This document describes the internal structure of the MPI communication tracer, trace consumer, analytics functionality, and visualisation functionality. This includes the interceptor library, parser, analytics pipeline, frontend display architecture, and test suite.
The project has three main runtime stages:
.mpic trace.tools/mpi_data_parser.py converts .mpic into .mpix.vis/ loads .mpix, streams timeline chunks on demand, renders hardware and traffic in 3D, and overlays analytics.src/
mpi_communication_tracking.c
mpi_communication_tracking.h
tools/
mpi_data_parser.py
topology_generator.py
slurm_topology_generator.py
vis/
index.html
style.css
visualiser.js
analytics.js
analytics-3d.js
analytics-controls.js
tests/
CMakeLists.txt
ctest_driver.py
trace_parser.py
test_*.c
test_*.f90
The interceptor is a shared library loaded with LD_PRELOAD. It wraps MPI calls via PMPI and records trace records with low overhead.
MPI_Init / MPI_Init_thread.mpic file at MPI_FinalizeThe library writes two record classes:
MPI_Sendrecv, MPI_Gather, MPI_Scatter, MPI_AllgatherMPI_SendMPI_RecvMPI_BsendMPI_SsendMPI_RsendMPI_IsendMPI_IbsendMPI_IssendMPI_IrsendMPI_IrecvMPI_SendrecvMPI_WaitMPI_WaitallMPI_WaitanyMPI_WaitsomeMPI_TestMPI_TestanyMPI_TestallMPI_TestsomeMPI_BarrierMPI_BcastMPI_ReduceMPI_AllreduceMPI_GatherMPI_ScatterMPI_AllgatherThe interceptor also tracks pending non-blocking requests internally. Instead of logging nonblocking communication only at posting time, it:
MPI_Isend, MPI_Irecv, etc.MPI_WaitMPI_WaitallMPI_WaitanyMPI_WaitsomeMPI_TestMPI_TestallMPI_TestanyMPI_TestsomeThis allows:
MPI_ANY_SOURCEThe library also includes symbol-level Fortran wrappers, primarily intended for:
mpif.huse mpi implementationsCoverage includes non-blocking completions and request-handling routines, but it is not intended as a full portable mpi_f08 interception layer yet.
.mpic Binary FormatThe .mpic file is the raw output of the interceptor.
The file begins with:
One per rank, containing:
For each rank:
"P2P Small Type Messages""P2P Large Type Messages"To process a trace file we provide a tool; tools/mpi_data_parser.py, which is designed to try and parse the file as created, but can also deal with malformed or truncated files, using the following approach:
This is provided to try and enable some profile/tracing data to be analysed if in a program run fails to complete successfully, but the primary mode of working requires a correctly formed trace file.
tools/mpi_data_parser.py converts .mpic into .mpix and creates a visualisation-friendly analysis layer.
The parser emits a compressed .mpix container with:
The timeline is split into fixed-size event chunks and compressed with zlib. This allows the visualisation tool to only loads only the chunks needed for the current time window, enabling large traces to be visualised without requiring the full data to be in memory at the same time.
The parser constructs an analysis object in the .mpix header.
{
"summary": { ... },
"per_rank": [ ... ],
"top_ranks_by_out_bytes": [ ... ],
"top_ranks_by_in_bytes": [ ... ],
"top_ranks_by_touch_bytes": [ ... ],
"top_links": [ ... ],
"collective_roots": [ ... ],
"barrier_spreads": [ ... ],
"patterns": [ ... ],
"issues": [ ... ],
"time_windows": [ ... ]
}
Includes:
Includes:
Hottest sender/receiver links from the canonical transfer subset.
Summaries for concentrated rooted collective traffic.
Heuristic skew estimates based on repeated barrier timing order.
Coarse phase summaries, including per-window:
The parser currently infers several communication motifs heuristically.
Triggered when one rank dominates communication volume and degree.
Triggered when rooted collective traffic is concentrated on a small number of roots.
Triggered when traffic is dominated by short rank-distance communication.
Triggered when communication concentrates on a small set of offsets and reciprocal pairs.
Triggered when the pair graph is dense and peer counts are high.
Triggered when balanced two-way communication is detected between a small number of pairs.
The parser emits heuristic issue records for likely performance problems.
Current issue types include:
small_message_overheadcommunication_imbalancebarrier_imbalancesynchronization_heavycollective_root_bottlenecklink_hotspotglobal_collective_heavyEach issue may contain:
severityscoredescriptionrankspairsmetricsImportant: these are diagnostics for visualisation and first-pass investigation, not formal proof of a problem.
The frontend is split into several files.
visualiser.jsMain application logic:
analytics.jsRenders the analytics panel:
Also supports:
analytics-3d.jsRenders persistent 3D analytics overlays:
Also supports:
analytics-controls.jsRenders:
index.html loads .mpix through the browser file picker.
The frontend reads the compressed header first and populates:
As playback advances, the frontend loads only the relevant chunk using the offsets stored in the header.
At each playback step, the frontend:
These are independent of playback-time communication rendering and are built from parsedData.analysis.
These remain visible even when playback is paused or the current active timeline has moved elsewhere.
When the user clicks Isolate 3D on an issue card:
After layout changes, overlays are refreshed so arcs and halos align with the new positions.
The project uses CTest-native integration tests.
Coverage includes:
MPI_ANY_SOURCEWait, Waitall, Waitany, WaitsomeTest, Testall, Testany, TestsomeOptional Fortran tests cover:
MPI_WAITMPI_WAITALLMPI_WAITANYMPI_TESTALLcmake -S . -B build -DMPI_TRACE_FORTRAN_TESTS=AUTO
cmake -S . -B build -DMPI_TRACE_FORTRAN_TESTS=ON
cmake -S . -B build -DMPI_TRACE_FORTRAN_TESTS=OFF
cd build
ctest --output-on-failure
Common extension points in src/ include:
MPI_Fint conversion routinesCommon extensions in tools/mpi_data_parser.py:
MESSAGE_TYPESanalyse_trace(data)analysisAnalytics should be:
Use analytics.js and follow the existing card helper pattern.
Use analytics-3d.js and:
parsedData.analysisUse analytics-controls.js and wire them to Analytics3D.configure(...).
MPI_THREAD_MULTIPLE hardening is still incompletempi_f08 interception is not guaranteed portablyPotential next improvements include:
# Configure
cmake -S . -B build -DMPI_TRACE_FORTRAN_TESTS=AUTO
# Build
cmake --build build
# Run tests
ctest --test-dir build --output-on-failure
# Profile an application
LD_PRELOAD=$PWD/build/src/libmpi_comm_tracker.so mpirun -n 16 ./your_mpi_application
# Parse trace
python tools/parse_mpic.py your_mpi_application-YYYYMMDDHHMMSS.mpic hardware_map.json
# Open frontend
# Load vis/index.html in a browser and open the .mpix file
When changing one layer, remember the others:
interceptor change
usually implies parser and frontend and tests updates
parser analytics change
may imply analytics panel and analytics 3D updates
frontend visual change
may require new fields in analysis
A safe workflow is:
.mpix trace in the browser