Profiling can be performed on Azure HPC VMs with various tools. Today we are going to look at MPI application profiling with AMD uProf.
AMD uProf is a tool for performance and system analysis. AMD uProf can gather time and instruction-based profiles. As well as tracing and visualizing MPI processes/threads.
Profiling is the analysis of an application’s execution via the measurement of system metrics. When profiling hardware counters can be sampled to verify the occurrence and frequency of certain hardware events. For example, these events could be L1 cache misses or CPU cycles per instruction. Profiling can also include time-based measurements that relate to specific instructions in an application call stack.
The insight from profiling an application’s runtime can be used to improve latency, throughput, and scalability on HPC systems.
Modes of operation
AMD uProf is cross platform. It provides a CLI interface as well as a GUI. The CLI can generate a CSV report that can be later analyzed. However, for this experiment we will be using the CLI to gather metrics and then switch over to the GUI to analyses and visualize the data.
Environment setup
Collecting and Generating Reports with CLI
Bash
Collect profile data:
PROF_CMD="AMDuProfCLI collect --config tbp -g --mpi --output-dir <output dir>
mpirun <MPI flags> $PROF_CMD ./application.out
Collect MPI trace:
PROF_CMD="$AMD_PERF_DIR/AMDuProfCLI collect --trace mpi=openmpi,full -g --mpi --output-dir < output dir >
mpirun <MPI flags> $PROF_CMD ./application.out
Generate report:
AMDuProfCLI report -g --detail --input-dir <output dir/prof dir>
Note:
To avoid latency due to processing to much data refer to the following:
Other options:
The –config flag can be used to provide a config file that designates which events should be sampled. Example config files are provided within the AMD uProf directory. The config flag also has predefined arguments which capture a preset list of metrics. Use the following to list them:
./AMDuProfCLI info --list collect-configs
Alternatively, the –events flag can be used to pick out performance monitoring unit (PMU) events to collect. Multiple event flags can be used when collecting more than one PMU event. To view the list of PMU events use the following:
./AMDuProfCLI info --list pmu-events
Profiling and Tracing WRF
PROF_CMD="./AMDuProfCLI collect --config tbp --config cpi -g --mpi --output-dir <o/p dir>
mpi_config.txt:
-np 1 $PROF_CMD ./wrf.exe"
-np 119 ./ wrf.exe"
mpirun command:
mpirun --allow-run-as-root $PIN_PROCESSOR_LIST --rank-by slot -mca coll ^hcoll -x LD_LIBRARY_PATH -x PATH -x PWD --app mpi_config.txt
Note: The $PIN_PROCESSOR_LIST variable is a string like this:
“--bind-to cpulist:ordered --cpu-set 0,1,2,3,4,5,6,7,8” which ensures proper pinning of all the cores.
~/AMDuProf_Linux_x64_4.0.341/bin/AMDuProfCLI report -g --detail --input-dir …/profout/AMDuProf-wrf-Custom_MPI
Visualization and Analysis Overview
Please refer to the AMD uProf documentation (section 5, 5.5) for how to import and use the AMD uProf GUI.
Summary Hot Spots view:
Analyze Function Hotspots
Analyze Metrics
Shows similar call stack but with filtering by process, thread. module
---Analyze Flame Graph
Analyze Call Graph
Sources
-maps function calls to assembly instructions for detailed view
Trace
MPI flat profile
Time chart:
Support for hardware counters
Hardware counter VM visibility is exposed through the Virtual Performance Monitoring Unit (VPMU). VPMU is enabled on the following VM SKUs:
References
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.