Intel Software Development Tools

Aplication Performance Snapshot

Aplication Performance Snapshot is a tool for application profiling. This application is included to the Intel Parallel Studio installed on the HybriLIT cluster.
Aplication Performance Snapshot allows analyze such parameters as memory usage, work time of each process, operating delay, etc.

In order tp profile an application, it’s necessary to:

  1. Load a module that provides support of envarinmental variables for work with Intel Parallel Studio:

  1. Compile the application by means of loading a library:

For MPI programs

for OpenMP programs

or

for a sequential program where name_app.cpp – is the name of the compiled file.

    aps or using a script-file:

For MPI programs

for OpenMP programs

for a sequential program

  1. 4. All received data will be placed in the folder aps_result_[launch date]. In order to display these data, generate a report in HTML (pic 1, pic 2) using the following command:

Pic.1 . Screenshot of an output-file of Aplication Perfomance Snapshot for MPI application (click the picture to see it at the report page)

 

Pic. 2. Screenshot of an output-file of Aplication Perfomance Snapshot for OpenMP application (click the picture to see it at the report page)

 

Intel Trace Analyzer and Collector

Intel Trace Analyzer and Collector is aimed at tracing MPI processes..

Main tasks:

  • Behavior vizualization of a parallel application;
  • Estimation of statistic profile and load balancing;
  • Performance analysis of programs and code blocks;
  • Detailing of model of exchange and performance data;
  • Detection of a hotspot;
  • Decrease of execution time and increase of application efficiency.

To launch an application, it’s necessary to repeat steps 1-2 of the MPI Performance Snapshot section. Then launch the program using parameter –trace. Please see an example below:

To view results, please use ITAC GUI by means of the command:

As a result, an application winsow will be opened and you will need to upload a file (a.out.stf by default). An examples is shown on Pic. 3.

Distribution of computational load on MPI processes and interprocess communication (Event Timeline scale) are shown at the top of Pic.3.; numeric values are shown at the bottom of Pic.3.

itac1

Pic. 3. Examples of communication in an MPI application.

By means of Message Profile, it is now possible to estimate which processes comunicate and which communications are the most time consuming (Pic. 4.).

itac2

Pic. 4. Interprocess communication in an MPI application.

In Collective operations page, it is possible to prepare a table that reflects total span time for collective operations in MPI application (Pic.5.).

Collective_operations-2

Рис. 5. Таблица временных затрат на коллективные операции в MPI-приложении.

Conclusion
MPI Performance Snapshot —  is aimed at fast estimation of efficiency;  it doesn’t require additional charges; provides profiling up to 32000 MPI processes; allows get fasr estimation of MPI and OpenMP disbalance; and provides total estimation of performance (GFLOPS).
Intel Trace Analyzer and Collector — allows carrying out detailed analysis of MPI applications, detect communication patterns, and locate specific bottle neck of programs.