2021 Code Performance Series: From analysis to insight

Tiny URL: https://tinyurl.com/performanceanalysis2021

Final experience report: https://zenodo.org/record/5155503#.YQkwETrTVH4

We organise a natural follow-up/wrap-up workshop in December: https://www.dur.ac.uk/arc/training/workshop/

An ExCALIBUR Knowledge Integration Activity in collaboration with the POP CoE, Durham’s Department of Computer Science, DiRAC and the N8 CIR (N8 Centre of Excellence in Computationally Intensive Research).

We do appreciate the support from EPSRC through their ExCALIBUR programme (grant no EP/V00154X/1). The workshop series is hosted and supported by Durham’s Department of Computer Science. The department makes their supercomputer DINE available to the workshop free of charge. The support of the N8 Centre of Excellence for Computationally Intensive Research (N8 CIR) funded by the N8 institutions (Universities of Durham, Lancaster, Leeds, Liverpool, Manchester, Newcastle, Sheffield and York) is gratefully acknowledged. Core course content is delivered by the POP CoE: The HPC Performance Optimisation and Productivity Centre of Excellence is funded from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824080.

If you benefit from the workshops, we’d appreciate any acknowledgement/tribute statement in outputs.

Performance analysis is at the core of the development of exascale software – to understand why
software performs (or not) is the basis of any informed improvement of the very same code.
Significant resources are put into the development of performance analysis tools by both academia
and industry, and significant effort is put into training by tool developers, compute centres and
vendors alike. Once gathered, it however remains a challenge how to translate “raw”ish
performance data into a language that domain specialists can understand and react to it: How do
performance metrics map onto algorithmic phases, how do efficiency patterns correlate with
particular data arrangements, how do application parameters affect the runtime characteristics, …
Due to this gap between the application perspective and performance data, performance analysis is
not routinely used by software end-users, and we do not see a performance-functionality co-design attitude in many areas. Our workshop series plans to improve upon this situation:

  • We run a series of performance analysis workshops. They allow participating groups to apply the discussed tools directly to their codes. This yields direct benefit on the code side, but it also provides upskilling to the participants.
  • We work on a document to uncover and document implicit performance data presentation languages. Certain conventions how to present performance data are established in HPC. Many charts/visualizations carry information that is implicitly clear to HPC specialists such as ExCALIBUR RSEs, but this information is not explicitly documented and digestible for non-HPC users. The choice of a data presentation style is important – it carries information, opinions and is often a language-barrier that challenges interdisciplinary teams.
  • We discuss strategies how to map performance data/insight onto code documentation and user data. Performance data is agnostic of user context (algorithm phases, input data structure, …). To facilitate a fruitful conversations between users and performance analysts (RSEs), these views have to be combined (via appropriate graphical representations or reporting conventions, e.g.) – otherwise domain scientists struggle to understand and appreciate HPC effort and implications of domain code design decisions onto performance.

Format

The workshops will take place online via Zoom. Recordings of the lecture/tutorial parts will be made available. All workshop conversations throughout the sessions plus in the weeks between will be realised through Slack.

There might be an (optional) in-person wrap-up/concluding workshop if possible.

Call for participants

Our goals:

  • Acquire the skill set to run profound performance analyses with a multitude of tools.
  • Run multifaceted, in-depth performance analyses of particular codes brought in by the participating teams.
  • Contribute towards a performance analysis landscape review report.
  • Provide feedback to performance tool developers.
  • Help us to write down unwritten performance data representation laws and uncover or sketch ways how to bring performance analysis data and algorithm know-how together.

We ask for teams to join our workshop series. A team consists of at least 2-3 developers and focuses on one piece of code. We hope that each team tries to send at least one delegate per workshop; but it is clear that members have other commitments, too. In the best case, the teams/codes accompany us through the whole workshop series.

Each workshop session/day is split into two parts: There’s a tutorial/lecture part in the morning, and a hackathon-type part in the afternoon. The mornings are for the personal development of workshop participants, i.e. to improve their performance analysis skill set. We make them familiar with the tools. For participants with previous knowledge, we will provide advanced instructions – most of the time the tool developers will themselves be available – but we do not expect previous knowledge.

In the aftermath/-noon of each workshop, the teams apply the tools discussed directly to their code. That is, teams work with the workshop organisers on a comprehensive performance overview of their code throughout the workshop series. This can either happen throughout the workshop afternoons or asynchronously via Slack in the weeks between – but in the end, we want to end up with a detailed, in-depth analysis of the team codes.

Further to that, we expect teams to actively participate in our write-ups. That is, we expect teams to provide us with (informal) feedback or to report on their progress and experiences throughout the workshop. Our ambition is that we collaboratively write a landscape overview/report document throughout the workshop which discusses our experiences with performance analysis tools, identifies training and feature needs, and explicitly shows which kind of (visual) feedback performance analysis tools can deliver. We will write the report, but we rely on the participating teams to feed us with input.

Individuals can participate in the workshop (in particular the morning sessions), even though we are primarily interested in whole teams. We currently investigate whether individuals (as individuals or as team members) who participate in most of the workshops can get some kind of accreditation.

The workshop is open free of charge, but we reserve the right to close the registration or to select among the participants should we have too many applications. Highest priority will be given to ExCALIBUR teams/projects, N8 teams, and DiRAC code development teams. We however explicitly want to encourage international colleagues who work on bigger pieces of software to apply.

21/1 Intro

  • 9:00-9:45 Intro of workshop and workshop concept
  • 10:00-11:00 Participants introduce their code base (what application is used), what parallel algorithms they use, and how these algorithms are implemented (used software/libraries)
  • 11:15-12:00 Profiling methodology and some high-level tools (B. Wylie)
  • 13:00-15:00 — Workshop —
    Get teams’ codes up and running
    Define proper benchmarking testcases
    ARM performance reports
    Intel performance reports

The morning session can be found here: https://durhamuniversity.zoom.us/rec/play/V57HWVGtnxxr7AxD2RKx5xCnGoprbVB2B2ujEOchizNbGahdLG4a8cJkBIgqbWgDBOkNPyWLieETUaM9.cMmAmJjU4KcQ05SM

18/2: Parallel profiling

  • 9:00-9:30 Lessons learned and take-away from last session (two case studies)
  • 9:30-10:00 Some scalability terminology
  • 10:00-12:00 Scalasca & Score-P (B. Wylie and colleagues)
  • 13:00-15:00 — Workshop —
    Apply Scalasca to participants’ code

The morning session can be found here:

Topic: 2021 Code Performance Series
Start Time : Feb 18, 2021 08:46 AM

Meeting Recording:
https://durhamuniversity.zoom.us/rec/share/hYUc9J9-u62h_YTqfICxeUByQT6kSHn9PKdKnEDy5JB9gErzu6YSLDliDig39ipB.LOGjxoyamZRLG7o1

11/3: Tracing with Vampir

  • 9:00-9:30 Lessons learned and take-away from last session
  • 9:30-10:00 Score-P instrumentation & measurements (recap with exercises)
  • 10:00-11:00 MPI Tracing
    • Interactive event trace visualisation with Vampir
    • Interactive profile exploration with CUBE
  • 11:00-12:15 Case studies of parallel performance analyses with CUBE & Vampir
  • 13:00-13:30 MPI Programming Wrap-up (optional refresher)
    • Collective vs p2p routines: What does it mean?
    • Blocking vs non-blocking programming: What does it mean?
  • 13:00-15:00 — Workshop —
    Apply Vampir to participants’ code

The recording of th MPI “revision” can be found here. The morning session is also available at

https://durhamuniversity.zoom.us/rec/share/h3OkGxZQGur84snTdp7qVFf88vCr6o-G2TkfCAqm04-oNU0F1Xmcih_oUZUXjcr3.7S5GEFQJgZww1x24

8/4 (changed from 15/4): Tracing analysis with Scalasca

  • 9:00-9:30 Lessons learned and take-away from last session
  • 9:30-10:30 MPI Trace Analysis
  • 10:30-11:00 Excalibur’s H&ES programme
  • 11:00-12:00 Case studies of parallel performance analyses: About the progress made with SUSY Lattice
  • 13:00-13:30 Asynchronous MPI communication with OpenMP tasks (optional outlook)
  • 13:00-15:00 — Workshop —
    Apply Scalasca to participants’ codes

Meeting Recording:
https://durhamuniversity.zoom.us/rec/share/bPI2vEk1_fYcmDWuAcW4xqCTEBvSK2LzzjPnYVI67h5oq-gDenKObV8wuzTGkuzc.mjgflXI-c71CToD-

Recording of Jochim’s talk on novel OpenMP/MPI features: https://durhamuniversity.zoom.us/rec/share/UJdgvdrZgNXGAY8UtLpeZBHXB_0X6oLj6uEg-norzVvsWcgP2wS6t0oniHorrSgK.-vu8Ndt0Vc1vI3QZ

20/5: MPI and multithreading correctness

  • 9:00-9:30 Lessons learned and take-away from last session
  • 9:30-10:15 MPI correctness with MUST (J. Protze)
  • 10:15-10:45 Threading correctness with Archer (J. Protze)
  • 10:45-11:15 Outlook: memory correctness checker
    – The Clang memory sanitiser (J. Protze)
    – Score-P memory checks (B. Williams)
    – Valgrind with massive (J. Protze)
  • 11:15-12:00 The concept of an RSE within ExCALIBUR (S. King)
    Followed by discussion/exchange
  • 13:00-15:00 — Workshop —
    Apply correctness checker to participants’ codes

Meeting Recording:
https://durhamuniversity.zoom.us/rec/share/2W_pa5E3t8f6Nk1HSDeVLPNizRZfGttRMQV2qaWT29f153x-ERuFANzvEqR2v1_5.Pd6ahOu5lKxYUIyE

24/6 (changed from 17/6): From MPI and OpenMP analysis to node analysis

  • 9:00-9:30 Lessons learned and take-away from last session
  • 9:30-10:30 Presentation of MAQAO performance analysis framework (J. Ibnamar)
  • 10:30-11:15 Hands-on – Using MAQAO : analysing a code & navigating through MAQAO ONE View reports (J. Valensi)
  • 11:15-12:00 Hands-on – Optimising a code with MAQAO (E. Oseret)
  • 13:00-15:00 — Workshop —
    Apply MAQAO to participants’ codes

Meeting Recordings:

Topic: 2021 Code Performance Series
Start Time : Jun 24, 2021 08:46 AM

Meeting Recording:
https://durhamuniversity.zoom.us/rec/share/J1tTibSbI7FqY80gjj72NiCFZ9QTarO3l9TYAr2ENj4lzacykv5BqvnwrhnW9sQ6.H95POlqfeizMll9d

15/7: Single node optimisation

  • 9:00-9:30 Lessons learned
  • 9:30-10:00 The why and how behind single node performance analysis – theory in a nutshell (G. Hager)
  • 10:00-11:00 Performance counter analysis with Likwid (T. Gruber)
  • 11:00-12:00 Live demo: performance modelling and analysis with a stencil code (G. Hager)
  • 13:00-15:00 — Workshop —
    Apply Likwid to participants’ codes

Meeting recording:

Topic: 2021 Code Performance Series
Start Time : Jul 15, 2021 08:45 AM

Meeting Recording:
https://durhamuniversity.zoom.us/rec/share/M8c4gg00N0n1NnRGKy7YMi7GH-FZxg2N6XE5u6S6KfIqpBhwss06-o77KAqG1jnM.bQibZMKE5J65Atzo

16/7: Feedback meeting

  • 9:00-11:00 Participating groups present their lessons learned and take-aways
  • 11:00-11:30 Open discussion and feedback round
  • 11:30-12:00 Alastair Basden: DiRAC-3

Further slots:

We plan to add further slots later into the year. Topics may include

* GPU/CUDA correctness verification
* parallel debugging (and profiling) with commercial tools
* GPU performance analysis/optimisation
* x86 performance analysis/optimisation
* vectorisation & core performance
* parallel file I/O performance
* performance analytics
* performance data mining
* scalability modeling
* job build/execution automation
* application semantic annotations

FAQ

  • Why is it not a single workshop, i.e. why do you want us to commit to a whole series of workshops?
    We think that users need time to digest ideas. You learn something about a tool, and then you have to use that tool for some time (days, weeks), before you can assess its value properly. With new insight, it then is more productive to discuss the next tool. With a series of workshops, we think you get most out of all the material provided.
  • Why is it for teams primarily? We welcome individual participants – in particular for the tutorial-like sessions in the mornings – but our prime goal is to run performance analysis for large-scale, production-ready software. This is part of our mission, and it is the sound way for us to gather feedback about what codes need from tools and how they use them. Our experience tells us that the analysis of large-scale software is best done by teams where different members bring in different views. We think that working in a team makes the whole analysis exercise more productive. Furthermore: we are well-aware that you guys are busy and thus might not be able to attend all sessions. With teams of participants, we hope that each code is exposed to most of the tools.
  • Why do you not focus on commercial tools first of all? We do appreciate that there are commercial performance analysis tools out there which are extremely useful, particularly those from computer vendors for their own systems. However, we think that open tools add value since they are
    platform-agnostic and facilitate use and performance comparison of a
    variety of computer systems. Also, the tools that we primarily discuss are driven by their own research agenda. That is, we will not run sales sessions – we will run sessions where tool developers and tool users work hand in hand, influence each other, and identify and start to tackle the next generation of HPC research questions.

Machines

We will primarily use Durham’s DINE supercomputer for all tutorials and exercises. Where appropriate, partitions of Cosma will be made available. Participants are recommended to try out their codes on their machines, but it is up to the participants to ensure beforehand that all tools are properly available, i.e. we will not be able to provide in-depth support for local installations.

DINE access will be granted to all workshop participants free of charge.

Registration

Please use the link below to register your team. Individuals can register through this link as well, but teams are our primary focus. We’ll get in contact you after the registration – please give us a week or two at least – with a confirmation if we have places left and further workshop details. We will then also share supecomputer access instructions if you don’t have access to our demo system yet.

Registration is now closed. Please contact the course organisers if you still want to sign up,