2021 Code Performance Series: From analysis to insight

Tiny URL: https://tinyurl.com/performanceanalysis2021

An ExCALIBUR Knowledge Integration Activity in collaboration with the POP CoE, Durham’s Department of Computer Science, DiRAC and the N8 CIR (N8 Centre of Excellence in Computationally Intensive Research).

We do appreciate the support from EPSRC through their ExCALIBUR programme (grant no EP/V00154X/1). The workshop series is hosted and supported by Durham’s Department of Computer Science. The department makes their supercomputer DINE available to the workshop free of charge. The support of the N8 Centre of Excellence for Computationally Intensive Research (N8 CIR) funded by the N8 institutions (Universities of Durham, Lancaster, Leeds, Liverpool, Manchester, Newcastle, Sheffield and York) is gratefully acknowledged. Core course content is delivered by the POP CoE: The HPC Performance Optimisation and Productivity Centre of Excellence is funded from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824080.

If you benefit from the workshops, we’d appreciate any acknowledgement/tribute statement in outputs.

Performance analysis is at the core of the development of exascale software – to understand why
software performs (or not) is the basis of any informed improvement of the very same code.
Significant resources are put into the development of performance analysis tools by both academia
and industry, and significant effort is put into training by tool developers, compute centres and
vendors alike. Once gathered, it however remains a challenge how to translate “raw”ish
performance data into a language that domain specialists can understand and react to it: How do
performance metrics map onto algorithmic phases, how do efficiency patterns correlate with
particular data arrangements, how do application parameters affect the runtime characteristics, …
Due to this gap between the application perspective and performance data, performance analysis is
not routinely used by software end-users, and we do not see a performance-functionality co-design attitude in many areas. Our workshop series plans to improve upon this situation:

  • We run a series of performance analysis workshops. They allow participating groups to apply the discussed tools directly to their codes. This yields direct benefit on the code side, but it also provides upskilling to the participants.
  • We work on a document to uncover and document implicit performance data presentation languages. Certain conventions how to present performance data are established in HPC. Many charts/visualizations carry information that is implicitly clear to HPC specialists such as ExCALIBUR RSEs, but this information is not explicitly documented and digestible for non-HPC users. The choice of a data presentation style is important – it carries information, opinions and is often a language-barrier that challenges interdisciplinary teams.
  • We discuss strategies how to map performance data/insight onto code documentation and user data. Performance data is agnostic of user context (algorithm phases, input data structure, …). To facilitate a fruitful conversations between users and performance analysts (RSEs), these views have to be combined (via appropriate graphical representations or reporting conventions, e.g.) – otherwise domain scientists struggle to understand and appreciate HPC effort and implications of domain code design decisions onto performance.

Format

The workshops will take place online via Zoom. Recordings of the lecture/tutorial parts will be made available. All workshop conversations throughout the sessions plus in the weeks between will be realised through Slack.

There might be an (optional) in-person wrap-up/concluding workshop if possible.

Call for participants

Our goals:

  • Acquire the skill set to run profound performance analyses with a multitude of tools.
  • Run multifaceted, in-depth performance analyses of particular codes brought in by the participating teams.
  • Contribute towards a performance analysis landscape review report.
  • Provide feedback to performance tool developers.
  • Help us to write down unwritten performance data representation laws and uncover or sketch ways how to bring performance analysis data and algorithm know-how together.

We ask for teams to join our workshop series. A team consists of at least 2-3 developers and focuses on one piece of code. We hope that each team tries to send at least one delegate per workshop; but it is clear that members have other commitments, too. In the best case, the teams/codes accompany us through the whole workshop series.

Each workshop session/day is split into two parts: There’s a tutorial/lecture part in the morning, and a hackathon-type part in the afternoon. The mornings are for the personal development of workshop participants, i.e. to improve their performance analysis skill set. We make them familiar with the tools. For participants with previous knowledge, we will provide advanced instructions – most of the time the tool developers will themselves be available – but we do not expect previous knowledge.

In the aftermath/-noon of each workshop, the teams apply the tools discussed directly to their code. That is, teams work with the workshop organisers on a comprehensive performance overview of their code throughout the workshop series. This can either happen throughout the workshop afternoons or asynchronously via Slack in the weeks between – but in the end, we want to end up with a detailed, in-depth analysis of the team codes.

Further to that, we expect teams to actively participate in our write-ups. That is, we expect teams to provide us with (informal) feedback or to report on their progress and experiences throughout the workshop. Our ambition is that we collaboratively write a landscape overview/report document throughout the workshop which discusses our experiences with performance analysis tools, identifies training and feature needs, and explicitly shows which kind of (visual) feedback performance analysis tools can deliver. We will write the report, but we rely on the participating teams to feed us with input.

Individuals can participate in the workshop (in particular the morning sessions), even though we are primarily interested in whole teams. We currently investigate whether individuals (as individuals or as team members) who participate in most of the workshops can get some kind of accreditation.

The workshop is open free of charge, but we reserve the right to close the registration or to select among the participants should we have too many applications. Highest priority will be given to ExCALIBUR teams/projects, N8 teams, and DiRAC code development teams. We however explicitly want to encourage international colleagues who work on bigger pieces of software to apply.

Slots

  • Why is it not a single workshop, i.e. why do you want us to commit to a whole series of workshops?
    We think that users need time to digest ideas. You learn something about a tool, and then you have to use that tool for some time (days, weeks), before you can assess its value properly. With new insight, it then is more productive to discuss the next tool. With a series of workshops, we think you get most out of all the material provided.
  • Why is it for teams primarily? We welcome individual participants – in particular for the tutorial-like sessions in the mornings – but our prime goal is to run performance analysis for large-scale, production-ready software. This is part of our mission, and it is the sound way for us to gather feedback about what codes need from tools and how they use them. Our experience tells us that the analysis of large-scale software is best done by teams where different members bring in different views. We think that working in a team makes the whole analysis exercise more productive. Furthermore: we are well-aware that you guys are busy and thus might not be able to attend all sessions. With teams of participants, we hope that each code is exposed to most of the tools.
  • Why do you not focus on commercial tools first of all? We do appreciate that there are commercial performance analysis tools out there which are extremely useful, particularly those from computer vendors for their own systems. However, we think that open tools add value since they are
    platform-agnostic and facilitate use and performance comparison of a
    variety of computer systems. Also, the tools that we primarily discuss are driven by their own research agenda. That is, we will not run sales sessions – we will run sessions where tool developers and tool users work hand in hand, influence each other, and identify and start to tackle the next generation of HPC research questions.

 

Each workshop spans two sessions: a tutorial-/lecture-esk morning session of around three hours, followed by an afternoon session where the participating groups apply the tools and ideas presented in the morning to their respective code.

  • 21/1 Intro session
    Organisers introduce concept; each team introduces their code base and ambitions; some basic profiling methodology; initial basic profiling of user codes
  • 18/2 Parallel profiling
    Introduction to Scalasca/Score-P/CUBE; primary focus on OpenMP and MPI
  • 11/3 Trace collection
    MPI & OpenMP trace collection&analysis with Scalasca/Score-P/Vampir
  • 15/4 Parallel Correctness
    MPI & OpenMP correctness verification with MUST/Archer
  • 20/5 User workshop
    Feedback on achievements so far; roundtable on next workshop steps; presentation by workshop participants how they present their performance data traditionally towards their communities (home-made performance data collection and presentation)
  • 17/6 to be confirmed
  • 15/7 to be confirmed

    Topics of follow-up workshops will be announced on time.

We plan to add further slots later into the year. Topics may include

* GPU/CUDA correctness verification
* parallel debugging (and profiling) with commercial tools
* GPU performance analysis/optimisation
* x86 performance analysis/optimisation
* vectorisation & core performance
* parallel file I/O performance
* performance analytics
* performance data mining
* scalability modeling
* job build/execution automation
* application semantic annotations

You get all the slots here as ICS File or you grab them from https://outlook.office365.com/owa/calendar/a1dfd791bbfa47118814d6efbdaaad6b@durham.ac.uk/98feaa8a654c4c2e84b7893e4ac3a4362538988115792561318/calendar.html.

Machines

We will primarily use Durham’s DINE supercomputer for all tutorials and exercises. Where appropriate, partitions of Cosma will be made available. Participants are recommended to try out their codes on their machines, but it is up to the participants to ensure beforehand that all tools are properly available, i.e. we will not be able to provide in-depth support for local installations.

DINE access will be granted to all workshop participants free of charge.

Registration

Please use the link below to register your team. Individuals can register through this link as well, but teams are our primary focus. We’ll get in contact you after the registration – please give us a week or two at least – with a confirmation if we have places left and further workshop details. We will then also share supecomputer access instructions if you don’t have access to our demo system yet.

Register