Resource-aware computations on CPUs and GPUs

Instructors: Suraj Kumar, Loris Marchal & Frédéric Vivien (ROMA team, LIP, ENS Lyon), 2025-2026



Computing platforms have limited resources, such as memory, cache, bandwidth, and processing power. In the past, algorithms have been derived with optimal complexity and thus are supposed to make an efficient usage of processing power, while memory, cache and bandwidth limitations have often been ignored. However, it is often the case that these resources become the primary factors limiting overall performance, especially (but not only) when using accelerators such as GPUs. GPUs offer increased processing capabilities and superior energy efficiency compared to CPUs, making them a crucial element of many computing systems over the past decade.

In this course, we will present on one hand algorithmic approaches that have recently been proposed in order to utilize all resources efficiently, and on the other hand we will focus on how to implement these efficient algorithms on real hardware platforms. The typical use case will focus on linear algebra computations (matrix operations), which are the basis of both "traditional" high performance computing applications and recent neural network computations.

seq-machine
A sequential machine.
 
distributed-machine
A distributed memory machine.
 
V100-GPU
Nvidia Volta 100 architecture.

Outline


We will look at several interesting research projects in the course related to parallel computations in high performance computing, machine learning and data analytics.


Prerequisite

Experience with C/C++ is expected. Knowledge of parallel algorithms will be helpful, but not required.

Evaluation

The evaluation will be based on the following weightings:

Recommended reading (evolving)

  • I/O-Optimal Algorithms for Symmetric Linear Algebra Kernels
    Olivier Beaumont, Lionel Eyraud-Dubois, Julien Langou, Mathieu Vérité
    ACM Symposium on Parallelism in Algorithms and Architectures, 2022, pdf.
  • LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores
    Yiwei Zhang, Kun Li, Liang Yuan, Jiawen Cheng, Yunquan Zhang, Ting Cao, Mao Yang
    SC24: International Conference for High Performance Computing, Networking, Storage and Analysis, 2024, pdf.
  • Bibliography

    The course will be based on recent research atricles in the area. However, we will follow the following textbooks for GPU programming:
  • Programming Massively Parallel Processors: A Hands-on Approach (4th Edition)
    by Wen-mei W. Hwu, David B. Kirk and Izzat El Hajj