Data-aware algorithms for matrix and tensor computations

Instructor: Suraj Kumar (ROMA team, LIP, ENS Lyon), 2024-2025

In the last few decades, the rate of operations of computing systems has improved drastically but we have noticed relatively smaller improvement in the rate of data transfers. The current computing systems face bottleneck due to the large volume of data transfers. One of the promising approaches to solve this problem is to design algorithms which minimize the data transfers. The goal of this course is to present a variety of such algorithms for matrix and tensor computations.

Tensors are multi-dimensional arrays and arise in several domains, e.g., data mining, neurosciences, quantum computing, molecular dynamics and computer vision. The course will present several tensor computations in detail.

Tucker decomposition example
   Tucker decomposition of a 3-dimensional tensor.

The course will also discuss various approaches to compute communication lower bounds (that determine how much data transfers are required) for different matrix and tensor computations. Establishing communication lower bounds helps one to identify efficient algorithms that exactly attain the bounds.

Part 1: Communication lower bounds and optimal algorithms for matrix computations

In this part, we will first study the seminal work of Hong and Kung (from 1960s) to determine the minimum data transfer between levels of a memory hierarchy on a sequential machine (a single processor). After that, we will review other approaches to obtain communication lower bounds on sequential and parallel systems. We will also study some algorithms for both types of systems which attain the lower bounds.

Part 2: Tensor computations

In this part, we will first look at tensor notations and different types of popular tensor operations. After that, we will study widely used tensor decompositions and different algorithms to compute them. We will also analyze the bottleneck computations of various tensor decomposition algorithms and discuss parallel and communication optimal approaches for those computations.

Part 3: Guest lectures on related popular topics

We will have some lectures on related topics from experts.

We will also look at several interesting research projects in the course related to matrix/tensor computations in high performance computing, machine learning, data analytics, molecular simulations and quantum computing.


Experience with algorithm analysis and mathematical optimization will be helpful, but not required.


The evaluation will be based on the following weightings:

Recommended reading (evolving)

  • Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds
    Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse
    ACM Symposium on Parallelism in Algorithms and Architectures, July 2022, Pages 445–448, Short version, Extended version.
  • Minimizing Communication in Numerical Linear Algebra
    Grey Ballard, James Demmel, Olga Holtz, Oded Schwartz
    SIAM Journal on Matrix Analysis and Applications, Volume 32, Issue 3, pp. 866-901, 2011, pdf.
  • Tensor Decompositions and Applications
    Tamara G. Kolda and Brett W. Bader
    SIAM Review, Volume 51, Number 3, pp. 455-500, 2009, pdf.
  • Related interesting articles