In the last few decades, the rate of operations of computing systems has improved
drastically but we have noticed relatively smaller improvement in the rate of
data transfers. The current computing systems face bottleneck due to the large
volume of data transfers. One of the promising approaches to solve this problem
is to design algorithms which minimize the data transfers. The goal of this course
is to present a variety of such algorithms for matrix and tensor computations.
Tensors are multi-dimensional arrays and arise in several domains, e.g., data
mining, neurosciences, quantum computing, molecular dynamics and computer vision.
The course will present several tensor computations in detail.
The course will also discuss various approaches to compute communication lower
bounds (that determine how much data transfers are required) for different matrix
and tensor computations. Establishing communication lower bounds helps one to
identify efficient algorithms that exactly attain the bounds.
Part 1: Communication lower bounds and optimal algorithms for matrix computations
In this part, we will first study the seminal work of Hong and Kung (from 1960s)
to determine the minimum data transfer between levels of a memory hierarchy on a
sequential machine (a single processor).
After that, we will review other approaches to obtain communication lower bounds on sequential
and parallel systems. We will also study some algorithms for both types of systems which attain the
lower bounds.
Matrix multiplications
Pebble games
Data transfer cost models
Memory dependent and independent communication lower bounds
Communication avoiding sequential and parallel algorithms
2D/2.5D/3D algorithms
Strassen's matrix multiplications
Matrix factorizations
Part 2: Tensor computations
In this part, we will first look at tensor notations and different types of popular
tensor operations. After that, we will study widely used tensor decompositions and
different algorithms to compute them. We will also analyze the bottleneck computations
of various tensor decomposition algorithms and discuss parallel and communication
optimal approaches for those computations.
Tensor notations
Use of tensors in data analytics and quantum computing
Matricized tensor times Khatri-Rao product computations
Multiple tensor times matrix computations
Part 3: Guest lectures on related popular topics
We will have some lectures on related topics from experts.
Memory-aware scheduling
Sparse matrix and tensor computations
Hierarchical and block low-rank matrices
Randomized methods to improve algorithmic costs
We will also look at several interesting research projects in the course
related to matrix/tensor computations in high performance computing, machine learning,
data analytics, molecular simulations and quantum computing.
Prerequisite
Experience with algorithm analysis and mathematical optimization will be helpful,
but not required.
Lecture notes
Lecture 1 (Sep 12): Overview of the course slides
and sequential matrix multiplication slides. Suggested reading pdf.
Assignment 1 is out with two questions (please see the slides).
Lecture 2 (Sep 14): Costs of some MPI routines slides.
A C++ program that practically analyzes all loop permutations for the sequential matrix
multiplication code.
Lecture 3 (Sep 19): Parallel matrix multiplication slides. Suggested reading pdf.
Assignment 2 is out (please see the slides), and it is due by Sep 28.
Lecture 5 (Sep 26): Symmetric computations slides. Suggested reading pdf.
Assignment 3 is out (please see the slides), and it is due by Oct 5.
Lecture 6 (Sep 28): Introduction to tensors slides.
Suggested reading pdf. Assignment 4 is out (please see the slides),
and it is due by Oct 10. If the unfoldings of the specified tensor are not vey clear, please have a look at all
the three unfoldings here.
Lecture 7 (Oct 3): Low rank approximations of tensors slides.
Suggested reading for Tensor train decomposition pdf.
The research topics/articles for the course project are out (please see the slides and inform me your choices).
Lecture 8 (Oct 5): Guest lecture by Loris Marchal on Memory-aware scheduling slides.
Lecture 9 (Oct 10): Guest lecture by Grégoire Pichon on Low-Rank compression in sparse
solvers slides.
Lecture 11 (Oct 17): Multiple tensor times matrix computations slides.
Suggested reading pdf. Final assignment is out (please see the slides), and it is due by Oct 26.
Lecture 12 (Oct 19): Guest lecture by Bora Uçar on sparse matrices and
bipartite graphs slides.
Lecture 13 (Oct 24): Remaining portions of Multiple tensor times matrix computations.
Lecture 14 (Nov 7): Implementation of CP/Tucker/Tensor train decompositions in Matlab/Octave files.
Lectures 15-16 (Nov 9): Project presentations and discussion.
Acknowledgement: I am extremely thankful to Grey Ballard and Loris Marchal for sharing source files of slides of their courses. I am using some of their slides in this course.
Evaluation
The evaluation will be based on the following weightings:
5 homework assignments (best 3 would be counted): 40%
Project: 60%
Course project for the final evaluation
Each student or a group of two students will prepare a 5-6 pages report and give a presentation
for one of the below topics/articles. Project report is due by Nov 6.
Research topics
Communication costs of a specific matrix factorization (LU, QR, TSQR, RRQR, ...)
Use of tensors in a particular domain, for example, neuroscience, data analysis, molecular simulations, quantum computing, face recognition, ...
Tensor train representation for the weight matrices of the fully connected layers. Presentation on Nov 9 at 16h50 by Maxime Just.Tensorizing Neural Networks
If you want to propose another topic/article, I would be happy to discuss it with you.