In the last few decades, the rate of operations of computing systems has improved drastically but we have noticed relatively smaller improvement in the rate of data transfers. The current computing systems face bottleneck due to the large volume of data transfers. One of the promising approaches to solve this problem is to design algorithms which minimize the data transfers. The goal of this course is to present a variety of such algorithms for matrix and tensor computations.

Tensors are multi-dimensional arrays and arise in several domains, e.g., data mining, neurosciences, quantum computing, molecular dynamics and computer vision. The course will present several tensor computations in detail.

The course will also discuss various approaches to compute communication lower bounds (that determine how much data transfers are required) for different matrix and tensor computations. Establishing communication lower bounds helps one to identify efficient algorithms that exactly attain the bounds.

In this part, we will first study the seminal work of Hong and Kung (from 1960s) to determine the minimum data transfer between levels of a memory hierarchy on a sequential machine (a single processor). After that, we will review other approaches to obtain communication lower bounds on sequential and parallel systems. We will also study some algorithms for both types of systems which attain the lower bounds.

- Matrix multiplications
- Pebble games
- Data transfer cost models
- Memory dependent and independent communication lower bounds
- Communication avoiding sequential and parallel algorithms
- 2D/2.5D/3D algorithms
- Strassen's matrix multiplications
- Matrix factorizations

In this part, we will first look at tensor notations and different types of popular tensor operations. After that, we will study widely used tensor decompositions and different algorithms to compute them. We will also analyze the bottleneck computations of various tensor decomposition algorithms and discuss parallel and communication optimal approaches for those computations.

- Tensor notations
- Use of tensors in data analytics and quantum computing
- Tensor decompositions: Canonical Polydiac, Tucker, Tensor train
- Matricized tensor times Khatri-Rao product computations
- Multiple tensor times matrix computations

We will have some lectures on related topics from experts.

- Memory-aware scheduling
- Sparse matrix and tensor computations
- Hierarchical and block low-rank matrices
- Randomized methods to improve algorithmic costs

We will also look at several interesting research projects in the course related to matrix/tensor computations in high performance computing, machine learning, data analytics, molecular simulations and quantum computing.

- 4 homework assignments (best 3 would be counted): 40%
- Project (60%): A list of selected research topics/papers will be provided. Each student or a group of 2 students will prepare a report and give a presentation for one of these.

- On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations.
- Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms.
- Sparsity-Aware Tensor Decomposition.
- Higher-Order QR with Tournament Pivoting for Tensor Compression.
- Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping.