Numerical linear algebra is a well-established research area with rich history full of inventions. It is the root and the driver of all state-of-the-art methods for all kinds of problems. Recent advances in task-based approach to parallel processing of such basic numerical linear algebra routines like QR decomposition and Cholesky factorization on heterogeneous clusters are still underrepresented in the area of AI. Task-based programming paradigm requires entire algorithm to be represented in a form of directed acyclic graph of tasks. Each task is executed on some processing unit, e.g., CPU core or GPU, as chosen by a scheduling library, and operates on local copy data, as provided by the scheduling library. Implementing AI training and inference techniques within task-based framework and solving all arithmetical and data transfer bottlenecks are the main duties of our lab.

deliverables

To achieve a highly scalable performance for current AI needs we will focus on:

algorithmical research to allow the same operations to be done in different ways to allow different optimal performance on clusters and supercomputers of different scales
optimizing performance of low-level building blocks
development of a quality software for parallel AI training and inference for different state-of-the art neural network models