Linear Algebra Survival Kit For Data Science

Marie-Pierre Etienne

https://marieetienne.github.io/TopicsInStatistics/

2025-07-10

References on Linear Algebra for Data Scientist.

For a general overview

[Gen07]

[Har98] [BR14]

With R or Python applications

[Yos21]

[Coh22]

Matrix computations reference book

[PP08]

Linear Algebra : What and why

Definition

Linear algebra is the branch of mathematics concerning linear equations such as $a_1 x_1 + \cdots a_n x_n = b,$ and linear maps such as $(x_{1},\ldots ,x_{n})\mapsto a_{1}x_{1}+\cdots +a_{n}x_{n},$ and their representations in vector spaces and through matrices.

Link between Data science and Linear Algebra

Data are organized in rectangular array, understood as matrices, and the mathematical concept associated with matrices are used to manipulate data.

Palmer penguins illustration

## install.packages('palmerpenguins")
library("palmerpenguins")
penguins_nona <- na.omit(penguins) ## remove_na for now
penguins_nona |> print(n=3)

# A tibble: 333 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
# ℹ 330 more rows
# ℹ 2 more variables: sex <fct>, year <int>

One row describes one specific penguin,
One column corresponds to one specific attribute of the penguin.

Focusing on quantitative variables, we get rectangular array with n=333 rows, and d=5 columns.

Each penguin might be seen as a vector in $\mathbb{R}^d$ and each variable might be seen as a vector in $\mathbb{R}^n$. The whole data set is a matrix of dimension $(n,d)$.

Linear Algebra : What and why

Regression Analysis

to predict one variable, the response variable named $Y$, given the others, the regressors
or equivalently to identify a linear combination of the regressors which provide a good approximation of the responsbe variable,
or equivalently to specify parameters $\theta,$ so that the prediction $\hat{Y} = X \theta,$ linear combination of the regressors which is as close as possible from $Y$.

We are dealing with linear equations, enter the world of Linear Algebra.

Principal Component Analysis

to explore relationship between variables,
or equivalently to quantify proximity between vectors,
but also to small set set of new variables (vectors) which represent most of the initial variables,

We are dealing with vectors, vector basis, enter the world of Linear Algebra.

Linear Algebra Basic objects

Vectors

$x\in \R^n$ is a vector of $n$ components.

By convention, $x$ is a column vector \[x=\begin{pmatrix} x_1 \cr \vdots\cr x_n \end{pmatrix}.\]

When a row vector is needed we use operator $\ ^\intercal,$

\[x^\intercal = \begin{pmatrix} x_1, \cdots, x_n \end{pmatrix}.\] The null vector $\mathbb{0}\in\R^d$,

\[\mathbb{0}^\intercal = \begin{pmatrix} 0, \cdots 0\end{pmatrix}.\]

Matrix

A matrix $A\in \R^{n\times d}$ has $n$ rows and $d$ columns:

\[A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1d} \cr a_{21} & a_{22} & \cdots & a_{2d} \cr \vdots & \vdots & & \vdots \cr a_{n1} & a_{n2} & \cdots & a_{nd} \end{pmatrix}\]

The term on row $i$ and column $j$ is refered to as $a_{ij}$,

The column $j$ is refer to as $A_{.,j}$, $A_{.,j} = \begin{pmatrix} a_{1j} \cr \vdots\cr a_{nj} \end{pmatrix}.$

To respect the column vector convention, the row $i$ is refer to as $A_{i,.}^\intercal$, $A_{i,.}^\intercal = \begin{pmatrix} a_{i1}, & \cdots & , a_{id} \end{pmatrix}$

Basic operations

Simple operations

Multiplication by a scalar

let’s $\lambda\in\R$,

\[\lambda x = \begin{pmatrix} \lambda x_1 \cr \vdots\cr \lambda x_n \end{pmatrix}.\]

\[\lambda A = \begin{pmatrix} \lambda a_{11} & \lambda a_{12} & \cdots & \lambda a_{1d} \cr \lambda a_{21} & \lambda a_{22} & \cdots & \lambda a_{2d} \cr \vdots & \vdots & & \vdots \cr \lambda a_{n1} & \lambda a_{n2} & \cdots & \lambda a_{nd} \end{pmatrix}\]

Sum

If $x\in \R^n$ and $y\in R^n$, then $x+y$ exists

\[x+ y = \begin{pmatrix} x_1 + y_1 \cr \vdots\cr x_n + y_n\end{pmatrix}.\]

If $A$ and $B$ are two matrices with the same dimension, $A+B$ exists

\[ A+B = \begin{pmatrix} a_{11} + b_{11} & \cdots & a_{1d} + b_{1d}\cr \vdots & & \vdots \cr a_{n1} + b_{n1}& \cdots & a_{nd} + b_{nd} \end{pmatrix}. \]

Visualization

Consider $x=\begin{pmatrix} 1 \cr 0 \end{pmatrix}$ and $y= \begin{pmatrix} 0 \cr 1 \end{pmatrix},$ and $z=x+y.$

Let $A\in \R^{n \times p}$ and $B\in\R^{p \times d}$, then the product $AB$ is well defined, it is a matrix $C$ in $^{n,d}, so that

\[C_{ij} = \sum_{k=1}^p A_{ik} B_{kj}, \quad 1\leq i\leq n, 1\leq j \leq d.\]

Example

$A\in \R^{3 \times 2}$, $B\in\R^{2 \times 4}$, with $A = \begin{pmatrix} \color{color }{1} & 2 \cr 1 & 3 \cr -1 & -1 \end{pmatrix}$ and $B =\begin{pmatrix} {-1} & {-2} & {0} & {1}\cr {-2} & {1} & {1} & {0}\end{pmatrix},$

$A B$ exists as the number of colums in $A$ equals the number of rows in $B$.

A useful presentation: \[\begin{matrix} & & \begin{pmatrix} \class{bleuf}{-1} & \class{bleu}{-2} & \class{vert}{0} & \class{clair}{1}\cr \class{bleuf}{-2} & \class{bleu}{1} & \class{vert}{1} & \class{clair}{0}\end{pmatrix} \cr & \begin{pmatrix} \class{jaune}{1} & \class{jaune}{2} \cr \class{orange}{1} & \class{orange}{3} \cr \class{rouge}{-1} & \class{rouge}{-1} \end{pmatrix} & \begin{pmatrix} \class{jaune}{\bullet} \class{bleuf}{\bullet} & \class{jaune}{\bullet} \class{bleu}{\bullet} & \class{jaune}{\bullet} \class{vert}{\bullet} & \class{jaune}{\bullet} \class{clair}{\bullet} \cr \class{orange}{\bullet} \class{bleuf}{\bullet} & \class{orange}{\bullet} \class{bleu}{\bullet} & \class{orange}{\bullet} \class{vert}{\bullet} & \class{orange}{\bullet} \class{clair}{\bullet} \cr \class{rouge}{\bullet} \class{bleuf}{\bullet} & \class{rouge}{\bullet} \class{bleu}{\bullet} & \class{rouge}{\bullet} \class{vert}{\bullet} & \class{rouge}{\bullet} \class{clair}{\bullet} \cr \end{pmatrix}, \end{matrix} \] * \[AB = \begin{pmatrix} \class{jaune}{1}\times\class{bleuf}{(-1)} +\class{jaune}{2}\times \class{bleuf}{(-2)} & \class{jaune}{1}\times \class{bleu}{(-2)} + \class{jaune}{2} \times\class{bleu}{1} & \class{jaune}{1}\times \class{vert}{0} + \class{jaune}{2}\times \class{vert}{1} & \class{jaune}{1} \times\class{clair}{1} + \class{jaune}{2}\times \class{clair}{0} \cr \class{orange}{1} \times \class{bleuf}{(-1)} + \class{orange}{3} \times \class{bleuf}{(-2)} & \class{orange}{1} \times \class{bleu}{(-2)} + \class{orange}{3} \times \class{bleu}{1} & \class{orange}{1} \times \class{vert}{0} + \class{orange}{3} \times \class{vert}{1} & \class{orange}{1} \times\class{clair}{1} + \class{orange}{3}\times \class{clair}{0} \cr \class{rouge}{(-1)} \times \class{bleuf}{(-1)} + \class{rouge}{(-1)} \times \class{bleuf}{(-2)} & \class{rouge}{(-1)} \times \class{bleu}{(-2)} + \class{rouge}{(-1)} \times \class{bleu}{1} & \class{rouge}{(-1)} \times \class{vert}{0} + \class{rouge}{(-1)} \times \class{vert}{1} & \class{rouge}{(-1)} \times\class{clair}{1} + \class{rouge}{(-1)}\times \class{clair}{0} \cr \end{pmatrix},\]

\[AB = \begin{pmatrix} -5 & 0 & 2 & 1 \cr -7 & 1 & 3 & 1 \cr 3 & 1 &-1 & -1 \cr \end{pmatrix}. \]

A matrix is defined column by column

A = matrix(c(1,1,-1,2,3,-1), ncol = 2)
B = matrix(c(-1,-2, -2,1, 0, 1, 1, 0), ncol = 4)

The default operation $A*B$ is not the matrix product but a product elment by element, the matrix product is obtained by

A %*% B

     [,1] [,2] [,3] [,4]
[1,]   -5    0    2    1
[2,]   -7    1    3    1
[3,]    3    1   -1   -1

Matrix product is available through the numpy library

import numpy as np

A = np.array([[ 1, 2], [ 1, 3], [ -1, 1]]); A.view()

array([[ 1,  2],
       [ 1,  3],
       [-1,  1]])

B = np.array([[-1, -2, 0, 1], [ -2, 1, 1, 0]]); B.view()

array([[-1, -2,  0,  1],
       [-2,  1,  1,  0]])

C = A.dot(B);  C.view()

array([[-5,  0,  2,  1],
       [-7,  1,  3,  1],
       [-1,  3,  1, -1]])

Matrix vector product

As a vector $x$ in $\R^n$ is also a matrix in $\R^{n\times 1}$, The product $A x$ exist is the number of columns in $_A$ equals the number of row for $x$.

Example
Exercises

A = \[\begin{pmatrix} 1 & -1 \cr 1 & 0 \cr -1 & 0.5 \end{pmatrix}\]

Let $x\in \R^2$, $Ax$ exists

\[Ax = \begin{pmatrix} x_1 - x_2 \cr x_1 \cr -x_1 + 0.5 x \end{pmatrix},\]

Let $x\in\R^2$, $y\in\R^3$ and \[A = \begin{pmatrix} 1 & -1 & 0 \cr 1 & 0 & -0.5 \cr -1 & 0.5 & 2 \end{pmatrix}, B = \begin{pmatrix} 1 & 1 & 0 \cr -1 & 0 & -0.5 \end{pmatrix},\quad C = \begin{pmatrix} 1 & -1 \cr -1 & 2 \cr 1 & -0.5 \end{pmatrix},\]

When possible, compute the following quantites

\[ Ax; \quad Ay; \quad AB; \quad BA; \quad AC; \quad CA; \quad Bx; \quad Cx. \]

Transposition

The transpose, denoted by $\ ^\intercal$, of a matrix results from switching the rows and columns. Let $A \in \R^{n\times d}$, \[A^{\intercal} \in \R^{d\times n},\ and (A^{\intercal})_{ij}= A_{ji}.\]

\[ A = \begin{pmatrix} \class{jaune}{\bullet} & \class{jaune}{\bullet} \cr \class{orange}{\bullet} & \class{orange}{\bullet} \cr \class{rouge}{\bullet} & \class{rouge}{\bullet} \cr \end{pmatrix}, \quad A^{\intercal} = \begin{pmatrix} \class{jaune}{\bullet} & \class{orange}{\bullet} & \class{rouge}{\bullet}\cr \class{jaune}{\bullet} & \class{orange}{\bullet} & \class{rouge}{\bullet} \end{pmatrix} \]

Illustration
Exercise

Example

\[A = \begin{pmatrix} 1 & -1 \cr 1 & 0 \cr -1 & 0.5 \end{pmatrix}, \quad A^\intercal = \begin{pmatrix} 1 & 1 & -1 \cr -1 & 0 & 0.5 \end{pmatrix}, \]

Properties

$(A^\intercal)^\intercal = A,$
$(AB)^\intercal = (B)^\intercal (A)^\intercal,$
$(A+B)^\intercal = (A)^\intercal + (B)^\intercal.$

\[A = \begin{pmatrix} 1 & -1 & 0 \cr 1 & 0 & -0.5 \cr -1 & 0.5 & 2 \end{pmatrix}, B = \begin{pmatrix} 1 & 1 & 0 \cr -1 & 0 & -0.5 \end{pmatrix}, C=\begin{pmatrix} 1 & 2 \cr 2 & 1 \end{pmatrix}, x \in \R^2, y\in\R^3.\]

Compute $(C)^\intercal, (BA)^\intercal, x^\intercal B, (Ay)^\intercal, y^\intercal A^\intercal$

Dot product and norm

Let $x,y$ be in $(\R^n)^2,$ the dot product, also known as inner product or scalar product is defined as \[x\cdot y = y\cdot x = \sum_{i=1}^n x_i y_i = x^\intercal y= y^\intercal x.\] The norm of a vector $x$, symbolized with $\norm{x},$ is defined by \[\norm{x}^2 =\sum_{i=1}^n x_i^2 = x^\intercal x.\]

Example
Geometrical Properties
Useful remark

\[x=\begin{pmatrix} 1\cr 0 \end{pmatrix}, y=\begin{pmatrix} 0\cr 1 \end{pmatrix}, z=\begin{pmatrix} 1\cr 2 \end{pmatrix} \]

\[x\cdot y = 0, x\cdot z = 1, y \cdot z = 2, \quad \norm{x}=\norm{y}=1, \quad \norm{z} = \sqrt{5}\]

Let $(x,y)\in(\R^n)^2$,

$x\cdot y = \norm{x} \norm{y} \cos(\theta)$

$\cos(\theta)= \frac{x\cdot y}{\norm{x} \norm{y}}$

Link with Linear applications

Vector space

Definition

A real vector space $(V, +, . \R)$ is a set $V$ equipped with two operations satisfybing the following properties for all $(x,y,z)\in V^3$ and all $(\lambda,\mu)\in\R^2:$

(close for addition) $x + y \in V,$
(close for scalar multiplaication) $\lambda x \in V,$
(communtativity) $x + y = y + x,$
(associativity of +) $(x + y) + z = x + (y + z),$
(Null element for +) $x + \mathbb{0} = x,$
(Existence of additive inverse +) There exists $-x \in V$, such that $ + (-x) = ,$
(Associativity of scalar multiplication) $\lambda (\mu x) = (\lambda \mu ) x,$
(Distributivity of scalar sums) $(\lambda + \mu ) x = \lambda x + \mu x,$
(Distributivity of vector sums) $\lambda (x + y ) = \lambda x + \lambda y,$
(Scalar multiplication identity) $1 x = x.$

It is enough for the course to think to the set of vectors in $\R^d$.

Linear mapping

Let $f$ be a function from $\R^d$ to $\R^n$, $f$ is linear if and only if

For all $(\lambda,\mu) \in \R^2$ and for all $(x,y) (\R^d)^2,$ \[f(\lambda x + \mu y ) = \lambda f(x) + \mu f( y )\]

Example

\[\begin{align} f : \R^2 & \to \R\cr x & \mapsto x_1-x_2 \end{align}\]

\[\begin{align} f(\lambda x + \mu y) & = f \left ( \begin{pmatrix} \lambda x_1 + \mu y_1\cr \lambda x_2 + \mu y_2\cr \end{pmatrix} \right) = (\lambda x_1 + \mu y_1) - (\lambda x_2 + \mu y_2)=\lambda (x_1 - x_2) + \mu (y_1 -y_2)\cr & = \lambda f( x ) + \mu f( y )\cr \end{align}\]

\[f(x) = A x, \quad A = \begin{pmatrix} 1 & -1 \end{pmatrix}\]

Matrix as a linear map on vector space

A real matrix $A\in\R^{n\times d}$ is a linear function operating from one real vector space of dimension $d$ to real vector space of dimension $n$.

\[\begin{align} A: &\R^d \to \R^n\\ & x \mapsto y = Ax \end{align}\]

Examples
Visualisation
Exercices
Solutions

\[A = \begin{pmatrix} \color{color }{1} & 2 \cr 1 & 3 \cr -1 & -1 \end{pmatrix}\]

\[\begin{align} A: &\R^2 \to \R^3\\ & x \mapsto y = \begin{pmatrix} \color{color }{1} & 2 \cr 1 & 3 \cr -1 & -1 \end{pmatrix} \begin{pmatrix} x_1\cr x_2 \end{pmatrix} =\begin{pmatrix} x_1 + 2 x_2 \cr x_1 + 3 x_2 \cr -x_1-x_2 \end{pmatrix} \end{align}\]

In particular $A \begin{pmatrix}1\cr 0\end{pmatrix} = A_{.,1}$ and $A \begin{pmatrix} 0\cr 1\end{pmatrix} = A_{.,2}.$

\[M = \begin{pmatrix} \color{color }{1} & -1 \cr 1 & 2 \end{pmatrix}\]

\[\begin{align} & x \mapsto y = \begin{pmatrix} 1 & -1 \cr 1 & 2 \end{pmatrix} \begin{pmatrix} x_1\cr x_2 \end{pmatrix} =\begin{pmatrix} x_1 - x_2 \cr x_1 + 2 x_2 \end{pmatrix} \end{align}\]

In particular $M \begin{pmatrix}1\cr 0\end{pmatrix} = M_{.,1}$ and $M \begin{pmatrix} 0\cr 1\end{pmatrix} = M_{.,2}.$

\[\begin{align} M: &\R^2 \to \R^2\\ & x \mapsto y = \begin{pmatrix} 1 & -1 \cr 1 & 2 \end{pmatrix} x \end{align}\] is a transformation of $\R^2$

Exercice 1

Let’s consider the application from $\R^2$ to $\R^2$ which projects on the $x$ axis.

Is it a linear transformation of $\R^2$ ?
If so, could you write the corresponding matrix $P$ ?

Exercice 2

Let’s consider the application from $\R^2$ to $\R$ which computes the square norm of the vector $|| x||^2 = x_1^2 + x_2^2$

Is it a linear transformation of $\R^2$ ?
If so, could you write the corresponding matrix $N$ ?

Exercice 1

Projection on $x$ axis consists in forgetting the $y$ axis component. \[\begin{align} f(\lambda_1 x + \lambda_2 z) & = f\left ( \begin{pmatrix} \lambda_1 x_1 + \lambda_2 z_1 \cr \lambda_1 x_1 + \lambda_2 z_2\cr \end{pmatrix} \right) \cr & = \begin{pmatrix} \lambda_1 x_1 + \lambda_2 z_1 \cr 0\cr \end{pmatrix}\cr & = \lambda_1 f(x) + \lambda_2 f(z), \end{align}\]
$f$ is linear, its matrix form $P:$

\[P = \begin{pmatrix} 1 & 0 \cr 0 & 0 \end{pmatrix},\]

\[Px = \begin{pmatrix} 1 & 0 \cr 0 & 0 \end{pmatrix} \begin{pmatrix} x_1 \cr x_2 \end{pmatrix} = \begin{pmatrix} x_1 \cr 0 \end{pmatrix}\]

Exercice 2

Square Norm $f$ of $x$ equals $x_1^2 + x_2^2,$ As $f(\lambda_1 x ) = \lambda_1^2 f( x)$, the square norm is not a linear function and could not be expressed as a matrix.

Linear independance

A sequence of vectors $x_1, \cdots, x_k$ from $\R^n$ is said to be linearly independent, if the only solution to $X a = \begin{pmatrix} & & & \cr x_1 & x_2 & \cdots & x_k \cr & & & \cr \end{pmatrix} \begin{pmatrix} a_1 \cr a_2 \cr \vdots \cr a_k\end{pmatrix} =\mathbb{0}$ is $a =\mathbb {0}$.

Example
Exercice

$x_1 = \begin{pmatrix} 1 \cr 0 \cr 0 \end{pmatrix}, \quad x_2 = \begin{pmatrix} 1 \cr 1 \cr 0 \end{pmatrix}, \quad X=\begin{pmatrix} 1 & 1 \cr 0 & 1 \cr 0 & 0 \cr \end{pmatrix}, \quad Xa = \begin{pmatrix}a_1 + a_2\cr a_2 \cr 0 \end{pmatrix}.$

$x_1$ and $x_2$ are linearly independant.

$x_1 = \begin{pmatrix} 1 \cr 1 \cr 1 \end{pmatrix}, \quad x_2 = \begin{pmatrix} 1 \cr 1 \cr 0 \end{pmatrix}, x_3 = \begin{pmatrix} 0 \cr 1 \cr 1 \end{pmatrix}, x_4 = \begin{pmatrix} 0 \cr -1 \cr 1 \end{pmatrix}$

Are $(x_1, x_2,x_3)$ linearly independant ?
Are $(x_1, x_2,x_4)$ linearly independant ?

Matrix rank

The sequence of columns of a matrix $M\in \R^{n\times d}$ forms a sequence of $d$ vectors $(c_1, \cdots,c_d)$ in $\R^n$, while the rows of a matrix forms a sequence of $n$ vectors $(r_1, \cdots,r_n)$ in $\R^d$.

The rank of a matrix is the length of the largest sequence of linearly independant columns or equivalently the length of the largest sequence of linearly independant rows.

Example
Remarks
Example with R
Example with Python

$X_1 =\begin{pmatrix} 1 & 1 \cr 0 & 1 \cr 0 & 0 \cr \end{pmatrix}, \quad rank(X_1) = 2; \quad \quad X_2=\begin{pmatrix} 1 & 1 & 0 \cr 1 & 1 & -1\cr 1 & 0 & 1 \end{pmatrix}, \quad rank(X_2) = 3;$
$X_3=\begin{pmatrix} 1 & 1 & -1 \cr 1 & 1 & -1\cr 1 & 0 & 1 \end{pmatrix}, \quad rank(X_3) = 2;$

Let $A= (a_1, \cdots, a_d)^\intercal \in \R^d$, $A A^\intercal = \begin{pmatrix} & & \cr a_1 A & \cdots & a_d A\cr & & \cr \end{pmatrix}$ if of rank $1$.

#install.packages('Matrix')
library(Matrix)
X1 <- matrix(c(1,0,0,1,1,0), ncol = 2)
X2 <- matrix(c(1,1,1,1,1,0, 0, -1,1), ncol = 3)
X3 <- matrix(c(1,1,1,1,1,0, -1, -1,1), ncol = 3)

rankMatrix(X1)[1]

[1] 2

rankMatrix(X2)[1]

[1] 3

rankMatrix(X3)[1]

[1] 2

import numpy as np
from numpy.linalg import matrix_rank

X1 = np.array([[ 1, 1], [ 0, 1], [ 0, 0]])
X2 = np.array([[ 1, 1, 0], [ 1, 1, -1], [ 1, 0, 1]])
X3 = np.array([[ 1, 1, -1], [ 1, 1, -1], [ 1, 0, 1]])

matrix_rank(X1)

np.int64(2)

matrix_rank(X2)

np.int64(3)

matrix_rank(X3)

np.int64(2)

Column Space, rank

If $A\in\R^{n\times d},$ and $c\in \R^d$, \[A c = \begin{pmatrix} & & & \cr A_{.,1} & A_{.,2} & \cdots & A_{.,d} \cr & & & \cr \end{pmatrix} \begin{pmatrix} c_1 \cr c_2 \cr \vdots \cr c_d \cr \end{pmatrix} = \begin{pmatrix} c_1 a_{11} + c_2 a_{12} + \cdots + c_d a_{1d} \cr \vdots \cr c_1 a_{n1} + c_2 a_{n2} + \cdots + c_d a_{nd} \cr \end{pmatrix} = c_1 A_{.,1} + c_2 A_{.,2} + \cdots + c_d A_{.,d}\]

Image of $A$ or Column space of A

$Im(A) = \left \lbrace y \in \R^d; \exists c\in R^d, y = \sum_{l=1}^d c_l A{., l} \right \rbrace$

Remarks

If $A_{.,1}$ and $A_{,2}$ are linearly dependant, $A_{,1} = \lambda A_{,2}$ and $Ac = ( c_1 \lambda + c_2) A_{.,2} + \cdots + c_d A_{.,d}$. The rank is the minimal number of vector to generate $Im(A)$, that is the dimension of $Im(A)$.

Remarkable matrices

Identity matrix

The square matrix $I$ such $I_{ij} =\delta_{ij}$ is named the Identity matrix and acts as the neutral element for matrix multiplication

Let $I_n$ be the identity matrix in $^{nn}, for any $A \in \R^{m\times n}$, $A I_n = A$ and $I_m A = A$.

Diagonal matrix

A Diagonal matrix is a matrix in which elements outside the main diagonal are all zeros. The term is mostly used for square matrix but the terminology might also be used otherwise.

Example
Geometrical point of view

Let’s $e_i = (\delta_{i1}, \cdots, \delta_{id})$ and $D = \begin{pmatrix}2 & 0 & 0 \cr 0 & 1 & 0 \cr 0 & 0 & -1 \end{pmatrix}$ \[D \class{rouge}{e_1} = 2 \class{rouge}{e_1},\quad D \class{bleu}{e_2} = \class{bleu}{e_2},\quad D \class{orange}{e_3} = - \class{orange}{e_3}.\]

$D$ acts as simple scalar multiplier on the basis vectors.

Consider $D = \begin{pmatrix}2 & 0 \cr 0 & 0.5 \end{pmatrix},\quad Dx = \begin{pmatrix} 2 x_1 \cr 0.5 x_2 \end{pmatrix}.$

Orthogonal matrix

Definition

An orthogonal matrix, $Q$ or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors and therefore verifies \[Q^\intercal Q = Q Q^\intercal = I.\]

Example
Geometrical point of view

$Q = \begin{pmatrix}\frac{\sqrt{2}}{2} & 0 & -\frac{\sqrt{2}}{2}\cr \frac{\sqrt{2}}{2} & 0 & \frac{\sqrt{2}}{2}\cr 0 & 1 & 0 \end{pmatrix}$ \[Q Q^\intercal = I_3\]

$D$ transform the vector basis in new orthonormal vectors.

Consider $Q = \begin{pmatrix}\frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2}\cr \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \end{pmatrix},\quad Qx = \begin{pmatrix} \frac{\sqrt{2}}{2}x_1 + \frac{\sqrt{2}}{2} x_2\cr -\frac{\sqrt{2}}{2}x_1 + \frac{\sqrt{2}}{2} x_2 \end{pmatrix}.$

Eigen values, Eigen vectors

Consider a square matrix, $A \in \R^{d\times d}$, and a nonzero vector, $v \in \R^d$, $v$ is an eigen vector for the eigen value $\lambda$ if \[Av = \lambda v.\] If $v$ is an eigen vector for $A$, applying $A$ to $v$ just consists in multiplying by the corresponding eigen value.

Geometric interpretation
Exercises

Identify the eigen values and eigen vectors for $A_1 = \begin{pmatrix}1 & 0 \cr 0 & 2\end{pmatrix}$, $A_2 = \begin{pmatrix}1 & 3 \cr 0 & 2\end{pmatrix},$ $A_3 = \begin{pmatrix} 2 & 2 \cr -2 & 2 \end{pmatrix}.$

Singular Value Decomposition (SVD)

Let’s $A$ a $n\times d$ matrix of rank $r$, there exists a $n\times n$ orthogonal matrix $P$, a $d\times d$ orthogonal matrix $Q$, and $D_1$ a diagonal $r\times r$ matrix whose diagonal terms are in decreasing order such that: \[ A = P \begin{pmatrix} D_1 & 0 \cr 0 & 0 \cr \end{pmatrix} Q^\intercal\]

For a nice and visual course on SVD see the Steve Brunton Youbube Channel on this subject!

Examples
Remarks
SVD in R
SVD in Python

$A = \frac{\sqrt{2}}{6} \begin{pmatrix} 0 & 4 \cr 6 & 2 \cr -3 & -5 \end{pmatrix}$ then consider

$P = \frac{1}{3} \begin{pmatrix} 1 & -2 & 2 \cr 2 & 2 & 1 \cr -2 & 1 & 2 \end{pmatrix}, \quad D = \begin{pmatrix} 2 & 0 \cr 0 & -1 \cr 0 & 0 \end{pmatrix}, \quad and \ Q = \frac{\sqrt{2}}{2} \begin{pmatrix} 1 & -1 \cr 1 & 1 \end{pmatrix}$

if $A$ is a $\^{d\timesd}$ symmetric matrix

\[A = P D P^\intercal.\]

A =matrix(c(0, sqrt(2), - sqrt(2)/2, 2* sqrt(2)/3, sqrt(2)/3, -5*sqrt(2)/6), ncol = 2);A

           [,1]       [,2]
[1,]  0.0000000  0.9428090
[2,]  1.4142136  0.4714045
[3,] -0.7071068 -1.1785113

svd(A)

$d
[1] 2 1

$u
           [,1]       [,2]
[1,] -0.3333333  0.6666667
[2,] -0.6666667 -0.6666667
[3,]  0.6666667 -0.3333333

$v
           [,1]       [,2]
[1,] -0.7071068 -0.7071068
[2,] -0.7071068  0.7071068

array([[ 0.        ,  0.94280904],
       [ 1.41421356,  0.47140452],
       [-0.70710678, -1.1785113 ]])

And now

It’s time to use this concept for Data Analysis

Bibliography

Banerjee, S. and A. Roy (2014). Linear algebra and matrix analysis for statistics. Vol. 181. Crc Press Boca Raton.

Cohen, M. X. (2022). Practical linear algebra for data science. O’Reilly Media, Inc.

Gentle, J. E. (2007). “Matrix algebra”. In: Springer texts in statistics, Springer, New York, NY, doi 10, pp. 978-0.

Harville, D. A. (1998). Matrix algebra from a statistician’s perspective.

Petersen, K. B., M. S. Pedersen, and others (2008). “The matrix cookbook”. In: Technical University of Denmark 7.15, p. 510.

Yoshida, R. (2021). Linear algebra and its applications with R. Chapman and Hall/CRC. URL: https://shainarace.github.io/LinearAlgebra/.

Linear Algebra Survival Kit For Data Science

References on Linear Algebra for Data Scientist.

Linear Algebra : What and why

Link between Data science and Linear Algebra

Palmer penguins illustration

Linear Algebra : What and why

Regression Analysis

Principal Component Analysis

Linear Algebra Basic objects

Vectors

Matrix

Basic operations

Simple operations

Visualization

Matrix product

Example

Matrix vector product

Transposition

Example

Properties

Dot product and norm

Link with Linear applications

Vector space

Linear mapping

Example

Matrix as a linear map on vector space

Exercice 1

Exercice 2

Exercice 1

Exercice 2

Linear independance

Matrix rank

Column Space, rank

Image of \(A\) or Column space of A

Remarkable matrices

Identity matrix

Diagonal matrix

Orthogonal matrix

Definition

Eigen values, Eigen vectors

Singular Value Decomposition (SVD)

And now

Bibliography