Linear Algebra Survival Kit For Data Science

Marie-Pierre Etienne

https://marieetienne.github.io/TopicsInStatistics/

2024-10-09

References on Linear Algebra for Data Scientist.

For a general overview

[Gen07]

[Har98]

With R or Python applications

[Yos21]

[Coh22]

Matrix computations reference book

[PP08]

Linear Algebra : What and why

Definition

Linear algebra is the branch of mathematics concerning linear equations such as \(a_1 x_1 + \cdots a_n x_n = b,\) and linear maps such as \((x_{1},\ldots ,x_{n})\mapsto a_{1}x_{1}+\cdots +a_{n}x_{n},\) and their representations in vector spaces and through matrices.

Data are organized in rectangular array, understood as matrices, and the mathematical concept associated with matrices are used to manipulate data.

Palmer penguins illustration

## install.packages('palmerpenguins")
library("palmerpenguins")
penguins_nona <- na.omit(penguins) ## remove_na for now
penguins_nona |> print(n=3)
# A tibble: 333 × 8
  species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
  <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
1 Adelie  Torgersen           39.1          18.7               181        3750
2 Adelie  Torgersen           39.5          17.4               186        3800
3 Adelie  Torgersen           40.3          18                 195        3250
# ℹ 330 more rows
# ℹ 2 more variables: sex <fct>, year <int>
  • One row describes one specific penguin,
  • One column corresponds to one specific attribute of the penguin.

Focusing on quantitative variables, we get rectangular array with n=333 rows, and d=5 columns.

Each penguin might be seen as a vector in \(\mathbb{R}^d\) and each variable might be seen as a vector in \(\mathbb{R}^n\). The whole data set is a matrix of dimension \((n,d)\).

Linear Algebra : What and why

Regression Analysis

  • to predict one variable, the response variable named \(Y\), given the others, the regressors

  • or equivalently to identify a linear combination of the regressors which provide a good approximation of the responsbe variable,

  • or equivalently to specify parameters \(\theta,\) so that the prediction \(\hat{Y} = X \theta,\) linear combination of the regressors which is as close as possible from \(Y\).

We are dealing with linear equations, enter the world of Linear Algebra.

Principal Component Analysis

  • to explore relationship between variables,

  • or equivalently to quantify proximity between vectors,

  • but also to small set set of new variables (vectors) which represent most of the initial variables,

We are dealing with vectors, vector basis, enter the world of Linear Algebra.

Linear Algebra Basic objects

Vectors

\(x\in \R^n\) is a vector of \(n\) components.

By convention, \(x\) is a column vector \[x=\begin{pmatrix} x_1 \cr \vdots\cr x_n \end{pmatrix}.\]

When a row vector is needed we use operator \(\ ^\intercal,\)

\[x^\intercal = \begin{pmatrix} x_1, \cdots, x_n \end{pmatrix}.\] The null vector \(\mathbb{0}\in\R^d\),

\[\mathbb{0}^\intercal = \begin{pmatrix} 0, \cdots 0\end{pmatrix}.\]

Matrix

A matrix \(A\in \R^{n\times d}\) has \(n\) rows and \(d\) columns:

\[A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1d} \cr a_{21} & a_{22} & \cdots & a_{2d} \cr \vdots & \vdots & & \vdots \cr a_{n1} & a_{n2} & \cdots & a_{nd} \end{pmatrix}\]

The term on row \(i\) and column \(j\) is refered to as \(a_{ij}\),

The column \(j\) is refer to as \(A_{.,j}\), \(A_{.,j} = \begin{pmatrix} a_{1j} \cr \vdots\cr a_{nj} \end{pmatrix}.\)

To respect the column vector convention, the row \(i\) is refer to as \(A_{i,.}^\intercal\), \(A_{i,.}^\intercal = \begin{pmatrix} a_{i1}, & \cdots & , a_{id} \end{pmatrix}\)

Basic operations

Simple operations

Multiplication by a scalar

let’s \(\lambda\in\R\),

\[\lambda x = \begin{pmatrix} \lambda x_1 \cr \vdots\cr \lambda x_n \end{pmatrix}.\]

\[\lambda A = \begin{pmatrix} \lambda a_{11} & \lambda a_{12} & \cdots & \lambda a_{1d} \cr \lambda a_{21} & \lambda a_{22} & \cdots & \lambda a_{2d} \cr \vdots & \vdots & & \vdots \cr \lambda a_{n1} & \lambda a_{n2} & \cdots & \lambda a_{nd} \end{pmatrix}\]

Sum

If \(x\in \R^n\) and \(y\in R^n\), then \(x+y\) exists

\[x+ y = \begin{pmatrix} x_1 + y_1 \cr \vdots\cr x_n + y_n\end{pmatrix}.\]

If \(A\) and \(B\) are two matrices with the same dimension, \(A+B\) exists

\[ A+B = \begin{pmatrix} a_{11} + b_{11} & \cdots & a_{1d} + b_{1d}\cr \vdots & & \vdots \cr a_{n1} + b_{n1}& \cdots & a_{nd} + b_{nd} \end{pmatrix}. \]

Visualization

Consider \(x=\begin{pmatrix} 1 \cr 0 \end{pmatrix}\) and \(y= \begin{pmatrix} 0 \cr 1 \end{pmatrix},\) and \(z=x+y.\)

Matrix product

Let \(A\in \R^{n \times p}\) and \(B\in\R^{p \times d}\), then the product \(AB\) is well defined, it is a matrix \(C\) in $^{n,d}, so that

\[C_{ij} = \sum_{k=1}^p A_{ik} B_{kj}, \quad 1\leq i\leq n, 1\leq j \leq d.\]

Example

  • \(A\in \R^{3 \times 2}\), \(B\in\R^{2 \times 4}\), with \(A = \begin{pmatrix} \color{color }{1} & 2 \cr 1 & 3 \cr -1 & -1 \end{pmatrix}\) and \(B =\begin{pmatrix} {-1} & {-2} & {0} & {1}\cr {-2} & {1} & {1} & {0}\end{pmatrix},\)

\(A B\) exists as the number of colums in \(A\) equals the number of rows in \(B\).

A useful presentation: \[\begin{matrix} & & \begin{pmatrix} \class{bleuf}{-1} & \class{bleu}{-2} & \class{vert}{0} & \class{clair}{1}\cr \class{bleuf}{-2} & \class{bleu}{1} & \class{vert}{1} & \class{clair}{0}\end{pmatrix} \cr & \begin{pmatrix} \class{jaune}{1} & \class{jaune}{2} \cr \class{orange}{1} & \class{orange}{3} \cr \class{rouge}{-1} & \class{rouge}{-1} \end{pmatrix} & \begin{pmatrix} \class{jaune}{\bullet} \class{bleuf}{\bullet} & \class{jaune}{\bullet} \class{bleu}{\bullet} & \class{jaune}{\bullet} \class{vert}{\bullet} & \class{jaune}{\bullet} \class{clair}{\bullet} \cr \class{orange}{\bullet} \class{bleuf}{\bullet} & \class{orange}{\bullet} \class{bleu}{\bullet} & \class{orange}{\bullet} \class{vert}{\bullet} & \class{orange}{\bullet} \class{clair}{\bullet} \cr \class{rouge}{\bullet} \class{bleuf}{\bullet} & \class{rouge}{\bullet} \class{bleu}{\bullet} & \class{rouge}{\bullet} \class{vert}{\bullet} & \class{rouge}{\bullet} \class{clair}{\bullet} \cr \end{pmatrix}, \end{matrix} \] * \[AB = \begin{pmatrix} \class{jaune}{1}\times\class{bleuf}{(-1)} +\class{jaune}{2}\times \class{bleuf}{(-2)} & \class{jaune}{1}\times \class{bleu}{(-2)} + \class{jaune}{2} \times\class{bleu}{1} & \class{jaune}{1}\times \class{vert}{0} + \class{jaune}{2}\times \class{vert}{1} & \class{jaune}{1} \times\class{clair}{1} + \class{jaune}{2}\times \class{clair}{0} \cr \class{orange}{1} \times \class{bleuf}{(-1)} + \class{orange}{3} \times \class{bleuf}{(-2)} & \class{orange}{1} \times \class{bleu}{(-2)} + \class{orange}{3} \times \class{bleu}{1} & \class{orange}{1} \times \class{vert}{0} + \class{orange}{3} \times \class{vert}{1} & \class{orange}{1} \times\class{clair}{1} + \class{orange}{3}\times \class{clair}{0} \cr \class{rouge}{(-1)} \times \class{bleuf}{(-1)} + \class{rouge}{(-1)} \times \class{bleuf}{(-2)} & \class{rouge}{(-1)} \times \class{bleu}{(-2)} + \class{rouge}{(-1)} \times \class{bleu}{1} & \class{rouge}{(-1)} \times \class{vert}{0} + \class{rouge}{(-1)} \times \class{vert}{1} & \class{rouge}{(-1)} \times\class{clair}{1} + \class{rouge}{(-1)}\times \class{clair}{0} \cr \end{pmatrix},\]

\[AB = \begin{pmatrix} -5 & 0 & 2 & 1 \cr -7 & 1 & 3 & 1 \cr 3 & 1 &-1 & -1 \cr \end{pmatrix}. \]

A matrix is defined column by column

A = matrix(c(1,1,-1,2,3,-1), ncol = 2)
B = matrix(c(-1,-2, -2,1, 0, 1, 1, 0), ncol = 4)

The default operation \(A*B\) is not the matrix product but a product elment by element, the matrix product is obtained by

A %*% B
     [,1] [,2] [,3] [,4]
[1,]   -5    0    2    1
[2,]   -7    1    3    1
[3,]    3    1   -1   -1

Matrix product is available through the numpy library

import numpy as np

A = np.array([[ 1, 2], [ 1, 3], [ -1, 1]]); A.view()
array([[ 1,  2],
       [ 1,  3],
       [-1,  1]])
B = np.array([[-1, -2, 0, 1], [ -2, 1, 1, 0]]); B.view()
array([[-1, -2,  0,  1],
       [-2,  1,  1,  0]])
C = A.dot(B);  C.view()
array([[-5,  0,  2,  1],
       [-7,  1,  3,  1],
       [-1,  3,  1, -1]])

Matrix vector product

As a vector \(x\) in \(\R^n\) is also a matrix in \(\R^{n\times 1}\), The product \(A x\) exist is the number of columns in \(_A\) equals the number of row for \(x\).

A = \[\begin{pmatrix} 1 & -1 \cr 1 & 0 \cr -1 & 0.5 \end{pmatrix}\]

,

Let \(x\in \R^2\), \(Ax\) exists

\[Ax = \begin{pmatrix} x_1 - x_2 \cr x_1 \cr -x_1 + 0.5 x \end{pmatrix},\]

Let \(x\in\R^2\), \(y\in\R^3\) and \[A = \begin{pmatrix} 1 & -1 & 0 \cr 1 & 0 & -0.5 \cr -1 & 0.5 & 2 \end{pmatrix}, B = \begin{pmatrix} 1 & 1 & 0 \cr -1 & 0 & -0.5 \end{pmatrix},\quad C = \begin{pmatrix} 1 & -1 \cr -1 & 2 \cr 1 & -0.5 \end{pmatrix},\]

When possible, compute the following quantites

\[ Ax; \quad Ay; \quad AB; \quad BA; \quad AC; \quad CA; \quad Bx; \quad Cx. \]

Transposition

The transpose, denoted by \(\ ^\intercal\), of a matrix results from switching the rows and columns. Let \(A \in \R^{n\times d}\), \[A^{\intercal} \in \R^{d\times n},\ and (A^{\intercal})_{ij}= A_{ji}.\]

\[ A = \begin{pmatrix} \class{jaune}{\bullet} & \class{jaune}{\bullet} \cr \class{orange}{\bullet} & \class{orange}{\bullet} \cr \class{rouge}{\bullet} & \class{rouge}{\bullet} \cr \end{pmatrix}, \quad A^{\intercal} = \begin{pmatrix} \class{jaune}{\bullet} & \class{orange}{\bullet} & \class{rouge}{\bullet}\cr \class{jaune}{\bullet} & \class{orange}{\bullet} & \class{rouge}{\bullet} \end{pmatrix} \]

Example

\[A = \begin{pmatrix} 1 & -1 \cr 1 & 0 \cr -1 & 0.5 \end{pmatrix}, \quad A^\intercal = \begin{pmatrix} 1 & 1 & -1 \cr -1 & 0 & 0.5 \end{pmatrix}, \]

Properties

  • \((A^\intercal)^\intercal = A,\)
  • \((AB)^\intercal = (B)^\intercal (A)^\intercal,\)
  • \((A+B)^\intercal = (A)^\intercal + (B)^\intercal.\)

\[A = \begin{pmatrix} 1 & -1 & 0 \cr 1 & 0 & -0.5 \cr -1 & 0.5 & 2 \end{pmatrix}, B = \begin{pmatrix} 1 & 1 & 0 \cr -1 & 0 & -0.5 \end{pmatrix}, C=\begin{pmatrix} 1 & 2 \cr 2 & 1 \end{pmatrix}, x \in \R^2, y\in\R^3.\]

Compute \((C)^\intercal, (BA)^\intercal, x^\intercal B, (Ay)^\intercal, y^\intercal A^\intercal\)

Dot product and norm

Let \(x,y\) be in \((\R^n)^2,\) the dot product, also known as inner product or scalar product is defined as \[x\cdot y = y\cdot x = \sum_{i=1}^n x_i y_i = x^\intercal y= y^\intercal x.\] The norm of a vector \(x\), symbolized with \(\norm{x},\) is defined by \[\norm{x}^2 =\sum_{i=1}^n x_i^2 = x^\intercal x.\]

\[x=\begin{pmatrix} 1\cr 0 \end{pmatrix}, y=\begin{pmatrix} 0\cr 1 \end{pmatrix}, z=\begin{pmatrix} 1\cr 2 \end{pmatrix} \]

\[x\cdot y = 0, x\cdot z = 1, y \cdot z = 2, \quad \norm{x}=\norm{y}=1, \quad \norm{z} = \sqrt{5}\]

Let \((x,y)\in(\R^n)^2\),

\(x\cdot y = \norm{x} \norm{y} \cos(\theta)\)

\(\cos(\theta)= \frac{x\cdot y}{\norm{x} \norm{y}}\)

Vector space

Definition

A real vector space \((V, +, . \R)\) is a set \(V\) equipped with two operations satisfybing the following properties for all \((x,y,z)\in V^3\) and all \((\lambda,\mu)\in\R^2:\)

  • (close for addition) \(x + y \in V,\)
  • (close for scalar multiplaication) \(\lambda x \in V,\)
  • (communtativity) \(x + y = y + x,\)
  • (associativity of +) \((x + y) + z = x + (y + z),\)
  • (Null element for +) \(x + \mathbb{0} = x,\)
  • (Existence of additive inverse +) There exists \(-x \in V\), such that $ + (-x) = ,$
  • (Associativity of scalar multiplication) \(\lambda (\mu x) = (\lambda \mu ) x,\)
  • (Distributivity of scalar sums) \((\lambda + \mu ) x = \lambda x + \mu x,\)
  • (Distributivity of vector sums) \(\lambda (x + y ) = \lambda x + \lambda y,\)
  • (Scalar multiplication identity) \(1 x = x.\)

It is enough for the course to think to the set of vectors in \(\R^d\).

Linear mapping

Let \(f\) be a function from \(\R^d\) to \(\R^n\), \(f\) is linear if and only if

For all \((\lambda,\mu) \in \R^2\) and for all \((x,y) (\R^d)^2,\) \[f(\lambda x + \mu y ) = \lambda f(x) + \mu f( y )\]

Example

\[\begin{align} f : \R^2 & \to \R\cr x & \mapsto x_1-x_2 \end{align}\]

\[\begin{align} f(\lambda x + \mu y) & = f \left ( \begin{pmatrix} \lambda x_1 + \mu y_1\cr \lambda x_2 + \mu y_2\cr \end{pmatrix} \right) = (\lambda x_1 + \mu y_1) - (\lambda x_2 + \mu y_2)=\lambda (x_1 - x_2) + \mu (y_1 -y_2)\cr & = \lambda f( x ) + \mu f( y )\cr \end{align}\]

\[f(x) = A x, \quad A = \begin{pmatrix} 1 & -1 \end{pmatrix}\]

Matrix as a linear map on vector space

A real matrix \(A\in\R^{n\times d}\) is a linear function operating from one real vector space of dimension \(d\) to real vector space of dimension \(n\).

\[\begin{align} A: &\R^d \to \R^n\\ & x \mapsto y = Ax \end{align}\]

\[A = \begin{pmatrix} \color{color }{1} & 2 \cr 1 & 3 \cr -1 & -1 \end{pmatrix}\]

\[\begin{align} A: &\R^2 \to \R^3\\ & x \mapsto y = \begin{pmatrix} \color{color }{1} & 2 \cr 1 & 3 \cr -1 & -1 \end{pmatrix} \begin{pmatrix} x_1\cr x_2 \end{pmatrix} =\begin{pmatrix} x_1 + 2 x_2 \cr x_1 + 3 x_2 \cr -x_1-x_2 \end{pmatrix} \end{align}\]

In particular \(A \begin{pmatrix}1\cr 0\end{pmatrix} = A_{.,1}\) and \(A \begin{pmatrix} 0\cr 1\end{pmatrix} = A_{.,2}.\)

\[M = \begin{pmatrix} \color{color }{1} & -1 \cr 1 & 2 \end{pmatrix}\]

\[\begin{align} & x \mapsto y = \begin{pmatrix} 1 & -1 \cr 1 & 2 \end{pmatrix} \begin{pmatrix} x_1\cr x_2 \end{pmatrix} =\begin{pmatrix} x_1 - x_2 \cr x_1 + 2 x_2 \end{pmatrix} \end{align}\]

In particular \(M \begin{pmatrix}1\cr 0\end{pmatrix} = M_{.,1}\) and \(M \begin{pmatrix} 0\cr 1\end{pmatrix} = M_{.,2}.\)

\[\begin{align} M: &\R^2 \to \R^2\\ & x \mapsto y = \begin{pmatrix} 1 & -1 \cr 1 & 2 \end{pmatrix} x \end{align}\] is a transformation of \(\R^2\)

Exercice 1

Let’s consider the application from \(\R^2\) to \(\R^2\) which projects on the \(x\) axis.

  • Is it a linear transformation of \(\R^2\) ?
  • If so, could you write the corresponding matrix \(P\) ?

Exercice 2

Let’s consider the application from \(\R^2\) to \(\R\) which computes the square norm of the vector \(|| x||^2 = x_1^2 + x_2^2\)

  • Is it a linear transformation of \(\R^2\) ?
  • If so, could you write the corresponding matrix \(N\) ?

Exercice 1

  • Projection on \(x\) axis consists in forgetting the \(y\) axis component. \[\begin{align} f(\lambda_1 x + \lambda_2 z) & = f\left ( \begin{pmatrix} \lambda_1 x_1 + \lambda_2 z_1 \cr \lambda_1 x_1 + \lambda_2 z_2\cr \end{pmatrix} \right) \cr & = \begin{pmatrix} \lambda_1 x_1 + \lambda_2 z_1 \cr 0\cr \end{pmatrix}\cr & = \lambda_1 f(x) + \lambda_2 f(z), \end{align}\]

  • \(f\) is linear, its matrix form \(P:\)

\[P = \begin{pmatrix} 1 & 0 \cr 0 & 0 \end{pmatrix},\]

\[Px = \begin{pmatrix} 1 & 0 \cr 0 & 0 \end{pmatrix} \begin{pmatrix} x_1 \cr x_2 \end{pmatrix} = \begin{pmatrix} x_1 \cr 0 \end{pmatrix}\]

Exercice 2

  • Square Norm \(f\) of \(x\) equals \(x_1^2 + x_2^2,\) As \(f(\lambda_1 x ) = \lambda_1^2 f( x)\), the square norm is not a linear function and could not be expressed as a matrix.

Linear independance

A sequence of vectors \(x_1, \cdots, x_k\) from \(\R^n\) is said to be linearly independent, if the only solution to \(X a = \begin{pmatrix} & & & \cr x_1 & x_2 & \cdots & x_k \cr & & & \cr \end{pmatrix} \begin{pmatrix} a_1 \cr a_2 \cr \vdots \cr a_k\end{pmatrix} =\mathbb{0}\) is \(a =\mathbb {0}\).

\(x_1 = \begin{pmatrix} 1 \cr 0 \cr 0 \end{pmatrix}, \quad x_2 = \begin{pmatrix} 1 \cr 1 \cr 0 \end{pmatrix}, \quad X=\begin{pmatrix} 1 & 1 \cr 0 & 1 \cr 0 & 0 \cr \end{pmatrix}, \quad Xa = \begin{pmatrix}a_1 + a_2\cr a_2 \cr 0 \end{pmatrix}.\)

\(x_1\) and \(x_2\) are linearly independant.

\(x_1 = \begin{pmatrix} 1 \cr 1 \cr 1 \end{pmatrix}, \quad x_2 = \begin{pmatrix} 1 \cr 1 \cr 0 \end{pmatrix}, x_3 = \begin{pmatrix} 0 \cr 1 \cr 1 \end{pmatrix}, x_4 = \begin{pmatrix} 0 \cr -1 \cr 1 \end{pmatrix}\)

  • Are \((x_1, x_2,x_3)\) linearly independant ?
  • Are \((x_1, x_2,x_4)\) linearly independant ?

Matrix rank

The sequence of columns of a matrix \(M\in \R^{n\times d}\) forms a sequence of \(d\) vectors \((c_1, \cdots,c_d)\) in \(\R^n\), while the rows of a matrix forms a sequence of \(n\) vectors \((r_1, \cdots,r_n)\) in \(\R^d\).

The rank of a matrix is the length of the largest sequence of linearly independant columns or equivalently the length of the largest sequence of linearly independant rows.

  • \(X_1 =\begin{pmatrix} 1 & 1 \cr 0 & 1 \cr 0 & 0 \cr \end{pmatrix}, \quad rank(X_1) = 2; \quad \quad X_2=\begin{pmatrix} 1 & 1 & 0 \cr 1 & 1 & -1\cr 1 & 0 & 1 \end{pmatrix}, \quad rank(X_2) = 3;\)
  • \(X_3=\begin{pmatrix} 1 & 1 & -1 \cr 1 & 1 & -1\cr 1 & 0 & 1 \end{pmatrix}, \quad rank(X_3) = 2;\)

Let \(A= (a_1, \cdots, a_d)^\intercal \in \R^d\), \(A A^\intercal = \begin{pmatrix} & & \cr a_1 A & \cdots & a_d A\cr & & \cr \end{pmatrix}\) if of rank \(1\).

#install.packages('Matrix')
library(Matrix)
X1 <- matrix(c(1,0,0,1,1,0), ncol = 2)
X2 <- matrix(c(1,1,1,1,1,0, 0, -1,1), ncol = 3)
X3 <- matrix(c(1,1,1,1,1,0, -1, -1,1), ncol = 3)
rankMatrix(X1)[1]
[1] 2
rankMatrix(X2)[1]
[1] 3
rankMatrix(X3)[1]
[1] 2
import numpy as np
from numpy.linalg import matrix_rank

X1 = np.array([[ 1, 1], [ 0, 1], [ 0, 0]])
X2 = np.array([[ 1, 1, 0], [ 1, 1, -1], [ 1, 0, 1]])
X3 = np.array([[ 1, 1, -1], [ 1, 1, -1], [ 1, 0, 1]])

matrix_rank(X1) 
np.int64(2)
matrix_rank(X2) 
np.int64(3)
matrix_rank(X3) 
np.int64(2)

Column Space, rank

If \(A\in\R^{n\times d},\) and \(c\in \R^d\), \[A c = \begin{pmatrix} & & & \cr A_{.,1} & A_{.,2} & \cdots & A_{.,d} \cr & & & \cr \end{pmatrix} \begin{pmatrix} c_1 \cr c_2 \cr \vdots \cr c_d \cr \end{pmatrix} = \begin{pmatrix} c_1 a_{11} + c_2 a_{12} + \cdots + c_d a_{1d} \cr \vdots \cr c_1 a_{n1} + c_2 a_{n2} + \cdots + c_d a_{nd} \cr \end{pmatrix} = c_1 A_{.,1} + c_2 A_{.,2} + \cdots + c_d A_{.,d}\]

Image of \(A\) or Column space of A

\(Im(A) = \left \lbrace y \in \R^d; \exists c\in R^d, y = \sum_{l=1}^d c_l A{., l} \right \rbrace\)

Remarks

  • If \(A_{.,1}\) and \(A_{,2}\) are linearly dependant, \(A_{,1} = \lambda A_{,2}\) and \(Ac = ( c_1 \lambda + c_2) A_{.,2} + \cdots + c_d A_{.,d}\). The rank is the minimal number of vector to generate \(Im(A)\), that is the dimension of \(Im(A)\).

Remarkable matrices

Identity matrix

The square matrix \(I\) such \(I_{ij} =\delta_{ij}\) is named the Identity matrix and acts as the neutral element for matrix multiplication

Let \(I_n\) be the identity matrix in $^{nn}, for any \(A \in \R^{m\times n}\), \(A I_n = A\) and \(I_m A = A\).

Diagonal matrix

A Diagonal matrix is a matrix in which elements outside the main diagonal are all zeros. The term is mostly used for square matrix but the terminology might also be used otherwise.

Let’s \(e_i = (\delta_{i1}, \cdots, \delta_{id})\) and \(D = \begin{pmatrix}2 & 0 & 0 \cr 0 & 1 & 0 \cr 0 & 0 & -1 \end{pmatrix}\) \[D \class{rouge}{e_1} = 2 \class{rouge}{e_1},\quad D \class{bleu}{e_2} = \class{bleu}{e_2},\quad D \class{orange}{e_3} = - \class{orange}{e_3}.\]

\(D\) acts as simple scalar multiplier on the basis vectors.

Consider \(D = \begin{pmatrix}2 & 0 \cr 0 & 0.5 \end{pmatrix},\quad Dx = \begin{pmatrix} 2 x_1 \cr 0.5 x_2 \end{pmatrix}.\)

Orthogonal matrix

Definition

An orthogonal matrix, \(Q\) or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors and therefore verifies \[Q^\intercal Q = Q Q^\intercal = I.\]

\(Q = \begin{pmatrix}\frac{\sqrt{2}}{2} & 0 & -\frac{\sqrt{2}}{2}\cr \frac{\sqrt{2}}{2} & 0 & \frac{\sqrt{2}}{2}\cr 0 & 1 & 0 \end{pmatrix}\) \[Q Q^\intercal = I_3\]

\(D\) transform the vector basis in new orthonormal vectors.

Consider \(Q = \begin{pmatrix}\frac{\sqrt{2}}{2} & -\frac{\sqrt{2}}{2}\cr \frac{\sqrt{2}}{2} & \frac{\sqrt{2}}{2} \end{pmatrix},\quad Qx = \begin{pmatrix} \frac{\sqrt{2}}{2}x_1 + \frac{\sqrt{2}}{2} x_2\cr -\frac{\sqrt{2}}{2}x_1 + \frac{\sqrt{2}}{2} x_2 \end{pmatrix}.\)

Eigen values, Eigen vectors

Consider a square matrix, \(A \in \R^{d\times d}\), and a nonzero vector, \(v \in \R^d\), \(v\) is an eigen vector for the eigen value \(\lambda\) if \[Av = \lambda v.\] If \(v\) is an eigen vector for \(A\), applying \(A\) to \(v\) just consists in multiplying by the corresponding eigen value.

  • Identify the eigen values and eigen vectors for \(A_1 = \begin{pmatrix}1 & 0 \cr 0 & 2\end{pmatrix}\), \(A_2 = \begin{pmatrix}1 & 3 \cr 0 & 2\end{pmatrix},\) \(A_3 = \begin{pmatrix} 2 & 2 \cr -2 & 2 \end{pmatrix}.\)

Singular Value Decomposition (SVD)

Let’s \(A\) a \(n\times d\) matrix of rank \(r\), there exists a \(n\times n\) orthogonal matrix \(P\), a \(d\times d\) orthogonal matrix \(Q\), and \(D_1\) a diagonal \(r\times r\) matrix whose diagonal terms are in decreasing order such that: \[ A = P \begin{pmatrix} D_1 & 0 \cr 0 & 0 \cr \end{pmatrix} Q^\intercal\]

For a nice and visual course on SVD see the Steve Brunton Youbube Channel on this subject!

\(A = \frac{\sqrt{2}}{6} \begin{pmatrix} 0 & 4 \cr 6 & 2 \cr -3 & -5 \end{pmatrix}\) then consider

\(P = \frac{1}{3} \begin{pmatrix} 1 & -2 & 2 \cr 2 & 2 & 1 \cr -2 & 1 & 2 \end{pmatrix}, \quad D = \begin{pmatrix} 2 & 0 \cr 0 & -1 \cr 0 & 0 \end{pmatrix}, \quad and \ Q = \frac{\sqrt{2}}{2} \begin{pmatrix} 1 & -1 \cr 1 & 1 \end{pmatrix}\)

if \(A\) is a \(\^{d\timesd}\) symmetric matrix

\[A = P D P^\intercal.\]

A =matrix(c(0, sqrt(2), - sqrt(2)/2, 2* sqrt(2)/3, sqrt(2)/3, -5*sqrt(2)/6), ncol = 2);A
           [,1]       [,2]
[1,]  0.0000000  0.9428090
[2,]  1.4142136  0.4714045
[3,] -0.7071068 -1.1785113
svd(A)
$d
[1] 2 1

$u
           [,1]       [,2]
[1,] -0.3333333  0.6666667
[2,] -0.6666667 -0.6666667
[3,]  0.6666667 -0.3333333

$v
           [,1]       [,2]
[1,] -0.7071068 -0.7071068
[2,] -0.7071068  0.7071068
array([[ 0.        ,  0.94280904],
       [ 1.41421356,  0.47140452],
       [-0.70710678, -1.1785113 ]])

And now

It’s time to use this concept for Data Analysis

Bibliography

Cohen, M. X. (2022). Practical linear algebra for data science. O’Reilly Media, Inc.

Gentle, J. E. (2007). “Matrix algebra”. In: Springer texts in statistics, Springer, New York, NY, doi 10, pp. 978-0.

Harville, D. A. (1998). Matrix algebra from a statistician’s perspective.

Petersen, K. B., M. S. Pedersen, and others (2008). “The matrix cookbook”. In: Technical University of Denmark 7.15, p. 510.

Yoshida, R. (2021). Linear algebra and its applications with R. Chapman and Hall/CRC. URL: https://shainarace.github.io/LinearAlgebra/.