Linear algebra is the branch of mathematics concerning linear equations such as \(a_1 x_1 + \cdots a_n x_n = b,\) and linear maps such as \((x_{1},\ldots ,x_{n})\mapsto a_{1}x_{1}+\cdots +a_{n}x_{n},\) and their representations in vector spaces and through matrices.
Link between Data science and Linear Algebra
Data are organized in rectangular array, understood as matrices, and the mathematical concept associated with matrices are used to manipulate data.
Palmer penguins illustration
## install.packages('palmerpenguins")library("palmerpenguins")penguins_nona <-na.omit(penguins) ## remove_na for nowpenguins_nona |>print(n=3)
# A tibble: 333 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
# ℹ 330 more rows
# ℹ 2 more variables: sex <fct>, year <int>
One row describes one specific penguin,
One column corresponds to one specific attribute of the penguin.
Focusing on quantitative variables, we get rectangular array with n=333 rows, and d=5 columns.
Each penguin might be seen as a vector in \(\mathbb{R}^d\) and each variable might be seen as a vector in \(\mathbb{R}^n\). The whole data set is a matrix of dimension \((n,d)\).
Linear Algebra : What and why
Regression Analysis
to predict one variable, the response variable named \(Y\), given the others, the regressors
or equivalently to identify a linear combination of the regressors which provide a good approximation of the responsbe variable,
or equivalently to specify parameters\(\theta,\) so that the prediction\(\hat{Y} = X \theta,\) linear combination of the regressors which is as close as possible from \(Y\).
We are dealing with linear equations, enter the world of Linear Algebra.
Principal Component Analysis
to explore relationship between variables,
or equivalently to quantify proximity between vectors,
but also to small set set of new variables (vectors) which represent most of the initial variables,
We are dealing with vectors, vector basis, enter the world of Linear Algebra.
Linear Algebra Basic objects
Vectors
\(x\in \R^n\) is a vector of \(n\) components.
By convention, \(x\) is a column vector\[x=\begin{pmatrix} x_1 \cr \vdots\cr x_n \end{pmatrix}.\]
When a row vector is needed we use operator \(\ ^\intercal,\)
The term on row \(i\) and column \(j\) is refered to as \(a_{ij}\),
The column \(j\) is refer to as \(A_{.,j}\), \(A_{.,j} = \begin{pmatrix} a_{1j} \cr \vdots\cr a_{nj} \end{pmatrix}.\)
To respect the column vector convention, the row \(i\) is refer to as \(A_{i,.}^\intercal\), \(A_{i,.}^\intercal = \begin{pmatrix} a_{i1}, & \cdots & , a_{id} \end{pmatrix}\)
As a vector \(x\) in \(\R^n\) is also a matrix in \(\R^{n\times 1}\), The product \(A x\) exist is the number of columns in \(_A\) equals the number of row for \(x\).
The transpose, denoted by \(\ ^\intercal\), of a matrix results from switching the rows and columns. Let \(A \in \R^{n\times d}\), \[A^{\intercal} \in \R^{d\times n},\ and (A^{\intercal})_{ij}= A_{ji}.\]
Let \(x,y\) be in \((\R^n)^2,\) the dot product, also known as inner product or scalar product is defined as \[x\cdot y = y\cdot x = \sum_{i=1}^n x_i y_i = x^\intercal y= y^\intercal x.\] The norm of a vector \(x\), symbolized with \(\norm{x},\) is defined by \[\norm{x}^2 =\sum_{i=1}^n x_i^2 = x^\intercal x.\]
A real vector space \((V, +, . \R)\) is a set \(V\) equipped with two operations satisfybing the following properties for all \((x,y,z)\in V^3\) and all \((\lambda,\mu)\in\R^2:\)
(close for addition)\(x + y \in V,\)
(close for scalar multiplaication)\(\lambda x \in V,\)
(communtativity)\(x + y = y + x,\)
(associativity of +)\((x + y) + z = x + (y + z),\)
(Null element for +)\(x + \mathbb{0} = x,\)
(Existence of additive inverse +) There exists \(-x \in V\), such that $ + (-x) = ,$
\[f(x) = A x, \quad A = \begin{pmatrix}
1 & -1
\end{pmatrix}\]
Matrix as a linear map on vector space
A real matrix \(A\in\R^{n\times d}\) is a linear function operating from one real vector space of dimension \(d\) to real vector space of dimension \(n\).
\[\begin{align}
A: &\R^d \to \R^n\\
& x \mapsto y = Ax
\end{align}\]
Square Norm \(f\) of \(x\) equals \(x_1^2 + x_2^2,\) As \(f(\lambda_1 x ) = \lambda_1^2 f( x)\), the square norm is not a linear function and could not be expressed as a matrix.
Linear independance
A sequence of vectors \(x_1, \cdots, x_k\) from \(\R^n\) is said to be linearly independent, if the only solution to \(X a = \begin{pmatrix}
& & & \cr
x_1 & x_2 & \cdots & x_k \cr
& & & \cr
\end{pmatrix} \begin{pmatrix} a_1 \cr a_2 \cr \vdots \cr a_k\end{pmatrix} =\mathbb{0}\) is \(a =\mathbb {0}\).
The sequence of columns of a matrix \(M\in \R^{n\times d}\) forms a sequence of \(d\) vectors \((c_1, \cdots,c_d)\) in \(\R^n\), while the rows of a matrix forms a sequence of \(n\) vectors \((r_1, \cdots,r_n)\) in \(\R^d\).
The rank of a matrix is the length of the largest sequence of linearly independant columns or equivalently the length of the largest sequence of linearly independant rows.
\(Im(A) = \left \lbrace y \in \R^d; \exists c\in R^d, y = \sum_{l=1}^d c_l A{., l} \right \rbrace\)
Remarks
If \(A_{.,1}\) and \(A_{,2}\) are linearly dependant, \(A_{,1} = \lambda A_{,2}\) and \(Ac = ( c_1 \lambda + c_2) A_{.,2} + \cdots + c_d A_{.,d}\). The rank is the minimal number of vector to generate \(Im(A)\), that is the dimension of \(Im(A)\).
Remarkable matrices
Identity matrix
The square matrix \(I\) such \(I_{ij} =\delta_{ij}\) is named the Identity matrix and acts as the neutral element for matrix multiplication
Let \(I_n\) be the identity matrix in $^{nn}, for any \(A \in \R^{m\times n}\), \(A I_n = A\) and \(I_m A = A\).
Diagonal matrix
A Diagonal matrix is a matrix in which elements outside the main diagonal are all zeros. The term is mostly used for square matrix but the terminology might also be used otherwise.
An orthogonal matrix, \(Q\) or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors and therefore verifies \[Q^\intercal Q = Q Q^\intercal = I.\]
Consider a square matrix, \(A \in \R^{d\times d}\), and a nonzero vector, \(v \in \R^d\), \(v\) is an eigen vector for the eigen value\(\lambda\) if \[Av = \lambda v.\] If \(v\) is an eigen vector for \(A\), applying \(A\) to \(v\) just consists in multiplying by the corresponding eigen value.
Identify the eigen values and eigen vectors for \(A_1 = \begin{pmatrix}1 & 0 \cr 0 & 2\end{pmatrix}\), \(A_2 = \begin{pmatrix}1 & 3 \cr 0 & 2\end{pmatrix},\)\(A_3 = \begin{pmatrix} 2 & 2 \cr -2 & 2 \end{pmatrix}.\)
Singular Value Decomposition (SVD)
Let’s \(A\) a \(n\times d\) matrix of rank \(r\), there exists a \(n\times n\) orthogonal matrix \(P\), a \(d\times d\) orthogonal matrix \(Q\), and \(D_1\) a diagonal \(r\times r\) matrix whose diagonal terms are in decreasing order such that: \[ A = P
\begin{pmatrix} D_1 & 0 \cr
0 & 0 \cr
\end{pmatrix} Q^\intercal\]
For a nice and visual course on SVD see the Steve Brunton Youbube Channel on this subject!