ktmagar's blog :): Learning

Asked with the Gemini and Chat GPT - asked queries and noted here. :)

Linear Algebra

- is the branch of mathematics that deals with the vector spaces and the linear transformations.

- what is vector space? A vector space is a collection of vectors where specific operations like addition and scalar multiplication (multiplying a vector by a number) can be performed.

- what is a vector? Vectors represent quantities that have both size and direction. An example can be force which requires the strength and the direction.

Vectors can be added together and multiplied by scalars.

Vector addition:

\mathbf{a} + \mathbf{b} = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} + \begin{pmatrix} b_1 \\ b_2 \end{pmatrix} = \begin{pmatrix} a_1 + b_1 \\ a_2 + b_2 \end{pmatrix}

Scalar multiplication:

c \mathbf{v} = c \begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \begin{pmatrix} c v_1 \\ c v_2 \end{pmatrix}

Represented graphically by arrows in space, with the direction and length corresponding to the vector's direction and magnitude.

- what are linear transformations? Functions that take one vector space and maps it to another, preserving the linear relationships between vectors. It is a transformation that stretches, shrinks, or rotates vectors Matrices represents linear transformations.

- what are matrices? Matrices are rectangular arrays of numbers that represents linear transformations, solve systems of linear equations, and store data. They are grids like structure where operations on rows and columns to manipulate vectors can be performed.

A matrix with $m$ rows and $n$ columns is called an $m \times n$ matrix (read as "m by n matrix"). It is typically written in the form:

$A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}$

where $a_{ij}$ represents the element in the $i$ -th row and $j$ -th column of the matrix.

Operations involving Vectors and Scalars

Dot Product: The dot product (or scalar product) of two vectors $\mathbf{a}$ and $\mathbf{b}$ is a scalar defined as: $\mathbf{a} \cdot \mathbf{b} = a_1 b_1 + a_2 b_2 + \ldots + a_n b_n$
Cross Product: The cross product of two vectors in $\mathbb{R}^3$ results in another vector perpendicular to both: $\mathbf{a} \times \mathbf{b} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \\ \end{vmatrix}$
Norm (Magnitude): The norm (or length) of a vector $\mathbf{v}$ is given by: $\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2}$

Matrix Operations

Addition: Two matrices of the same dimension can be added by adding their corresponding elements.
$A + B = \begin{pmatrix} a_{11} + b_{11} & a_{12} + b_{12} & \cdots & a_{1n} + b_{1n} \\ a_{21} + b_{21} & a_{22} + b_{22} & \cdots & a_{2n} + b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} + b_{m1} & a_{m2} + b_{m2} & \cdots & a_{mn} + b_{mn} \end{pmatrix}$
Scalar Multiplication: A matrix can be multiplied by a scalar by multiplying each element of the matrix by the scalar.
$cA = \begin{pmatrix} c a_{11} & c a_{12} & \cdots & c a_{1n} \\ c a_{21} & c a_{22} & \cdots & c a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ c a_{m1} & c a_{m2} & \cdots & c a_{mn} \end{pmatrix}$
Matrix Multiplication: Two matrices $A$ (of dimension $m \times n$ ) and $B$ (of dimension $n \times p$ ) can be multiplied to form a matrix $C = AB$ (of dimension $m \times p$ ).
$c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}$
Each element $c_{ij}$ of the resulting matrix $C$ is the dot product of the $i$ -th row of $A$ and the $j$ -th column of $B$ .
Matrix Multiplication: Two matrices $A$ (of dimension $m \times n$ ) and $B$ (of dimension $n \times p$ ) can be multiplied to form a matrix $C = AB$ (of dimension $m \times p$ ).
$c_{i j} = \sum_{k = 1}^{n} a_{i k} b_{k j}$
Each element $c_{ij}$ of the resulting matrix $C$ is the dot product of the $i$ -th row of $A$ and the $j$ -th column of $B$ .

Linear Algebra in Machine Learning

1. Data Representation

Vectors and Matrices: Data is often represented as vectors (1D arrays) or matrices (2D arrays). For instance, a dataset with $m$ samples and $n$ features is represented as an $m \times n$ matrix.
Tensors: Higher-dimensional arrays, known as tensors, are used for more complex data structures, such as images (3D tensors) or videos (4D tensors).

2. Model Representation

Linear Models: Linear regression, logistic regression, and support vector machines use vectors and matrices to represent coefficients and features. $\mathbf{y} = X\mathbf{w} + \mathbf{b}$ Here, $\mathbf{y}$ is the vector of predictions, $X$ is the matrix of input features, $\mathbf{w}$ is the weight vector, and $\mathbf{b}$ is the bias vector.

3. Transformations and Projections

Linear Transformations: Matrices are used to perform linear transformations, such as scaling, rotating, and translating data points in space. These transformations are essential in neural networks and dimensionality reduction techniques.
Principal Component Analysis (PCA): PCA is a technique to reduce the dimensionality of data while preserving as much variance as possible. It involves eigenvalue decomposition of the covariance matrix. $X^T X = V \Lambda V^T$ where, $V$ is the matrix of eigenvectors and $\Lambda$ is the diagonal matrix of eigenvalues.

4. Optimization

Gradient Descent: Optimization algorithms like gradient descent rely on linear algebra to update model parameters. The gradient of the loss function with respect to the parameters is computed using vector and matrix operations. $\theta := \theta - \alpha \nabla J(\theta)$ where $\theta$ represents the parameters, $\alpha$ is the learning rate, and $\nabla J(\theta)$ is the gradient of the loss function.

5. Neural Networks

Forward Propagation: In neural networks, inputs are transformed through multiple layers using matrix multiplications and non-linear activation functions. $\mathbf{a}^{(l+1)} = \sigma(W^{(l)} \mathbf{a}^{(l)} + \mathbf{b}^{(l)})$ where $\mathbf{a}^{(l)}$ is the activation vector of layer $l$ , $W^{(l)}$ is the weight matrix, $\mathbf{b}^{(l)}$ is the bias vector, and $\sigma$ is the activation function.
Backpropagation: The backpropagation algorithm for training neural networks involves computing gradients of the loss function with respect to each parameter, which relies heavily on matrix calculus.

6. Singular Value Decomposition (SVD)

Matrix Factorization: Techniques like SVD are used in dimensionality reduction, noise reduction, and data compression. In recommendation systems, SVD helps in decomposing the user-item interaction matrix. $A = U \Sigma V^T$ where $A$ is the original matrix, $U$ and $V$ are orthogonal matrices, and $\Sigma$ is a diagonal matrix of singular values.

7. Probabilistic Models

Multivariate Gaussian Distribution: The covariance matrix of a multivariate Gaussian distribution is a fundamental concept in probabilistic models and Bayesian inference. $p(\mathbf{x}) = \frac{1}{(2\pi)^{k/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (\mathbf{x} - \mu)^T \Sigma^{-1} (\mathbf{x} - \mu)\right)$ where $\mu$ is the mean vector and $\Sigma$ is the covariance matrix.

8. Clustering

K-Means Clustering: In K-means clustering, linear algebra is used to calculate distances between points and centroids, update centroids, and minimize the sum of squared distances within clusters.

Example Applications

Image Processing: Images are represented as matrices of pixel values. Operations like convolution, convolutional neural networks (CNNs), use matrix multiplication.
Natural Language Processing (NLP): Text data is often represented using embeddings, which are matrices that map words to vectors. Matrix operations are used in various NLP models, including transformers.
Recommender Systems: Matrix factorization techniques, such as collaborative filtering, use linear algebra to predict user preferences based on historical data.

ktmagar's blog :)

Sunday, 7 July 2024

Learning - Linear Algebra