ML Math - Scalars and Vectors

Scalars and vectors are foundational mathematical concepts used in machine learning. Complex machine learning algorithms, such as neural networks, are built around scalars and vectors. Therefore, a solid understanding of these concepts is essential for navigating the many complex neural-network-based architectures you will encounter

The scaler

A scalar is a single real number representing magnitude or quantity, such as 5, 2, 7.6 or even -3.

s \in \mathbb{R}

Scalars can also be complex or boolean:

While we have defined a scalar as a real number, in standard machine learning it can also be a complex number or a boolean value. For the scope of this post and standard machine learning packages, focusing on real numbers is sufficient.

The Vector

A vector is a sequence of real numbers (scalars) representing magnitude and direction, such as [5, 2], [7.6, -3], or [1, 0, 4].

\mathbf{x} \in \mathbb{R}^n

We represent a vector as a column of $n$ elements. When a vector has $n$ real-valued components, we say it exists in an $n-$ dimensional space:

\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \begin{matrix} \longleftarrow \text{feature } x_1 \\ \longleftarrow \text{feature } x_2 \\ \vdots \\ \longleftarrow \text{feature } x_n \end{matrix}

In machine learning, we use vectors to represent a single data point in your dataset. Each individual number inside the vector is called a feature

Now observe the mathematical representation above, $X$ is a single datapoint, a data point have have many features $x_1, x_2 ...... x_n$ . these feature can be anything, depending on your data, lets take for example a housing price prediction dataset, in other to predict the price, we need specific features like the area of the house, the number of bedrooms, and the age of the house.

\mathbf{x} = \begin{bmatrix} 4500 \\ 8 \\ 10 \end{bmatrix} \begin{matrix} \longleftarrow x_1 \text{ (Area)} \\ \longleftarrow x_2 \text{ (Bedrooms)} \\ \longleftarrow x_3 \text{ (Age)} \end{matrix} \Biggr\} \text{ Features}

See how from the above math equation it clearly shows how we represent features in vectors, where each feature (e.g area can be represented as $x_1$ )

Vector Operations

We have seen how to represent a data point as a vector. The next step is to understand how to perform mathematical operations on vectors. By combining these operations, we can build the mathematical models that power machine learning algorithms

Scalar Multiplication

The simplest operation is multiplying a vector by a scalar. This is often used to "weight" our features. When you multiply a vector by a scalar, you multiply every individual feature inside that vector by that number.

For example, if we want to double the values in our house vector:

a\mathbf{x} = a \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} a \cdot x_1 \\ a \cdot x_2 \\ a \cdot x_3 \end{bmatrix}

Lets see this in code:

python

import numpy as np

# A scalar (e.g., a scaling factor)
a = 2

# A vector x (e.g., our house features)
x = np.array([4500, 8, 10])

# Multiplying the vector by the scalar
result = a * x

print(result)
# Output: [9000, 16, 20]

Vector Addition

To add two vectors together, both vectors must have the same number of features (the same dimensions). For example, if vector $x$ represents a house with 3 features, you can only add it to another vector $y$ that also has 3 features.

We perform this addition element-wise. This means we simply add the numbers that are in the same position.

\mathbf{x} + \mathbf{y} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} + \begin{bmatrix} y_1 \\ y_2 \\ y_3 \end{bmatrix} = \begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 \\ x_3 + y_3 \end{bmatrix}

To represent this in code we will use numpy to perform the vector addition:

python

import numpy as np

# Representing House A as a vector [Area, Bedrooms, Age]
house_a = np.array([4500, 8, 55])

# Representing House B as a vector [Area, Bedrooms, Age]
house_b = np.array([2300, 10, 20])

# Adding the two vectors (Element-wise Addition)
total_features = house_a + house_b

print(f"House A: {house_a}")
print(f"House B: {house_b}")
print(f"Result:  {total_features}") 
# Output: [6800   18   75]

Notice how the result is a new vector: [4500+2300, 8+10, 55+20]

Transposition

Transposing a vector is simply the act of flipping its orientation. If you have a column vector (a vertical list), transposing it turns it into a row vector (a horizontal list).

In machine learning mathematics, transposition is important because many vector operations require specific dimensions. A feature vector is usually written as a column vector:

The column vector:

\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} \begin{matrix} \longleftarrow \text{feature } 1 \\ \longleftarrow \text{feature } 2 \\ \longleftarrow \text{feature } 3 \end{matrix}

The Transposed Vector ( $X^T$ ):

\mathbf{x}^T = \begin{bmatrix} x_1 & x_2 & x_3 \end{bmatrix}

Lets represent this in code:

python

import numpy as np

# Creating a vector (reshaped to look like a column vector)
x = np.array([[2500], [3], [15]])

# Transposing the vector
x_transpose = x.T

The dot product

To calculate the Dot Product, you multiply the corresponding elements of two vectors and then add all the results together. because we are adding everything up, at the end, the result of the Dot Product is always a scaler number.

Consider a house price prediction problem where a data point is represented by features such as area, number of bedrooms, and age of the house. We can represent these features as a column vector:

\mathbf{x} =\begin{bmatrix}400 \\7 \\6\end{bmatrix}

Our model also has a set of parameters, often called weights, which determine how much each feature contributes to the prediction:

\mathbf{w} =\begin{bmatrix}w_1 \\w_2 \\w_3\end{bmatrix}

To make a prediction, we need to multiply the feature values by their corresponding weights and add the results together. This operation is known as the dot product. For the multiplication to be mathematically valid, one vector must be a row vector and the other must be a column vector. This is where the transpose becomes important.

By transposing the feature vector, we obtain:

\mathbf{x}^{T} =\begin{bmatrix}400 & 7 & 6\end{bmatrix}

We can now multiply the row vector by the weight vector:

\mathbf{x}^{T}\mathbf{w}

The result is a single number:

\mathbf{x}^{T}\mathbf{w}=400w_1 + 7w_2 + 6w_3

Lets see this in code:

python

import numpy as np

# Feature vector (The House)
x = np.array([400, 7, 6])

# Weight vector (The Model's rules)
w = np.array([350, 3800, -450])

# Calculating the Dot Product
prediction = np.dot(w, x)

print(f"The predicted price is: {prediction}")
# Output: 163900

Summary

Scalars are single numbers that represent magnitude, serving as the simplest building blocks of data.
Vectors are lists of scalars that represent a single data point, where every individual number is a specific feature.
Basic Operations like addition and scalar multiplication allow us to combine data points or scale their features element-wise.
Transposition is the process of flipping a vector's orientation from a column to a row, which is essential for mathematical alignment.
The Dot Product is the most important operation in machine learning; it is how a model multiplies features by weights to calculate a final prediction or output.