Matrices

Author

Published

September 13, 2024

1 Matrices

Matrices are rectangular collections of numbers, meaning they are arranged by rows and columns. Learning about matrices is important because most of the data that social scientists use is rectangular data. The rest, less structured data, is eventually converted to a rectangular format before analysis.

Moreover, matrix algebra is the working engine behind regression models, a cornerstone tool for data analysis. When R and other software quickly convert rectangular data into a vector of regression coefficient, what goes under the hood is usually a bunch of matrix algebra.

2 Introduction

2.1 Scalars

A single number (for example, 12) is referred to as a scalar.

\[ a = 12 \]

2.2 Vectors

We can put several scalars together to make a vector. Vectors are usually denoted with an arrow on top, although social scientist tend to forgo this convention. Here is an example:

\[ \overrightarrow b = \begin{bmatrix} 12 \\ 14 \\ 15 \end{bmatrix} \]

Since this is a column of numbers, we refer to it as a column vector.

Of course, we can also have row vectors:

\[ \overrightarrow c = \begin{bmatrix} 12 & 14 & 15 \end{bmatrix} \]

Note that we use brackets $[ ... ]$ to demarcate vectors and matrices.

In R, we can construct vectors with the the c() function to combine elements (the c stands for concatenate):

c(5, 25, -2, 1)

[1]  5 25 -2  1

We can also create vectors that follow a pattern with the : operator or the seq() function:

10:20

 [1] 10 11 12 13 14 15 16 17 18 19 20

seq(from = 3, to = 27, by = 3)

[1]  3  6  9 12 15 18 21 24 27

3 Operators

3.1 Summation

The summation operator $\sum$ (i.e., the uppercase Sigma letter) lets us perform an operation on a sequence of numbers, which is often but not always a vector.

\[\overrightarrow d = \begin{bmatrix} 12 & 7 & -2 & 3 & -1 \end{bmatrix}\]

We can then calculate the sum of the first three elements of the vector, which is expressed as follows: \[\sum_{i=1}^3 d_i\]

Then we do the following math: \[12+7+(-2)=17\]

It is also common to use $n$ in the superscript to indicate that we want to sum up to the very last element:

\[ \sum_{i=1}^n d_i = 12 + 7 + (-2) + 3 + (-1) = 19 \]

We can perform these operations using the sum() function in R:

d = c(12, 7, -2, 3, -1)

sum(d)

[1] 19

sum(d[1:3])

[1] 17

sum(d)

[1] 19

3.2 Product

The product operator $\prod$ (uppercase Pi) can also perform operations over a sequence of elements in a vector. Recall our previous vector:

\[\overrightarrow d = \begin{bmatrix} 12 & 7 & -2 & 3 & 1 \end{bmatrix}\]

We might want to calculate the product of all its elements, which is expressed as follows: \[\prod_{i=1}^n d_i = 12 \cdot 7 \cdot (-2) \cdot 3 \cdot (-1) = 504\]

In R, we can compute products using the prod() function:

prod(d)

[1] 504

4 Matrices

4.1 Basics

We can append vectors together to form a matrix:

\[A = \begin{bmatrix} 12 & 14 & 15 \\ 115 & 22 & 127 \\ 193 & 29 & 219 \end{bmatrix}\]

The number of rows and columns of a matrix constitute the dimensions of the matrix. The first number is the number of rows (“r”) and the second number is the number of columns (“c”) in the matrix. The order of dimensions never changes.

Matrix $A$ above, for example, is a $3 \times 3$ matrix. Sometimes we’d refer to it as $A_{3 \times 3}$. We commonly use capital letters to represent matrices. Sometimes, vectors and matrices are further denoted with bold font.

Constructing matrices in R

There are different ways to create matrices in R. One of the simplest is via rbind() or cbind(), which paste vectors together (either by rows or by columns):

# Create some vectors
vector1 = 1:4
vector2 = 5:8
vector3 = 9:12
vector4 = 13:16

# Using rbind(), each vector will be a row 
rbind_mat = rbind(vector1, vector2, vector3, vector4)
rbind_mat

        [,1] [,2] [,3] [,4]
vector1    1    2    3    4
vector2    5    6    7    8
vector3    9   10   11   12
vector4   13   14   15   16

# Using cbind(), each vector will be a column
cbind_mat = cbind(vector1, vector2, vector3, vector4)
cbind_mat

     vector1 vector2 vector3 vector4
[1,]       1       5       9      13
[2,]       2       6      10      14
[3,]       3       7      11      15
[4,]       4       8      12      16

An alternative is to use to properly named matrix() function. The basic syntax is matrix(data, nrow, ncol, byrow):

data is the input vector which becomes the data elements of the matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are arranged by row. By default (FALSE), elements are arranged by column.

Let’s see some examples:

# Elements are arranged sequentially by row.
M = matrix(c(1:12), nrow = 4, byrow = T)
M

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12

# Elements are arranged sequentially by column (byrow = F by default).
N = matrix(c(1:12), nrow = 4)
N

     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12

4.2 Structure

How do we refer to specific elements of the matrix? For example, matrix $A$ is an $m\times n$ matrix where $m=n=3$. This is sometimes called a square matrix.

More generally, matrix $B$ is an $m\times n$ matrix where the elements look like this: \[B= \begin{bmatrix} b_{11} & b_{12} & b_{13} & \ldots & b_{1n} \\ b_{21} & b_{22} & b_{23} & \ldots & b_{2n} \\ \vdots & \vdots & \vdots & \ldots & \vdots \\ b_{m1} & b_{m2} & b_{m3} & \ldots & b_{mn} \end{bmatrix}\]

Thus $b_{23}$ refers to the second unit down and third across. More generally, we refer to row indices as $i$ and to column indices as $j$.

In R, we can access a matrix’s elements using square brackets:

# In matrix N, access the element at 1st row and 3rd column.
N[1,3]

[1] 9

# In matrix N, access the element at 4th row and 2nd column.
N[4,2]

[1] 8

Tip

When trying to identify a specific element, the first subscript is the element’s row and the second subscript is the element’s column (always in that order).

5 Matrix operations

5.1 Addition and subtraction

Addition and subtraction are straightforward operations.
Matrices must have exactly the same dimensions for both of these operations.
We add or subtract each element with the corresponding element from the other matrix.
This is expressed as follows:

\[A \pm B=C\]

\[c_{ij}=a_{ij} \pm b_{ij} \text{ }\forall i,j\]

\[\begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix} \pm \begin{bmatrix} b_{11} & b_{12} & b_{13}\\ b_{21} & b_{22} & b_{23}\\ b_{31} & b_{32} & b_{33} \end{bmatrix}\] \[=\] \[\begin{bmatrix} a_{11}\pm b_{11} & a_{12}\pm b_{12} & a_{13}\pm b_{13}\\ a_{21}\pm b_{21} & a_{22}\pm b_{22} & a_{23}\pm b_{23}\\ a_{31}\pm b_{31} & a_{32}\pm b_{32} & a_{33}\pm b_{33} \end{bmatrix}\]

Addition and subtraction in R

We start by creating two 2x3 matrices:

\[A= \begin{bmatrix} 3 & -1 & 2 \\ 9 & 4 & 6 \end{bmatrix}\]

# Create two 2x3 matrices.
matrix1 = matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
matrix1

     [,1] [,2] [,3]
[1,]    3   -1    2
[2,]    9    4    6

And \[B= \begin{bmatrix} 5 & 2 & 0 \\ 9 & 3 & 4 \end{bmatrix}\]

matrix2 = matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
matrix2

     [,1] [,2] [,3]
[1,]    5    0    3
[2,]    2    9    4

We can simply use the + and - operators for addition and substraction:

matrix1 + matrix2

     [,1] [,2] [,3]
[1,]    8   -1    5
[2,]   11   13   10

matrix1 - matrix2

     [,1] [,2] [,3]
[1,]   -2   -1   -1
[2,]    7   -5    2

5.2 Scalar multiplication

Scalar multiplication is very intuitive. As we know, a scalar is a single number. We multiply each value in the matrix by the scalar to perform this operation.

Formally, this is expressed as follows: \[A = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix}\] \[cA = \begin{bmatrix} ca_{11} & ca_{12} & ca_{13}\\ ca_{21} & ca_{22} & ca_{23}\\ ca_{31} & ca_{32} & ca_{33} \end{bmatrix}\]

In R, all we need to do is take an established matrix and multiply it by some scalar:

# matrix1 from our previous example
matrix1

     [,1] [,2] [,3]
[1,]    3   -1    2
[2,]    9    4    6

matrix1 * 3

     [,1] [,2] [,3]
[1,]    9   -3    6
[2,]   27   12   18

5.3 Matrix multiplication

Multiplying matrices is slightly trickier than multiplying scalars.
Two matrices must be conformable for them to be multiplied together. This means that the number of columns in the first matrix equals the number of rows in the second.
When multiplying $A \times B$, if $A$ is $m \times n$, $B$ must have $n$ rows.
The conformability requirement never changes. Before multiplying anything, check to make sure the matrices are indeed conformable.
The resulting matrix will have the same number of rows as the first matrix and the number of columns in the second. For example, if $A$ is $i \times k$ and $B$ is $k \times j$, then $A \times B$ will be $i \times j$.

For example, which of the following can we multiply? What will be the dimensions of the resulting matrix?

\[ \begin{aligned} B_{4 \times 1}= \begin{bmatrix} 2 \\ 3\\ 4\\ 1 \end{bmatrix} M_{3 \times 3} = \begin{bmatrix} 1 & 0 & 2\\ 1 & 2 & 4\\ 2 & 3 & 2 \end{bmatrix} L_{2 \times 3} = \begin{bmatrix} 6 & 5 & -1\\ 1 & 4 & 3 \end{bmatrix} \end{aligned} \]

Answer

The only valid multiplication based on the provided matrices is $L \times M$, which results in a $2 \times 3$ matrix.

Why can’t we multiply in the opposite order?

The non-commutative property of matrix multiplication is a fundamental aspect in matrix algebra. The multiplication of matrices is sensitive to the order in which the matrices are multiplied due to the requirements of dimensional compatibility, the resulting dimensions, and the computation process itself.

Warning

When multiplying matrices, order matters. Even if multiplication is possible in both directions, in general $AB \neq BA$.

Multiplication steps

Multiply each row by each column, summing up each pair of multiplied terms. This is sometimes to referred to as the “dot product,” where we multiply matching members, then sum up.
The element in position $ij$ is the sum of the products of elements in the $i$th row of the first matrix ($A$) and the corresponding elements in the $j$th column of the second matrix ($B$). \[c_{ij}=\sum_{k=1}^n a_{ik}b_{kj}\]

Example

Suppose a company manufactures two kinds of furniture: chairs and sofas.

A chair costs $100 for wood, $270 for cloth, and $130 for feathers.
Each sofa costs $150 for wood, $420 for cloth, and $195 for feathers.

	Chair	Sofa
Wood	100	150
Cloth	270	420
Feathers	130	195

The same information about unit cost ($C$) can be presented as a matrix.

\[C = \begin{bmatrix} 100 & 150\\ 270 & 420\\ 130 & 195 \end{bmatrix}\]

Note that each of the three rows of this 3 x 2 matrix represents a material (wood, cloth, or feathers), and each of the two columns represents a product (chair or coach). The elements are the unit cost (in USD).

Now, suppose that the company will produce 45 chairs and 30 sofas this month. This production quantity can be represented in the following table, and also as a 2 x 1 matrix ($Q$):

Product	Quantity
Chair	45
Sofa	30

\[Q = \begin{bmatrix} 45 \\ 30 \end{bmatrix}\]

What will be the company’s total cost? The “total expenditure” is equal to the “unit cost” times the “production quantity” (the number of units).

The total expenditure ($E$) for each material this month is calculated by multiplying these two matrices.

\[\begin{aligned} E = CQ = \begin{bmatrix} 100 & 150\\ 270 & 420\\ 130 & 195 \end{bmatrix} \begin{bmatrix} 45 \\ 30 \end{bmatrix} = \begin{bmatrix} (100)(45) + (150)(30) \\ (270)(45) + (420)(30) \\ (130)(45) + (195)(30) \end{bmatrix} = \begin{bmatrix} 9,000 \\ 24,750 \\ 11,700 \end{bmatrix} \end{aligned}\]

Multiplying the 3x2 Cost matrix ($C$) times the 2x1 Quantity matrix ($Q$) yields the 3x1 Expenditure matrix ($E$).

As a result of this matrix multiplication, we determine that this month the company will incur expenditures of:

$9,000 for wood
$24,750 for cloth
$11,700 for feathers.

Matrix multiplication in R

Before attempting matrix multiplication, we must make sure the matrices are conformable (as we do for our manual calculations).

Then we can multiply our matrices together using the %*% operator.

C = matrix(c(100, 270, 130, 150, 420, 195), nrow = 3)
C

     [,1] [,2]
[1,]  100  150
[2,]  270  420
[3,]  130  195

Q = matrix(c(45, 30), nrow = 2)
Q

     [,1]
[1,]   45
[2,]   30

C %*% Q

      [,1]
[1,]  9000
[2,] 24750
[3,] 11700

Warning

If you have a missing value or NA in one of the matrices you are trying to multiply (something we will discuss in further detail in the next module), you will have NAs in your resulting matrix.

5.4 Properties of operations

Addition and subtraction:
- Associative: $(A \pm B) \pm C = A \pm (B \pm C)$
- Commutative: $A \pm B = B \pm A$
Multiplication:
- $AB \neq BA$
- $A(BC) = (AB)C$
- $A(B+C) = AB + AC$

6 Special matrices

Square matrix

\[ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix} \]

In a square matrix, the number of rows equals the number of columns ($m=n$):
The diagonal of a matrix is a set of numbers consisting of the elements on the line from the upper-left-hand to the lower-right-hand corner of the matrix, as in $ d(A)=[1,5,9] $. Diagonals are particularly useful in square matrices.
The trace of a matrix, denoted as $tr(A)$, is the sum of the diagonal elements of the matrix. $tr(A) = 1+5+9 = 15$

Diagonal matrix:

In a diagonal matrix, all of the elements of the matrix that are not on the diagonal are equal to zero. \[ D = \begin{bmatrix} 4 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & 6 \end{bmatrix} \]

Scalar matrix:

A scalar matrix is a diagonal matrix where the diagonal elements are all equal to each other. In other words, we’re really only concerned with one scalar (or element) held in the diagonal.

\[ S = \begin{bmatrix} 7 & 0 & 0 \\ 0 & 7 & 0 \\ 0 & 0 & 7 \end{bmatrix} \]

Identity matrix:

\[ I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

The identity matrix is a scalar matrix with all of the diagonal elements equal to one.
Remember that, as with all diagonal matrices, the off-diagonal elements are equal to zero.
The capital letter $I$ is reserved for the identity matrix. For convenience, a 3x3 identity matrix can be denoted as $I_3$.

7 Transpose

The transpose is the original matrix with the rows and the columns interchanged.

The notation is either $J'$ (“J prime”) or $J^T$ (“J transpose”).

\[J = \begin{bmatrix} 4 & 5\\ 3 & 0\\ 7 & -2 \end{bmatrix}\]

\[J' = J^T = \begin{bmatrix} 4 & 3 & 7 \\ 5 & 0 & -2 \end{bmatrix}\]

In R, we use t() to get the transpose.

J = matrix(c(4, 3, 7, 5, 0, -2), ncol = 2)
J

     [,1] [,2]
[1,]    4    5
[2,]    3    0
[3,]    7   -2

t(J)

     [,1] [,2] [,3]
[1,]    4    3    7
[2,]    5    0   -2

8 Inverse

Just like a number has a reciprocal, a matrix has an inverse.
When we multiply a matrix by its inverse we get the identity matrix (which is like “1” for matrices).

\[A × A^{-1} = I\]

The inverse of A is $A^{-1}$ only when:

\[AA^{-1} = A^{-1}A = I\]

Sometimes there is no inverse at all.

In a math of stats course, you would be required to know how to calculate the inverse of a matrix by hand. For math camp purposes, we only need to know how to do it in R.

In R, we use the solve() function to calculate the inverse of a matrix:

A = matrix(c(3, 2, 5, 2, 3, 2, 5, 2, 4), ncol = 3)
A

     [,1] [,2] [,3]
[1,]    3    2    5
[2,]    2    3    2
[3,]    5    2    4

solve(A)

            [,1]        [,2]       [,3]
[1,] -0.29629630 -0.07407407  0.4074074
[2,] -0.07407407  0.48148148 -0.1481481
[3,]  0.40740741 -0.14814815 -0.1851852

9 Linear systems and matrices

A system of equations can be represented by an augmented matrix.

System of equations: \[{\color{red}{3}}x + {\color{green}{6}}y = {\color{blue}{12}}\] \[{\color{red}{5}}x + {\color{green}{10}}y = {\color{blue}{25}}\]

In an augmented matrix, each row represents one equation in the system and each column represents a variable or the constant terms. \[\begin{bmatrix} {\color{red}{3}} & {\color{green}{6}} & {\color{blue}{12}}\\ {\color{red}{5}} & {\color{green}{10}} & {\color{blue}{25}} \end{bmatrix}\]

Thinking in matrices is helpful on its when you need to solve somewhat complex systems of equations on its own. It turns out that a common social science data analytic tool relies on this principle.

10 OLS and matrices

We can use the logic above to calculate estimates for our ordinary least squares (OLS) models. We may not be able to cover this in person, but we leave it here since it may come up in your regression course.

OLS is a linear regression technique used to find the best-fitting line for a set of data points (observations) by minimizing the residuals (the differences between the observed and predicted values).

The goal is to minimize the sum of the squared errors.

10.1 Outcome variable

Suppose, for example, we have a sample consisting of $n$ observations.
The outcome variable is denoted as an $n \times1$ column vector.

\[Y = \begin{bmatrix} y_1 \\ y_2 \\ y_3 \\ \vdots \\ y_n \end{bmatrix}\]

10.2 Explanatory variables

Suppose there are $k$ explanatory variables and a constant term, meaning $k+1$ columns and $n$ rows.

We can represent these variables as an $n \times (k+1)$ matrix, expressed as follows:

\[X= \begin{bmatrix} 1 & x_{11} & \dots & x_{1k} \\ 1 & x_{21} & \dots & x_{2k} \\ \vdots & \vdots & \dots & \vdots \\ 1 & x_{n1} & \dots & x_{nk} \end{bmatrix}\]

$x_{ij}$ is the $i$-th observation of the $j$-th explanatory variable.

10.3 Linear regression model

Let’s say we have 173 observations (n = 173) and 2 explanatory variabless (k = 3).

This can be expressed as the following linear equation: \[y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \epsilon\]

In matrix form, we have: \[\begin{aligned} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & x_{21} \\ 1 & x_{21} & x_{22} \\ \vdots & \vdots & \vdots \\ 1 & x_{1173} & x_{2173} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix} + \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_{173} \end{bmatrix}\end{aligned} \]

All 173 equations can be represented by: \[y=X\beta+\epsilon\]

10.4 Estimates

Without getting too much into the mechanics, we can calculate our coefficient estimates with matrix algebra using the following equation:

\[\hat{\beta} = (X'X)^{-1}X'Y\]

Read aloud, we say “X prime X inverse, X prime Y”.

The little hat on our beta ($\hat{\beta}$) signifies that these are estimates.

Remember, the OLS method is to choose $\hat{\beta}$ such that the sum of squared residuals (“SSR”) is minimized.

10.4.1 Example in R

We will load the mtcars data set for this example, which contains data about many different car models.

cars_df = mtcars

Now, we want to estimate the association between hp (horsepower) and wt (weight), our independent variables, and mpg (miles per gallon), our dependent variable.

First, we transform our dependent variable into a matrix, using the as.matrix function and specifying the column of the mtcars data set to create a column vector of our observed values for the DV.

Y = as.matrix(cars_df$mpg)
Y

      [,1]
 [1,] 21.0
 [2,] 21.0
 [3,] 22.8
 [4,] 21.4
 [5,] 18.7
 [6,] 18.1
 [7,] 14.3
 [8,] 24.4
 [9,] 22.8
[10,] 19.2
[11,] 17.8
[12,] 16.4
[13,] 17.3
[14,] 15.2
[15,] 10.4
[16,] 10.4
[17,] 14.7
[18,] 32.4
[19,] 30.4
[20,] 33.9
[21,] 21.5
[22,] 15.5
[23,] 15.2
[24,] 13.3
[25,] 19.2
[26,] 27.3
[27,] 26.0
[28,] 30.4
[29,] 15.8
[30,] 19.7
[31,] 15.0
[32,] 21.4

Next, we do the same thing for our independent variables of interest, and our constant.

# create two separate matrices for IVs
X1 = as.matrix(cars_df$hp)
X2 = as.matrix(cars_df$wt)

# create constant column

# bind them altogether into one matrix
constant =  rep(1, nrow(cars_df))
X = cbind(constant, X1, X2)
X

      constant          
 [1,]        1 110 2.620
 [2,]        1 110 2.875
 [3,]        1  93 2.320
 [4,]        1 110 3.215
 [5,]        1 175 3.440
 [6,]        1 105 3.460
 [7,]        1 245 3.570
 [8,]        1  62 3.190
 [9,]        1  95 3.150
[10,]        1 123 3.440
[11,]        1 123 3.440
[12,]        1 180 4.070
[13,]        1 180 3.730
[14,]        1 180 3.780
[15,]        1 205 5.250
[16,]        1 215 5.424
[17,]        1 230 5.345
[18,]        1  66 2.200
[19,]        1  52 1.615
[20,]        1  65 1.835
[21,]        1  97 2.465
[22,]        1 150 3.520
[23,]        1 150 3.435
[24,]        1 245 3.840
[25,]        1 175 3.845
[26,]        1  66 1.935
[27,]        1  91 2.140
[28,]        1 113 1.513
[29,]        1 264 3.170
[30,]        1 175 2.770
[31,]        1 335 3.570
[32,]        1 109 2.780

Next, we calculate $X'X$, $X'Y$, and $(X'X)^{-1}$. Remember the %*% operator for matrix multiplication.

# X prime X
XpX = t(X) %*% X

# X prime X inverse
XpXinv = solve(XpX)

# X prime Y
XpY = t(X) %*% Y

# beta coefficient estimates
bhat = XpXinv %*% XpY
bhat

                [,1]
constant 37.22727012
         -0.03177295
         -3.87783074

You can come back to this code and corroborate that is correct after you learn how to use the default function to estimate OLS regression in R.