SVD: One Matrix Decomposition to Rule Them All

One of the benefits of tweets, when following great people, is that you will encounter tweets that will crush your mind. The last week is just one of those weeks with fantastic and funny tweets.

One of Professor Daniela Witten’s tweets for @WomenInStat. And it begins to explain so well why the decomposition of singular values – in short, the SVD – is an array decomposition that checks all other values.

So yesterday I asked all of you what you wanted to hear from me this week and an answer came: SINGLE PRICE DECOMPOSITION (SVD).https://t.co/9Nyqp2Kr9k.

1/

– Women in statistics and data sciences (@WomenInStat) 21. July 2020.

Still in the #teamSVD camp, here’s a blog post that tries to decompress some tweets with examples of real data using amazing penguin data.

Before we get to the UDS, let’s download the Tidyverse, R-package.

Library (neat)
theme_set(theme_bw(16))

We will use the Palmer penguin data collected by Allison Horst directly from github cmdlinetips.com.

p2data <- https://raw.githubusercontent.com/cmdlinetips/data/master/palmer_penguins.csv

penguins_df <- read_csv(p2data) penguins_df %>% head()
## # A tibble: 6 x 7
## kind of island stream_length_mm jaw_depth_mm flipper_length_… Body_Mass_g Gender
##
## 1 Adelie Torge… 39.1 18.7 181 3750 male
## 2 Adelie Torge… 39.5 17.4 186 3800 FMA…
## Three Adelie Torge… ## 40.3 18 195 3250 FMA…
~ Four Adelie Torge… ~ ~ NA
## 5 Adelie Torge… ## 36.7 19.3 193 3450 fma…
## Six Adelie Torge… ## 39.3 20.6 190 3650 men.

Separate the numerical part of the penguin data from the columns with the penguin information.

penguins_meta <- penguins_df %>%
drop_na() %>%
select_if(is.character)

penguins_meta %>% head()
## ## A tibble: 6 x 3
## Island species Sex
##
## 1 Adelie Torgersen male
## 2 Adelie Torgersen female
## 3 Adelie Torgersen female
## 4 Adelie Torgersen female
## 5 Adelie Torgersen male
## 6 Adelie Torgersen female

We store the numeric variables describing the penguins in a new variable and use them to run the UDS.

penguins <- penguins_df %>%
drop_na() %>%
select_if(is.numeric)

Penguins %>% head()
## # A tibble: 6 x 4
## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##
## 1 39,1 18.7 181 3750
## 2 39,5 17,4 186 3800
## 3 40,3 18 195 3250
## 4 36,7 19,3 193 3450
## 5 39,3 20,6 190 3650
## 6 38,9 17,8 181 3625

What is the single value decomposition (SVD)?

First, what does the IRS do? You give me a matrix of $n times p$X$ and I’ll give you back 3 matrices $U$, $D$, $V$. Without loss of universality, assume $n Geq p$. Then $U$ is a matrix of p$ multiplied by $n, and $D$ and $V$ have the dimension of $p multiplied by $p.

7/

– Women in statistics and data sciences (@WomenInStat) 21. July 2020.

We can use the svd() function in R to run SVD. We provide the data matrix as input for the svd() function.

# Run UDS with svd()
penguins_svd <- svd(penguins)

What happens if we apply svd() to the data matrix?

$U$, $D$, $V$ dissect the $X$ matrix because $X=UDV^T$. It’s as simple as that. But what makes this decomposition special (and unique) is a special set of $U$, $D$ and $V$ properties.

8/ Picture.twitter.com/TYG4andUMhE

– Women in statistics and data sciences (@WomenInStat) 21. July 2020.

In R we get from the svd() function a list of three objects corresponding to the decomposition of the matrix, the singular values d and the singular vectors u and v.

str(penguins_svd)
## List 3
## $ d : num [1:4] 78250,5 499,9 74 41,8
## $ u : num [1:333, 1:4] -0,048 -0,0486 -0,0416 -0,0442 -0,0467 …
## $ v : num [1:4, 1:4] -0.01022 -0.00389 -0.04657 -0.99886 0.1985 …

Let’s look at the individual values and vectors and understand their properties. For simplicity, we give unit values and vectors a simple name.

The cigar vector U is a matrix of 333 x 4, with the same dimensions as our data.

U <- penguins_svd$u U %>% dim()
## [1] 333 4

And it looks like this.

U %>% Head()

## [.1] [.2] [.3] [.4] ## [1,] -0.04798188 0.01338848 0.004108109 0.07663240
## [2,] -0.04862309 0.01809574 0.014438916 0.03365464
## [3,] -0.04160792 0.08855716 0.025312973 -0.01876370
## [4,] -0.04415929 0.06451464 0.067224723 0.03968621
## [5,] -0.04671088 0.04099607 0.024614580 0.08738297
## [6,] -0.04638621 0.02499507 0.006737752 0.04717352.

In the same way the individual vectors V have the dimension 4 x 4.

Q <- Penguins_svd$v

And this is what the different V-vectors look like.

V %>% headroom()
## [,1] [,2] [,3] [,4] ## [1,] -0,0102185 0,19849831 -0,975661291 -0,092623187
## [2,] -0,003891394 0,14068499 -0.065122188 0,987902673
## [3,] -0,046569777 0,96877576 0,209389404 -0,124341735
## [4,] -0,998855196 -0,04774608 0,000472226 0,002896004

And finally, the unit D-values we get are a vector of length 4.

D <- penguins_svd$d
D
## [1] 78250.54165 499.88402 74.04578 41.84098

What is so special about these unusual matrices?

The only U and V matrices are orthogonal. This means that if you multiply U with the transposition of U, you get an identity matrix with units along the diagonal and zeros elsewhere. This also applies to the single matrix V.

http://31.220.61.170/wp-content/uploads/2020/10/SVD-One-Matrix-Decomposition-to-Rule-Them-All.png

orthogonal_matrix_of_vd

Let’s check the orthogonal properties of the single matrices with the data of the penguins.

t(U) %*% U

We see that the diagonal elements are units and that the elements outside the diagonal are quite close to zero.

## [,1] [,2] [,3] [,4] ## [1,] 1.000000e+00 2.038300e-16 3.816392e-17 6.126555e-16
## [2,] 2.038300e-16 1.000000e+00 2.289835e-16 2.167862e-16
## [3,] 3.816392e-17 2.289835e-16 1.000000e+00 -1.156844e-16
## [4,] 6.126555e-16 2.167862e-16 -1.156844e-16 1.000000e+00.

And this also applies to the single Matrix V.

t(V) %*% V
## [,1] [,2] [,3] [,4] ## [1,] 1.000000e+00 -4.857226e-17 1.919038e-17 -3.473784e-16
## [2,] -4.857226e-17 1.000000e+00 2.058629e-17 5.583641e-18
## [3,] 1.919038e-17 2.058629e-17 1.000000e +00 -1.406308e-17
## [4,] -3.47373784e-16 5.583641e-18 -1.406308e-17 1.000000e +00 2.

This also applies if you change the order of multiplication.

V %*% t(V)
## [,1] [,2] [,3] [,4] ## [1,] 1,000000e+00 5,551115e-17 2,255141e-17 -6,179952e-18
## [2,] 5,551115e-17 1,000000e+00 8.326673e-17 3.486794e-16
## [3,] 2.255141e-17 8.326673e-17 1.000000e+00 -1.534146e-17
## [4,] -6.179952e-18 3.486794e-16 -1.534146e-17 1.000000e+00 8.

Another peculiarity of the individual vectors is that they are orthonormal. This means that the square elements in each U column are added to 1. And the inner product (dot product) between any pair of columns in U is 0. And the internal product between each pair of V-Loudspeakers is 0. The same goes for the V-shaped speakers.

http://31.220.61.170/wp-content/uploads/2020/10/1604113822_665_SVD-One-Matrix-Decomposition-to-Rule-Them-All.png

svd_singular_vector

First check that the square elements in each column 1 are equal to U and V.

columnsSums(U^2)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

columnSums (V^2)
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0

We see that the sum of the squares of each column of U and V equals 1.

We now want to check whether the column vectors are orthogonal, i.e. whether the inner product of the U-column pair (and of the V-columns) is equal to zero.

We see that the vectors of the columns U and V are orthogonal.

Total(V[.1]*V[.4])
## [1] -3,474868e-16

total(U[,1]*U[,2])
## [1] 1,715343e-16

Since both orthogonal and normal properties apply to U and V, we say that the individual vectors are orthonormal.

Recovery of source data using multiple vectorssingle

A common application of the SVD is data compression or sizing. The original data matrix with larger dimensions can be approximated with one or more single vectors.

http://31.220.61.170/wp-content/uploads/2020/10/1604113822_128_SVD-One-Matrix-Decomposition-to-Rule-Them-All.png

SVD_Rank_1_Approximation

Let’s reconstruct an approximate variant of the output data on a column vector of U and V and the first single value. We’re getting a first-order approach to the data.

D[1] * (U[,1] %*% t(V[,1]) %>% head()## [,1] [,2] [,3] [,4]## [1,] 38.36528 14.61066 174.8513 3750.310## [2,] 38.87798 14.80591 177.1879 3800.427## [3,] 33.26880 12.66977 151.6239 3252.115 1ST9 ## [4,] 35.30882 13.44667 160.9213 3451.533 1F33## [5,] 37.34901 14.22364 170.2196 3650.967## [6,] 37.08941 14.1247 169.0365 3625.591

Let’s have some top rows of about 1 date. Check the areas with the original data. And we see that the Grade 1 approach is pretty close to the original data.

Penguins %>% head()
## # A tibble: 6 x 4
## bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##
## 1 39,1 18.7 181 3750
## 2 39,5 17,4 186 3800
## 3 40,3 18 195 3250
## 4 36,7 19,3 193 3450
## 5 39,3 20,6 190 3650
## 6 38,9 17,8 181 3625

We can improve our reconstruction capacity by using more single vectors.

http://31.220.61.170/wp-content/uploads/2020/10/1604113828_149_SVD-One-Matrix-Decomposition-to-Rule-Them-All.png

SVD_Rank_2_Approximation

In this case we calculate the approximation of the data matrix row for the singular vectors and singular values of the first column.

((D[1]] * (U[,1] %*% t(V[,1])))+ (D[2] * (U[,2] %*% t(V[,2]))))%>% head())

## [,1] [,2] [,3] [,4] ## [1,] 39.69377 15.55222 181.3350 3749.991
## [2,] 40.67355 16.07852 185.9512 3799.995
## [3,] 42.05598 18.89765 194.5099 3250.001
## [4,] 41.71036 17.98374 192.1642 3449.993
## [5,] 41.41689 17.10673 190.0730 3649.989
## [6,] 39.56958 15.88258 181.1410 3624.994

Compared to the original penguin data, this is a very good approach.

Calculation of the CPA of SVD

The analysis of the basic components is only one of the specialties of the SVD. Applying the SVD to a data matrix with centred columns results in a PCA. V Columns – PC boot drives. The U columns are vectors of PC notation.

APC is a special case of SVD

APC is a special case of SVD

Let’s all send the SVD data to the penguin’s center. By centring we place all variables/columns in the data on the same scale. Therefore, a variable with a large bandwidth will not dominate.

# average alignment in the middle of columns
data4pca <- application(penguins,2,function(x)})
# application SVD
pca_by_svd <- svd(data4pca) # look at V singular vectors pca_by_svd$v %>% header()
## [,1] [,2] [,3] [,4] ## [1,] 0.004003162 -0,3192773 -0,94126475 -0,1098470736
## [2,] -0,001154327 0,08684753 -0,14449479 0,9856862728
## [3,] 0.015194547 -0.94354238 0.30518986 0.1278908060
## [4,] 0.999875876 0.01571702 -0.00103611 -0.0003657482

PCA can also be applied to penguin data with the prcomp() function. Note that the rotation matrix of the PCA is the same as the single V-vectors of the UDS on the central center data.

pca_obj <- prcomp(penguins) pca_obj$rotation %>% header()
## PC1 PC2 PC3 PC4
## bill_length_mm 0,004003162 -0,3192773 -0,94126475 -0,1098470736
## bill_depth_mm -0.001154327 0,08684753 -0,14449479 0,9856862728
## flipper_length_mm 0,015194547 -0,94354238 0,30518986 0,1278908060
## body_mass_g 0,999875876 0,01571702 -0,00103611 -0,0003657482

t(as.matrix(penguins)) %*% as.matrix(penguins)
## beak_length_mm beak_depth_mm flipper_length_mm body_mass_g
## beak_length_mm 654405.7 250640.99 2960705 62493450
## beak_depth_mm 250641.0 99400,11 1143413 23798630
## flipper_length_mm 2960705,0 1143412,60 13514330 284815600
## body_massa_g 62493450,0 23798630,00 284815600 6109136250

The SVD from zero: Pair of individual vectors simultaneously

The most interesting thing is that it is relatively easy to write a function from scratch to make the SVD work. We can write a function that calculates a few U and V using an iterative approach. The idea is to iteratively calculate the first unit vectors of U and V from the data, then remove the grade 1 approach from the data and apply the approach to calculate the second unit of the vectors U and V.

Introduction of the SVD from zero

Introduction of the SVD from zero

Here is function R, which calculates the first single SVD vectors from zero.

svd_from_scratch <- function(data){
# initialize u and v with random numbers
u <- rnorm(nrow(data))
v <- rnorm(ncol(data))
for (iteration on 1 :500){
u <- data %*% v
u <- u/sqrt(sum(u^2)))
v <- t(data) %*% u
v <- v / sqrt(sum(v^2)))
}
yield(list(u=u,v=v))
}

svdx <- svd_from_scratch(as.matrix(penguins))

hn(U[,1],svdx$u)
## [,1] ## [1,] -1
hn(V[,1],svdx$v)
## [,1] ## [1,] -1.

If you like the tweets of Professor Daniela Witten, you will love the introduction to the statistical training with R in collaboration with Professor Witten. And it’s available for free on the Internet.

And I’m looking forward to a second edition of the book.

If you like the first song of #islr, you’re gonna love it? Our second edition? …soon !!!! https://t.co/9w1DSYJAgH

– Daniela Witten (@daniela_witten) 16 May 2020

Related Tags: