DIGIT DIMENSION REDUCTION

Silas Liu - Nov. 12, 2021

R, Machine Learning Classification, PCA

Many machine learning problems include a large number of predictors, which makes visualization somewhat challenging. With a big number of predictors, single scatter-plots of the data become impossible due to the high dimensionality.

Dimension reduction becomes a powerful technique useful for exploratory data analysis. By reducing the dimension of the dataset and preserving important characteristics, visualization of the data beomes more feasible. This is achieved through the Singular Value Decomposition (SVD), with the Principal Component Analysis (PCA).

The function PCA applies an orthogonal transformation, preserving the distance between the original data, thus the total variation remains the same. The principal components are composed of columns with decreasing variability, from most to least. This makes possible the principal component analysis, with most of the data summarized in the first few components.

In previous analysis we studied several machine learning algorithms over the mnist digit dataset.

DIGIT RECOGNITION