IRIS SPECIES

Silas Liu - Nov. 12, 2021

R, Machine Learning Classification, PCA

The Iris data set is a famous data set used in machine learning studies, first presented in 1936, for Linear Discriminant Analysis (LDA). The dataset is composed of 4 features of the flowers and 3 iris species: setosa, versicolor and virginica.

We apply here some machine learning techniques like LDA and QDA but the focus of this study is to apply Principal Component Analysis (PCA) on the dataset, showcasing dimension reduction.

A complementary analysis of PCA can be accessed by viewing the analysis of Singular Value Decomposition (SVD) applied to mnist digits dataset.

DIMENSION REDUCTION