This is a tidyverse extension for handling, manipulating, and visualizing ordination models with consistent conventions and in a tidy workflow.
This package is designed to integrate ordination analysis and biplot visualization into a tidyverse workflow. It is inspired in particular by the extensions ggbiplot and tidygraph.
The package consists in several modules:
the 'tbl_ord' class, a wrapper for various ordination object classes
extracting augmentation for the factors of an ordination
using dplyr-verbs to add annotation to the factors
adjusting the conference of inertia between the factors
methods of the above generics for several widely-used object classes
convenient formatting of ordination objects
ggbiplot()
, a ggplot2 extension for rendering biplots
Ordination encompasses a variety of techniques for data compression, dimension reduction, feature extraction, and visualization. Well-known ordination techniques are predominantly unsupervised and include principal components analysis, multidimensional scaling, and correspondence analyis (Podani, 2000, Chapter 7; Palmer, n.d.). These methods are theoretically grounded in geometric data analysis (Le Roux & Rouanet, 2004) and powered by the matrix factorizations described below. A variety of other techniques may also be viewed, or treated using the same tools, as ordination, including linear regression, linear discriminant analysis, k-means clustering, and non-negative matrix factorization.
Biplots are two-layered scatterplots widely used to visualize unsupervised SVD-based ordinations. Gabriel (1971) introduced biplots to represent the scores and loadings of PCA on a single set of axes. They have also been used to visualize generalized linear regression and linear discriminant analysis (Greenacre, 2010) and can adapted to any 2-factor matrix decomposition.
The most popular ordination techniques use singular value decomposition (SVD) to factor a data matrix \(X\) into a product \(X=UDV'\) of two orthogonal (rotation) matrices \(U\) and \(V\) and a diagonal (scaling) matrix \(D\), with \(V'\) the transpose of \(V\). In most cases, the data matrix \(X\) is transformed from an original data matrix, e.g. by centering, scaling, double-centering, or log-transforming. The SVD introduces a set of shared orthogonal coordinates in which \(U\) encodes the rows of \(X\) and \(V\) encodes the columns of \(X\). The singular values in \(D\) are the variances of \(X\) along each of these coordinates, and they proceed in decreasing order, so that the first \(r\) (for "rank") columns of \(U\) and of \(V\) produce a geometrically optimized approximation to \(X\).
Biplots of SVD-based ordinations usually plot the rows and columns of \(X\) on these \(r\) coordinate axes. For an SVD-based biplot to be truly geometric, the total variance contained in \(D\) must be conferred onto \(U\) or \(V\), or distributed over both (Orlov, 2015). When \(D\) is conferred onto \(U\), the rows of \(X\) are represented by the rows of \(UD\), and their distances in the biplot approximate their distances in the original column space of \(X\). Meanwhile, the columns of \(X\) are represented by the rows of \(V\). These are unit vectors in the full space of shared coordinates, so their squared lengths in the biplot indicate the proportion of their variance captured by the biplot axes and their cosines with each other approximate the correlations between the columns. Finally, the projection of a row's coordinates (point) onto a column's coordinates (vector) approximates the corresponding entry of \(X\).
Podani J (2000) "Ordination". Introduction to the Exploration of Multivariate Biological Data Chapter 7, 215--284. Backhuys Publishers, ISBN 90-5782-067-6. https://web.archive.org/web/20200221000313/http://ramet.elte.hu/~podani/books.html
Palmer M Ordination Methods for Ecologists. Website, accessed 2019-07-12. http://ordination.okstate.edu/
Le Roux B & Rouanet H (2004) Geometric Data Analysis: From Correspondence Analysis to Stsructured Data Analysis. Springer Dordrecht, ISBN: 978-1-4020-2236-4. doi: 10.1007/1-4020-2236-0 https://link.springer.com/book/10.1007/1-4020-2236-0
Gabriel KR (1971) "The biplot graphic display of matrices with application to principal component analysis". Biometrika 58(3), 453--467. doi: 10.1093/biomet/58.3.453
Greenacre MJ (2010) Biplots in Practice. Fundacion BBVA, ISBN: 978-84-923846. https://www.fbbva.es/microsite/multivariate-statistics/biplots.html
Orlov K (2015) Answer to "PCA and Correspondence analysis in their relation to Biplot". CrossValidated, accessed 2019-07-12. https://stats.stackexchange.com/a/141755/68743