tSNE

t-distributed stochastic neighbor embedding or more commonly referred to as tSNE is technique used to reduce the dimensions of a data set, but why is this desired? Effective data visualization can only be done in a couple dimensions, so when there are many variables, such is the case in single cell transcriptomics, reducing the number of axes in a plot to only a few is necessary for visualization, this is where tSNE comes in. The type of underlying correlations (linear vs non-linear) in the data impact the effectiveness of dimensionality reduction algorithms. If the data is non-linear, then linear techniques such as PCA (Principal component analysis) will be ineffective in reducing the dimensions, but because tSNE is a non-linear method it would be effective. The tSNE algorithm works by iteratively adjusting the spacing of points in the low dimensional space so that the distribution of points is very similar to that in the high dimensional space. This iteration means, however, that the same map from high to low dimensionality cannot be used for different data.

Author: Brian Ladd (ESR5)

data visualization, dimensionality reduction, high dimensional data, statistics, tSNE