Genomics data analysis requires normalization of feature counts that stabilizes technical variance, accounts for variable cell sequencing depth, and preserves monotonicity of within-cell feature abundances. We show that normalization via an optimal variance stabilizing transform for negative binomial count data followed by a proportional fitting step (PFlog) is the only feature-relabeling-equivariant method satisfying the three desiderata. We demonstrate superior performance of this method, which is equivalent to a shifted centered-log ratio transform, in comparison to other normalizations on numerous benchmarks across hundreds of single-cell RNA-seq datasets. We further show that both the shifted-log scale and centered-log ratio geometry are important for preserving PCA and k-NN structure.
Booeshaghi, A. S. et al. · CC-BY 4.0