朝夕说 · 英语阅读

WitChi: Efficient Detection and Pruning of Compositional Bias in Phylogenomic Alignments Using Empirical Chi-Squared Testing

C2科学246 词约 1 分钟

Convergent evolution, where unrelated taxa independently evolve similar nucleotide or amino acid compositions, can introduce compositional bias into biological sequence data. Such biases distort phylogenetic inference, particularly in deep or unevenly sampled phylogenomic datasets. While composition-aware models can mitigate this issue, their computational demands often preclude their use in large-scale analyses. We present WitChi, a computationally efficient tool for identifying and removing compositionally biased alignment columns using empirical significance testing. WitChi calculates taxon-specific chi-squared ({chi}{superscript 2}) scores and compares them to null distributions derived from permutations within alignment columns that preserve the phylogenetic structure of the alignment. Sites most responsible for deviation from the expected null are iteratively pruned using one of three scoring algorithms until the bias is no longer statistically detectable. Z-scores and p-values are provided for both taxa and alignments, offering interpretable metrics of the magnitude of compositional bias. Pruning of simulated compositional heterogeneous alignments show that WitChi reliably restores correct topologies under standard, compositionally stationary models. In benchmarks, WitChi outperforms BMGEs stationary-based trimming while scaling linearly with taxon number. Applied to the archaeal GTDB r220 dataset (5,869 taxa; 10,101 sites), WitChi completes pruning in under one hour on four CPU cores. The resulting phylogeny recovers key clades previously resolved only by in-depth analyses using complex models of sequence evolution. WitChi provides an efficient, scalable solution for detecting and removing compositional bias in phylogenomic datasets comprising thousands to tens of thousands of taxa, enabling more accurate phylogenetic inference across the tree of life.

Koestlbacher, S. et al. · CC-BY 4.0

朝夕说 · 听说读写背单词 · 赣ICP备2026010754号

免费继续阅读全文 · 查词 · AI 精讲