PROTRIDER: protein abundance outlier detection from mass spectrometry-based proteomics data with a conditional autoencoder

1:1
PROTRIDER is a method for detecting aberrant protein expression from mass spectrometry data that outperforms existing approaches and identifies enrichments of pathogenic variants, supporting its application in rare disease diagnostics and cancer proteomics.

Motivation: Detection of gene regulatory aberrations enhances our ability to interpret the impact of inherited and acquired genetic variation for rare disease diagnostics and tumor characterization. While numerous methods for calling RNA expression outliers from RNA-sequencing data have been proposed, the establishment of protein expression outliers from mass spectrometry data is lacking.

Results: Here, we propose and assess various modeling approaches to call protein expression outliers across three datasets from rare disease
diagnostics and oncology. We use as independent evidence the enrichment for outlier calls in matched RNA-seq samples and the enrichment for rare variants likely disrupting protein expression. We show that controlling for hidden confounders and technical covariates, while simultaneously modeling the occurrence of missing values, is largely beneficial and can be achieved using conditional autoencoders. Moreover, we find that the differences between experimental and fitted log-transformed intensities by such models exhibit heavy tails that are poorly captured with the Gaussian distribution and report stronger statistical calibration when instead using the Student’s t-distribution. Our resulting method, PROTRIDER, outperformed baseline approaches based on raw log-intensities Z-scores, PCA, and isolation-based anomaly detection with Isolation forests. The application of PROTRIDER reveals significant enrichments of AlphaMissense pathogenic variants in protein expression outliers. Overall, PROTRIDER provides a method to confidently identify aberrantly expressed proteins applicable to rare disease diagnostics and cancer proteomics.

Availability and implementation: PROTRIDER is freely available at github.com/gagneurlab/PROTRIDER and also available on Zenodo under
the DOI zenodo.15569781.

blank

publication

Year of publication

2025

Source

Bioinformatics, Volume 41, Issue 12, December 2025

Author

Daniela Klaproth-Andrade , Ines F Scheller , Georgios Tsitsiridis , Stefan Loipfinger , Christian Mertes , Dmitrii Smirnov , Holger Prokisch , Vicente A Yépez , Julien Gagneur

You might also be interested in

The NCL Foundation has opened the Rare-to-Common Neurodegeneration Impact Prize, a €200,000 research award supporting collaborative projects that connect CLN3 Batten disease research with more common neurodegenerative or age-related disorder
June 12 - June 15
The conference will take place in Rotterdam on 12–15 June 2027 and is planned as a hybrid event by the European Society of Human Genetics.
At the European Human Genetics Conference 2026 in Gothenburg, ERDERA’s Diagnostic Research Workstream reviewed progress, highlighted early results and used a major European genetics meeting to examine how advances in data sharing and genomic analysis may strengthen rare disease diagnosis across countries.
Boston, 9–11 June 2026: ERDERA's Scientific Coordinator joined the World Orphan Drug Congress USA to set out how stronger clinical research networks can make rare disease trials more feasible across sites and borders.