Integrate Partially-Matched Multi-Omics Data with the MiniMax Statistic

Slides: https://gabriel.quarto.pub/jsm_2022_minimax/

Gabriel J. Odom, Antonio Colaprico, Tiago Silva, Steven Chen, and Lily Wang

1 August 2022

Motivation

  • You have multi-omics data related to the same disease (e.g. copy number variation, proteomics, or RNAseq)
  • Few, if any, of the samples are shared across all data types
  • You may only have gene summaries for some data

Sample Overlap for TCGA COADREAD Data

We can’t join on samples. We can’t join on genes. How can we join these data sets without losing lots of information?

Join on Biological Pathways!

A biological pathway represents a group of genes which work together to moderate a biological process.

“A biological pathway is a series of actions among molecules in a cell that leads to a certain product or a change in the cell. It can trigger the assembly of new molecules, such as a fat or protein, turn genes on and off, or spur a cell to move.” - NHGRI

The Algorithm

  1. Choose a collection of pathways, \(\mathbb{K}\), which overlap as little as possible (e.g. via the Greedy Set Cover algorithm)
  2. For each pathway \(k \in \mathbb{K}\) and each -omics data source \(1, \ldots, G\), calculate an appropriate pathway enrichment \(p\)-value, \(p_{kg}\)
  3. The MiniMax statistic for pathway \(k\) is the Minimum of all pairwise Maxima of \(p_{k1}, p_{k2}, \ldots, p_{kG}\)
  4. Algebraically, the MiniMax is equivalent to the second order statistic of \(p_{k\cdot}\), denoted \(p^{[2]}_k\)

Example: Find the MiniMax Statistic

Top pathways by MiniMax for colorectal adenocarcinoma; minimum p-values in light blue, p-value yielding the MiniMax Test Statistic in dark blue.

Distribution of the MiniMax Statistic

  • We now have a test statistic, but what is the null distribution?
  • As long as your -omics data sets do not share too many samples, \[ p^{[2]}_k \sim \mathcal{B}(\alpha = 2, \beta = G - 1), \] where \(\mathcal{B}\) is the Beta distribution.

The MiniMax Distribution, \(G = 4\)

We performed extensive simulations to explore the effects of correlation between \(p_{ki}\) and \(p_{kj}\). In most common use cases, the test size of the MiniMax statistic was \(\le 0.06\).

Our Paper

https://doi.org/10.3389/fgene.2021.783713

Thank you!