Bioinformatics Paper Wei Zhang et al. | University of Miami

An Integrative Multi-Omics Random Forest Framework (MRF-IMD)

Robust biomarker discovery via Inverse Minimal Depth importance.

๐Ÿงฌ Executive Summary

Current high-throughput technologies produce vast multi-omics data. However, finding shared biomarkers across these layers is challenging when relationships are nonlinear. This paper presents MRF-IMD, an unsupervised Multivariate Random Forest framework. It uses a novel metric, Inverse Minimal Depth (IMD), to prioritize shared features.

โœ… Problem

Linear methods (SPLS, CCA) fail to capture complex, nonlinear biological interactions across omics layers.

๐Ÿš€ Solution

MRF-IMD captures nonlinear hubs and uses 3 selection strategies (Filter, Mixture, Transform) for robust discovery.

Impact Highlights

  • 1 Outperforms SPLS/CCA in nonlinear simulations.
  • 2 Identified 8 Tumor Clusters in Pan-Cancer analysis.
  • 3 Improved Dementia Prediction (P=0.033) over existing scores.

The MRF-IMD Workflow

๐Ÿ“Š

1. Multi-Omics Input

Samples with matched data (e.g., Gene Exp + Methylation). One layer acts as predictors (X), the other as multivariate response (Y).

๐ŸŒฒ

2. Multivariate Forest

Fit Random Forest. Trees split nodes to maximize heterogeneity in the multivariate response Y.

๐Ÿ“‰

3. Calculate IMD

Compute Inverse Minimal Depth. Strong variables appear closer to the root (depth 0), resulting in high IMD scores.

๐ŸŽฏ

4. Feature Selection

Apply selection strategy (Filter, Mixture, or Transform) to identify robust shared biomarkers.

Strategy A: Filter

Selects variables above a threshold (ฯ„ ยท ฯƒ). Best for parsimonious, stable signatures (e.g., ~73 genes in BRCA).

Strategy B: Mixture

Fits a 2-component mixture model to separation signal vs. noise. Offers a balanced trade-off.

Strategy C: Transform

Standardizes IMD using a t-score. Best for detecting subtle signals or interaction effects.