Variational Infinite Heterogeneous Mixture Model for Semi-supervised Clustering of Heart Enhancers.

Mehdi TF, Singh G, Mitchell JA, Moses AM

Bioinformatics 2019 Feb 07; ():

PMID: 30753279

Abstract

Mammalian genomes can contain thousands of enhancers but only a subset are actively driving gene expression in a given cellular context. Integrated genomic datasets can be harnessed to predict active enhancers. One challenge in integration of large genomic datasets is the increasing heterogeneity: continuous, binary and discrete features may all be relevant. Coupled with the typically small numbers of training examples, semi-supervised approaches for heterogeneous data are needed; however, current enhancer prediction methods are not designed to handle heterogeneous data in the semi-supervised paradigm.