AI-driven hypothesis generation

OpenBioLink: A benchmark for automated hypothesis generation and link prediction in biomedicine

OpenBioLink is a resource and evaluation framework for evaluating link prediction models on heterogeneous biomedical graph data. It contains benchmark datasets as well as tools for creating custom benchmarks and training and evaluating models.

OpenBioLink overview

OpenBioLink project on Github

Paper preprint on arXiv

Peer reviewed version in the journal Bioinformatics (for citations)

Supplementary data

The OpenBioLink benchmark aims to meet the following criteria:

Openly available
Large-scale
Wide coverage of current biomedical knowledge and entity types
Standardized, balanced train-test split
Open-source code for benchmark dataset generation
Open-source code for evaluation (independent of model)
Integrating and differentiating multiple types of biological entities and relations (i.e., formalized as a heterogeneous graph)
Minimized information leakage between train and test sets (e.g., avoid inclusion of trivially inferable relations in the test set)
Coverage of true negative relations, where available
Differentiating high-quality data from noisy, low-quality data
Differentiating benchmarks for directed and undirected graphs in order to be applicable to a wide variety of link prediction methods
Clearly defined release cycle with versions of the benchmark and public leaderboard