Drug repurposing (DR) has gained significant attention as a cost-effective strategy for identifying new therapeutic uses for existing drugs. Heterogeneous network-based methods are particularly promising because they exploit complex biological interactions. However, comprehensive benchmarking across multiple datasets is still needed to assess their reliability and generalizability. We systematically evaluate ten advanced heterogeneous network-based DR methods across eight datasets, including six publicly available and two newly introduced drug-disease datasets. The methods include (i) matrix factorization: NMF, NMF-PDR, NMF-DR, VDA-GKSBMF, (ii) matrix completion: BNNR, OMC, HGIMC, (iii) recommendation systems: IBCF, LIBMF, and (iv) a deep learning approach: DRDM. Performance is assessed using the area under the receiver operating characteristic (AUC) and precision-recall curve (AUPR). We also analyze the impact of data sparsity and compare findings with previous benchmarking studies. Our results reveal that OMC consistently achieves the highest AUC and AUPR across most datasets. BNNR, DRDM, HGIMC, VDA-GKSBMF, and NMF-PDR, also demonstrate competitive performance, with NMF-PDR outperforming other NMF-based approaches. We find that differences in cross-validation strategies substantially impact reported AUPR values, with previous studies overestimating performance by omitting many negative instances. This work provides a reliable benchmarking framework and new datasets, offering insights for future research in DR.

Benchmarking heterogeneous network-based methods for drug repurposing

Nguyen T. T.;Calza S.;
2026-01-01

Abstract

Drug repurposing (DR) has gained significant attention as a cost-effective strategy for identifying new therapeutic uses for existing drugs. Heterogeneous network-based methods are particularly promising because they exploit complex biological interactions. However, comprehensive benchmarking across multiple datasets is still needed to assess their reliability and generalizability. We systematically evaluate ten advanced heterogeneous network-based DR methods across eight datasets, including six publicly available and two newly introduced drug-disease datasets. The methods include (i) matrix factorization: NMF, NMF-PDR, NMF-DR, VDA-GKSBMF, (ii) matrix completion: BNNR, OMC, HGIMC, (iii) recommendation systems: IBCF, LIBMF, and (iv) a deep learning approach: DRDM. Performance is assessed using the area under the receiver operating characteristic (AUC) and precision-recall curve (AUPR). We also analyze the impact of data sparsity and compare findings with previous benchmarking studies. Our results reveal that OMC consistently achieves the highest AUC and AUPR across most datasets. BNNR, DRDM, HGIMC, VDA-GKSBMF, and NMF-PDR, also demonstrate competitive performance, with NMF-PDR outperforming other NMF-based approaches. We find that differences in cross-validation strategies substantially impact reported AUPR values, with previous studies overestimating performance by omitting many negative instances. This work provides a reliable benchmarking framework and new datasets, offering insights for future research in DR.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/639269
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact