Drug repurposing (DR) has gained significant attention as a cost-effective strategy for identifying new therapeutic uses for existing drugs. Heterogeneous network-based methods are particularly promising because they exploit complex biological interactions. However, comprehensive benchmarking across multiple datasets is still needed to assess their reliability and generalizability. We systematically evaluate ten advanced heterogeneous network-based DR methods across eight datasets, including six publicly available and two newly introduced drug-disease datasets. The methods include (i) matrix factorization: NMF, NMF-PDR, NMF-DR, VDA-GKSBMF, (ii) matrix completion: BNNR, OMC, HGIMC, (iii) recommendation systems: IBCF, LIBMF, and (iv) a deep learning approach: DRDM. Performance is assessed using the area under the receiver operating characteristic (AUC) and precision-recall curve (AUPR). We also analyze the impact of data sparsity and compare findings with previous benchmarking studies. Our results reveal that OMC consistently achieves the highest AUC and AUPR across most datasets. BNNR, DRDM, HGIMC, VDA-GKSBMF, and NMF-PDR, also demonstrate competitive performance, with NMF-PDR outperforming other NMF-based approaches. We find that differences in cross-validation strategies substantially impact reported AUPR values, with previous studies overestimating performance by omitting many negative instances. This work provides a reliable benchmarking framework and new datasets, offering insights for future research in DR.
Benchmarking heterogeneous network-based methods for drug repurposing
Nguyen T. T.;Calza S.;
2026-01-01
Abstract
Drug repurposing (DR) has gained significant attention as a cost-effective strategy for identifying new therapeutic uses for existing drugs. Heterogeneous network-based methods are particularly promising because they exploit complex biological interactions. However, comprehensive benchmarking across multiple datasets is still needed to assess their reliability and generalizability. We systematically evaluate ten advanced heterogeneous network-based DR methods across eight datasets, including six publicly available and two newly introduced drug-disease datasets. The methods include (i) matrix factorization: NMF, NMF-PDR, NMF-DR, VDA-GKSBMF, (ii) matrix completion: BNNR, OMC, HGIMC, (iii) recommendation systems: IBCF, LIBMF, and (iv) a deep learning approach: DRDM. Performance is assessed using the area under the receiver operating characteristic (AUC) and precision-recall curve (AUPR). We also analyze the impact of data sparsity and compare findings with previous benchmarking studies. Our results reveal that OMC consistently achieves the highest AUC and AUPR across most datasets. BNNR, DRDM, HGIMC, VDA-GKSBMF, and NMF-PDR, also demonstrate competitive performance, with NMF-PDR outperforming other NMF-based approaches. We find that differences in cross-validation strategies substantially impact reported AUPR values, with previous studies overestimating performance by omitting many negative instances. This work provides a reliable benchmarking framework and new datasets, offering insights for future research in DR.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


