Balancing and Refining Representations for DTI Prediction: A Framework Combining One-SVM-US and a Modified VAE

Ghanbari, Ali; Keyhanian, Mohaddeseh; Pirgazi, Jamshid

doi:10.22044/jadm.2026.16927.2825

Document Type : Conceptual Paper

Authors

Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran.

https://doi.org/10.22044/jadm.2026.16927.2825

Abstract

Accurate prediction of drug–target interactions is essential for advancing drug discovery and repositioning efforts. This study introduces a comprehensive framework that effectively addresses key challenges in DTI prediction, including dataset imbalance and high-dimensional feature representations. The approach integrates multiple protein descriptors—specifically, nine statistical and sequence-based features—and drug molecular fingerprints encoded via Morgan algorithms, with optimal feature combinations selected through validation to capture diverse biological and chemical information. To mitigate dataset imbalance, a one-class SVM-based undersampling method (One-SVM-US) models the distribution of positive interactions to guide the selective reduction of the majority class, thereby effectively balancing positive and negative samples. Furthermore, a supervised, classification-oriented variational autoencoder is employed to compress the high-dimensional features into a lower-dimensional space while preserving class-discriminative information relevant to interaction prediction. The refined features are then classified using machine learning models to predict potential drug–target pairs. Experimental evaluations on benchmark datasets demonstrate the effectiveness of the proposed framework, with results showing perfect AUC-ROC scores of 1.00 on the EN, GPCR, and NR datasets, and a score of 0.9731 on the IC dataset, indicating performance improvements over existing methods. These findings confirm the robustness and potential of the approach as a reliable tool for drug–target interaction prediction.

Keywords

Main Subjects

H.3.2.10. Medicine and science

References

[1] S. M. Ivanov, A. A. Lagunin, P. V. Pogodin, D. A. Filimonov, and V. V. Poroikov, “Identification of drug targets related to the induction of ventricular tachyarrhythmia through a systems chemical biology approach,” Toxicol. Sci., vol. 145, no. 2, pp. 321–336, 2015.

[2] O. E. Ibitoye and M. E. Soliman, “Machine learning in enhancing protein binding sites predictions – what has changed since then?” Comb. Chem. High Throughput Screen., vol. 28, no. 10, pp. 1640–1653, 2025.

[3] F. Yang, Q. Zhang, X. Ji, Y. Zhang, W. Li, S. Peng, and F. Xue, “Machine learning applications in drug repurposing,” Interdiscip. Sci. Comput. Life Sci., vol. 14, no. 1, pp. 15–21, 2022.

[4] A. Ezzat, M. Wu, X.-L. Li, and C.-K. Kwoh, “Drug-target interaction prediction via class imbalance-aware ensemble learning,” BMC Bioinformatics, vol. 17, no. Suppl 19, p. 509, 2016.

[5] H. Shi, S. Liu, J. Chen, X. Li, Q. Ma, and B. Yu, “Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure,” Genomics, vol. 111, no. 6, pp. 1839–1852, 2019.

[6] S. H. Mahmud, W. Chen, H. Jahan, Y. Liu, N. I. Sujan, and S. Ahmed, “iDTi-CSsmoteB: Identification of drug–target interaction based on drug chemical structure and protein sequence using XGBoost with over-sampling technique SMOTE,” IEEE Access, vol. 7, pp. 48699–48714, 2019.

[7] Q. An and L. Yu, “A heterogeneous network embedding framework for predicting similarity-based drug-target interactions,” Brief. Bioinform., vol. 22, no. 6, p. bbab275, 2021.

[8] A. G. Sorkhi, Z. Abbasi, M. I. Mobarakeh, and J. Pirgazi, “Drug–target interaction prediction using unifying of graph regularized nuclear norm with bilinear factorization,” BMC Bioinformatics, vol. 22, no. 1, p. 555, 2021.

[9] H. El-Behery, A.-F. Attia, N. El-Fishawy, and H. Torkey, “Efficient machine learning model for predicting drug-target interactions with case study for Covid-19,” Comput. Biol. Chem., vol. 93, p. 107536, 2021.

[10] D. Iliadis, B. De Baets, T. Pahikkala, and W. Waegeman, “A comparison of embedding aggregation strategies in drug–target interaction prediction,” BMC Bioinformatics, vol. 25, no. 1, p. 59, 2024.

[11] A. Dehghan, K. Abbasi, P. Razzaghi, H. Banadkuki, and S. Gharaghani, “CCL-DTI: Contributing the contrastive loss in drug–target interaction prediction,” BMC Bioinformatics, vol. 25, no. 1, p. 48, 2024.

[12] M. Kalemati, M. Zamani Emani, and S. Koohi, “DCGAN-DTA: Predicting drug-target binding affinity with deep convolutional generative adversarial networks,” BMC Genomics, vol. 25, no. 1, p. 411, 2024.

[13] M. Yazdani-Jahromi, N. Yousefi, A. Tayebi, E. Kolanthai, C. J. Neal, S. Seal, and O. O. Garibay, “AttentionSiteDTI: An interpretable graph-based model for drug-target interaction prediction using NLP sentence-level relation classification,” Brief. Bioinform., vol. 23, no. 4, p. bbac272, 2022.

[14] M. Li, H. Liu, F. Kong, and P. Lv, “DTRE: A model for predicting drug-target interactions of endometrial cancer based on heterogeneous graph,” Future Gener. Comput. Syst., vol. 161, pp. 478–486, 2024.

[15] J. Wang, Y. Xiao, X. Shang, and J. Peng, “Predicting drug–target binding affinity with cross-scale graph contrastive learning,” Brief. Bioinform., vol. 25, no. 1, p. bbad516, 2024.

[16] Z. Liu, Q. Chen, W. Lan, H. Lu, and S. Zhang, “SSLDTI: A novel method for drug-target interaction prediction based on self-supervised learning,” Artif. Intell. Med., vol. 149, p. 102778, 2024.

[17] J. Lee, D. W. Jun, I. Song, and Y. Kim, “DLM-DTI: A dual language model for the prediction of drug-target interaction with hint-based learning,” J. Cheminformatics, vol. 16, no. 1, p. 14, 2024.

[18] B.-W. Zhao, X.-R. Su, P.-W. Hu, Y.-A. Huang, Z.-H. You, and L. Hu, “iGRLDTI: An improved graph representation learning method for predicting drug–target interactions over heterogeneous biological information network,” Bioinformatics, vol. 39, no. 8, p. btad451, 2023.

[19] A. Dalkıran, A. Atakan, A. S. Rifaioğlu, M. J. Martin, R. Ç. Atalay, A. C. Acar, T. Doğan, and V. Atalay, “Transfer learning for drug–target interaction prediction,” Bioinformatics, vol. 39, no. Supplement_1, pp. i103–i110, 2023.

[20] R. Zhang, Z. Wang, X. Wang, Z. Meng, and W. Cui, “Mhtan-dti: Metapath-based hierarchical transformer and attention network for drug–target interaction prediction,” Brief. Bioinform., vol. 24, no. 2, p. bbad079, 2023.

[21] M. Li, X. Cai, S. Xu, and H. Ji, “Metapath-aggregated heterogeneous graph neural network for drug–target interaction prediction,” Brief. Bioinform., vol. 24, no. 1, p. bbac578, 2023.

[22] Q. Ye, X. Zhang, and X. Lin, “Drug–target interaction prediction via multiple classification strategies,” BMC Bioinformatics, vol. 22, no. Suppl 12, p. 461, 2022.

[23] K. Huang, C. Xiao, L. M. Glass, and J. Sun, “MolTrans: Molecular interaction transformer for drug–target interaction prediction,” Bioinformatics, vol. 37, no. 6, pp. 830–836, 2021.

[24] G. Liu, M. Singha, L. Pu, P. Neupane, J. Feinstein, H.-C. Wu, J. Ramanujam, and M. Brylinski, “GraphDTI: A robust deep learning predictor of drug-target interactions from multiple heterogeneous data,” J. Cheminformatics, vol. 13, no. 1, p. 58, 2021.

[25] Y. Chu, A. C. Kaushik, X. Wang, W. Wang, Y. Zhang, X. Shan, D. R. Salahub, Y. Xiong, and D.-Q. Wei, “DTI-CDF: A cascade deep forest model towards the prediction of drug-target interactions based on hybrid features,” Brief. Bioinform., vol. 22, no. 1, pp. 451–462, 2021.

[26] Y. Chu, X. Shan, D. R. Salahub, Y. Xiong, and D.-Q. Wei, “Predicting drug-target interactions using multi-label learning with community detection method (DTI-MLCD),” bioRxiv, 2020. [Online]. Available: https://doi.org/10.1101/2020.05.11.087734.

[27] T. Hinnerichs and R. Hoehndorf, “DTI-Voodoo: Machine learning over interaction networks and ontology-based background knowledge predicts drug–target interactions,” Bioinformatics, vol. 37, no. 24, pp. 4835–4843, 2021.

[28] M. Bhasin and G. P. Raghava, “Classification of nuclear receptors based on amino acid composition and dipeptide composition,” J. Biol. Chem., vol. 279, no. 22, pp. 23262–23266, 2004.

[29] V. Saravanan and N. Gautham, “Harnessing computational biology for exact linear B-cell epitope prediction: A novel amino acid composition-based feature descriptor,” OMICS, vol. 19, no. 10, pp. 648–658, 2015.

[30] T.-Y. Lee, Z.-Q. Lin, S.-J. Hsieh, N. A. Bretaña, and C.-T. Lu, “Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences,” Bioinformatics, vol. 27, no. 13, pp. 1780–1787, 2011.

[31] K.-C. Chou, “Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes,” Bioinformatics, vol. 21, no. 1, pp. 10–19, 2005.

[32] T. I. Baig, Y. D. Khan, T. M. Alam, B. Biswal, H. Aljuaid, and D. Q. Gillani, “ILipo-PseAAC: Identification of lipoylation sites using statistical moments and general PseAAC,” Comput. Mater. Contin., vol. 71, no. 1, pp. 215–230, 2022.

[33] Y. D. Khan, N. Rasool, W. Hussain, S. A. Khan, and K.-C. Chou, “iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC,” Anal. Biochem., vol. 550, pp. 109–116, 2018.

[34] B. R. Donald, Algorithms in Structural Molecular Biology. Cambridge, MA, USA: MIT Press, 2023.

[35] E. Contreras-Torres, “Predicting structural classes of proteins by incorporating their global and local physicochemical and conformational properties into general Chou’s PseAAC,” J. Theor. Biol., vol. 454, pp. 139–145, 2018.

[36] H.-B. Shen and K.-C. Chou, “Nuc-PLoc: A new web-server for predicting protein subnuclear localization by fusing PseAA composition and PsePSSM,” Protein Eng. Des. Sel., vol. 20, no. 11, pp. 561–567, 2007.

[37] S. Akbar, S. Khan, F. Ali, M. Hayat, M. Qasim, and S. Gul, “iHBP-DeepPSSM: Identifying hormone binding proteins using PsePSSM based evolutionary features and deep learning approach,” Chemom. Intell. Lab. Syst., vol. 204, p. 104103, 2020.

[38] B. Yu, S. Li, W.-Y. Qiu, C. Chen, R.-X. Chen, L. Wang, M.-H. Wang, and Y. Zhang, “Accurate prediction of subcellular location of apoptosis proteins combining Chou’s PseAAC and PsePSSM based on wavelet denoising,” Oncotarget, vol. 8, no. 64, p. 107640, 2017.

[39] D. T. Jones, “Protein secondary structure prediction based on position-specific scoring matrices,” J. Mol. Biol., vol. 292, no. 2, pp. 195–202, 1999.

[40] X. Chen, Z. L. Ji, and Y. Z. Chen, “TTD: Therapeutic target database,” Nucleic Acids Res., vol. 30, no. 1, pp. 412–415, 2002.

[41] K. C. Chou, “Prediction of protein cellular attributes using pseudo-amino acid composition,” Proteins, vol. 43, no. 3, pp. 246–255, 2001.

[42] Z. Chen, P. Zhao, F. Li, A. Leier, T. T. Marquez-Lago, Y. Wang, G. I. Webb, A. I. Smith, R. J. Daly, and K.-C. Chou, “iFeature: A python package and web server for features extraction and selection from protein and peptide sequences,” Bioinformatics, vol. 34, no. 14, pp. 2499–2502, 2018.

[43] Z. Mousavian, S. Khakabimamaghani, K. Kavousi, and A. Masoudi-Nejad, “Drug–target interaction prediction from PSSM based evolutionary information,” J. Pharmacol. Toxicol. Methods, vol. 78, pp. 42–51, 2016.

[44] H. Khojasteh, J. Pirgazi, and A. Ghanbari Sorkhi, “Improving prediction of drug-target interactions based on fusing multiple features with data balancing and feature selection techniques,” PLoS ONE, vol. 18, no. 8, p. e0288173, 2023.

[45] S. H. Mahmud, W. Chen, H. Meng, H. Jahan, Y. Liu, and S. M. Hasan, “Prediction of drug-target interaction based on protein features using undersampling and feature selection techniques with boosting,” Anal. Biochem., vol. 589, p. 113507, 2020.

[46] L. Wang, Z.-H. You, X. Chen, X. Yan, G. Liu, and W. Zhang, “Rfdt: A rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information,” Curr. Protein Pept. Sci., vol. 19, no. 5, pp. 445–454, 2018.

[47] S. H. Mahmud, W. Chen, Y. Liu, M. A. Awal, K. Ahmed, M. H. Rahman, and M. A. Moni, “PreDTIs: Prediction of drug–target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques,” Brief. Bioinform., vol. 22, no. 5, p. bbab046, 2021.

[48] Y. Wang, L. Wang, L. Wong, B. Zhao, X. Su, Y. Li, and Z. You, “ROFDT: Identification of drug–target interactions from protein sequence and drug molecular structure using rotation forest,” Biology, vol. 11, no. 5, p. 741, 2022.

[49] F.-R. Meng, Z.-H. You, X. Chen, Y. Zhou, and J.-Y. An, “Prediction of drug–target interaction networks from the integration of protein sequences and drug chemical structures,” Molecules, vol. 22, no. 7, p. 1119, 2017.

[50] Z. Li, P. Han, Z.-H. You, X. Li, Y. Zhang, H. Yu, R. Nie, and X. Chen, “In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences,” Sci. Rep., vol. 7, no. 1, p. 11174, 2017.

Journal of AI and Data Mining

Balancing and Refining Representations for DTI Prediction: A Framework Combining One-SVM-US and a Modified VAE

References

References

Volume 14, Issue 2
April 2026
Pages 257-273

Balancing and Refining Representations for DTI Prediction: A Framework Combining One-SVM-US and a Modified VAE

References

References

Volume 14, Issue 2April 2026Pages 257-273

Volume 14, Issue 2
April 2026
Pages 257-273