An ensemble-based drug–target interaction prediction approach using multiple feature information with data balancing

El-Behery, Heba; Attia, Abdel-Fattah; El-Fishawy, Nawal; Torkey, Hanaa

doi:10.1186/s13036-022-00296-7

Journal of Biological Engineering

Table 1 Summary and comparison of DTI prediction methods for identification interactions relative to our presented framework

From: An ensemble-based drug–target interaction prediction approach using multiple feature information with data balancing

Paper	Drug feature and protein feature	Method for negative samples	Description	Method
DTI-SNNFRA [30] (2021)	Drug: constitutional, topological, and geometrical descriptors. Protein: amino acid, pseudoamino acid, and CTD	First is the similarity between the drugs and the proteins. Then, the shared nearest neighbors and k-medoids clustering	First, the similarity between the drugs and the proteins. Then, the shared nearest neighbors and k-medoids clustered using the RUSBoost classifier for the prediction stage.	1. Shared nearest neighbors 2. RUSBoost Classifier
DeepCon [31] (2019)	Drug: Morgan fingerprint Protein: CNN on raw protein sequence, CTD	Dependent on the similarity between the drugs and the proteins; then compute the distance between the drug and protein.	First compute the distance depending on the similarity of drug and target features for predict the negative samples to achieved the class balance, second apply to DBN for prediction stage.	1. The similarity of drug and target features 2. Deep belief network (DBN)
Idti-MLKdr [32] (2021)	Drug: Morgan fingerprint Drug: AAC, DC, TC	evaluate the molecular similarity of drug and target features based on the Tanimoto coefficient (TC). Then, the Cluster-Based Molecular Similarity algorithm calculates and selects the top-ranked drugs and targets.	The Tanimoto coefficient (TC) depends on the similarity between the drugs and between the proteins. Then, use Cluster algorithm and finally using Multikernel learning (MKL).	1. Cluster algorithm 2. Multikernel learning (MKL)
PreDTIs [33] (2021)	Drug: drug-molecular substructure pattern fingerprint Protein: Psepssm	Using the SVM classifier. Then, the Euclidean distance is calculated from the predicted and the value of the real features	Use the SVM classifier. Then, calculate the Euclidean distance between the real and predicted values, using the LightGBM for prediction.	1. Euclidean distance 2. LightGBM Classifier
[20] (2020)	Drug: molecular substructure fingerprints Protein: Apply the PSSM, and then, apply the LOOP method to extract protein feature	Randomly select the number of negative samples, which is the same as the number of positive samples.	Randomly select the negative samples, equal to the positive samples. Apply the rotation forest for prediction.	1. Rotation forest
[35] (2020)	Drug: Morgan fingerprint. Protein: 20 amino acids	The negative sample sets consist of the same number of randomly selected pairs of unrelated drugs and proteins.	Randomly select the negative samples. Apply Random Forest for prediction.	1. Random Forest classifier
[16] (2017)	Drug: molecular descriptors and molecular fingerprints (MFs). Protein: AAC, DC, and TC	The negative dataset can be randomly selected from the DTS.	Random select the negative samples. Apply the deep belief network for prediction	1. Deep belief network (DBN)
[34] (2020)	Drug: (E-state) fingerprints Protein: (APAAC)	The Euclidean distance from all unlabeled samples to the positive center is calculated and sorted. The farther the distance is, the more likely the sample is to be negative.	The Euclidean distance from all unlabeled samples to the positive center. Apply support vector machines (SVM) for prediction.	1. Euclidean distance 2. Support vector machines (SVM)

Back to article page

ISSN: 1754-1611

Contact us

General enquiries: journalsubmissions@springernature.com