We performed sequencing analyses on a subset of the UNC breast tumors analyzed by microarray for EGFR mutations
The ratio of the rate of non synonymous to synonymous change, dNdS, for a gene provides one measure to detect such selection pressure. following website We compared the dNdS for GWAS and drug target genes using human mouse and human chimp data from H invDB and found both are under stronger selection than all genes. We found HGMD genes also exhibit nega tive Enzastaurin CAS selection in recent history. The selection against variants in drug target genes is slightly stronger than that against variants in GWAS reported genes for dNdS calculated using human chimp orthologs, suggesting the selleckchem PKC inhibitor selection is stronger for drug targets in recent history. These three fea tures capture the enrichment information from the ana lysis above, but there are some subtle relationships not included. The problem of identifying drug targets based on their relationship to GWAS genes is similar to the problem of finding missing relationships in social net work analysis. We therefore also use common friends with GWAS genes, a widely used feature in the social network machine learning field. The common neighbor feature is defined as the proportion of neigh bors shared by two genes In which NA is the set of Neighbors for gene A, NB is the set of Neighbors for gene B. The total number of features for each gene is 3N, where N is the number of GWAS genes for that disease that are mapped to the protein network. Since the num ber of drug targets for a disease is very small compared to the total number of genes in the FI network, the training set is highly unbalanced if we use the latter as the true negative set.
To address this issue, we focus on the 932 existing drug targets in Drug bank that are also in the FI network, and thus restrict the task to identifying targets for existing drugs that can potentially be repurposed to treat other diseases. Repur posing is an attractive goal, since such use is much easier than developing a new drug from scratch. We include the 30 diseases with at least 10 approved drug targets and 10 GWAS genes in the FI network. We tested four machine learning methods using the WEKA software package a SVM with a polynomial kernel, a SVM with a RBF kernel, a Na ve Bayes Network, and Random Forests. Among these the best result is achieved by a Random forest. The best case is Kawasaki disease, with a true positive rate of 70% and a false positive rate of 2. 7%. Potential new drug targets for drug repurposing The false positive drug targets are drug targets for other diseases which have very similar network proper ties to those of the disease under study. These may indeed be mistakes made by the classifier. However, some of these false positive drug targets may be good candidates for repurposing, not previously identified. For example, C1QB and C1QC are the highest scoring proteins in the false positive list for the best case, Kawa saki disease, These are subcomponents of complement C1Q.