We performed sequencing analyses on a subset of the UNC breast tumors analyzed by microarray for EGFR mutations
The ratio of the rate of non synonymous to synonymous change, dNdS, for a gene provides one measure to detect such selection pressure. SGC0946 cost We compared the dNdS for GWAS and drug target genes using human mouse and human chimp data from H invDB and found both are under stronger selection than all genes. We found HGMD genes also exhibit nega tive http://www.selleckchem.com/products/Enzastaurin.html selection in recent history. The selection against variants in drug target genes is slightly stronger than that against variants in GWAS reported genes for dNdS calculated using human chimp orthologs, suggesting the PKC pathway selection is stronger for drug targets in recent history. This quantity is highly dependent on the total number of neighbors a gene has, so we also use the degree of the gene as a control. As the previous analysis shows, second neighbors of drug targets genes are also enriched for GWAS genes, thus we also use the number of second neighbor GWAS genes of a gene as a feature. These three fea tures capture the enrichment information from the ana lysis above, but there are some subtle relationships not included. The problem of identifying drug targets based on their relationship to GWAS genes is similar to the problem of finding missing relationships in social net work analysis. We therefore also use common friends with GWAS genes, a widely used feature in the social network machine learning field. The common neighbor feature is defined as the proportion of neigh bors shared by two genes In which NA is the set of Neighbors for gene A, NB is the set of Neighbors for gene B. The total number of features for each gene is 3N, where N is the number of GWAS genes for that disease that are mapped to the protein network. Since the num ber of drug targets for a disease is very small compared to the total number of genes in the FI network, the training set is highly unbalanced if we use the latter as the true negative set.
To address this issue, we focus on the 932 existing drug targets in Drug bank that are also in the FI network, and thus restrict the task to identifying targets for existing drugs that can potentially be repurposed to treat other diseases. Repur posing is an attractive goal, since such use is much easier than developing a new drug from scratch. We include the 30 diseases with at least 10 approved drug targets and 10 GWAS genes in the FI network. We tested four machine learning methods using the WEKA software package a SVM with a polynomial kernel, a SVM with a RBF kernel, a Na ve Bayes Network, and Random Forests. Among these the best result is achieved by a Random forest. The best case is Kawasaki disease, with a true positive rate of 70% and a false positive rate of 2. 7%. Potential new drug targets for drug repurposing The false positive drug targets are drug targets for other diseases which have very similar network proper ties to those of the disease under study. These may indeed be mistakes made by the classifier.