As stated, the sequence of your T. cruzi genome was obtained using an entire genome shotgun strategy, from a hybrid clone. Due to the sequence divergence between alleles in the CL Brener clone, assembly of this genome resulted in lots of circumstances while in the separation of those alleles into separate contigs. This permitted us to align these sequences and determine sequence distinctions. On the other hand, due to the repetitive nature of your T. cruzi genome, we chose to focus this preliminary hard work on mapping the genetic diversity in typically single copy protein coding loci. These have been defined as those sequences repre sented by no over 2 coding sequences from the CL Brener genome in our sequence alignments.

Sequences used on this get the job done consist of all of the annotated coding sequences from the reference CL Brener genome, plus the corresponding coding sequences from your Sylvio X10 genome, at the same time as other publicly offered sequence information. Right after clustering sequences by similarity we obtained seven,639 many se quence alignments, 71. 3% of which had 2 reference coding sequences from your CL Brener genome. Other alignments have expanding numbers of reference coding sequences. These set of alignments is made up of sequences for many on the large gene households of T. cruzi, and were not considered even more. Even soon after this stringent filte ring, there have been even now quite a few alignments that contained only two reference sequences through the CL Brener genome, but that belonged to these large gene households mucins, mucin connected proteins, trans sialidase like proteins, and so on.

These correspond to circumstances where extremely equivalent copies of members of the family members have been separated from their paralogs throughout the clustering or assembly techniques. Eventually, a variety of alignments had just one reference sequence from your CL Brener hybrid. These scenarios may perhaps correspond to haploid regions during the hybrid genome or to instances exactly where two extremely divergent alleles had been separated through the clus tering phase. We then scanned the numerous sequence alignments and identified columns containing sequence differ ences and or indels. Through the set of all alignments we recognized 325,355 sites with variation, of which 28,316 corresponded to tiny indels. These polymorphic sites provide representative infor mation about the diversity located in T. cruzi evolutionary lineages TcI, TcVI, but additionally in lineages TcII and TcIII. Columns containing variation in a various sequence alignment may perhaps correspond to polymorphic web pages or to sequencing errors. To discriminate involving these possi bilities, we also analyzed the sequence community all over every probable SNP. Determined by this examination we uncovered 302,390 SNPs located in areas using a very low density of SNPs.