Annotation

The genome was aligned to human NCBI36 by UCSC using BLASTz. These alignments were used to transfer human ensembl gene structures (Human Build 36f) to chimpanzee. 92% of the chimp-specific proteins were aligned to the chimp genome in a first layer of annotation. The 8% missing correspond to fragments or proteins that contain stop codons in the assembled genome

More than 2000 chimp-specific protein sequences were used during the gene build process, and were aligned using a combination of Genewise and Exonerate. Owing to the small number of proteins (many of which aligned in the same location) an additional layer of gene structures was added by projection of human genes. The high-quality annotation of the human genome and the high degree of similarity between the human and chimpanzee genomes enables us to identify genes in chimpanzee by transfer of human genes to the corresponding location in chimp.

The protein-coding transcripts of the human gene structures are projected through the WGA onto the chromosomes in the chimp genome. Small insertions/deletions that disrupt the reading-frame of the resultant transcripts are corrected for by inserting "frame-shift" introns into the structure.

For some human exons and parts of exons, the corresponding chimp sequence is missing from the assembly. In most of these cases, the missing exon is omitted from the chimpanzee gene model. In a small number of cases however, where BLASTZ has aligned the human sequence to a gap in the chimp sequence, the exon is placed in the gap, resulting on a run of X's of the correct length in the translation.

Some human transcripts fail to transfer cleanly (due to, for example, missing alignment in the othologous regions). We have attempted to recover these using Exonerate. The single best exonerate alignment to chimp is chosen for each "missing" human transcript, and transcripts with less that 50% identity to the source or 50% coverage of the source are discarded.