Annotation

Annotation of the AgamP3 assembly was carried out by VectorBase. The set of gene models presented (genebuild 4, released June 2007) combines manual annotation of chromosome arm 2L, data provided by the research community, and gene prediction using the Ensembl system. Prediction utilised alignments of dipteran and other protein sets to the genome and generation of GeneWise models, alignment and gene prediction based on Anopheles ESTs, and selected ab initio predictions. 'Known' genes are those which could be named using entries from the community gene symbol database or from UniProt. More details can be found at VectorBase.

New Identifiers

From genebuild 4, VectorBase stable identifiers for genes, transcripts and proteins are used instead of Ensembl-style identifiers. All VB identifiers for Anopheles gambiae begin with the 4 letters AGAP. The same numeric value is used for a gene and all its products. Alternative products of a single gene have suffixes ending A, B, C etc.

Ensembl-style identifiers in the previous genebuild have been mapped where possible to the new identifiers. The web browser page IDHistoryView summarises relationships between old and new identifiers. This page can be reached by clicking the 'ID history' link at the left of each gene view page or from the search results for an unmapped old-style identifier.

Example: Gene CPR34 has the VB identifier AGAP006864, with transcript AGAP006864-RA and protein AGAP006864-PA. The corresponding old identifiers were gene ENSANGG00000020866, transcript ENSANGT00000023320 and protein ENSANGP00000024283.