Ensembl FamilyView

Ensembl 'FamilyView' provides a list of closely related Ensembl gene predictions together with a consensus family description and shows the chromosomal location of family members on a karyotype ideogram. It also provides a list of vertebrate UniProt sequences and Ensembl protein predictions from other species that have been used to define the family. It therefore provides a way of exploring orthologues and closely related homologues across a range of animal species.

Ensembl Protein Family Members

The link 'Export a list of genes containing this family' will take you to the 'EnsMart' data mining system. The protein family will already have been used as a filter, and you will be ready to export the list of genes. Similarly, the link 'dump this family as FASTA' takes you to an EnsMart page ready to export the set of sequences.

Clustering

The protein family database is generated by running the Markov Clustering (MCL) algorithm (1, 2, 3, 4) as initially proposed by A.J. Enright, S. van Dongen and C.A. Ouzounis (5). Prior to the clustering process, an all-against-all BLASTP sequence similarity search is run on the super-set of all Ensembl protein predictions of all species, together with all metazoan sequences from UniProt/Swiss-Prot and UniProt/TrEMBL, to establish similarities. Using these similarities, protein family clusters are established running the MCL algorithm.

For each cluster thus obtained, a Consensus Annotation is automatically generated from the UniProt/Swiss-Prot and UniProt/TrEMBL description lines of all UniProt members of each cluster using a longest common sub-string approach.

The annotation confidence score is the percentage of UniProt family members that have this annotation, or part of it, in UniProt. Note that only family members with 'informative' UniProt descriptions are taken into account. If the description covers less than 40% of these UniProt members, the family description is assigned 'AMBIGUOUS'.

References

1. Stijn van Dongen
Graph Clustering by Flow Simulation.
PhD thesis, University of Utrecht, May 2000
[Full text]

2. Stijn van Dongen
A cluster algorithm for graphs.
Technical Report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000
[PostScript file]

3. Stijn van Dongen
A stochastic uncoupling process for graphs.
Technical Report INS-R0011, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000
[PostScript file]

4. Stijn van Dongen
Performance criteria for graph clustering and Markov cluster experiments.
Technical Report INS-R0012, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000
[PostScript file]

5. Anton J. Enright, Stijn van Dongen and Christos A. Ouzounis
An efficient algorithm for large-scale detection of protein families.
Nucleic Acids Res. 2002 Apr 1;30(7):1575-1584.
[Abstract] [Full text]

The search box at the top of the page allows you to search for any identifier present in Ensembl. For detailed instructions see the Ensembl 'TextView' page.