Ensembl Variation Tables Description
About Variation Data |
Database Description |
Variation Sources |
Variation Tables Description |
Perl API
Introduction
This document gives a high-level description of the tables that
make up the Ensembl variation schema. Tables are listed by alphabetical order, and the purpose of each table is explained. It is intended to
allow people to familiarise themselves with the schema when
encountering it for the first time, or when they need to use some
tables that they've not used before.
This document refers to version 63 of the Ensembl
variation schema.
A PDF document of the schema is available here.
A colour legend is available at the bottom of the page.
List of the tables:
This table stores information about each of a variation's alleles, along with population frequencies.
Column | Type | Default value | Description | Index |
allele_id | int(10) | | Primary key, internal identifier. | primary key |
variation_id | int(10) | | Foreign key references to the variation table. | key: variation_idx |
subsnp_id | int(15) | | Foreign key references to the subsnp_handle table. | key: subsnp_idx |
allele | varchar(25000) | | Allele found at the variation location, for the sample. e.g. "A". | key: variation_idx |
frequency | float | | Frequency of this allele in the sample. | |
sample_id | int(10) | | Foreign key references to the sample table. | |
count | int(10) | NULL | Number of individuals in the sample where this allele is found. | |
See also:
|
|
allele_group |
Show columns |
This table, along with allele_group_allele, represents a particular multi-marker allele of a given multi-marker variation, or haplotype. It stores an associated population frequency.
Column | Type | Default value | Description | Index |
allele_group_id | int(10) | | Primary key, internal identifier. | primary key |
variation_group_id | int(10) | | Foreign key references to the variation_group table. | |
sample_id | int(10) | | Foreign key references to the population table. | |
name | varchar(255) | | The name of this allele group. | unique |
source_id | int(10) | | Foreign key references to the source table. | |
frequency | float | | The frequency of this allele_group within the referenced population. | |
See also:
|
|
allele_group_allele |
Show columns |
This table represents an allele of one variation in a multi-marker variation, or haplotype. It stores a string of the allele.
Column | Type | Default value | Description | Index |
allele_group_id | int(10) | | Primary key, internal identifier. | unique key: allele_idx |
allele | varchar(255) | | Nucleotid presents in the group. | |
variation_id | int(10) | | Foreign key references to the variation table. | unique key: allele_idx |
See also:
|
|
associate_study |
Show columns |
This table contains identifiers of associated studies (e.g. NHGRI and EGA studies with the same pubmed identifier).
Column | Type | Default value | Description | Index |
study1_id | int(10) | | Primary key. Foreign key references to the study table. | primary key |
study2_id | int(10) | | Primary key. Foreign key references to the study table. | primary key |
See also:
Defines various attributes used elsewhere in the database
Column | Type | Default value | Description | Index |
attrib_id | INT(11) | 0 | Primary key | primary key |
attrib_type_id | SMALLINT(5) | 0 | Key into the attrib_type table, identifies the type of this attribute | unique key: type_val_idx |
value | TEXT | | The value of this attribute | unique key: type_val_idx |
See also:
Groups related attributes together
Column | Type | Default value | Description | Index |
attrib_set_id | INT(11) | 0 | Primary key | unique key: set_idx |
attrib_id | INT(11) | 0 | Key of an attribute in this set | unique key: set_idx key: attrib_idx |
See also:
Defines the set of possible attribute types used in the attrib table
Column | Type | Default value | Description | Index |
attrib_type_id | SMALLINT(5) | 0 | Primary key | primary key |
code | VARCHAR(20) | '' | A short codename for this type (indexed, so should be used for lookups) | unique key: code_idx |
name | VARCHAR(255) | '' | The name of this type | |
description | TEXT | | Longer description of this type | |
See also:
|
|
compressed_genotype_single_bp |
Show columns |
This table holds genotypes compressed using the pack() method in Perl. These genotypes are mapped to particular genomic locations rather than variation objects. The data have been compressed to reduce table size and increase the speed of the web code.
Column | Type | Default value | Description | Index |
sample_id | int(10) | | Primary key. Foreign key references to the sample table. | |
seq_region_id | int(10) | | Foreign key references seq_region in core db. ers to the seq_region which this variant is on, which may be a chromosome, a clone, etc... | key: pos_idx |
seq_region_start | int | | The start position of the variation on the seq_region. | key: pos_idx |
seq_region_end | int | | The end position of the variation on the seq_region. | |
seq_region_strand | tinyint | | The orientation of the variation on the seq_region. | |
genotypes | blob | | Encoded representation of the genotype data: Each row in the compressed table stores genotypes from one individual in one fixed-size region of the genome (arbitrarily defined as 100 Kb). The compressed string (using Perl's pack method) consisting of a repeating triplet of elements: a distance in base pairs from the previous genotype followed by a pair of alleles. For example, a given row may have a start position of 1000, indicating the chromosomal position of the first genotype in this row. The unpacked genotypes field then may contain the following elements: 0, A, G, 20, C, C, 35, G, T, 320, A, A, ... The first genotype has a position of 1000 + 0 = 1000 and alleles A and G. The second genotype has a position of 1000 + 20 = 1020 and alleles C and C. The third genotype similarly has a position of 1055 and alleles G and T, and so on. | |
See also:
|
|
failed_allele |
Show columns |
Contains alleles that did not pass the Ensembl filters
Column | Type | Default value | Description | Index |
failed_allele_id | int(11) | | Primary key, internal identifier. | primary key |
allele_id | int(10) | | Foreign key references to the allele table. | unique key: allele_idx |
failed_description_id | int(10) | | Foreign key references to the failed_description table. | unique key: allele_idx |
See also:
|
|
failed_description |
Show columns |
This table contains descriptions of reasons for a variation being flagged as failed.
Column | Type | Default value | Description | Index |
failed_description_id | int(10) | | Primary key, internal identifier. | primary key |
description | text | | Text containing the reason why the Variation has been flagged as failed. e.g. "Variation does not map to the genome". | |
See also:
|
|
failed_variation |
Show columns |
For various reasons it may be necessary to store information about a variation that has failed quality checks in the Variation pipeline. This table acts as a flag for such failures.
Column | Type | Default value | Description | Index |
failed_variation_id | int(11) | | Primary key, internal identifier. | primary key |
variation_id | int(10) | | Foreign key references to the variation table. | unique key: variation_idx |
failed_description_id | int(10) | | Foreign key references to the failed_description table. | unique key: variation_idx |
See also:
|
|
flanking_sequence |
Show columns |
This table contains the upstream and downstream sequence surrounding a variation. Since each variation is defined by its flanking sequence, this table has a one-to-one relationship with the variation table.
Column | Type | Default value | Description | Index |
variation_id | int(10) | | Primary key. Foreign key references to the variation table. | primary key |
up_seq | text | | Upstream sequence, used to initially store the sequence from the core database, and in a later process get from here the position. | |
down_seq | text | | Downstream sequence, used to initially store the sequence from the core database, and in a later process get from here the position. | |
up_seq_region_start | int | | Position of the starting of the sequence in the region. | |
up_seq_region_end | int | | Position of the end of the sequence in the region. | |
down_seq_region_start | int | | Position of the starting of the sequence in the region. | |
down_seq_region_end | int | | Position of the end of the sequence in the region. | |
seq_region_id | int(10) | | Foreign key references seq_region in core db. Refers to the seq_region which this variant is on, which may be a chromosome or clone etc.. | |
seq_region_strand | tinyint | | The orientation of the variation on the seq_region. | |
See also:
This table represents the equivalent of a tagged_variation_feature for multi-marker variations, representing an instance where a haplotype is tagged by or tags another marker.
Column | Type | Default value | Description | Index |
httag_id | int(10) | | Primary key, internal identifier. | primary key |
variation_group_id | int(10) | | Foreign key references to the variation_group table. | key: variation_group_idx |
name | varchar(255) | | The name of the tag, for web purposes. | |
source_id | int(10) | | Foreign key references to the source table. | |
See also:
Stores information about an identifiable individual, including gender and the identifiers of the individual's parents (if known).
Column | Type | Default value | Description | Index |
sample_id | int(10) | | Primary key, internal identifier. See the sample table. Corresponds to the individual ID. | primary key |
gender | enum('Male', 'Female', 'Unknown') | 'Unknown' | The sex of this individual. | |
father_individual_sample_id | int(10) | | Self referential ID, the father of this individual if known. | |
mother_individual_sample_id | int(10) | | Self referential ID, the mother of this individual if known. | |
individual_type_id | int(10) | | Foreign key references to the individual_type table. | |
See also:
|
|
individual_genotype_multiple_bp |
Show columns |
This table holds uncompressed genotypes for given variations.
Column | Type | Default value | Description | Index |
variation_id | int(10) | | Primary key. Foreign key references to the variation table. | key: variation_idx |
subsnp_id | int(15) | | Foreign key references to the subsnp_handle table. | key: subsnp_idx |
allele_1 | varchar(25000) | | One of the alleles of the genotype, e.g. "TAG". | |
allele_2 | varchar(25000) | | The other allele of the genotype. | |
sample_id | int(10) | | Foreign key references to the individual table. | key: sample_idx |
See also:
|
|
individual_population |
Show columns |
This table resolves the many-to-many relationship between the individual and population tables; i.e. samples may belong to more than one population. Hence it is composed of rows of individual and population identifiers.
Column | Type | Default value | Description | Index |
individual_sample_id | int(10) | | Foreign key references to the individual table. | key: individual_sample_idx |
population_sample_id | int(10) | | Foreign key references to the population table. | key: population_sample_idx |
See also:
|
|
individual_type |
Show columns |
This table resolves the many-to-many relationship between the individual and population tables; i.e. samples may belong to more than one population. Hence it is composed of rows of individual and population identifiers.
Column | Type | Default value | Description | Index |
individual_type_id | int(0) | | Primary key, internal identifier. | primary key |
name | varchar(255) | | Short name of the individual type. e.g. "fully_inbred","mutant". | |
description | text | | Long name of the individual type. | |
See also:
This table stores various metadata relating to the database, generally used by the Ensembl web code.
This table gives the coordinate system used by various tables in the database.
This table stores details of the phenotypes associated with variation annotations.
Column | Type | Default value | Description | Index |
phenotype_id | int(10) | | Primary key, internal identifier. | primary key |
name | varchar(50) | | Phenotype short name. e.g. "CAD". | unique key: name_idx |
description | varchar(255) | | varchar Phenotype long name. e.g. "Coronary Artery Disease". | |
See also:
|
|
polyphen_prediction |
Show columns |
Stores the PolyPhen 2 prediction for every possible amino acid substitution in the ensembl proteome
Column | Type | Default value | Description | Index |
polyphen_prediction_id | int(10) | | Primary key | primary key |
protein_position_id | int(10) | | Foreign key into the protein_position table identifying the protein and position that this prediction applies to | key: pos_aa_idx |
amino_acid | char(1) | | The substituted amino acid | key: pos_aa_idx |
prediction | enum('unknown', 'benign', 'possibly damaging', 'probably damaging') | | The qualitative PolyPhen prediction for this substitution | |
probability | float | | The PolyPhen probability that this substitution is damaging | |
compressed_result_hash | blob | | A compressed string representation of a Perl hash with further results from PolyPhen (not released) | |
See also:
A table consisting simply of sample_ids representing populations; all data relating to the populations are stored in separate tables (see below).
A population may be an ethnic group (e.g. caucasian, hispanic), assay group (e.g. 24 europeans), strain, phenotypic group (e.g. blue eyed, diabetes) etc. Populations may be composed of other populations by defining relationships in the population_structure table.
Column | Type | Default value | Description | Index |
sample_id | int(10) | | int Foreign key references to the sample table. Corresponds to the population ID. | primary key |
See also:
|
|
population_genotype |
Show columns |
This table stores alleles and frequencies for variations in given populations.
Column | Type | Default value | Description | Index |
population_genotype_id | int(10) | | Primary key, internal identifier. | primary key |
variation_id | int(10) | | Foreign key references to the variation table. | key: variation_idx |
subsnp_id | int(15) | NULL | Foreign key references to the subsnp_handle table. | key: subsnp_idx |
allele_1 | varchar(25000) | | First allele in the genotype. | |
allele_2 | varchar(25000) | | Second allele in the genotype. | |
frequency | float | | Frequency of the genotype in the population. | |
sample_id | int(10) | | Foreign key references to the population table. | key: sample_idx |
count | int(10) | NULL | Number of individuals who have this genotype, in this population. | |
See also:
|
|
population_structure |
Show columns |
This table stores hierarchical relationships between populations by relating them as populations and sub-populations.
Column | Type | Default value | Description | Index |
super_population_sample_id | int(10) | | Foreign key references to the population table. | unique key: sub_pop_sample_idx |
sub_population_sample_id | int(10) | | Foreign key references to the population table. | unique key: sub_pop_sample_idx |
See also:
|
|
protein_info |
Show columns |
Contains information about each translation in the ensembl proteome, used by the nsSNP prediction tables
Column | Type | Default value | Description | Index |
protein_info_id | int(10) | | Primary key | primary key |
transcript_stable_id | varchar(128) | | The stable ID of the transcript from which this protein is translated | key: transcript_idx |
transcript_version | smallint | | The version of the transcript | key: transcript_idx |
translation_md5 | char(32) | | A hexidecimal string representing the MD5 hash of the protein sequence | |
See also:
|
|
protein_position |
Show columns |
Table with a row for each position in every ensembl translation, used by the nsSNP prediction tables
Column | Type | Default value | Description | Index |
protein_position_id | int(10) | | Primary key | primary key |
protein_info_id | int(10) | | Foreign key into the protein_info table, identifying the relevant protein | key: pos_idx |
position | mediumint | | The coordinate in the protein sequence | key: pos_idx |
amino_acid | char(1) | | The amino acid at this position | |
sift_median_conservation | float | | The median conservation at this position, as calculated by SIFT | |
sift_num_sequences_represented | smallint | | The number of sequences that SIFT found at this position in its multiple alignment | |
See also:
|
|
read_coverage |
Show columns |
This table stores the read coverage in the resequencing of individuals. Each row contains an individual ID, chromosomal coordinates and a read coverage level.
Column | Type | Default value | Description | Index |
seq_region_id | int(10) | | Foreign key references seq_region in core db. ers to the seq_region which this variant is on, which may be a chromosome, a clone, etc... | key: seq_region_idx |
seq_region_start | int | | The start position of the variation on the seq_region. | key: seq_region_idx |
seq_region_end | int | | The end position of the variation on the seq_region. | |
level | tinyint | | Minimum number of reads. | |
sample_id | int(10) | | Foreign key references to the individual table. | |
See also:
Sample is used as a generic catch-all term to cover individuals, populations and strains; it contains a name and description, as well as a size if applicable to the population.
Column | Type | Default value | Description | Index |
sample_id | int(10) | | Primary key, internal identifier. | primary key |
name | varchar(255) | | Name of the sample (can be an individual or a population name). | key: name_idx |
size | int | | Number of individual in the sample. | |
description | text | | Description of the sample. | |
display | enum('REFERENCE', 'DEFAULT', 'DISPLAYABLE', 'UNDISPLAYABLE', 'LD', 'MARTDISPLAYABLE') | 'UNDISPLAYABLE' | Information used by the website: samples with little information are filtered from some web displays. | |
See also:
|
|
sample_synonym |
Show columns |
Used to store alternative names for populations when data comes from multiple sources.
Column | Type | Default value | Description | Index |
sample_synonym_id | int(10) | | Primary key, internal identifier. | primary key |
sample_id | int(10) | | Foreign key references to the sample table. | key: sample_idx |
source_id | int(10) | | Foreign key references to the source table. | key: |
name | varchar(255) | | Name of the synonym (a different sample_id). | key: |
See also:
This table stores the relationship between Ensembl's internal coordinate system identifiers and traditional chromosome names.
Column | Type | Default value | Description | Index |
seq_region_id | INT(10) | | Primary key. Foreign key references seq_region in core db. Refers to the seq_region which this variant is on, which may be a chromosome, a clone, etc... | primary key |
name | VARCHAR(40) | | The name of this sequence region. | unique key: name_idx |
See also:
|
|
sift_prediction |
Show columns |
Stores the SIFT prediction for every possible amino acid substitution in the ensembl proteome
Column | Type | Default value | Description | Index |
sift_prediction_id | int(10) | | Primary key | primary key |
protein_position_id | int(10) | | Foreign key into the protein_position table identifying the protein and position that this prediction applies to | key: pos_aa_idx |
amino_acid | char(1) | | The substituted amino acid | key: pos_aa_idx |
prediction | enum('tolerated', 'deleterious') | | The qualitative SIFT prediction for this substitution | |
score | float | | The SIFT score for this substitution | |
See also:
This table contains details of the source from which a variation is derived. Most commonly this is NCBI's dbSNP; other sources include SNPs called by Ensembl.
Column | Type | Default value | Description | Index |
source_id | int(10) | | Primary key, internal identifier. | primary key |
name | varchar(255) | | Name of the source. e.g. "dbSNP" | |
version | int | | Version number of the source (if available). e.g. "132" | |
description | varchar(255) | | Description of the source. | |
url | varchar(255) | | URL of the source. | |
type | ENUM('chip') | NULL | Define the type of the source, e.g. 'chip' | |
somatic_status | ENUM ('germline','somatic','mixed') | 'germline' | Indicates if this source includes somatic or germline mutations, or a mixture | |
See also:
|
|
structural_variation |
Show columns |
This table stores information about structural variation features.
Column | Type | Default value | Description | Index |
structural_variation_id | int(10) | | Primary key, internal identifier. | primary key |
seq_region_id | int(10) | | Foreign key references seq_region in core db. Refers to the seq_region which this variant is on, which may be a chromosome, a clone, etc... | key: pos_idx |
seq_region_start | int(11) | | The start position of the variation on the seq_region. | key: pos_idx |
seq_region_end | int(11) | | The end position of the variation on the seq_region. | key: pos_idx |
seq_region_strand | tinyint(4) | | The orientation of the variation on the seq_region. | |
variation_name | varchar(255) | NULL | The external identifier or name of the variation. e.g. "esv9549". | key: name_idx |
source_id | int(10) | | Foreign key references to the source table. | |
study_id | int(10) | NULL | Foreign key references to the study table. | key: study_idx |
class_attrib_id | int(10) | 0 | Foreign key references to the attrib table. Defines the type of structural variant. | key: attrib_idx |
inner_start | int(11) | NULL | The 5' inner bound defined for the feature on the seq_region. | |
inner_end | int(11) | NULL | The 3' inner bound defined for the feature on the seq_region. | |
allele_string | longtext | | The variant allele, where known. | |
validation_status | ENUM('validated','not validated','high quality') | | Validation status of the variant. | |
See also:
This table contains details of the studies. The studies information can come from internal studies (DGVa, EGA) or from external studies (Uniprot, NHGRI, ...).
Column | Type | Default value | Description | Index |
study_id | int(10) | | Primary key, internal identifier. | primary key |
source_id | int(10) | | Foreign key references to the source table. | key: source_idx |
name | varchar(255) | null | Name of the study. e.g. "EGAS00000000001" | |
description | varchar(255) | NULL | Description of the study. | |
url | varchar(255) | NULL | URL to find the study data (http or ftp). | |
external_reference | varchar(255) | NULL | The pubmed/id or project name associated with this study. | |
study_type | set('GWAS') | | Displays if a study comes from a genome-wide association study or not. | |
See also:
|
|
subsnp_handle |
Show columns |
This table contains the SubSNP(ss) ID and the name of the submitter handle of dbSNP.
Column | Type | Default value | Description | Index |
subsnp_id | int(11) | | Primary key. It corresponds to the subsnp identifier (ssID) from dbSNP. This ssID is stored in this table without the "ss" prefix. e.g. "120258606" instead of "ss120258606". | primary key |
handle | varchar(20) | | The name of the dbSNP handler who submitted the ssID. Name of the synonym (a different sample_id). | |
See also:
|
|
supporting_structural_variation |
Show columns |
This table stores the name of the supporting evidence for the structural variants (e.g. DGVa structural variants).
Column | Type | Default value | Description | Index |
supporting_structural_variation_id | int(10) | | Primary key, internal identifier. | primary key |
name | varchar(255) | | The identifier or name of the supporting evidence. | |
structural_variation_id | int(10) | | Foreign key references to the structural_variation table. | key: structural_variation_idx |
See also:
|
|
tagged_variation_feature |
Show columns |
This table lists variation features that are tagged by another variation feature. Tag pairs are defined as having an r2 > 0.99.
Column | Type | Default value | Description | Index |
variation_feature_id | INT(10) | | Primary key. Foreign key references to the variation_feature table. | primary key |
sample_id | INT(10) | | Primary key. Foreign key references to the sample table. | primary key |
See also:
|
|
tmp_individual_genotype_single_bp |
Show columns |
his table is only needed for create master schema when run healthcheck system. Needed for other species, but human, so keep it.
Column | Type | Default value | Description | Index |
variation_id | int(10) | | Primary key. Foreign key references to the variation table. | key: variation_idx |
subsnp_id | int(15) | | Foreign key references to the subsnp_handle table. | key: subsnp_idx |
allele_1 | char(1) | | One of the alleles of the genotype, e.g. "TAG". | |
allele_2 | char(1) | | The other allele of the genotype. | |
sample_id | int | | Foreign key references to the individual table. | key: sample_idx |
See also:
|
|
transcript_variation |
Show columns |
This table relates a single allele of a variation_feature to a transcript (see Core documentation). It contains the consequence of the allele e.g. intron_variant, non_synonymous_codon, stop_lost etc, along with the change in amino acid in the resulting protein if applicable.
Column | Type | Default value | Description | Index |
transcript_variation_id | int(11) | | Primary key, internal identifier. | primary key |
feature_stable_id | varchar(128) | NULL | Foreign key to core databases. Unique stable id of related transcript. | key: feature_idx key: somatic_feature_idx |
variation_feature_id | int(11) | | Foreign key references to the variation_feature table. | key: variation_feature_idx |
allele_string | text | | Shows the reference sequence and variant sequence of this allele | |
somatic | tinyint(1) | 0 | Flags if the associated variation is known to be somatic | key: somatic_feature_idx |
consequence_types | set ( 'splice_acceptor_variant', 'splice_donor_variant', 'complex_change_in_transcript', 'stop_lost', 'coding_sequence_variant', 'non_synonymous_codon', 'stop_gained', 'synonymous_codon', 'frameshift_variant', 'nc_transcript_variant', 'mature_miRNA_variant', 'NMD_transcript_variant', '5_prime_UTR_variant', '3_prime_UTR_variant', 'incomplete_terminal_codon_variant', 'intron_variant', 'splice_region_variant', '5KB_downstream_variant', '500B_downstream_variant', '5KB_upstream_variant', '2KB_upstream_variant', 'initiator_codon_change', 'stop_retained_variant', 'inframe_codon_gain', 'inframe_codon_loss', 'pre_miRNA_variant' ) | | The consequence(s) of the variant allele on this transcript. | key: consequence_type_idx |
cds_start | int(11) | | The start position of variation in cds coordinates. | |
cds_end | int(11) | | The end position of variation in cds coordinates. | |
cdna_start | int(11) | | The start position of variation in cdna coordinates. | |
cdna_end | int(11) | | The end position of variation in cdna coordinates. | |
translation_start | int(11) | | The start position of variation on peptide. | |
translation_end | int(11) | | The end position of variation on peptide. | |
codon_allele_string | text | | The reference and variant codons | |
pep_allele_string | text | | The reference and variant peptides | |
hgvs_genomic | text | | HGVS representation of this allele with respect to the genomic sequence | |
hgvs_coding | text | | HGVS representation of this allele with respect to the CDS | |
hgvs_protein | text | | HGVS representation of this allele with respect to the protein | |
polyphen_prediction | enum('unknown', 'benign', 'possibly damaging', 'probably damaging') | NULL | The PolyPhen prediction for the effect of this allele on the protein | |
sift_prediction | enum('tolerated', 'deleterious') | NULL | The SIFT prediction for the effect of this allele on the protein | |
See also:
This is the schema's generic representation of a variation, defined as a genetic feature that varies between individuals of the same species.The most common type is the single nucleotide variation (SNP) though the schema also accommodates copy number variations (CNVs) and structural variations (SVs).A variation is defined by its flanking sequence rather than its mapped location on a chromosome; a variation may in fact have multiple mappings across a genome.This table stores a variation's name (commonly an ID of the form e.g. rs123456, assigned by dbSNP), along with a validation status and ancestral (or reference) allele.
Column | Type | Default value | Description | Index |
variation_id | int(10) | | Primary key, internal identifier. | primary key |
source_id | int(10) | | Foreign key references to the source table. | key: source_idx |
name | varchar(255) | | Name of the variation. e.g. "rs1333049". | unique |
validation_status | SET('cluster','freq', 'submitter','doublehit', 'hapmap','1000Genome', 'failed','precious') | | Variant discovery method and validation from dbSNP. | |
ancestral_allele | varchar(255) | NULL | Taken from dbSNP to show ancestral allele for the variation. | |
flipped | tinyint(1) | NULL | This is set to 1 if the variant is flipped from the negative to the positive strand during import. | |
class_attrib_id | int(10) | 0 | Class of the variation, key into the attrib table | |
somatic | tinyint(1) | 0 | flags whether this variation is known to be somatic or not | |
See also:
|
|
variation_annotation |
Show columns |
This table stores information linking genotypes and phenotypes. It stores various fields pertaining to the study conducted, along with the associated gene, risk allele frequency and a p-value.
Column | Type | Default value | Description | Index |
variation_annotation_id | int(10) | | Primary key, internal identifier. | primary key |
variation_id | int(10) | | Foreign key references to the variation table. | key: variation_idx |
phenotype_id | int(10) | | Foreign key references to the phenotype table. | key: phenotype_idx |
study_id | int(10) | | Foreign key references to the study table. | key: study_idx |
associated_gene | varchar(255) | NULL | Common gene(s) name(s) associated to the variation. | |
associated_variant_risk_allele | varchar(255) | NULL | Allele associated to the phenotype. | |
variation_names | varchar(255) | NULL | Name of the variation. e.g. "rs1333049". | |
risk_allele_freq_in_controls | double | NULL | Risk allele frequency. | |
p_value | double | NULL | P value of the association phenotype/variation. | |
See also:
|
|
variation_feature |
Show columns |
This table represents mappings of variations to genomic locations. It stores an allele string representing the different possible alleles that are found at that locus e.g. "A/T" for a SNP, as well as a "worst case" consequence of the mutation. It also acts as part of the relationship between variations and transcripts.
Column | Type | Default value | Description | Index |
variation_feature_id | int(10) | | Primary key, internal identifier. | primary key |
seq_region_id | int(10) | | Foreign key references seq_region in core db. Refers to the seq_region which this variant is on, which may be a chromosome, a clone, etc... | key: pos_idx |
seq_region_start | int | | The start position of the variation on the seq_region. | key: pos_idx |
seq_region_end | int | | The end position of the variation on the seq_region. | key: pos_idx |
seq_region_strand | tinyint | | The orientation of the variation on the seq_region. | |
variation_id | int(10) | | Foreign key references to the variation table. | key: variation_idx |
allele_string | varchar(50000) | | This is a denormalised string taken from the alleles in the allele table associated with this variation. The reference allele (i.e. one on the reference genome comes first). | |
variation_name | varchar(255) | | A denormalisation taken from the variation table. This is the name or identifier that is used for displaying the feature. | |
map_weight | int | | The number of times that this variation has mapped to the genome. This is a denormalisation as this particular feature is one example of a mapped location. This can be used to limit the the features that come back from a query. | |
flags | SET('genotyped') | | Flag to filter the selection of variations. | |
source_id | int(10) | | Foreign key references to the source table. | |
validation_status | SET( 'cluster', 'freq', 'submitter', 'doublehit', 'hapmap', '1000Genome', 'precious' ) | | SET('cluster', 'freq', 'submitter', 'doublehit', 'hapmap', '1000Genome', 'precious') Variant discovery method and validation from dbSNP. | |
consequence_type | SET ( 'intergenic_variant', 'splice_acceptor_variant', 'splice_donor_variant', 'complex_change_in_transcript', 'stop_lost', 'coding_sequence_variant', 'non_synonymous_codon', 'stop_gained', 'synonymous_codon', 'frameshift_variant', 'nc_transcript_variant', 'mature_miRNA_variant', 'NMD_transcript_variant', '5_prime_UTR_variant', '3_prime_UTR_variant', 'incomplete_terminal_codon_variant', 'intron_variant', 'splice_region_variant', '5KB_downstream_variant', '500B_downstream_variant', '5KB_upstream_variant', '2KB_upstream_variant', 'initiator_codon_change', 'stop_retained_variant', 'inframe_codon_gain', 'inframe_codon_loss', 'miRNA_target_site_variant', 'pre_miRNA_variant', 'regulatory_region_variant', 'increased_binding_affinity', 'decreased_binding_affinity', 'binding_site_variant' ) | 'intergenic_variant' | The SO accession(s) representing the 'worst' consequence(s) of the variation in a transcript or regulatory region | |
variation_set_id | SET ( '1','2','3','4','5','6','7','8', '9','10','11','12','13','14','15','16', '17','18','19','20','21','22','23','24', '25','26','27','28','29','30','31','32', '33','34','35','36','37','38','39','40', '41','42','43','44','45','46','47','48', '49','50','51','52','53','54','55','56', '57','58','59','60','61','62','63','64' ) | '' | The variation feature can belong to a variation_set. | key: variation_set_idx |
class_attrib_id | int(10) | 0 | Class of the variation, key in the attrib table | |
somatic | tinyint(1) | 0 | Flags whether this variation_feature is somatic or germline | |
See also:
|
|
variation_group |
Show columns |
This table represents the equivalent of a variation for a multi-marker variation.
Column | Type | Default value | Description | Index |
variation_group_id | int(10) | | Primary key, internal identifier. | primary key |
name | varchar(255) | | The code or name of this variation_group. | unique |
source_id | int(10) | | Foreign key references to the source table. | |
type | enum('haplotype', 'tag') | | Type of the variation group. | |
See also:
|
|
variation_group_feature |
Show columns |
This table represents the equivalent of a variation_feature for multi-marker variations, mapping a haplotype to a chromosomal coordinate system.
Column | Type | Default value | Description | Index |
variation_group_feature_id | int(10) | | Primary key, internal identifier. | primary key |
seq_region_id | int(10) | | Foreign key references seq_region in core db. Refers to the seq_region which this variant is on, which may be a chromosome, a clone, etc... | key: pos_idx |
seq_region_start | int | | The start position of the variation on the seq_region. | key: pos_idx |
seq_region_end | int | | The end position of the variation on the seq_region. | |
seq_region_strand | tinyint | | The orientation of the variation on the seq_region. | |
variation_group_id | int(10) | | Foreign key references to the variation_group table. | key: variation_group_idx |
variation_group_name | varchar(255) | | Name of the variation group. | |
See also:
|
|
variation_group_variation |
Show columns |
This table represents an individual variation that makes up a multi-marker variation, and resolves the many-to-many relationship between variation and variation_group.
Column | Type | Default value | Description | Index |
variation_id | int(10) | | Foreign key references to the variation table. | unique key: variation_idx |
variation_group_id | int(10) | | Foreign key references to the variation_group table. | unique key: variation_idx |
See also:
|
|
variation_set |
Show columns |
This table containts the name of sets and subsets of variations stored in the database. It usually represents the name of the project or subproject where a group of variations has been identified.
Column | Type | Default value | Description | Index |
variation_set_id | int(10) | | Primary key, internal identifier. | primary key |
name | VARCHAR(255) | | Name of the set e.g. "Phenotype-associated variations". | key: name_idx |
description | TEXT | | Description of the set. | |
short_name_attrib_id | INT(10) | NULL | Foreign key references to the attrib table. Short name used for web purpose. | |
See also:
|
|
variation_set_structure |
Show columns |
This table stores hierarchical relationships between variation sets by relating them as variation sets and variation subsets.
Column | Type | Default value | Description | Index |
variation_set_super | int(10) | | Primary key. Foreign key references to the variation_set table. | primary key key: sub_idx |
variation_set_sub | int(10) | | Primary key. Foreign key references to the variation_set table. | primary key key: sub_idx |
See also:
|
|
variation_set_variation |
Show columns |
A table for mapping variations to variation_sets.
Column | Type | Default value | Description | Index |
variation_id | int(10) | | Primary key. Foreign key references to the variation table. | primary key key: variation_set_idx |
variation_set_id | int(10) | | Primary key. Foreign key references to the variation_set table. | primary key key: variation_set_idx |
See also:
|
|
variation_synonym |
Show columns |
This table allows for a variation to have multiple IDs, generally given by multiple sources.
Column | Type | Default value | Description | Index |
variation_synonym_id | int(10) | | Primary key, internal identifier. | primary key |
variation_id | int(10) | | Foreign key references to the variation table. | key: variation_idx |
subsnp_id | int(15) | | Foreign key references to the subsnp_handle table. | key: subsnp_idx |
source_id | int(10) | | Foreign key references to the source table. | unique key: source_idx |
name | varchar(255) | | Name of the synonym variation. e.g. 'rs1333049'. The corresponding variation ID of this variation is different from the one stored in the column variation_id. | unique |
moltype | varchar(50) | | ... | |
See also:
Colour legend
| Other tables |
| Tables containing individual data |
| Tables containing sets of variations |
| Tables containing source and study data |
| Tables containing metadata |
| Tables containing "failed" data |
| Tables containing attribute data |
| Tables concerning protein data |