eHive production system - Installation

Download and install the necessary external software

Note: You may have these packages already installed in your system.

  1. Perl 5.6 or higher, since eHive code is written in Perl.
  2. MySQL 5.1 or higher
    eHive keeps its state in a MySQL database, so you will need MySQL version 5.1 or higher is recommended to maintain compatibility with Compara pipelines.
  3. Perl DBI API
    Perl database interface that includes API to MySQL
  4. Perl UUID API
    eHive uses Universally Unique Identifiers (UUIDs) to identify workers internally.

Download and install essential and optional packages from BioPerl and EnsEMBL CVS

  1. Create a directory for the source code.
    It is advised to have a dedicated directory where EnsEMBL-related packages will be deployed. Unlike DBI or UUID modules that can be installed system-wide by the system administrator, you will benefit from full (read+write) access to the EnsEMBL files/directories, so it is best to install them under your home directory. For example,
    $ mkdir $HOME/ensembl_main
    It will be convenient to set a variable pointing at this directory for future use:
  2. Change into your ensembl codebase directory:
    $ cd $ENS_CODE_ROOT
  3. Log into the BioPerl CVS server (using "cvs" for password):
    $ cvs -d :pserver:cvs@code.open-bio.org:/home/repository/bioperl login
  4. Export the bioperl-live package:
    $ cvs -d :pserver:cvs@code.open-bio.org:/home/repository/bioperl export bioperl-live
  5. Log into the EnsEMBL CVS server at Sanger (using "CVSUSER" for password):
    $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl login
    Logging in to :pserver:cvsuser@cvs.sanger.ac.uk:2401/cvsroot/ensembl
    CVS password: CVSUSER
  6. Export ensembl and ensembl-hive CVS modules:
    $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl export ensembl
    $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl export ensembl-hive
  7. In the likely case you are going to use eHive in the context of Compara pipelines, you will also need to install ensembl-compara:
    $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl export ensembl-compara
  8. Add new packages to the PERL5LIB variable:

Useful files and directories of the eHive repository

  1. In ensembl-hive/scripts we keep perl scripts used for controlling the pipelines. Adding this directory to your $PATH may make your life easier.
  2. In ensembl-hive/modules/Bio/EnsEMBL/Hive/PipeConfig we keep example pipeline configuration modules that can be used by init_pipeline.pl . A PipeConfig is a parametric module that defines the structure of the pipeline. That is, which analyses with what parameters will have to be run and in which order. The code for each analysis is contained in a RunnableDB module. For some tasks bespoke RunnableDB have to be written, whereas some other problems can be solved by only using 'universal buliding blocks'. A typical pipeline is a mixture of both.
  3. In ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB we keep 'universal building block' RunnableDBs:
  4. In ensembl-hive/modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult we keep bespoke RunnableDBs for long multiplication example pipeline.

Back