Maintaining a consistent development environment
1) Ensure all development in performed within a virtualenv. A good way too
bootstrap this is via virtualenv-burrito.
Execute the installation using:
$ curl -sL https://raw.githubusercontent.com/brainsik/virtualenv-burrito/master/virtualenv-burrito.sh | $SHELL
2) Make a virtualenv called BanzaiDB:
3) Install autoenv:
$ git clone git://github.com/kennethreitz/autoenv.git ~/.autoenv
$ echo 'source ~/.autoenv/activate.sh' >> ~/.bashrc
Familiarise yourself with the code
The BanzaiDB/BanzaiDB.py is the core module. It handles database insertion,
deletion and updating.
For example:
$ ~/BanzaiDB/BanzaiDB$ python BanzaiDB.py -h
usage: BanzaiDB.py [-h] [-v] {init,populate,update,query} ...
BanzaiDB v0.1 - Database for Banzai NGS pipeline tool (http://github.com/mscook/BanzaiDB)
positional arguments:
{init,populate,update,query}
Available commands:
init Initialise a DB
populate Populates a database with results of an experiment
update Updates a database with results from a new experiment
query List available or provide database query functions
optional arguments:
-h, --help show this help message and exit
-v, --verbose verbose output
Licence: ECL 2.0 by Mitchell Stanton-Cook <m.stantoncook@gmail.com>
Listing help on populate:
$ python BanzaiDB.py populate -h
usage: BanzaiDB.py populate [-h] {qc,mapping,assembly,ordering,annotation} run_path
positional arguments:
{qc,mapping,assembly,ordering,annotation}
Populate the database with data from the given pipeline step
run_path Full path to a directory containing finished experiments from a pipeline run
optional arguments:
-h, --help show this help message and exit
The fabfile (Fabric file) in fabfile directory contains query pre-written
functions.
You can list them like this:
$ ~/BanzaiDB$ fab -l
Available commands:
variants.get_variants_by_keyword Return variants with a match in the "Product" with the regular_expression
variants.get_variants_in_range Return all the variants in given [start:end] range (inclusive of)
variants.plot_variant_positions Generate a PDF of SNP positions for given strains using GenomeDiagram
variants.strain_variant_stats Print the number of variants and variant classes for all strains
variants.variant_hotspots Return the (default = 100) prevalent variant positions
variants.variant_positions_within_atleast Return positions that have at least this many variants
variants.what_differentiates_strains Provide variant positions that differentiate two given sets of strains
Note: python BanzaiDB.py query simply calls the fabfile discussed above.
Development workflow
Use GitHub. You will have already cloned the BanzaiDB repo (if you followed
instructions above). To make things easier, please fork
(https://github.com/mscook/BanzaiDB/fork) and update your local copy to point to
your fork.
Something like this:
$ # Assuming your fork is like this
$ # https://github.com/$YOUR_USERNAME/BanzaiDB/
$ vi .git/config
$ # Replace:
$ # url = git@github.com:mscook/BanzaiDB.git
$ # with:
$ # url = git@github.com:$YOUR_USERNAME/BanzaiDB.git
With this setup you will be able to push development changes to your fork and
submit Pull Requests to the core BanzaiDB repo when you’re happy.
Important Note: Upstream changes will not be synced to your fork by
default. Please, before submitting a pull request please sync your fork with
any upstream changes (specifically handle any merge conflicts). Info on
syncing a fork can be found here.