What makes refgenie better?
- It provides a command-line interface to download individual resources. Think of it as
GitHub
for reference genomes. You just typerefgenie pull hg38/bwa_index
. - It’s scripted. In case you need resources not on the server, such as for a custom genome, you can
build
your own:refgenie build custom_genome/bowtie2_index
. - It simplifies finding local asset locations. When you need a path to an asset, you can
seek
it, making your pipelines portable across computing environments:refgenie seek hg38/salmon_index
. - It provides remote operation mode, useful for cloud applications. Get a path to an asset file hosted on AWS S3:
refgenie seekr hg38/fasta --remote-class s3
. - It includes a Python API. For tool developers, you use
rgc = refgenconf.RefGenConf("genomes.yaml")
to get a Python object with paths to any genome asset, e.g.,rgc.seek("hg38", "kallisto_index")
. - It strictly determines genomes compatibility. Users refer to genomes with arbitrary aliases, like “hg38”, but refgenie uses sequence-derived identifiers to verify genome identity with asset servers.
Commands
Install and configure
pip install --user refgenie
export PATH=~/.local/bin:$PATH
refgenie init -c data/reference_data/genome_config.yaml
export REFGENIE=data/reference_data/genome_config.yaml
refgenie listr
Download pre-built reference genome assets
The listr
command lists remote assets to see what’s available:
refgenie listr
The pull
downloads the specific asset of your choice:
refgenie pull GENOME/ASSET
Where GENOME
refers to a genome key (e.g. hg38) and ASSET
refers to one or more specific asset keys (e.g. bowtie2_index). For example:
refgenie pull hg38/bowtie2_index
You can also pull many assets at once:
refgenie pull --genome mm10 bowtie2_index hisat2_index