Building an Offline CRAM Reference Cache for Samtools
Introduction
Working with CRAM files is usually painless—until you try to run samtools on a machine with no internet access. The moment samtools encounters a CRAM whose reference sequences aren’t available locally, it attempts to fetch them from `
https://www.ebi.ac.uk/ena/cram/md5/
[W::cram_get_ref] Attempting to fetch reference from EBI...
[E::cram_get_ref] Failed to download reference
Sometimes the EBI’s servers can be a bit sketchy, so running pipelines with multiple samples can lead to intermittent errors, even when your system is connected to the internet okay.
This post walks through why this happens, and how to build a local CRAM reference cache using bash so samtools never attempts a network lookup again.
Building a Local CRAM Reference Cache
Below is a minimal, reproducible workflow that:
- downloads a reference FASTA
- indexes it
- builds the CRAM reference cache
- configures samtools to use it
Everything happens locally—no Docker, no internet at runtime.
1. Install the required tools if you don’t have them already
On Ubuntu/Debian:
sudo apt update
sudo apt install -y \
samtools \
wget \
gzip \
perl \
libdigest-md5-perl \
libfile-spec-perl \
libfile-path-perl \
libfile-basename-perl
2. Create reference and cache directories
export REF_DIR=/ref
export REF_CACHE=/ref/cache
mkdir -p "$REF_DIR" "$REF_CACHE"
3. Download a reference FASTA and index it
Example: GRCh38 primary assembly from Ensembl.
wget -O $REF_DIR/genome.fa.gz \
https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip $REF_DIR/genome.fa.gz
samtools faidx $REF_DIR/genome.fa
5. Build the CRAM reference cache
seq_cache_populate.pl -root $REF_CACHE $REF_DIR/genome.fa
This populates $REF_CACHE with MD5‑named reference chunks.
6. Use the cache when running samtools
Set the environment variables:
export REF_PATH=$REF_DIR
export REF_CACHE=$REF_CACHE
Now samtools will decode CRAM files without ever attempting a network fetch:
samtools view sample.cram
If the reference matches, samtools will silently use the local cache.
For a system that you are using directly, or one that is in interactive mode, you may want to put the export commands into one of the files that sets your login environment (eg: ~/.bashrc).
Optional: Verifying the Cache
You can check that the cache contains MD5‑named files:
find $REF_CACHE | head
You should see a two‑level directory structure:
/ref/cache/ab/cd/abcdef1234567890...
/ref/cache/12/34/1234abcd5678ef90...