Introduction

Working with CRAM files is usually painless—until you try to run samtools on a machine with no internet access. The moment samtools encounters a CRAM whose reference sequences aren’t available locally, it attempts to fetch them from ` https://www.ebi.ac.uk/ena/cram/md5/`. On an offline system, this results in errors like:

[W::cram_get_ref] Attempting to fetch reference from EBI...
[E::cram_get_ref] Failed to download reference

Sometimes the EBI’s servers can be a bit sketchy, so running pipelines with multiple samples can lead to intermittent errors, even when your system is connected to the internet okay.

This post walks through why this happens, and how to build a local CRAM reference cache using bash so samtools never attempts a network lookup again.

Building a Local CRAM Reference Cache

Below is a minimal, reproducible workflow that:

  • downloads a reference FASTA
  • indexes it
  • builds the CRAM reference cache
  • configures samtools to use it

Everything happens locally—no Docker, no internet at runtime.


1. Install the required tools if you don’t have them already

On Ubuntu/Debian:

sudo apt update
sudo apt install -y \
    samtools \
    wget \
    gzip \
    perl \
    libdigest-md5-perl \
    libfile-spec-perl \
    libfile-path-perl \
    libfile-basename-perl

2. Create reference and cache directories

export REF_DIR=/ref
export REF_CACHE=/ref/cache

mkdir -p "$REF_DIR" "$REF_CACHE"

3. Download a reference FASTA and index it

Example: GRCh38 primary assembly from Ensembl.

wget -O $REF_DIR/genome.fa.gz \
  https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

gunzip $REF_DIR/genome.fa.gz

samtools faidx $REF_DIR/genome.fa

5. Build the CRAM reference cache

seq_cache_populate.pl -root $REF_CACHE $REF_DIR/genome.fa

This populates $REF_CACHE with MD5‑named reference chunks.

6. Use the cache when running samtools

Set the environment variables:

export REF_PATH=$REF_DIR
export REF_CACHE=$REF_CACHE

Now samtools will decode CRAM files without ever attempting a network fetch:

samtools view sample.cram

If the reference matches, samtools will silently use the local cache. For a system that you are using directly, or one that is in interactive mode, you may want to put the export commands into one of the files that sets your login environment (eg: ~/.bashrc).

Optional: Verifying the Cache

You can check that the cache contains MD5‑named files:

find $REF_CACHE | head

You should see a two‑level directory structure:

/ref/cache/ab/cd/abcdef1234567890...
/ref/cache/12/34/1234abcd5678ef90...