This is a regularly modified post which holds all of the small bits and tips that don’t warrant their own post. If there is a group of related tips that pass a critical mass, they will be spun out into their own post, and a placeholder will remain here.

This post focuses on Biopython, and using this set of packages to deal with sequencing data.

Installation

pip install biopython

Bio.SeqIO

This is meant to be the main interface for inputting and outputting sequences from Python. It supports most formats that we need, including FASTA, FASTQ, etc.. Full tutorial here. Main documentation here. Bio.AlignIO handles other files, like MAF.

Sequence input

FASTA input

Read a FASTA file and process it seq-by-seq

from Bio import SeqIO

for record in SeqIO.parse("FASTA.fa", "fasta"):
    print(record.id)

Read a FASTA file as a handle, then iterate over the handle.

with open("example.fasta") as handle:
    for record in SeqIO.parse(handle, "fasta"):
        print(record.id)

Grab a small FASTA file into a dict. This is not a good idea for large files, so use with caution. Apparently this is better than Bio.SeqIO.to_dict(), but still.

from Bio import SeqIO

record_dict = SeqIO.index("example.fasta", "fasta")
print(record_dict["gi:12345678"])  # use any record ID

Access the title and seq from the record using the .id and .seq methods.

print(record.id)
print(record.seq)

FASTA output

from Bio import SeqIO
#   Where outRecord is a SeqRecord object, such as the type produced by SeqIO
outFH = open("outFile.txt", "w")
SeqIO.write(outRecord, outFH, 'fasta')
outFH.close()

Manually creating a SeqRecord object

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
manualRecord = SeqRecord(
                Seq('ATGCAGCTGCATAGTACGTGCATGACTGCATGTACGACTAGTC'),
                id = 'NameOfManualRecord',
                description = "Description of manual record")

Common things to do

GC content

from Bio.SeqUtils import gc_fraction
print(gc_fraction("TGCAGTACTAGCTACGT"))

Translate an RNA seq

#   seqRecord is a SeqRecord object, such as the type produced by SeqIO
outRecord = seqRecord.translate()