Using sequence data in dataframes with BioStrings
Introduction
Sometimes, you have a dataframe which contains sequences that you have processed, and you want to convert it into a DNAStringSet object. From there, you can continue to process them with BioStrings functions, or write them out to a FASTA file.
Let’s say you have a dataframe like this:
SequencesDF <- data.frame(Title=c("wildType", "mutant", "Zoidberg"),
Seqs=c('AAATTCCC', 'AAATGCCC', 'GAGATATA'))
Title Seqs
1 wildType AAATTCCC
2 mutant AAATGCCC
3 Zoidberg GAGATATA
Converting it to a DNAStringSet
library(Biostrings)
# Convert the sequence column into an object
SequencesObj <- DNAStringSet(unique(SequencesDF$Seqs))
Adding names metadata
# Make the names "Sequence_1", "Sequence_2"...
names(SequencesObj) <- paste0("Sequence_", seq(length(SequencesDF$Title)))
# Have a look
names(SequencesObj)
# Make the names the same as the "Titles" column
names(SequencesObj) <-SequencesDF$Title