How to move data from Rocket to AWS
Interacting with AWS on a Mac/Linux Bash terminal is done using the AWS Command Line Interface (awscli). You’ll need to download and install that onto whatever system that you are using (in this case, Rocket) in order to do so. Note: This will allow you to do anything that you user is able to do on your user, so make sure that you trust the system that you are using before doing this.
Installing the awscli on Rocket
- Install the
awscli- Download and unpack the AWS command line interface with:
mkdir -p /nobackup/$USER/bin/ /nobackup/$USER/bin/AWS cd /nobackup/$USER/bin/AWS curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zipFor more info on what just happened, go to
https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html - Now, you can access the awscli with
/nobackup/$USER/bin/AWS/aws/dist/aws. This is a bit rubbish though, and just usingawsas a command would be better. You can create a launcher for it using the instructions in this post.
- Download and unpack the AWS command line interface with:
- If you already have the awscli installed elsewhere, you can copy your configuration files across.
- Log into a machine where you have the
awscliinstalled and working. For Mac/Linux, go to~/.aws. Here are two files:configandcredentialsthat you will need toscpacross into Rocket’s~/.aws/folder. That target folder is unlikely to exist, so create it withmkdirif necessary. Alternatively, you can runaws configureto do so manually for the first time.
- Log into a machine where you have the
Logging in with an MFA device
- If you are logging into an account which requires MFA, you will need to start a session which stores temporary credentials.
- More info here.
- If you are a member of the Sarra Ryan Lab, you will have access to the
UserSessionStart.shscript, which does it all for you.- Grab the version of that which you have access to, and make sure that you have the MFA Device ID configured on line 3, according to the information on the documentation which comes with the script.
scpthat across to Rocket.- For some reason, Rocket doesn’t have the JSON parser called
jqinstalled.- Go to a folder that is in your $PATH.
use
echo $PATHto find an existing one, and go to this article for info on adding another one to your $PATH. - Run the following commands to download and install jq:
curl -s https://api.github.com/repos/jqlang/jq/releases/latest | grep 'browser_download_url.*jq-linux-amd64' \ | sed -E 's|.*https(.*)"|https\1|' \ | xargs wget -O jq chmod 700 jq - Test the result by running
jq. If you get help information, and not acommand not founderror, you were successful!
- Go to a folder that is in your $PATH.
use
- Now, you can use the
UserSessionStart.shscript to login.
Copying data from Rocket to AWS
- The documentation for this is actually quite good.
- Looking for files in S3 is similar to the bash
lscommand.- Use
aws s3 lsto look for buckets that you have access to. - Use
aws s3 ls bucketNamefor contents of the bucket. - Use
aws s3 ls bucketName/folder/to check a folder’s contents. NB: The folder path ends in a forward slash!
- Use
- The
aws s3 cpcommand can be used to copy individual files to and from and S3 bucket, and the documentation is here. That command can’t copy multiple files at once, so you will need to use this command multiple times to do more files. - The
aws s3 synccommand can be used to do movement of folders, similarly to the Bashrsynccommand. Documentation is here.
Other notes to bear in mind:
- Locations/files that are on the bucket must be preceded by
s3://eg:aws s3 cp /path/to/local/file.txt s3://bucketName/folderLocation/subFolder/ - Remote locations should end with a forward slash.
- If you have multiple files in a folder, but don’t want to copy all of them, and they all have something in common (eg: they’re all BAM/BAI files), you can use the following:
for f in *.bam *.bai do aws s3 cp $f s3://bucketName/target/location/ doneThis is good for messy folders full of lots of different file types.
- Otherwise, you can make a folder, and link all files that you want to copy into them, then sync the folder.
mkdir stagingFolder ln *.bam *.bai specificFile1.bla specificFile2.foo stagingFolder aws s3 sync stagingFolder s3://bucketName/target/location/