Chapter 5 Computational skills

The data you are working with is too large to analyze by hand. For example, the human genome is 3.2 billion bases. It would be hard to compare all that data to data from other species just using your eyes. Instead we’re going to need to learn some basic programming skills so we can tell the computer how to work with our data.

First, the computer we’re using doesn’t have the graphical interface you usually use (like a PC or Mac). That means you have to type in your commands - you can’t point and click. This actually has a hidden benefit because it’s easy to write down what you did in a line of text rather than having to try to explain where to click on each step.

Let’s start by learning how to log in the our server and run commands without clicking.

5.1 Getting access

  • Follow the directions for students to obtain an account on URI’s High Performance Computing Cluster (Unity).
  • You will need to request an account under pi_rsschwartz_uri_edu
  • Wait for your account to be approved

5.2 Log in to the server

  • On the main Unity page select OpenOnDemand
  • Select URI to log in
  • At the top of the page select Shell - Unity Shell access

5.3 Basic navigation

You are now logged in to Unity, which is a cluster of computers that can be used for research. First, you’ll need to learn to navigate around only by typing. Computing clusters typically require you to type all the commands.

For example, to list all the files in your home folder (where you are when you log in) type ls. There probably isn’t anything in this folder right now. This is just like opening a folder in a Finder window on a Mac or a folder on a PC, except you had to type the command instead of simply double clicking.

Second (and to help you get used to typing instead of clicking), let’s find out where we are in the folder structure. You should be in your home folder, which is inside another folder, and so on and so forth. Think of it as a highly organized filing cabinet, and you will need to move in between folders to access files. Type pwd to see the folder you are in. You’ll see your folder at the end, preceeded by a slash, preceeded by another folder, and so on.

Your home folder is on a relatively small disk, so for all of our subsequent work we will use a large storage disk. This disk contains everyone’s data, organized by research lab. You change the directory using the cd command. To get to the main folder for the lab you need to changed directories using the cd command. Unlike the prior commands you learned, but like most commands you’ll use, this command also takes an argument, which is another “word” that comes after this command. In this case you need the argument is the “path” to the folder. Type cd /work/pi_rsschwartz_uri_edu.

Let’s start by making a new folder to put your work in. Use the mkdir command to make a new directory. When you make a directory you need to name it, so instead of just mkdir you need mkdir newfolder or some other folder name. This is a shared lab folder so you should make a folder with your username. For example I might type mkdir rsschwartz. From now on I will refer to this as your main folder.

Move into your folder (use cd) and make a new folder called data. Now that you have a new folder move into it. Type pwd again. Notice what folder you are in. To go back to the folder containing your current folder type cd .., which will take you one level up in the folder hierarchy. Type pwd again to see where you are now.

Now that you have seen a few commands check out the Cheatsheet of commands at the end of the book for future reference.

5.4 Data storage

Unlike your personal computer, Unity has multiple storage options. This is because you will have extremely large datasets and for long-term storage these should be stored in the most cost-effective way possible. However, for more immediate work you will need storage drives with optimal processing speeds. Because these drives have faster processing speeds they are also more expensive and therefore you will have less space on them. For this reason we may work on data in one place (faster speeds, but less space and more expensive) and then move our results to another location (more space, slower speed).

When you log in you should see something like the following information:

/home/rsschwartz_uri_edu: 146M (1%) of 50G
/work/pi_rsschwartz_uri_edu: 19G (2%) of 1000G
/project/pi_rsschwartz_uri_edu: 0 (0%) of 5.0T

Read Unity’s storage guide for additional information.

Following these guidelines you should use /work for running your analyses and /project for long-term storage.