Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

What you can quickly learn is how to query lots of data for the information you want super fast. Using Bash or any other shell sometimes feels more like programming than like using a mouse. Commands are terse (often only a couple of characters long), their names are frequently cryptic, and their output is lines of text rather than something visual like a graph. On the other hand, with only a few keystrokes, the shell allows us to combine existing tools into powerful pipelines and handle large volumes of data automatically. This automation not only makes us more productive, but also improves the reproducibility of our workflows by allowing us to repeat them with few simple commands. Also, understanding the basics of the shell is very useful as a foundation before learning to program, since most programming languages necessitate working with the shell.

0.1:

...

Navigation

We will begin with the basics of navigating the Unix shell.

...

The output will be a path to your home directory. Let's check if we recognize it by listing the contents of the directory. To do that, we use the ls command: 

Code Block
languagebash
$ ls
Applications Documents Library Music Public Desktop Downloads Movies Pictures

 

We may want more information than just a list of files and directories. We can get this by specifying various flags(also known as options or switches) to go with our basic commands.  These are additions to a command that provide the computer with a bit more guidance of what sort of output or manipulation you want.

If we type ls -l and hit enter, the computer returns a list of files that contains information similar to what we would find in our Finder (Mac) or Explorer (Windows): the size of the files in bytes, the date it was created or last modified, and the file name. 

Code Block
languagebash
$ ls -l
total 0 
drwx------+ 6 riley staff 204 Jul 16 11:50 Desktop 
drwx------+ 3 riley staff 102 Jul 16 11:30 Documents 
drwx------+ 3 riley staff 102 Jul 16 11:30 Downloads 
drwx------@ 46 riley staff 1564 Jul 16 11:38 Library 
drwx------+ 3 riley staff 102 Jul 16 11:30 Movies 
drwx------+ 3 riley staff 102 Jul 16 11:30 Music 
drwx------+ 3 riley staff 102 Jul 16 11:30 Pictures 
drwxr-xr-x+ 5 riley staff 170 Jul 16 11:30 Public

...

In everyday usage we are more used to units of measurement like kilobytes, megabytes, and gigabytes. Luckily, there's another flag -h that when used with the -l option, use unit suffixes: Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte in order to reduce the number of digits to three or less using base 2 for sizes.

Now ls -h won't work on its own. When we want to combine two flags, we can just run them together. So, by typing ls -lh and hitting enter we receive an output in a human-readable format (note: the order here doesn't matter).

 

Code Block
languagebash
$ ls -lh
total 0 
drwx------+ 6 riley staff 204B Jul 16 11:50 Desktop 
drwx------+ 3 riley staff 102B Jul 16 11:30 Documents 
drwx------+ 3 riley staff 102B Jul 16 11:30 Downloads 
drwx------@ 46 riley staff 1.5K Jul 16 11:38 Library 
drwx------+ 3 riley staff 102B Jul 16 11:30 Movies 
drwx------+ 3 riley staff 102B Jul 16 11:30 Music 
drwx------+ 3 riley staff 102B Jul 16 11:30 Pictures 
drwxr-xr-x+ 5 riley staff 170B Jul 16 11:30 Public

...

We've now spent a great deal of time in our home directory. Let's go somewhere else. We can do that through the cd or Change Directory command: (Note: On Windows and Mac, by default, the case of the file/directory doesn't matter On Linux it does.)

 

Code Block
languagebash
$ cd Desktop

Notice that the command didn't output anything. This means that it was carried out successfully. Let's check by using pwd:

 

Code Block
languagebash
$ pwd
/Users/riley/Desktop

 

If something had gone wrong, however, the command would have told you. Let's see by trying to move into a (hopefully) non-existing directory:

 

Code Block
languagebash
$ cd "Evil plan to destroy the world"
bash: cd: Evil plan to destroy the world: No such file or directory

...

Notice that we surrounded the name by quotation marks. The arguments given to any shell command are separated by spaces, so a way to let them know that we mean 'one single thing called "Evil plan to destroy the world"', not 'six different things', is to use (single or double) quotation marks.

We've now seen how we can do 'down' through our directory structure (as in into more nested directories). If we want to go back, we can type cd ... This moves us 'up' one directory, putting us back where we started. If we ever get completely lost, the command cd without any arguments will bring us right back to the home directory, right where we started.

Previous Directory

To switch back and forth between two directories use: cd -.

Try exploring

Move around the computer, get used to moving in and out of directories, see how different file types appear in the Unix shell. Be sure to use the pwd and cd commands, and the different flags for the ls command you learned so far. If you run Windows, also try typing 

...

Note: this command is for Mac and Linux users only. It does not work directly for Windows users. If you use windows, you can search for the Shell command on http://man.he.net/, and view the associated manual page.

Find out about advanced ls commands

Find out, using the manual page, how to list the files in a directory ordered by their filesize. Try it out in different directories. Can you combine it with the -lflag you learned before? Afterwards, find out how you can order a list of files based on their last modification date. Try ordering files in different directories.

Answer

To order files in a directory by their filesize, in combination with the -l flag:

...

2014-01-31_JA-africa.tsv   2014-02-02_JA-britain.tsv  gulliver.txt
2014-01-31_JA-america.tsv  33504-0.txt
2014-01_JA.tsv

Copying a file

Instead of moving a file, you might want to copy a file (make a duplicate), for instance to make a backup before modifying a file using some script you're not quite sure how works. Just like the mv command, the cpcommand takes two arguments: the old name and the new name. How would you make a copy of the file gulliver.txt called gulliver-backup.txt? Try it!

 

Answer

cp gulliver.txt gulliver-backup.txt

Renaming a directory

Renaming a directory works in the same way as renaming a file. Try using the mv command to rename the firstdir directory to backup.

Answer

mv firstdir backup

Moving a file into a directory

If the last argument you give to the mv command is a directory, not a file, the file given in the first argument will be moved to that directory. Try using the mv command to move the file gulliver-backup.txt into the backupfolder.

Answer

mv gulliver-backup.txt backup
{: .bash}
This would also work:
mv gulliver-backup.txt backup/gulliver-backup.txt

...

(Regular expressions are not a feature of the shell, but some commands support them, we'll get back to that.)

...

  • The ? wildcard matches the regular expression . (a dot)

  • The * wildcard matches the regular expression .*

...

The echo command simply prints out a text you specify. Try it out: echo "Library Carpentry is awesome!". Interesting, isn't it?You can also specify a variable, for instance NAME= followed by your name. Then type echo "$NAME is a fantastic library carpentry student". What happens?You can combine both text and normal shell commands using echo, for example the pwd command you have learned earlier today. You do this by enclosing a shell command in $( and ), for instance $(pwd). Now, try out the following: echo "Finally, it is nice and sunny on" $(date). Note that the output of the datecommand is printed together with the text you specified. You can try the same with some of the other commands you have learned so far.Why do you think the echo command is actually quite important in the shell environment?

...

You may think there is not much value in such a basic command like echo. However, from the moment you start writing automated shell scripts, it becomes very useful. For instance, you often need to output text to the screen, such as the current status of a script.

...

And let's just check what files are in the directory and how large they are with ls -lh:

$ ls -lh

{: .bash}

total 139M
-rw-r--r-- 1 riley staff 3.6M Jan 31 18:47 2014-01-31_JA-africa.tsv
-rw-r--r-- 1 riley staff 7.4M Jan 31 18:47 2014-01-31_JA-america.tsv
-rw-rw-r-- 1 riley staff 126M Jun 10  2015 2014-01_JA.tsv
-rw-r--r-- 1 riley staff 1.4M Jan 31 18:47 2014-02-02_JA-britain.tsv
-rw-r--r-- 1 riley staff 583K Feb  1 22:53 33504-0.txt
drwxr-xr-x 2 riley staff   68 Feb  2 00:58 backup
-rw-r--r-- 1 riley staff 598K Jan 31 18:47 gulliver.txt

...

In this episode we'll focus on the dataset 2014-01_JA.tsv, that contains journal article metadata, and the three .tsvfiles derived from the original dataset. Each of these three .tsv files includes all data where a keyword such as africa or america appears in the 'Title' field of 2014-01_JA.tsv.

CSV and TSV Files

CSV (Comma-separated values) is a common plain text format for storing tabular data, where each record occupies one line and the values are separated by commas. TSV (Tab-separated values) is just the same except that values are separated by tabs rather than commas. Confusingly, CSV is sometimes used to refer to both CSV, TSV and variations of them. The simplicity of the formats make them great for exchange and archival. They are not bound to a specific program (unlike Excel files, say, there is no CSV program, just lots and lots of programs that support the format, including Excel by the way.), and you wouldn't have any problems opening a 40 year old file today if you came across one

...

.

...

wc is the "word count" command: it counts the number of lines, words, bytes and characters in files. Since we love the wildcard operator, let's run the command wc *.tsv to get counts for all the .tsv files in the current directory (it takes a little time to complete):

...

The key is that any program that reads lines of text from standard input and writes lines of text to standard output can be combined with every other program that behaves this way as well. You can and should write your programs this way so that you and other people can put those programs into pipes to multiply their power.

Adding another pipe

We have our wc -l *.tsv | sort -n | head -n 1 pipeline. What would happen if you piped this into cat? Try it!

Solution

The cat command just outputs whatever it gets as input, so you get exactly the same output from

...

Let's make a different pipeline. You want to find out how many files and directories there are in the current directory. Try to see if you can pipe the output from ls into wc to find the answer, or something close to the answer.

Solution

You get close with

$ ls -l | wc -l

...

The date command outputs the current date and time. Can you write the current date and time to a new file called logfile.txt? Then check the contents of the file.

Solution

$ date > logfile.txt
$ cat logfile.txt

...

While > writes to a file, >> appends something to a file. Try to append the current date and time to the file logfile.txt?

Solution

$ date >> logfile.txt $ cat logfile.txt

...

If you have time, you can also try to sort the results by piping it to sort. And/or explore the other flags of wc.

Solution

From man wc, you will see that there is a -w flag to print the number of words:

...

Search for all case sensitive instances of a word you choose in all four derived tsv files in this directory. Print your results to the shell.

...

$ grep hero *.tsv

Case sensitive search in select files

Search for all case sensitive instances of a word you choose in the 'America' and 'Africa' tsv files in this directory. Print your results to the shell.

Solution

$ grep hero *a.tsv

Count words (case sensitive)

...

Count all case insensitive instances of that word in the 'America' and 'Africa' tsv files in this directory. Print your results to the shell.

Solution

$ grep -ci hero *a.tsv

Case insensitive search in select files

...

  • Recall that the command ls -l | wc -l took us quite far, but the result was one too high because it included the "total" line in the line count.

  • With the knowledge of grep, can you figure out how to exclude the "total" line from the ls -l output?

  • Hint: You want to exclude any line starting with the text "total". The hat character (^) is used in regular expressions to indicate the start of a line.

Solution

To find any lines starting with "total", we would use:

...