On linux, unlike the VAX, file names and directory names are case sensitive. You can tab complete instead of typing in the whole filename/directory name. Press tab twice to list possible completions.
Working with files and directories from the Linux commandline:
Command | Description |
---|---|
pwd | Show current directory |
cd ~ | Change to your home directory (same as your W: drive on WIndows) |
cd dirname | Change directory to a subfolder named dirname |
cd .. | Go up a directory |
ls | List files and folders in the current directory (with color highlighting) |
dir | List files and folders in the current directory without color highlighting |
mkdir dirname | Make a new subfolder named dirname |
rmdir dirname | Deletes the subfolder name dirname (forlder must be empty) |
rm filename | Deletes the file named filename |
rm -rf dirname | Recursively deletes the non-empty folder name dirname and all contained files and folders |
mv oldname newname | Rename a file or directory from oldname to newname |
mv src destination | Move a file or directory from src to destination |
cp src destination | Copies a file from src to destination |
cp -av src destination | Recursively copies a folder from src to destination |
xdg-open . | Open a gui file browser in the current directory |
xdg-open file | Open a file |
Command Line Tricks for Data Scientists
This is reproduced from: https://medium.com/@kadek/command-line-tricks-for-data-scientists-c98e0abe5da .
ICONV
File encodings can be tricky. For the most part files these days are all UTF-8 encoded. To understand some of the magic behind UTF-8, check out this excellent video. Nonetheless, there are times where we receive a file that isn’t in this format. This can lead to some wonky attempts at swapping the encoding schema. Here, iconv
is a life saver. Iconv is a simple program that will take text in one encoding and output the text in another.
# Converting -f (from) latin1 (ISO-8859-1) # -t (to) standard UTF_8 iconv -f ISO-8859-1 -t UTF-8 < input.txt > output.txt
Useful options:
- >iconv -l list all known encodings
- iconv -c silently discard characters that cannot be converted
HEAD
If you are a frequent Pandas user then head will be familiar. Often when dealing with new data the first thing we want to do is get a sense of what exists. This leads to firing up Pandas, reading in the data and then calling df.head() - strenuous, to say the least. Head, without any flags, will print out the first 10 lines of a file. The true power of head lies in testing out cleaning operations. For instance, if we wanted to change the delimiter of a file from commas to pipes. One quick test would be: head mydata.csv | sed 's/,/|/g'.