Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Working with files and directories from the Linux commandline:

Command

Description

pwd

Show current directory

cd ~

Change to your home directory (same as your W: drive on WIndows)

cd dirname

Change directory to a subfolder named dirname

cd ..

Go up a directory

ls

List files and folders in the current directory (with color highlighting)

dir

List files and folders in the current directory without color highlighting

mkdir dirname

Make a new subfolder named dirname

rmdir dirname

Deletes the subfolder name dirname (forlder must be empty)

rm filename

Deletes the file named filename

rm -rf dirname

Recursively deletes the non-empty folder name dirname and all contained files and folders

mv oldname newname

Rename a file or directory from oldname to newname

mv src destination

Move a file or directory from src to destination

cp src destination

Copies a file from src to destination

cp -av src destination

Recursively copies a folder from src to destination

xdg-open .

Open a gui file browser in the current directory

xdg-open file

Open a file

 

Command Line Tricks for Data Scientists

This is reproduced from: https://medium.com/@kadek/command-line-tricks-for-data-scientists-c98e0abe5da .

ICONV

File encodings can be tricky. For the most part files these days are all UTF-8 encoded. To understand some of the magic behind UTF-8, check out this excellent video. Nonetheless, there are times where we receive a file that isn’t in this format. This can lead to some wonky attempts at swapping the encoding schema. Here, iconv is a life saver. Iconv is a simple program that will take text in one encoding and output the text in another.

 
Code Block
languagebash
# Converting -f (from) latin1 (ISO-8859-1)
# -t (to) standard UTF_8
iconv -f ISO-8859-1 -t UTF-8 < input.txt > output.txt
 

Useful options:

  • >iconv -l list all known encodings
  • iconv -c silently discard characters that cannot be converted

HEAD

If you are a frequent Pandas user then head will be familiar. Often when dealing with new data the first thing we want to do is get a sense of what exists. This leads to firing up Pandas, reading in the data and then calling df.head() - strenuous, to say the least. Head, without any flags, will print out the first 10 lines of a file. The true power of head lies in testing out cleaning operations. For instance, if we wanted to change the delimiter of a file from commas to pipes. One quick test would be: head mydata.csv | sed 's/,/|/g'.