Survey:

  • your name
  • your program, your year
  • your next step? continue research or industry
  • what is your favorite programming language?

Expectations

  • active participations
    • ask questions
    • try things out, spend time debugging
    • help others with their practical roadblocks

I will not:

  • google things for you. For example, if you ask my how to install jupyter notebook, I will send you this link
  • teach how to use a programming language
  • tell how past students do the module. You shouldn’t ask for any kind of code or material also. We will check them.

Questions?

Some of the following contents are from Prof. Ané’s STAT 679 and Wikibook.

The Unix Shell

GUI (graphical user interface): easy but not reproducible.

CLI (command line interface) or REPL (read-evaluate-print loop): steep learning curve but reproducible and powerful.

After you have a terminal open, type echo $SHELL. You may see this:

$ echo $SHELL
/bin/bash

or you may get this:

% echo $SHELL
/bin/zsh

The terminal is the “window” (more or less), while the shell is a program (or a programming language, like R and Python are). You can even write your own shell!

There are several shell programs, bash (and zsh) being the most common. They are almost equivalent.

A list of shell commands

  • directory structure, root is /
  • relative versus absolute paths
    • in your code and projects: use relative paths as much as possible: it makes your code more portable, for others, and for yourself if you re-locate your own project folder
  • shortcuts: ., .., ~, -
    • cd - is so useful!
  • tab completion to get program and file names
  • up/down arrows and ! to repeat commands
   
man display the manual of a command
pwd print working directory. where am I?
ls list. many options, e.g. -a (all) -l (long) -lrt (reverse-sorted by time)
cd change directory
mkdir make directory
rm remove (forever). -f to force, -i to ask interactively, -r recursively
mv move (and rename). can overwrite existing files, unless -i to ask
cp copy. would also overwrite existing files
wc word count: lines, words, characters. -l, -w, -c
cat concatenate
less because “less is more”. q to quit.
sort -n for numerical sorting
head first 10 lines. -n 3 for first 3 lines (etc.)
tail last 10 lines. -n 3 for last 3 lines, -n +30 for line 30 and up
echo print
ps provide information about the currently running processes
kill kill a process manully
chmod change file permissions
history shows the history of all previous commands, numbered
   

The shell is an incredibly powerful tool:

  • The Unix shell can do great things, but power comes with danger: it’s unsafe!
$ rm -rf *

deletes all files in the current directory.

rm is to remove files & directories
-r will do it recursively (enter each directory within each directory)
-f will “force” removal without asking you confirmation for each individual file
* in the shell will match anything

You should understand what the command is doing before executing it!

The Unix world: one file after another

When you think of a computer, you usually come up with the following things:

  • The computer itself

  • The keyboard

  • The mouse

  • Your hard drive with your files and directories on it

  • The network connection leading to the Internet

However, everything in the whole (Unix) universe is a file. Your (data) files are files. Your directories are files. Your hard drive is a file. Your keyboard is a read-only file of infinite size. Your monitor is infinitely sized write-only files. Your network connection is a read/write file.

Streams: what goes between files

Everything in Unix is a file – except that which sits between files. Between files Unix defines a mechanism that allows data to move from one file to another: the stream.

Unix has the following three standard streams:

  • Standard in (stdin): the standard stream for input into a file.

  • Standard out (stdout): the standard stream for output out of a file.

  • Standard error (stderr): the standard stream for error output from a file.

Text streams

to process a stream of data rather than holding it all in memory.

Example: concatenate two data files. Open both in editor, copy one and paste into the other?

  • may not have enough memory
  • manual operation: error-prone and not reproducible.

Instead: print the files’s content to standard output stream and redirect this stream from our terminal to the file we wish to save the combined results to.

cat a.txt b.txt c.txt
cat *.txt > combine.txt

Mock interview question

How can you generate a uniformly distributed random number between 1-7 with only a die?