You are here: Public » Introduction to Unix

Introduction to Unix
Introduction to Unix, with an emphasis on Linux

12 Jun 2024 - 16:06 | Version 64 | , , , , ,

Linux is the most common Operating System (OS) you will find in scientific and technical computing. It is a version of Unix. MacOS is closely related, so what is written here can mostly be applied as well, due to the similarity of all Unixes as far as the end users are concerned.

This documentation is meant to educate users of different levels, while organizing information by categories. As a result, each level is color-coded, and users may choose to focus on one color and skip the others. The overall goal is not to teach everything about every single command, but rather show what kind of tools are available.
Before we get started, some notes about this document:

A green background is the bare minimum any user should know: Unix for beginners

A yellow background is for regular Unix users who want better insight on a topic

Advanced users can learn new information about administration and programming

Commands are shown using the format command [option(s)] <mandatory argument(s)>
Anything in italics should be changed when you use it! The characters <, >, [, ] should be omitted

key press are key combinations. Multiple keys pressed together are separated by -
^Key is the same as Ctrl-Key

Information specific to Linux is preceded by Tux, the Linux Penguin logo.

Information specific to MacOS is preceded by the Apple logo.

Flatiron-specific information appears in blue tip boxes

Unix, Linux?

Unix is a family of Operating Systems that have been around since 1969. They are by far the most common OS in use in servers, but also consumer computers (MacOS, WSL inside Windows), and even cell phones (Android, iOS).

What is Linux?

Linux was first developed in the early 1990s by Linus Torvalds, as a way to provide an open-source Unix-like operating system. It spread on Intel-like processors, as an alternative to Windows, and eventually became the de-facto Operating System for servers. With time, it replaced vendor-specific OS'es as well.

Linux distributions?

Linux is mostly used in the form of distributions, which contain the kernel, and different software packages. The main differences between them are how low-level functions are implemented (eg: how programs start on boot), and the package manager used. Related distributions can be grouped into families, with internal differences based on licensing model.
  • RedHat family: RedHat Enterprise Linux (RHEL), Fedora, CentOS, Rocky Linux
  • Debian family: Debian, Ubuntu, Mint
  • SUSE Linux Enterprise Server (SLES)

At FI, we use Rocky Linux 8 on the workstations and cluster nodes

The terminal

The most basic way to use Unix is through its terminal (or console). Do not be scared, programs and tools come with help!

A quick note about the colors/styles in this document for the examples in terminal windows:
[this@is the_prompt]$ This_is_a_command_you_would_type # Short description
And here is what Unix would respond to your command

Unix commands often look like: command <mandatory argument> [optional argument]
You will notice that some of these optional arguments have a short and long form
  • Long-form options are --long-option
    when a value is expected, the equal sign is used: --long-option=<value>
  • Usually short options take the form -X and can be aggregated, eg: -XYZ is -X -Y -Z
    when a value is expected, it is preceeded by a space -X <value>

Connecting to a computer

In order to log into a computer, you will need a username and a password. Typically, you will see a prompt asking you for your username (login) and once entered, you will be prompted for a password. In Unix, every user has their own account (username/login). The administrator user (superuser) is called root.


Once you are logged in, you will see the prompt for your shell environment. This is where you will enter commands to launch programs, access files, …

The prompt will vary based on your shell/distribution, and will usually display your login, the name of the computer you are using, and the directory you are currently in.
In the examples, we will assume a user called user1 connected to the computer ws01. Different examples of prompt:
[user1@ws01 ~]$   Standard with bash on CentOS and Rocky
$   Simple version
user1@ws01:~$  On Ubuntu and Debian
ws01:~ user1$  zsh on MacOS

Common shells:
  • bash: (Bourne Again Shell) the standard for Linux
  • zsh: the standard for MacOS, compatible with bash
  • csh, tcsh: legacy shell, installed on Linux for backward compatibility

A shell can be closed using exit or logout. For GUI-based terminals, this will close the terminal.

Internally, what shells do is let users interact with the kernel of the Operating System.
They are interpreted programming languages, making them powerful tools if you take the time to study them carefully.

Personalizing the shell

alias <shortcut>='<command>' are shortcuts to commands with arguments.
Common ones are setup by default by most Unix installations, you see colors with ls because it is an alias to ls --color=auto
To see the list of aliases for your account, simply type the command alias

To test a different shell, you can launch it in your command line: bash, zsh etc
If you want to make the change permanent, you can use chsh -s <shell>

If you have an account at FI, you can change the default shell by using FIDO

You can change the way your prompt looks using the PS1 environment variable. There are many guides about that.

Tips about the terminal

  • When you start typing a command, you can press Tab, which will auto-complete with the known commands and sub-directories that start with these characters
  • You can copy-paste by selecting text with your mouse, and then middle-click. This usually works anywhere in Unix, including text editors
  • To see past commands, you can use the arrow keys and
  • To see the complete list, you can use history. You will notice that each line has a number. To repeat an existing command, use ! with that number. Eg: !123
  • To find commands you have previously used, Ctrl-R will show you the most recent version, with auto-completion as you start typing

Command line editor cheat sheet
Most shortcuts are based on the location of the cursor

Move backward one character
Move forward one character
Move backward one word Ctrl-→
Move forward one word
Jump to the beginning of line End
Jump to the end of line
Clear the terminal    
Ctrl-K Clear to the end of line Ctrl-U Clear from the beginning
Cut beginning of word    
Ctrl-Y Paste last cut Alt-Y Cycle cut text
Ctrl-/ Undo (repeatable)    

Command line history cheat sheet

Previous command
Next command
history Show the complete history !<id> Re-issue command id
Ctrl-R Search a previous command    

Getting help

Man pages

Standard Unix commands all come with man (manual) pages accessed with man <command>
You can then navigate using the arrow keys, or search using /<pattern>
[user1@ws01 ~]$ man rm
RM(1)                      User Commands                      RM(1)

       rm - remove files or directories
       rm [OPTION]... [FILE]...

The SEE ALSO section is extremely useful to find related commands

If you are programming in C, you can also use manual pages which detail APIs.
Eg: man fprintf

There are different categories of man pages. They correspond to the number in parenthesis after the page name. The most common ones are:
  • (1) are commands and programs you can execute
  • (2) are for system calls
  • (3) correspond to functions from the C APIs

Command-line options

For most applications, there are often other forms of help from the command line, usually by passing the arguments help, -h, or --help. Example:

[user1@ws01 ~]$ gcc --help
Usage: gcc [options] file…
  -pass-exit-codes         Exit with highest error code from a phase.
  --help                   Display this information.

Files: Unix Filesystem tree

Just like for any other Operating System, files are organized inside directories (folders) in a tree structure where / is the root. In the following examples, we will use:

├── code
│   ├── bin
│   │   ├── hello_world
│   │   └── test_db
│   ├── bin2
│   │   └── hello_world
│   ├── include
│   │   └── database
│   │       └── db.h
│   ├── lib
│   │   └──
│   └── src
│       ├── database
│       │   ├── db.c
│       │   └── db.o
│       └── tests
│           ├── hello_world.c
│           ├── test_db.c
│           └── test_db.o
├── data
└── README.txt

Anywhere you are, there are special directories:
  • . is the current directory
  • .. is the folder parent directory
  • / is the root directory
  • ~ is your home directory (eg: /home/user1)

The formatted representation above is the output from the Linux command tree

Listing files

ls lists all the files and folders located at your current location (path) in the tree
[user1@ws01 folder]$ ls
code  data  README.txt

In Unix, files and directories starting with dot . are hidden files which are not displayed by default by ls. To see them, use ls -a They are often used to store settings. For instance in your home directory, you will find .bashrc, .ssh/

If you want to filter files by name, you can use regular expressions (see grep for more details). A common pattern is to use * (which matches any character, 0 or more times). Example:
  • ls *.jpg shows only the files ending with the extension .jpg
  • ls *jwst* shows all the files which contain jwst in their name

ls -l gives details about the files: latest modification, permissions, size.

Tip: ls -ltr shows you the list of files in the current directory, reverse-ordered by date

Seeing the current path

Most shell prompts show the current directory, but pwd provides the complete path starting from the root /
[user1@ws01 folder]$ pwd

In Unix, forward slashes / are used to delimitate sub-directories.

To change location, you can simply use cd (change directory). This can be absolute (starting at /) or relative.
You can go "up the tree" by using ..
cd - returns to the previous directory you were in.

[user1@ws01 folder]$ pwd
[user1@ws01 folder]$ cd code/ # Pressing tab here auto-completes to show the subfolders
bin/     bin2/    include/ lib/     src/     
[user1@ws01 folder]$ cd code/include/database/
[user1@ws01 database]$ pwd
[user1@ws01 database]$ cd ../../src/
[user1@ws01 src]$ pwd
[user1@ws01 src]$ cd /tmp/folder/data # An absolute path
[user1@ws01 data]$ pwd

Files and directories operations

Working with files

  • mv <filename> <new_name_or_destination> renames or moves a file
  • cp <source> <destination> copies a file
  • rm <filename> erases a file Caution: there is no undo!
  • rmdir <dirname> removes an empty directory
  • rm -r <dirname> removes directories recursively Use with caution
Operations taking the -r option will be performed recursively (on all the sub-directories)

  • cp -r <source> <destination> copies a directory recursively
  • touch <filename> creates an empty file, or changes its date to the current date and time

  • cp -p  <filename> copies a file while preserving the metadata (ownership and timestamp)
  • touch -d @<epoch_time> <filename> sets a file date, time to the given value

File types

There are several standard extensions for file names in Unix:
  • .sh are shell scripts
  • .py are python scripts
  • .conf are configuration files
  • .o files are object files, compiled from a C, C++, Fortran program
  • .so are dynamically loaded libraries
  • .a are statically loaded libraries (they will be embedded in executables)
  • And of course, the standard .txt, .jpg, .png, …

The file command is used to give you more details about a file. For instance, it can be used to know if something is a 32-bit or 64-bit executable.
[user1@ws01 tests]$ file test_db.c
test_db.c: C source, ASCII text
[user1@ws01 tests]$ file test_db.o
test_db.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
[user1@ws01 tests]$ file test_db
test_db: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not stripped

Sometimes it can be useful to have links to files or directories. For instance:
  • To have a shortcut to a directory
  • When multiple versions of the same file exist, to create a default one
  • To share read-only data from a common source
To create a link, use the command ln -s <source> <destination> where destination can be a new name, or a directory (target location).

Example, and resulting effect as seen by ls:
[user1@ws01 folder]$ ln -s code/bin/test_db .
[user1@ws01 folder]$ ls -l test_db 
lrwxrwxrwx 1 user1 group1 16 Jul 24 22:10 test_db → code/bin/test_db

Looking for and finding files

The find command can be used if you are looking for a file or a group of files. It is recursive and will search through sub-directories. Be careful as this can be taxing on the filesystem. Typical use cases:
  • find <directory> -name <pattern> looks in directory for files with the given pattern
  • It can be useful to look only for regular files, in this case, use the option -type f


[user1@ws01 folder]$ find . -name "*.o" # Find object files

find is a powerful command that can be used with the -exec optional argument to run code on the search result. For instance, this used to be the portable way to find a string contained in the files in a tree (the {} refer to an individual result from find ):
find <directory> -name <filename_pattern> -exec grep -n <needle> {} /dev/null \;

For example, this would be used to find all the include's in C files.

[u@w folder]$ find . -name "*.c" -exec grep -n include {} /dev/null \;
./code/src/tests/hello_world.c:1:#include <stdio.h>
./code/src/tests/test_db.c:1:#include <stdio.h>
./code/src/tests/test_db.c:2:#include "db.h"
./code/src/database/db.c:1:#include "db.h"

A more modern version would be:
grep -rn --include=<filename_pattern> <directory> -e '<pattern>'

If installed, the command locate <pattern> will look for a file by a portion of its name. This relies on a database to be rebuilt periodically, and recent files will not appear in the results!

File permissions

Permissions and groups

Unix is multi-user, and every file and directory has permissions (rights). Those are:
  • read: the ability to read a file
  • write: the ability to write/modify/erase a file
  • execute: the ability to execute a file (eg: program), or access the content of directories

These apply to three entities: user, group, other
  • user is the owner of the file
  • group is the group that file is assigned to
  • other is anyone else

Unix groups are used to share data between users. A user may be part of any number of groups. To see which groups you are part of, use the command groups

Seeing permissions

To see the permissions for a given set of files, use ls -l
The format you will see is:
      [ permissions ]
 ?    rwx   rwx   rwx     username     usergroup   size  date  filename
type  user group other  file's owner  file's group

If there is a -, it means the permission is not granted. Remember that for someone to be allowed to enter a directory, they need both read and execute permissions (at least r-x)

[user1@ws01 folder]$ ls -l
total 0
drwxrwxr-x 7 user1 group1 66 Jun 24 12:23 code
drwxrwxr-x 5 user1 group1 47 Jun 23 14:38 data
-rw-r--r-- 1 user1 group1  0 Jun 24 17:03 README.txt

Changing permissions

chmod is used to grant (+) or remove (-) permissions:
chmod <entity_they_apply_to>[-|+]<permissions_to_change> <filename>
You can use a (all) instead of ugo to set permissions for everyone.

With the example above:
[user1@ws01 folder]$ chmod go-rwx data    # Only the user will be able to read that folder
[user1@ws01 folder]$ chmod g+w README.txt # members of that group will be able to modify that file
[user1@ws01 folder]$ ls -l
total 0
drwxrwxr-x 7 user1 group1 66 Jun 24 12:23 code
drwx------ 5 user1 group1 47 Jun 23 14:38 data
-rw-rw-r-- 1 user1 group1  0 Jun 24 17:03 README.txt

Permission masks

Instead of the human-readable permissions modifications like chmod ugo+rx, you will often see examples of scripts that use numbers, for instance chmod 775.

It sets the permissions based on the given number (the mask). Each digit is for user, group, other. This mask is based on the binary representation of read + write + and execute:
  • Read set to true (1) is 4 in decimal
  • Write set to true is 2 in decimal
  • Execute set to true is 1 in decimal
  • The total is the mask. For instance r-x would be 1*4 (r) + 0*2 (w) + 1*1 (x) = 5
  • chmod 775 will correspond to rwx rwx r-x

Changing owner or group

You can change the owner or the group of a file or directory. Be careful: changing the owner might mean preventing yourself from changing that file anymore!
  • chown <username> <file_or_dir> will change the owner of the file or directory
  • chgrp <groupname> <file_or_dir> will change the group of the file or directory

It is often desirable to change a whole tree (directory and its subdirectories) and assign it to a new owner and group. For this purpose, use:
chown -R <username>:<group> <directory>

Applying default permissions

If you want to have a directory, where all the newly created files have the same group as the directory rather than the default, use chmod g+s.

[user1@ws01 folder]$ mkdir foo
[user1@ws01 folder]$ ls -ld foo # ls -d only shows the directory
drwxrwxr-x 2 user1 group1 6 Jul 24 17:20 foo
[user1@ws01 folder]$ chgrp group2 foo 
[user1@ws01 folder]$ ls -ld foo
drwxrwxr-x 2 user1 group2 6 Jul 24 17:20 foo
[user1@ws01 folder]$ cd foo
[user1@ws01 foo]$ touch bar1
[user1@ws01 foo]$ ls -l
total 0    # The new file's permissions are using the default group
-rw-rw-r-- 1 user1 group1 0 Jul 24 17:20 bar1
[user1@ws01 foo]$ cd ..
[user1@ws01 folder]$ chmod g+s foo
[user1@ws01 folder]$ ls -ld foo
drwxrwsr-x 1 user1 group2 6 Jul 24 17:20 foo
[user1@ws01 folder]$ cd foo
[user1@ws01 foo]$ touch bar2
[user1@ws01 foo]$ ls -l
total 0    # The new file's permissions are the same as the directory
-rw-rw-r-- 1 user1 group1 0 Jul 24 17:20 bar1
-rw-rw-r-- 1 user1 group2 0 Jul 24 17:20 bar2

Text files

Text files content

How to see the content of a file?
  • cat <filename(s)> displays the content of one or several files in the terminal, (be careful with large files): very useful to concatenate multiple files together
  • more <filename> and less <filename> display the content, but let you navigate in the file (Enter, , , d, Pg up, Pg dn), and search using /<pattern>

  • head <filename> shows the beginning of a file
  • tail <filename> shows the end of a file
    tail -f <filename> keeps the file open and lets you see lines being added

Standard text editors

vi / vim

This is the text editor you will find in any Unix installation. It might look scary at first, but recent versions have greatly increased its usability.

To open or create a new file use vi <filename>

vi cheat sheet
i insert mode Esc exit current mode
:q quit :q! force quit
:w write to file :x is a shortcut for :wq
o creates a new line O new line before current
I insert at beginning of line A insert at end of line
J merges two lines  
:N jump to line N G jump to last line
yy copies current line :yN copies N lines
cc cuts current line :cN cuts N lines
p paste :u undo
dd deletes current line :dN deletes N lines
/pattern searches pattern
/ next occurrence
? previous


Emacs should also be installed on any system. To launch it, use:
[user1@ws01 ~]$ emacs # To open an empty editor
[user1@ws01 ~]$ emacs file.ext # will open the specified file

emacs cheat sheet
Once inside the editor, all the commands are usually accessible using the Ctrl key.
Ctrl-x Ctrl-c Exits the program Ctrl-x Ctrl-f Opens a file
Ctrl-x Ctrl-s saves a file Ctrl-x Ctrl-w saves as a new file
Ctrl-x u undo  


Nano is a simple text editor, with inline help at the bottom of the screen.
[user1@ws01 ~]$ nano # To create an empty buffer
[user1@ws01 ~]$ nano filename.ext # To open an existing file

Text files tools

Getting stats about a text file

wc is used to get information about a text file: number of lines, words, characters.
Common use cases:
  • wc -l <filename> counts the number of lines in a file
  • <command> | wc -l counts the number of results returned by command

Filtering text

grep <pattern> is a command that shows you the lines that match a pattern in a file or a string. This pattern can be a full-fledged regular expression.

Let's take an example with a file called animals.txt containing the following text:
I like cats: they are great animals, and their name is a Unix command.
Mine is called Cathy.
On the other hand, dogs are friendly and not as reclusive.
Overall, cats and dogs are good pets.

Now let's use grep on it:

[user1@ws01 input]$ grep cat animals.txt 
I like cats: they are great animals, and their name is a Unix command.
Overall, cats and dogs are good pets.
# If we use -i, the search becomes case-insensitive:
[user1@ws01 input]$ grep -i cat animals.txt
I like cats: they are great animals, and their name is a Unix command.
Mine is called Cathy.
Overall, cats and dogs are good pets.

When dealing with large files, using grep -n will show you the line numbers that have matched the search pattern.

It is often useful to know the context around the line you are looking for, in this case you can use the -C<N> flag, which will show N lines per match (including the lines leading to the matching line, and the ones after).

On the other hand, sometimes you do not want certain lines (eg: in log files) in this case use grep -v <pattern_to_reject>

Finally, you can use regular expressions using grep -E <regular_expression>

When looking for something in compressed text files, not need to uncompress them first. Use instead zgrep, bzgrep, xzgrep, or lzgrep which can be used for (respectively) .gz, .bz2, .xz, .lz files

Replacing text

The sed command is used to edit a text quickly, especially to replace one string with another, using the syntax:
sed 's/<string1>/<string2>/' <filename>

Example with the file above:

[user1@ws01 input]$ sed 's/cat/bear/' animals.txt
I like bears: they are great animals, and their name is a Unix command.
Mine is called Cathy.
On the other hand, dogs are friendly and not as reclusive.
Overall, bears and dogs are good pets.

If you are using vi, you can use the sed syntax as you are editing a file to replace a string by another. The command is (within vi):
:%s s/<string1>/<string2>/gc

The tr command is used to delete or replace characters.

[user1@ws01 input]$ echo "tata" | tr a o # replaces a's with o's
[user1@ws01 input]$ echo "tata" | tr -d a # deletes a's

Sorting lines in a file

  • The sort <filenames> command sorts (alphabetically) concatenated text files line by line
  • uniq <filename> is used to remove duplicate lines from a single text file
  • To combine sorting and duplicate removals, use sort -u <filenames>

Extracting formatted data from a text file

When data is stored in a formatted text file, awk is used to parse and output it. By default, it assumes that the data is organized by rows and columns, where rows are lines, and columns are separated with spaces or tabs (this is configurable). In this case, if for instance we want to print the first and third columns of each row, we can use:
awk '{print $1 $3}' <filename>
Each column is identified by $<N> where N refers to the Nth column. $0 is the complete line.

Let's take an example, with the file users.dat containing some information about individuals:
Paul            Peterson        34     London
Julia           Smith           23   Boston
Mary-Jane       Allgood         54    Vancouver
Peter           Maxwell         45     Atlanta

Let's print their first names and age, comma-separated:
[user1@ws01 input]$ awk '{print $1","$3}' users.dat

Finding differences between files

If you want to see how two text files differ, you can use the command:
diff <filename1> <filename2>

diff can also be used to generate patches (code fixes), which can be applied by other users to benefit from your changes.
  • Creating a patch: diff -Naur oldfile newfile > patchfile
  • Applying the patch: patch < patchfile in the directory of the file to be patched

Generic files tools

When downloading source code, you will often see an MD5 Checksum on the download page. This is to ensure that the file you obtained has not been corrupted. To calculate the checksum, use md5sum <filename> This is a costly operation

To compare two files byte by byte, use cmp <file1> <file2> No output means the files are identical.


You might often see data or programs being distributed as .tar.gz or .tgz files (referred to as "tarballs"). Those are compressed (.gz, but can also be .bz2, .xz) archives (.tar).

An archive contains a whole tree structure (files and folders), which, when created, will conserve the metadata (date, permissions), enabling to easily distribute an exact copy of a work environment.

Opening an archive

You can uncompress and untar in a single instruction (the optional v stands for "view". For large archives, you should omit it):
  • .tar.gz files: tar xzvf <archive_name>.tar.gz
  • .tar.bz2 files: tar xjvf <archive_name>.tar.bz2
  • .tar.xz files: tar xJvf <archive_name>.tar.xz

If you want to process in two steps, uncompress then untar:
  1. Uncompress files using gunzip, bunzip2, unxz
  2. To extract all the files from an archive, use tar xvf <archive_name>.tar

Example, after having copied the original .tar.gz file to another computer:
[user2@ws02 ~]$ ls
[user2@ws02 ~]$ tar xzvf folder.tar.gz 
[user2@ws02 ~]$ ls
folder      folder.tar.gz
[user2@ws02 ~]$ ls -l folder/code/bin
total 17    # Metadata preserved
-rwxrwxr-x 1 user2 group2 8120 Jun 24 11:39 hello_world
-rwx------ 1 user2 group2 8200 Jun 24 16:23 test_db

Creating an archive

The first step is to use tar cvf <archive_name>.tar <dir_to_archive> to create the archive.

The second step is to compress, for instance with gzip: gzip <archive_name>.tar which will generate <archive_name>.tar.gz

But you can use the shortcut that does both at the same time:
  • .tar.gz files: tar czvf <archive_name>.tar.gz <dir_to_archive>
  • .tar.bz2 files: tar cjvf <archive_name>.tar.bz2 <dir_to_archive>
  • .tar.xz files: tar cJvf <archive_name>.tar.gz <dir_to_archive>

With the tree presented earlier:
[user1@ws01 tmp]$ ls
[user1@ws01 tmp]$ tar czvf folder.tar.gz folder
[user1@ws01 tmp]$ ls
folder                folder.tar.gz   

Variables and environment

Shell variables

Variables (used to store a value) in shell are set using the syntax <var_name>=<value> There is no space on either side of the = sign! If the value contains multiple words, you can use double quotes " around the string.
You can read the value back using echo to print them:
[user1@ws01 ~]$ echo $t # This has not been set yet

[user1@ws01 ~]$ t="Hello friends"  # Assigns the value
[user1@ws01 ~]$ echo $t
Hello friends

When you want to print a string based on a variable, but with a suffix, it is necessary to delimitate the variable name you are using with curly brackets ${var}.

For example:
[user1@ws01 ~]$ my_file=firstfile
[user1@ws01 ~]$ echo $my_file
[user1@ws01 ~]$ echo $my_file_prev   # Unknown variable my_file_prev

[user1@ws01 ~]$ echo ${my_file}_prev # Will properly append

Environment variables

Shell variables are not seen by child processes (a program launched from the shell). For that purpose, Unix uses environment variables: they can be used by any program, and set from the command line. The common practice is to name them using SNAKE_CASE in capital letters. Some of them are predefined.
  • Setting an environment variable: use either export ENV_VAR_NAME=<value>
  • Reading an environment variable: use echo $ENV_VAR_NAME
  • To unset an environment variable: unset ENV_VAR_NAME
  • env shows all the defined variables
  • printenv shows all the defined variables, as well as macros

[user1@ws01 ~]$ echo $MY_SETTING # This has not been set yet

[user1@ws01 ~]$ export MY_SETTING=fast_computation  # Create the setting and give it a value
[user1@ws01 ~]$ echo $MY_SETTING
[user1@ws01 ~]$ export MY_SETTING=3.1416  # Another value
[user1@ws01 ~]$ echo $MY_SETTING
[user1@ws01 ~]$ env | grep MY_SETTING
[user1@ws01 ~]$ unset MY_SETTING
[user1@ws01 ~]$ echo $MY_SETTING # After it has been cleared

Predefined environment variables

  • PWD: the current directory
  • HOSTNAME: name of the computer
  • USERNAME, USER the username
  • HOME: the user's home folder, aka that's where you put your files

  • VISUAL, EDITOR the default editor used when files need to be changed
  • DISPLAY: used when remote programs display on that computer screen. Usually :0
  • PATH: directories containing executables (see details here)
  • LD_LIBRARY_PATH: directories containing dynamic libraries

If you are writing your own code in a compiled language, this is how compilers "know" where to find header files and libraries by default:
  • CPATH contains the list of directories containing header files known to the compilers (they do not need to be passed through -I)
  • LIBRARY_PATH contains the list of directories containing libraries known to the compilers (no need to use -L for these)

Permanent environment variables

You can keep your settings permanent by putting your export commands in ~/.bashrc or ~/.zshrc (depending on your shell). Along with other things like aliases.

More generally, if you want to keep some exports in a file which can later be recalled, you can use the source <filename> (or . <filename>) command which will execute the instructions contained in filename, while conserving the environment variables.

[user1@ws01 folder]$ cat 
export MY_VERSION=4.5
export MY_GCC=/usr/bin/gcc
[user1@ws01 folder]$ bash # Executing the code
[user1@ws01 folder]$ echo "$MY_VERSION $MY_GCC"

[user1@ws01 folder]$ source # Source'ing it
[user1@ws01 folder]$ echo "$MY_GCC $MY_VERSION"
/usr/bin/gcc 4.5

If you look in your home folder on a Unix system, you will notice several files like .bashrc and .bash_profile which contain environment variables settings. The reason is because there are different types of shells.
  • .bash_profile is sourced from interactive login shells (you log on the machine)
  • .bashrc is sourced from interactive non-login shells (eg: you open a new terminal)
It is common to source ~/.bashrc from .bash_profile to make sure the behavior of both types of shells is similar.


The instance of a running program is called a process. You can monitor what is currently running, as well as control the different processes (pending permissions!). Each process has a process identifier, called the pid.

Seeing all running processes

The ps command is used to check what processes are currently running.

Without arguments, the command shows only what was launched in your current terminal. To see more, you can then use:
  • ps au shows all the processes for the current user
  • ps aux shows all the processes for all the users

To see which processes use the most resources, use top, which will order them by CPU usage.

On modern Unixes, htop will show you how the processes are distributed among the different cores of a multicore computer.

To see the complete command line of a process, use ps p <PID> ww

Background and foreground processes

It can be useful to run a program in the background (aka non-interactively). For instance something that will take a lot of time and whose result can be redirected to a file. For that, simply launch your program followed by ampersand &. Eg: ./my_program &

Modifying running processes

To put an interactive program in the background, you can pause it using Ctrl-Z. Then use the command bg. This will be similar to having launched it using &. If you need to bring it back to the foreground, use the command fg

To end a process running in the foreground, you can stop it using Ctrl-C.

Programs can be stopped using kill <PID> where PID is the process ID (first column in ps). This will send a SIGINT command (Equivalent to Ctrl-C). If they are interactive, simply open a new terminal first.
If that is not enough, you can force with kill -9 <pid>

  • To stop multiple instances of the same program at once, use killall <program_name>
  • For GUIs, if you want to kill a stuck program, use xkill and click on one of its windows

Ctrl-D is interpreted as EOF (End Of File), and for shells will mean exit.

Programs always running even when logged out / a terminal is closed

Normally, when a program is running, it is stopped if you close the terminal that launched it, or log off the computer where it is executing.
If you want to make sure that it keeps running, you need to launch it using nohup. Typically this is used with &

Example: nohup ./my_long_program &

disown can be used if you have launched a program that you want to keep running after you terminate the shell it was launched from. The behavior is what would have happened if you had launched it using nohup.

Running programs with a lower priority

If you want to lower the priority of a process, you can prepend your command with nice, which means that if several programs are running at the same time, this one will run slower than those of higher priority. Example: nice ./heavy_duty_work

The /proc directory

If you look into /proc you will see that every single process running on your machine has a directory there, named by its pid. Inside each directory, you will find details about the process. For instance:
  • cmdline/ contains the complete command line which launched this process
  • The fd/ folder contains links to the files opened by the process
  • io/ contains stats about the input/output operations performed


Remote connection to another computer

Unix makes it very easy to remotely connect to another computer. This can be done in your local network, or to a computer on the other side of the planet.

In order to connect to a remote computer, you need to have an account there (aka a login), and use the ssh (secure shell) command.
You will then be prompted for your password on that system.
[user1@ws01 ~]$ ssh remote_username@remote_computer
remote_username@remote_computer's password: 
[remote_username@remote_computer ~]$ whoami

The ssh command can also be used in non-interactive mode, just to launch a command remotely on another computer. In this case, use the syntax
ssh <remote_username>@<remote_computer> <command>

You can set up password-less remote access using a public/private key: see the documentation for ssh-copy-id for details.

FI resources can be remotely accessed with these instructions

Copying files to/from another computer

In order to match cp, the network version scp uses similar arguments, but indicates the remote computer with :
  • scp <remote_user>@<remote_computer>:<remote_file> <local_path> will copy a file from a remote computer to the local one
  • scp <local_file> <remote_user>@<remote_computer>:<remote_path> will copy a file from the local computer to a remote one
  • The -r option provides recursive copy for directories

rsync is used to make sure that two files or directories are synchronized (it is available for both local or remote synchronizations). This will ensure that any new additions or modifications to the source directory are maintained (along with metadata). It is also able to restart if a transfer was previously interrupted.
  • rsync -a <source> <destination>
  • rsync -arz user2@server:/mnt/home/user2/data/ /home/user1/data --delete syncs the content of a remote directory from server to the local computer, recursively (-r), using compression (-z), and deleting removed files (--delete)
When synchronizing directories, the source ends with / while the destination does not!

Downloading files from the internet

wget <url> is used to download files from the internet, to the current directory. This works for both HTTP and FTP.

Checking network connectivity

ping <remote_computer> is used to check if a computer is responsive through the network. It can also be used to test that the current computer can access the network.
[user1@ws01 ~]$ ping
PING ( 56(84) bytes of data.
64 bytes from XXXX-YY-ZZZZ ( icmp_seq=1 ttl=115 time=1.04 ms
64 bytes from XXXX-YY-ZZZZ ( icmp_seq=2 ttl=115 time=0.992 ms
--- ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 0.930/0.977/1.035/0.040 ms

Note that sometimes administrators disable responses to ping for security reasons, so it might not always work!

Testing HTTP services ports on a server

NetCat (netcat, ncat, or nc) will try to connect to a remove computer (hostname) on the given port, using the TCP protocol, and report the result. This is useful to check if a service is open and accessible (for instance, not blocked by a firewall)
ncat -zv <hostname> <port>
-u can be used to check with the UDP protocol instead of TCP.
[user1@ws01 ~]$ ncat -zv 80  # Testing port 80
Ncat: Version 7.70 ( )
Ncat: Connected to # Success
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
[user1@ws01 ~]$ ncat -zv 809 # Testing port 809
Ncat: Version 7.70 ( )
Ncat: Connection refused.              # Port closed
[user1@ws01 ~]$ ncat -zv 803 # Testing port 803
Ncat: Version 7.70 ( )
Ncat: Connection timed out.            # Port not used

Testing low level communication across the network

telnet <hostname> [port] is a low level tool to communicate with a remote computer.
It is often used to check the responses to simple commands, for instance for http servers.

How do Unixes find executables?

Binary locations

If you try and run a program that is located inside the folder you are currently in by just using its name, you will notice that you usually get the error command not found. However using ./ in front of it works. This is because Unix needs complete paths for executables. See the following example:

[user1@ws01 bin]$ ls # Let's check what files are in this directory
[user1@ws01 bin]$ hello_world # Launch the program
bash: hello_world: command not found
[user1@ws01 bin]$ ./hello_world
Hello World # It worked!

However, commands like ls, cat can be accessed from anywhere without giving their complete path, how is that possible? This is because of the PATH environment variable, which lets Unix know that the executables in a directory should be accessible system-wide. You can see what its current value is, and prepend to it to add a new folder, with a colon : to separate different directories. For instance, let's make our previous test program accessible.

[user1@ws01 bin]$ pwd
[user1@ws01 bin]$ ls
[user1@ws01 bin]$ hello_world
bash: hello_world: command not found
[user1@ws01 bin]$ echo $PATH
/usr/local/bin:/usr/bin:/bin  # aka /usr/local/bin, /usr/bin, /bin
[user1@ws01 bin]$ export PATH=/tmp/folder/code/bin:$PATH # Do not forget to put $PATH at the end, otherwise you will lose standard commands!
[user1@ws01 bin]$ echo $PATH
[user1@ws01 bin]$ hello_world 
Hello World

But what if you have several executables with the same name? This is where the directories order in PATH matters: Unix will use the first one it finds. That's why we always prepend to PATH rather than append to it. With our example, if we have a second executable called hello_world, but located in a different folder, and both of these folders are in PATH, we can use the command which to show which of the two will be called when not giving the full path!

[user1@ws01 code]$ ls
bin  bin2  include  lib  src
[user1@ws01 code]$ ls bin
[user1@ws01 code]$ ls bin2
[user1@ws01 code]$ which hello_world # Which version is used?
[user1@ws01 code]$ export PATH=/tmp/folder/code/bin2:$PATH  # Prepend the location of the new executable
[user1@ws01 code]$ echo $PATH
[user1@ws01 code]$ which hello_world # Which version is used?


A program is often not simply an executable, but a set of (dynamic) libraries. You might get the error: error while loading shared libraries .so when trying to run a program. This means that one of the dynamic libraries used by the executable you are trying to run cannot be found.

Just like executables are found using PATH, Unix uses the environment variable LD_LIBRARY_PATH to find libraries. And its use is similar to PATH: you can just prepend to it.

[user1@ws01 tests]$ ls
hello_world.c  test_db  test_db.c  test_db.o
[user1@ws01 tests]$ ./test_db # We want to run test_db
./test_db: error while loading shared libraries: cannot open shared object file: No such file or directory
[user1@ws01 tests]$ ls ../../lib/ # The library is here
[user1@ws01 tests]$ export LD_LIBRARY_PATH=/tmp/folder/code/lib/:$LD_LIBRARY_PATH
[user1@ws01 tests]$ ./test_db 
Initializing database
Database initialized

If you have multiple versions of the same library, the one that comes first in LD_LIBRARY_PATH will be used. You can check what dynamic libraries are used by an executable with ldd

[user1@ws01 tests]$ ldd test_db =>  (0x00007fff22531000) => /tmp/folder/code/lib/ (0x00007f063e27b000) => /lib64/ (0x00007f063dead000)
	/lib64/ (0x00007f063e47d000)

If you are using a Modules Environment, what they do is actually set the PATH and LD_LIBRARY_PATH environment variables for the software you are using

Scripting: programming with shell

Sequences of commands in Unix

In Unix, different commands can be executed sequentially (one after the other) when typed on the same line.
  • You can use the semicolon ; to separate each command: a; b runs command a and then command b
  • You can use the logical AND && to conditionally run each command only if the previous command succeeds (the return code is 0). a && b runs command a, and then only runs command b if a succeeds


Redirection to files

You can redirect the output of a command to a file using the redirection operators > and >>
  • command > filename will create a new file called filename and the text returned by command will be written there
  • command >> filename will append the result of the command to filename, or create it if it does not exist

Commands and well-behaved programs actually print outputs and errors in two different places:
  • stdout is used for the results of the operation
  • stderr is used for error messages, warnings, information

If you use > or >> to redirect a command, you are actually only redirecting stdout, while on your terminal you see both. The following operators let you redirect both (these will overwrite the files: you can use >> to append):
  • > redirects stdout
  • 2> redirects stderr
  • &> redirects both of them, as they would appear in a terminal

A very common way to use redirection is to separately redirect to two files:
[user1@ws01 ~]$ command > command.out 2> command.err

Similarly, you can use the operator < to give an input file to a command. This corresponds to writing to stdin, which is the way arguments are read as strings.

If you do not want to keep the output printed to the terminal, you can redirect it to /dev/null. This will be faster than writing to a file and then erasing it. Very useful if the output of a command is somewhat verbose.

Chaining commands with pipe

It can be useful to chain commands, reusing the output of one as the input to another. This is done using the pipe operator |, which practically consists in redirecting the output of one command as the input to the next one. The commands are evaluated from left to right

[user1@ws01 ~]$ command1 | command2 | command3 | command4

Storing an output in a variable

You can use backticks `<command>` to store the output of a command in a variable for later use. For example:

[user1@ws01 folder]$ my_var=`ls -l`
[user1@ws01 folder]$ echo $my_var # New lines are replaced by spaces!
total 0 drwxrwxr-x 7 user1 group1 66 Jun 24 12:23 code drwxrwxr-x 5 user1 group1 47 Jun 23 14:38 data -rw-r--r-- 1 user1 group1  0 Jun 24 17:03 README.txt

Similarly, the syntax $(<command>) can be used. It is especially useful as it can be nested. The innermost $( ) are executed first.

For example, to see all the executables located in the same directory as mpirun, one may write:
[user1@ws01 ~]$ ls $(dirname $(which mpirun))  mpicc  mpicxx   mpif77  mpifort  ompi-clean  ompi-server   ortecc      orted      orterun      oshc++  oshCC   oshfort      oshrun          shmemc++  shmemCC   shmemfort
mpic++                mpiCC  mpiexec  mpif90  mpirun   ompi_info   opal_wrapper  orte-clean  orte-info  orte-server  oshcc   oshcxx  oshmem_info  shmemcc   shmemcxx  shmemrun

Simple arithmetic operations

Simple mathematical operations can be performed in shell using the syntax $((<operations>))
[user1@ws01 ~]$ echo $((3*6 - 1 + 5**2)) # ** is the exponentiation

Flow control

Sometimes you want to make more complex operations, that repeat themselves, or act differently based on a test. Shells have flow control for that


Loops are performed using the sequence for do done

[user1@ws01 ~]$ for prime in 1 2 3 5 7 11
> do
> echo "A prime: $prime"
> done
A prime: 1
A prime: 2
A prime: 3
A prime: 5
A prime: 7
A prime: 11
[user1@ws01 ~]$ for vowel in a e i o u y; do echo "m$vowel"; done # as a single line

zsh comes with a powerful concept called shell globbing (globs), which can be used to iterate over sub-directories. To enable it with bash, use the command shopt -s globstar first.

The syntax to use globs is **, and a typical usage will look like this:
# This will iterate over all the .txt files
#   in the current directory and its subdirectories
for f in **/*.txt
    echo $f

Tests and branching

Tests are performed with if then elif else fi. Tests themselves are between double brackets [[ <test> ]]
if [[ $i -eq 0 ]]; then # EQual for numbers
   echo "Null value"
elif [[ $i -lt 5 ]]; then # Less Than
   echo "The number is 5"
   echo "The number is larger than 5"

To avoid cascading elif, shells provide an equivalent to other languages switch/case with case in ) ;; esac
case "$animal_name" in
    echo "You might have a dog"
    echo "You might have a cat"
    echo "You might have a frog"
    echo "Sorry I cannot guess what animal you have"

Script files

Instead of entering your commands one line at a time, you can create scripts, which are text files telling the shell to execute a sequence of operations. They usually have the extension .sh
Let's create

# The first line above is called the shebang, and tells Unix how to interpret the instructions contained in that file
# Comments start with the pound/hashtag sign

echo "Hello World!"

To launch such a script, after making it executable (chmod a+x, you can either give its path (if in the same folder ./ or prepend its directory path to PATH
[user1@ws01 ~]$ ./
Hello World!

Shells pre-define several variables that are very useful in scripts:
  • $0, $1, … are the elements passed on the command line, where $0 is the script
  • $# is the number of elements in the command line
  • $* and $@ are the arguments
  • $? is the return value of the latest issued command (0 if no error)

Scheduling automated operations

crontab is used to schedule the launch of commands periodically using the cron daemon, for instance running a back-up every day at midnight. Each user has their own cron jobs.
  • crontab -e lets you edit the schedule
  • crontab -l shows you the schedule
Editing the file is done with vi (unless you have set the environment variable VISUAL), and uses the following format (a * means all the possible values):

# You can put comments using hash/pound
Minute  Hour(0-23)  Day of Month  Month  Day of week   Command
# Running a backup every day at 11pm
0           23           *          *         *    /home/user1/
# Running a sync every Monday at noon
0           12           *          *         1      /home/user1/

Graphical interfaces

X servers

You will often hear references to "X" or "X11" when talking about graphical interfaces in Unix. This is because in order for GUI-based programs to be used, an X-server has to be running on the local machine you are using (which is the default for desktop computers).

Some useful commands to check if X is running:
[user1@ws01 ~]$ xterm  # Opens a new terminal
[user1@ws01 ~]$ xclock # Displays a clock

Running graphical interfaces remotely

If you want to remotely run graphical based applications on another computer, but with the GUIs displayed on your local screen, you should use ssh -X or ssh -Y. This will ensure that the remote computer knows where to display the GUIs.

For example:
[user1@ws01 ~]$ ssh -Y remote_username@remote_computer
remote_username@remote_computer's password: 
[remote_username@remote_computer ~]$ xterm # this will launch a new terminal which will be opened on your local computer, but whose commands are executed on the remote one

Some notes about Unix administration


a program that runs as a background process
a program that gets requests from other processes. Usually implemented as a daemon. Eg: sshd

Common directories

The usual directories you will see in the Unix root filesystem are the following:
├── boot  # This is where the Linux kernel lives
├── dev   # Used to access device drivers
├── etc   # Used to store configuration files
├── home  # home directories of users on the local machine
│   ├── user1
│   ├── user2
│   └── user3
├── mnt   # Filesystems mounted from other sources: network-based, USB …
├── opt   # Contains software packages
├── proc  # Information about the system and running processes
├── root  # The files of the administrator
├── tmp   # The scratch area on a local disk
├── usr          # files that come with the Operating System
│   ├── bin      # programs and commands
│   ├── include  # include files when programming
│   ├── lib      # libraries
│   ├── lib64    # 64-bit libraries
│   └── local    # more bin, include, lib, lib64, added by local admin
└── var
    └── log      # Most programs running as services will write logs here

System information

The following commands are useful to know more about your system.

uname gives you the name of your operating system, and is often used to get details about which kernel is running with uname -r

  • cat /etc/os-release give you the exact version of the Linux distribution that is currently running
  • lscpu and cat /proc/cpuinfo give you information about the processors in your system: number, brand, model, capabilities
  • cat /proc/meminfo contains the information about the memory
  • lspci shows you the complete list of devices in your computer
  • lsmod shows you the list of drivers loaded by the OS
  • dmesg shows you the system log file, starting with the most recent reboot of the system (timestamp 0). Useful to debug hardware problems.
    As root, you can find that information in /var/log/messages

Filesystems information

To know about disk space usage, you can use the following commands:
  • df gives you the storage usage per disk
  • du -h -d 1 gives you the usage for a user in a given directory and its sub-directories
    du -h -d 1 | sort -h will order them by size

mount is used to show the different filesystems (storage: "hard drives") which are currently made available to the users. They can consist in physical drives, logical drives (eg: ramdisk), network-mounted storage.

mount is also used to modify mounting points. But they will not survive a reboot.

/etc/fstab contains the list of mounting points that will be automatically mounted when the Operating System starts, aka these settings are permanent.

Package managers

Different distributions have different package managers. They are the programs in charge of installing and updating new applications, libraries, tools, and solving dependencies between them.

  • RedHat family use yum
    yum info <package_name> is used to query if a package exists or is installed (can use regular expressions)
    yum info <package_name> installed a package Need to be root
  • Debian and related use apt
    apt info <package_name> is used to query packages
    apt install <package_name> installed a package Need to be root

At the low level, RedHat packages are found inside .rpm archives. Try and avoid working with these directly as manually finding dependencies can be tedious.

If a software package is only distributed as an RPM and you need to extract what is inside, you can use the rpm2cpio command piped into cpio

[user1@ws01 ~]$ rpm2cpio ./packagename-version.x86_64.rpm | cpio -idmv

Similarly, Debian distributions rely on .deb archives. Which you can extract using the ar command first, then tar.

[user1@ws01 deb_extraction]$ ls
[user1@ws01 deb_extraction]$ ar x fahclient.deb 
[user1@ws01 deb_extraction]$ ls -l
total 6388
-rw-r--r-- 1 user1 group1    2372 Jul 22 13:58 control.tar.xz
-rw-r--r-- 1 user1 group1 3263500 Jul 22 13:58 data.tar.xz
-rw-r--r-- 1 user1 group1       4 Jul 22 13:58 debian-binary
-rw-rw-r-- 1 user1 group1 3266064 Oct 23  2020 fahclient.deb
# data.tar.xz contains the files we are interested in
[user1@ws01 deb_extraction]$ tar xvf data.tar.xz 

Running programs as a different user

Switching users

su - [username] can be used to change to another user. If no username is specified, root is assumed. You will be prompted to enter the password of the target account, unless you are root.
The optional argument - is used to specify you want to use a login shell with their environment (aka it will clear up your own environment first)

Launching a program as another user

sudo <command> can be used to launch a command as another user. This is often used to give admin access to a user for only a subset of commands, without giving them the root password.

For users who are authorized to run as root, a common practice is to use sudo su - <username> to become user username.

Launching the same commands on multiple computers

Using pdsh (parallel commands to remote hosts) is used to send the same command to different remote computers. It is often used with computers that have logical names, for instance using ranges of numbers to identify them.

The following would query the computers ws0, ws1, ws2, ws3 about the root filesystem /. Note that this operation is done in parallel, so results are coming back without following the alphanumerical order!

[root@workstation ~]$ pdsh -w ws[0-3] df -H /
ws2: Filesystem      Size  Used Avail Use% Mounted on
ws2: /dev/sda1        33G   18G   15G  55% /
ws1: Filesystem      Size  Used Avail Use% Mounted on
ws1: /dev/sda1        33G   18G   15G  55% /
ws3: Filesystem      Size  Used Avail Use% Mounted on
ws3: /dev/sdb1        33G   18G   15G  55% /
ws0: Filesystem      Size  Used Avail Use% Mounted on
ws0: /dev/sda1        33G   18G   15G  55% /

Sciware logo Sciware

Do you want to know more? Sciware-23 was about the topic of "Command line and shell interaction". See the slides here
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Flatiron Institute Documentation Center? Send feedback
This website is using cookies. More info. That's Fine