* Set STOPWIKIWORDLINK = RedHat,CentOS,MacOS
Linux is the most common Operating System (OS) you will find in scientific and technical computing. It is a version of Unix. MacOS is closely related, so what is written here can mostly be applied as well, due to the similarity of all Unixes as far as the end users are concerned.
This documentation is meant to educate users of different levels, while organizing information by categories. As a result, each level is color-coded, and users may choose to focus on one color and skip the others. The overall goal is not to teach everything about every single command, but rather show what kind of tools are available.
Before we get started, some notes about this document:
A green background is the bare minimum any user should know: Unix for beginners
A yellow background is for regular Unix users who want better insight on a topic
Advanced users can learn new information about administration and programming
Commands are shown using the format
command [option(s)] <mandatory argument(s)>
Anything in italics should be changed when you use it! The characters <, >, [, ] should be omitted
key press
are key combinations. Multiple keys pressed together are separated by
-
^Key
is the same as
Ctrl-Key
Information specific to Linux is preceded by Tux, the Linux Penguin logo.
Information specific to MacOS is preceded by the Apple logo.
Flatiron-specific information appears in blue tip boxes
Unix, Linux?
Unix is a family of Operating Systems that have been around since 1969. They are by far the most common OS in use in servers, but also consumer computers (MacOS, WSL inside Windows), and even cell phones (Android, iOS).
What is Linux?
Linux was first developed in the early 1990s by Linus Torvalds, as a way to provide an open-source Unix-like operating system. It spread on Intel-like processors, as an alternative to Windows, and eventually became the de-facto Operating System for servers. With time, it replaced vendor-specific OS'es as well.
Linux distributions?
At FI, we use Rocky Linux 8 on the workstations and cluster nodes
The terminal
The most basic way to use Unix is through its
terminal (or
console). Do not be scared, programs and tools come with help!
A quick note about the colors/styles in this document for the examples in terminal windows:
[this@is the_prompt]$ This_is_a_command_you_would_type # Short description
And here is what Unix would respond to your command
Connecting to a computer
In order to log into a computer, you will need a username and a password. Typically, you will see a prompt asking you for your username (login) and once entered, you will be prompted for a password. In Unix, every user has their own account (username/login). The administrator user (superuser) is called root.
Shell
Once you are logged in, you will see the
prompt for your
shell environment. This is where you will enter commands to launch programs, access files, …
The prompt will vary based on your shell/distribution, and will usually display your login, the name of the computer you are using, and the directory you are currently in.
In the examples, we will assume a user called
user1 connected to the computer
ws01. Different examples of prompt:
[user1@ws01 ~]$ Standard with bash on CentOS and Rocky
$ Simple version
user1@ws01:~$ On Ubuntu and Debian
ws01:~ user1$ zsh on MacOS
A shell can be closed using exit
or logout
. For GUI-based terminals, this will close the terminal.
Internally, what shells do is let users interact with the kernel of the Operating System.
They are interpreted programming languages, making them powerful tools if you take the time to study them carefully.
Personalizing the shell
alias <shortcut>='<command>'
are shortcuts to commands with arguments.
Common ones are setup by default by most Unix installations, you see colors with ls
because it is an alias to ls --color=auto
To see the list of aliases for your account, simply type the command alias
To test a different shell, you can launch it in your command line: bash
, zsh
etc
If you want to make the change permanent, you can use chsh -s <shell>
If you have an account at FI, you can change the default shell by using
FIDO
You can change the way your prompt looks using the PS1
environment variable. There are many guides about that.
Tips about the terminal
- When you start typing a command, you can press
Tab
, which will auto-complete with the known commands and sub-directories that start with these characters
- You can copy-paste by selecting text with your mouse, and then middle-click. This usually works anywhere in Unix, including text editors
- To see past commands, you can use the arrow keys
↑
and ↓
- To see the complete list, you can use
history
. You will notice that each line has a number. To repeat an existing command, use !
with that number. Eg: !123
- To find commands you have previously used,
Ctrl-R
will show you the most recent version, with auto-completion as you start typing
Getting help
Man pages
Standard Unix commands all come with
man (manual) pages accessed with
man <command>
You can then navigate using the arrow keys, or search using
/<pattern>
[user1@ws01 ~]$ man rm
RM(1) User Commands RM(1)
NAME
rm - remove files or directories
SYNOPSIS
rm [OPTION]... [FILE]...
DESCRIPTION
…
OPTIONS
…
SEE ALSO
…
The
SEE ALSO section is extremely useful to find related commands
If you are programming in C, you can also use manual pages which detail APIs.
Eg: man fprintf
There are different categories of man pages. They correspond to the number in parenthesis after the page name. The most common ones are:
- (1) are commands and programs you can execute
- (2) are for system calls
- (3) correspond to functions from the C APIs
Command-line options
For most applications, there are often other forms of help from the command line, usually by passing the arguments
help
,
-h
, or
--help
. Example:
[user1@ws01 ~]$ gcc --help
Usage: gcc [options] file…
Options:
-pass-exit-codes Exit with highest error code from a phase.
--help Display this information.
…
Files: Unix Filesystem tree
Just like for any other Operating System, files are organized inside directories (folders) in a tree structure where
/
is the root. In the following examples, we will use:
/tmp
├── code
│ ├── bin
│ │ ├── hello_world
│ │ └── test_db
│ ├── bin2
│ │ └── hello_world
│ ├── include
│ │ └── database
│ │ └── db.h
│ ├── lib
│ │ └── libdb_api.so
│ └── src
│ ├── database
│ │ ├── db.c
│ │ └── db.o
│ └── tests
│ ├── hello_world.c
│ ├── test_db.c
│ └── test_db.o
├── data
└── README.txt
Anywhere you are, there are special directories:
-
.
is the current directory
-
..
is the folder parent directory
-
/
is the root directory
-
~
is your home directory (eg: /home/user1
)
The formatted representation above is the output from the Linux command tree
Listing files
ls
lists all the files and folders located at your current location (path) in the tree
[user1@ws01 folder]$ ls
code data README.txt
Seeing the current path
Most shell prompts show the current directory, but
pwd
provides the complete path starting from the root
/
[user1@ws01 folder]$ pwd
/tmp/folder
In Unix, forward slashes
/
are used to delimitate sub-directories.
Navigating the tree
To change location, you can simply use
cd
(change directory). This can be absolute (starting at
/
) or relative.
You can go "up the tree" by using
..
cd -
returns to the previous directory you were in.
Examples:
[user1@ws01 folder]$ pwd
/tmp/folder
[user1@ws01 folder]$ cd code/ # Pressing tab here auto-completes to show the subfolders
bin/ bin2/ include/ lib/ src/
[user1@ws01 folder]$ cd code/include/database/
[user1@ws01 database]$ pwd
/tmp/folder/code/include/database
[user1@ws01 database]$ cd ../../src/
[user1@ws01 src]$ pwd
/tmp/folder/code/src
[user1@ws01 src]$ cd /tmp/folder/data # An absolute path
[user1@ws01 data]$ pwd
/tmp/folder/data
Files and directories operations
Working with files
-
mv <filename> <new_name_or_destination>
renames or moves a file
-
cp <source> <destination>
copies a file
-
rm <filename>
erases a file Caution: there is no undo!
-
rmdir <dirname>
removes an empty directory
-
rm -r <dirname>
removes directories recursively Use with caution
Operations taking the -r
option will be performed recursively (on all the sub-directories)
-
cp -p <filename>
copies a file while preserving the metadata (ownership and timestamp)
-
touch -d @<epoch_time> <filename>
sets a file date, time to the given value
File types
There are several standard extensions for file names in Unix:
-
.sh
are shell scripts
-
.py
are python scripts
-
.conf
are configuration files
-
.o
files are object files, compiled from a C, C++, Fortran program
-
.so
are dynamically loaded libraries
-
.a
are statically loaded libraries (they will be embedded in executables)
- And of course, the standard
.txt
, .jpg
, .png
, …
Links
Looking for and finding files
find
is a powerful command that can be used with the
-exec
optional argument to run code on the search result. For instance, this used to be the portable way to find a string contained in the files in a tree (the
{}
refer to an individual result from
find
):
find <directory> -name <filename_pattern> -exec grep -n <needle> {} /dev/null \;
For example, this would be used to find all the include's in C files.
[u@w folder]$ find . -name "*.c" -exec grep -n include {} /dev/null \;
./code/src/tests/hello_world.c:1:#include <stdio.h>
./code/src/tests/test_db.c:1:#include <stdio.h>
./code/src/tests/test_db.c:2:#include "db.h"
./code/src/database/db.c:1:#include "db.h"
A more modern version would be:
grep -rn --include=<filename_pattern> <directory> -e '<pattern>'
If installed, the command locate <pattern>
will look for a file by a portion of its name. This relies on a database to be rebuilt periodically, and recent files will not appear in the results!
File permissions
Permissions and groups
Unix is multi-user, and every file and directory has
permissions (rights). Those are:
- read: the ability to read a file
- write: the ability to write/modify/erase a file
- execute: the ability to execute a file (eg: program), or access the content of directories
These apply to three entities: user, group, other
- user is the owner of the file
- group is the group that file is assigned to
- other is anyone else
Unix
groups are used to share data between users. A user may be part of any number of groups. To see which groups you are part of, use the command
groups
Seeing permissions
To see the permissions for a given set of files, use
ls -l
The format you will see is:
[ permissions ]
? rwx rwx rwx username usergroup size date filename
type user group other file's owner file's group
If there is a
-
, it means the permission is not granted. Remember that for someone to be allowed to enter a directory, they need both
read and e
xecute permissions (at least
r-x
)
[user1@ws01 folder]$ ls -l
total 0
drwxrwxr-x 7 user1 group1 66 Jun 24 12:23 code
drwxrwxr-x 5 user1 group1 47 Jun 23 14:38 data
-rw-r--r-- 1 user1 group1 0 Jun 24 17:03 README.txt
Changing permissions
chmod
is used to grant (
+
) or remove (
-
) permissions:
chmod <entity_they_apply_to>[-|+]<permissions_to_change> <filename>
You can use
a
(all) instead of
ugo
to set permissions for everyone.
With the example above:
[user1@ws01 folder]$ chmod go-rwx data # Only the user will be able to read that folder
[user1@ws01 folder]$ chmod g+w README.txt # members of that group will be able to modify that file
[user1@ws01 folder]$ ls -l
total 0
drwxrwxr-x 7 user1 group1 66 Jun 24 12:23 code
drwx------ 5 user1 group1 47 Jun 23 14:38 data
-rw-rw-r-- 1 user1 group1 0 Jun 24 17:03 README.txt
Permission masks
Changing owner or group
It is often desirable to change a whole tree (directory and its subdirectories) and assign it to a new owner and group. For this purpose, use:
chown -R <username>:<group> <directory>
Applying default permissions
If you want to have a directory, where all the newly created files have the same group as the directory rather than the default, use
chmod g+s
.
[user1@ws01 folder]$ mkdir foo
[user1@ws01 folder]$ ls -ld foo # ls -d only shows the directory
drwxrwxr-x 2 user1 group1 6 Jul 24 17:20 foo
[user1@ws01 folder]$ chgrp group2 foo
[user1@ws01 folder]$ ls -ld foo
drwxrwxr-x 2 user1 group2 6 Jul 24 17:20 foo
[user1@ws01 folder]$ cd foo
[user1@ws01 foo]$ touch bar1
[user1@ws01 foo]$ ls -l
total 0 # The new file's permissions are using the default group
-rw-rw-r-- 1 user1 group1 0 Jul 24 17:20 bar1
[user1@ws01 foo]$ cd ..
[user1@ws01 folder]$ chmod g+s foo
[user1@ws01 folder]$ ls -ld foo
drwxrwsr-x 1 user1 group2 6 Jul 24 17:20 foo
[user1@ws01 folder]$ cd foo
[user1@ws01 foo]$ touch bar2
[user1@ws01 foo]$ ls -l
total 0 # The new file's permissions are the same as the directory
-rw-rw-r-- 1 user1 group1 0 Jul 24 17:20 bar1
-rw-rw-r-- 1 user1 group2 0 Jul 24 17:20 bar2
Text files
Text files content
How to see the content of a file?
-
cat <filename(s)>
displays the content of one or several files in the terminal, (be careful with large files): very useful to concatenate multiple files together
-
more <filename>
and less <filename>
display the content, but let you navigate in the file (Enter
, ↑
, ↓
, d
, Pg up
, Pg dn
), and search using /<pattern>
Standard text editors
vi / vim
This is the text editor you will find in any Unix installation. It might look scary at first, but recent versions have greatly increased its usability.
To open or create a new file use
vi <filename>
vi cheat sheet
i insert mode |
Esc exit current mode |
:q quit |
:q! force quit |
:w write to file |
:x is a shortcut for :wq |
o creates a new line |
O new line before current |
I insert at beginning of line |
A insert at end of line |
J merges two lines |
|
:N jump to line N |
G jump to last line |
yy copies current line |
:yN copies N lines |
cc cuts current line |
:cN cuts N lines |
p paste |
:u undo |
dd deletes current line |
:dN deletes N lines |
/pattern searches pattern / next occurrence ? previous |
|
emacs
Emacs should also be installed on any system. To launch it, use:
[user1@ws01 ~]$ emacs # To open an empty editor
[user1@ws01 ~]$ emacs file.ext # will open the specified file
emacs cheat sheet
Once inside the editor, all the commands are usually accessible using the Ctrl
key.
Ctrl-x Ctrl-c Exits the program |
Ctrl-x Ctrl-f Opens a file |
Ctrl-x Ctrl-s saves a file |
Ctrl-x Ctrl-w saves as a new file |
Ctrl-x u undo |
|
nano
Nano is a simple text editor, with inline help at the bottom of the screen.
[user1@ws01 ~]$ nano # To create an empty buffer
[user1@ws01 ~]$ nano filename.ext # To open an existing file
Text files tools
Getting stats about a text file
Filtering text
When dealing with large files, using
grep -n
will show you the line numbers that have matched the search pattern.
It is often useful to know the context around the line you are looking for, in this case you can use the
-C<N>
flag, which will show
N lines per match (including the lines leading to the matching line, and the ones after).
On the other hand, sometimes you do not want certain lines (eg: in log files) in this case use
grep -v <pattern_to_reject>
Finally, you can use regular expressions using
grep -E <regular_expression>
When looking for something in compressed text files, not need to uncompress them first. Use instead zgrep
, bzgrep
, xzgrep
, or lzgrep
which can be used for (respectively) .gz
, .bz2
, .xz
, .lz
files
Replacing text
If you are using vi
, you can use the sed
syntax as you are editing a file to replace a string by another. The command is (within vi
):
:%s s/<string1>/<string2>/gc
The
tr
command is used to delete or replace characters.
Example:
[user1@ws01 input]$ echo "tata" | tr a o # replaces a's with o's
toto
[user1@ws01 input]$ echo "tata" | tr -d a # deletes a's
tt
Sorting lines in a file
Finding differences between files
If you want to see how two text files differ, you can use the command:
diff <filename1> <filename2>
diff
can also be used to generate patches (code fixes), which can be applied by other users to benefit from your changes.
- Creating a patch:
diff -Naur oldfile newfile > patchfile
- Applying the patch:
patch < patchfile
in the directory of the file to be patched
When downloading source code, you will often see an
MD5 Checksum on the download page. This is to ensure that the file you obtained has not been corrupted. To calculate the checksum, use
md5sum <filename>
This is a costly operation
To compare two files byte by byte, use
cmp <file1> <file2>
No output means the files are identical.
Archives
You might often see data or programs being distributed as
.tar.gz
or
.tgz
files (referred to as "tarballs"). Those are
compressed (
.gz
, but can also be
.bz2
,
.xz
)
archives (
.tar
).
An archive contains a whole tree structure (files and folders), which, when created, will conserve the
metadata (date, permissions), enabling to easily distribute an exact copy of a work environment.
Opening an archive
You can uncompress and untar in a single instruction (the optional
v
stands for "view". For large archives, you should omit it):
-
.tar.gz
files: tar xzvf <archive_name>.tar.gz
-
.tar.bz2
files: tar xjvf <archive_name>.tar.bz2
-
.tar.xz
files: tar xJvf <archive_name>.tar.xz
If you want to process in two steps, uncompress then untar:
- Uncompress files using
gunzip
, bunzip2
, unxz
- To extract all the files from an archive, use
tar xvf <archive_name>.tar
Example, after having copied the original
.tar.gz
file to another computer:
[user2@ws02 ~]$ ls
folder.tar.gz
[user2@ws02 ~]$ tar xzvf folder.tar.gz
folder/
folder/code/
folder/code/bin2/
folder/code/bin2/hello_world
folder/code/src/
folder/code/src/tests/
folder/code/src/tests/hello_world.c
…
[user2@ws02 ~]$ ls
folder folder.tar.gz
[user2@ws02 ~]$ ls -l folder/code/bin
total 17 # Metadata preserved
-rwxrwxr-x 1 user2 group2 8120 Jun 24 11:39 hello_world
-rwx------ 1 user2 group2 8200 Jun 24 16:23 test_db
Creating an archive
Variables and environment
Shell variables
Variables (used to store a value) in shell are set using the syntax
<var_name>=<value>
There is no space on either side of the =
sign! If the value contains multiple words, you can use double quotes
"
around the string.
You can read the value back using
echo
to print them:
[user1@ws01 ~]$ echo $t # This has not been set yet
[user1@ws01 ~]$ t="Hello friends" # Assigns the value
[user1@ws01 ~]$ echo $t
Hello friends
Environment variables
Shell variables are not seen by child processes (a program launched from the shell). For that purpose, Unix uses
environment variables: they can be used by any program, and set from the command line. The common practice is to name them using SNAKE_CASE in capital letters. Some of them are predefined.
- Setting an environment variable: use either
export ENV_VAR_NAME=<value>
- Reading an environment variable: use
echo $ENV_VAR_NAME
- To unset an environment variable:
unset ENV_VAR_NAME
-
env
shows all the defined variables
-
printenv
shows all the defined variables, as well as macros
Example:
[user1@ws01 ~]$ echo $MY_SETTING # This has not been set yet
[user1@ws01 ~]$ export MY_SETTING=fast_computation # Create the setting and give it a value
[user1@ws01 ~]$ echo $MY_SETTING
fast_computation
[user1@ws01 ~]$ export MY_SETTING=3.1416 # Another value
[user1@ws01 ~]$ echo $MY_SETTING
3.1416
[user1@ws01 ~]$ env | grep MY_SETTING
MY_SETTING=3.1416
[user1@ws01 ~]$ unset MY_SETTING
[user1@ws01 ~]$ echo $MY_SETTING # After it has been cleared
Predefined environment variables
-
PWD
: the current directory
-
HOSTNAME
: name of the computer
-
USERNAME
, USER
the username
-
HOME
: the user's home folder, aka that's where you put your files
If you are writing your own code in a compiled language, this is how compilers "know" where to find header files and libraries by default:
-
CPATH
contains the list of directories containing header files known to the compilers (they do not need to be passed through -I
)
-
LIBRARY_PATH
contains the list of directories containing libraries known to the compilers (no need to use -L
for these)
Permanent environment variables
If you look in your home folder on a Unix system, you will notice several files like
.bashrc
and
.bash_profile
which contain environment variables settings. The reason is because there are different types of shells.
-
.bash_profile
is sourced from interactive login shells (you log on the machine)
-
.bashrc
is sourced from interactive non-login shells (eg: you open a new terminal)
It is common to
source
~/.bashrc
from
.bash_profile
to make sure the behavior of both types of shells is similar.
Processes
The instance of a running program is called a process. You can monitor what is currently running, as well as control the different processes (pending permissions!). Each process has a process identifier, called the pid.
Seeing all running processes
The
ps
command is used to check what processes are currently running.
Without arguments, the command shows only what was launched in your current terminal.
To see more, you can then use:
-
ps au
shows all the processes for the current user
-
ps aux
shows all the processes for all the users
To see which processes use the most resources, use
top
, which will order them by CPU usage.
On modern Unixes, htop
will show you how the processes are distributed among the different cores of a multicore computer.
To see the complete command line of a process, use ps p <PID> ww
Background and foreground processes
It can be useful to run a program in the background (aka non-interactively). For instance something that will take a lot of time and whose result can be redirected to a file. For that, simply launch your program followed by ampersand &
. Eg: ./my_program &
Modifying running processes
To put an interactive program in the background, you can pause it using Ctrl-Z
. Then use the command bg
. This will be similar to having launched it using &
. If you need to bring it back to the foreground, use the command fg
- To stop multiple instances of the same program at once, use
killall <program_name>
- For GUIs, if you want to kill a stuck program, use
xkill
and click on one of its windows
Ctrl-D
is interpreted as EOF
(End Of File), and for shells will mean exit.
Programs always running even when logged out / a terminal is closed
disown
can be used if you have launched a program that you want to keep running after you terminate the shell it was launched from. The behavior is what would have happened if you had launched it using nohup
.
Running programs with a lower priority
If you want to lower the priority of a process, you can prepend your command with nice
, which means that if several programs are running at the same time, this one will run slower than those of higher priority. Example: nice ./heavy_duty_work
The /proc directory
If you look into
/proc
you will see that every single process running on your machine has a directory there, named by its
pid. Inside each directory, you will find details about the process. For instance:
-
cmdline/
contains the complete command line which launched this process
- The
fd/
folder contains links to the files opened by the process
-
io/
contains stats about the input/output operations performed
Networking
Remote connection to another computer
Unix makes it very easy to remotely connect to another computer. This can be done in your local network, or to a computer on the other side of the planet.
In order to connect to a remote computer, you need to have an account there (aka a login), and use the
ssh
(secure shell) command.
You will then be prompted for your password on that system.
[user1@ws01 ~]$ ssh remote_username@remote_computer
remote_username@remote_computer's password:
[remote_username@remote_computer ~]$ whoami
remote_username
The ssh
command can also be used in non-interactive mode, just to launch a command remotely on another computer. In this case, use the syntax
ssh <remote_username>@<remote_computer> <command>
You can set up password-less remote access using a public/private key: see the documentation for ssh-copy-id
for details.
If a command inside an ssh terminal appears to freeze, you can disconnect using the sequence of keys: Enter
~
.
Copying files to/from another computer
In order to match
cp
, the network version
scp
uses similar arguments, but indicates the remote computer with
:
-
scp <remote_user>@<remote_computer>:<remote_file> <local_path>
will copy a file from a remote computer to the local one
-
scp <local_file> <remote_user>@<remote_computer>:<remote_path>
will copy a file from the local computer to a remote one
- The
-r
option provides recursive copy for directories
Downloading files from the internet
wget <url>
is used to download files from the internet, to the current directory. This works for both HTTP and FTP.
Checking network connectivity
Testing HTTP services ports on a server
NetCat (
netcat
,
ncat
, or
nc
) will try to connect to a remove computer (
hostname) on the given port, using the TCP protocol, and report the result. This is useful to check if a service is open and accessible (for instance, not blocked by a firewall)
ncat -zv <hostname> <port>
-u
can be used to check with the UDP protocol instead of TCP.
[user1@ws01 ~]$ ncat -zv www.aserver.com 80 # Testing port 80
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 142.250.176.196:80. # Success
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
[user1@ws01 ~]$ ncat -zv www.aserver.com 809 # Testing port 809
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connection refused. # Port closed
[user1@ws01 ~]$ ncat -zv www.aserver.com 803 # Testing port 803
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connection timed out. # Port not used
Testing low level communication across the network
telnet <hostname> [port]
is a low level tool to communicate with a remote computer.
It is often used to check the responses to simple commands, for instance for http servers.
How do Unixes find executables?
Binary locations
If you try and run a program that is located inside the folder you are currently in by just using its name, you will notice that you usually get the error
command not found
. However using
./
in front of it works. This is because Unix needs complete paths for executables. See the following example:
[user1@ws01 bin]$ ls # Let's check what files are in this directory
hello_world
[user1@ws01 bin]$ hello_world # Launch the program
bash: hello_world: command not found
[user1@ws01 bin]$ ./hello_world
Hello World # It worked!
However, commands like
ls
,
cat
can be accessed from anywhere without giving their complete path, how is that possible? This is because of the
PATH
environment variable, which lets Unix know that the executables in a directory should be accessible system-wide. You can see what its current value is, and prepend to it to add a new folder, with a colon
:
to separate different directories. For instance, let's make our previous test program accessible.
[user1@ws01 bin]$ pwd
/tmp/folder/code/bin
[user1@ws01 bin]$ ls
hello_world
[user1@ws01 bin]$ hello_world
bash: hello_world: command not found
[user1@ws01 bin]$ echo $PATH
/usr/local/bin:/usr/bin:/bin # aka /usr/local/bin, /usr/bin, /bin
[user1@ws01 bin]$ export PATH=/tmp/folder/code/bin:$PATH # Do not forget to put $PATH at the end, otherwise you will lose standard commands!
[user1@ws01 bin]$ echo $PATH
/tmp/folder/code/bin:/usr/local/bin:/usr/bin:/bin
[user1@ws01 bin]$ hello_world
Hello World
But what if you have several executables with the same name? This is where the directories order in
PATH
matters: Unix will use the first one it finds. That's why we always prepend to
PATH
rather than append to it. With our example, if we have a second executable called
hello_world
, but located in a different folder, and both of these folders are in
PATH
, we can use the command
which
to show which of the two will be called when not giving the full path!
[user1@ws01 code]$ ls
bin bin2 include lib src
[user1@ws01 code]$ ls bin
hello_world
[user1@ws01 code]$ ls bin2
hello_world
[user1@ws01 code]$ which hello_world # Which version is used?
/tmp/folder/code/bin/hello_world
[user1@ws01 code]$ export PATH=/tmp/folder/code/bin2:$PATH # Prepend the location of the new executable
[user1@ws01 code]$ echo $PATH
/tmp/folder/code/bin2:/tmp/folder/code/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin
[user1@ws01 code]$ which hello_world # Which version is used?
/tmp/folder/code/bin2/hello_world
Libraries
A program is often not simply an executable, but a set of (dynamic) libraries. You might get the error:
error while loading shared libraries .so
when trying to run a program. This means that one of the dynamic libraries used by the executable you are trying to run cannot be found.
Just like executables are found using
PATH
, Unix uses the environment variable
LD_LIBRARY_PATH
to find libraries. And its use is similar to
PATH
: you can just prepend to it.
[user1@ws01 tests]$ ls
hello_world.c test_db test_db.c test_db.o
[user1@ws01 tests]$ ./test_db # We want to run test_db
./test_db: error while loading shared libraries: libdb_api.so: cannot open shared object file: No such file or directory
[user1@ws01 tests]$ ls ../../lib/ # The library is here
libdb_api.so
[user1@ws01 tests]$ export LD_LIBRARY_PATH=/tmp/folder/code/lib/:$LD_LIBRARY_PATH
[user1@ws01 tests]$ ./test_db
Initializing database
Database initialized
If you have multiple versions of the same library, the one that comes first in
LD_LIBRARY_PATH
will be used. You can check what dynamic libraries are used by an executable with
ldd
[user1@ws01 tests]$ ldd test_db
linux-vdso.so.1 => (0x00007fff22531000)
libdb_api.so => /tmp/folder/code/lib/libdb_api.so (0x00007f063e27b000)
libc.so.6 => /lib64/libc.so.6 (0x00007f063dead000)
/lib64/ld-linux-x86-64.so.2 (0x00007f063e47d000)
If you are using a
Modules Environment, what they do is actually set the PATH and LD_LIBRARY_PATH environment variables for the software you are using
Scripting: programming with shell
Sequences of commands in Unix
In Unix, different commands can be executed sequentially (one after the other) when typed on the same line.
- You can use the semicolon
;
to separate each command: a; b
runs command a
and then command b
Redirection
Redirection to files
You can redirect the output of a command to a file using the
redirection operators
>
and
>>
-
command > filename
will create a new file called filename and the text returned by command will be written there
-
command >> filename
will append the result of the command to filename, or create it if it does not exist
Similarly, you can use the operator <
to give an input file to a command. This corresponds to writing to stdin
, which is the way arguments are read as strings.
If you do not want to keep the output printed to the terminal, you can redirect it to /dev/null
. This will be faster than writing to a file and then erasing it. Very useful if the output of a command is somewhat verbose.
Chaining commands with pipe
It can be useful to chain commands, reusing the output of one as the input to another. This is done using the
pipe operator
|
, which practically consists in redirecting the output of one command as the input to the next one. The commands are evaluated from left to right
[user1@ws01 ~]$ command1 | command2 | command3 | command4
Storing an output in a variable
Similarly, the syntax
$(<command>)
can be used. It is especially useful as it can be nested. The innermost
$( )
are executed first.
For example, to see all the executables located in the same directory as
mpirun
, one may write:
[user1@ws01 ~]$ ls $(dirname $(which mpirun))
aggregate_profile.pl mpicc mpicxx mpif77 mpifort ompi-clean ompi-server ortecc orted orterun oshc++ oshCC oshfort oshrun shmemc++ shmemCC shmemfort
mpic++ mpiCC mpiexec mpif90 mpirun ompi_info opal_wrapper orte-clean orte-info orte-server oshcc oshcxx oshmem_info profile2mat.pl shmemcc shmemcxx shmemrun
Simple arithmetic operations
Flow control
Sometimes you want to make more complex operations, that repeat themselves, or act differently based on a test. Shells have flow control for that
Loops
zsh
comes with a powerful concept called shell globbing (
globs), which can be used to iterate over sub-directories. To enable it with
bash
, use the command
shopt -s globstar
first.
The syntax to use globs is
**
, and a typical usage will look like this:
# This will iterate over all the .txt files
# in the current directory and its subdirectories
for f in **/*.txt
do
echo $f
done
Tests and branching
To avoid cascading
elif
, shells provide an equivalent to other languages switch/case with
case in ) ;; esac
case "$animal_name" in
"Pluto")
echo "You might have a dog"
;;
"Felix")
echo "You might have a cat"
;;
"Kermit"|"Jean-Baptiste")
echo "You might have a frog"
;;
*)
echo "Sorry I cannot guess what animal you have"
;;
esac
Script files
Shells pre-define several variables that are very useful in scripts:
-
$0
, $1
, … are the elements passed on the command line, where $0
is the script
-
$#
is the number of elements in the command line
-
$*
and $@
are the arguments
-
$?
is the return value of the latest issued command (0
if no error)
Scheduling automated operations
crontab
is used to schedule the launch of commands periodically using the cron daemon, for instance running a back-up every day at midnight. Each user has their own cron jobs.
-
crontab -e
lets you edit the schedule
-
crontab -l
shows you the schedule
Editing the file is done with
vi
(unless you have set the environment variable
VISUAL
), and uses the following format (a
*
means all the possible values):
# You can put comments using hash/pound
Minute Hour(0-23) Day of Month Month Day of week Command
# Running a backup every day at 11pm
0 23 * * * /home/user1/backup.sh
# Running a sync every Monday at noon
0 12 * * 1 /home/user1/sync.sh
Graphical interfaces
X servers
You will often hear references to "X" or "X11" when talking about graphical interfaces in Unix. This is because in order for GUI-based programs to be used, an X-server has to be running on the local machine you are using (which is the default for desktop computers).
Some useful commands to check if X is running:
[user1@ws01 ~]$ xterm # Opens a new terminal
[user1@ws01 ~]$ xclock # Displays a clock
Running graphical interfaces remotely
Some notes about Unix administration
Definitions
Common directories
The usual directories you will see in the Unix root filesystem are the following:
/
├── boot # This is where the Linux kernel lives
├── dev # Used to access device drivers
├── etc # Used to store configuration files
├── home # home directories of users on the local machine
│ ├── user1
│ ├── user2
│ └── user3
├── mnt # Filesystems mounted from other sources: network-based, USB …
├── opt # Contains software packages
├── proc # Information about the system and running processes
├── root # The files of the administrator
├── tmp # The scratch area on a local disk
├── usr # files that come with the Operating System
│ ├── bin # programs and commands
│ ├── include # include files when programming
│ ├── lib # libraries
│ ├── lib64 # 64-bit libraries
│ └── local # more bin, include, lib, lib64, added by local admin
└── var
└── log # Most programs running as services will write logs here
The following commands are useful to know more about your system.
uname
gives you the name of your operating system, and is often used to get details about which kernel is running with uname -r
To know about disk space usage, you can use the following commands:
/etc/fstab
contains the list of mounting points that will be automatically mounted when the Operating System starts, aka these settings are permanent.
Package managers
At the low level, RedHat packages are found inside
.rpm
archives. Try and avoid working with these directly as manually finding dependencies can be tedious.
If a software package is only distributed as an RPM and you need to extract what is inside, you can use the
rpm2cpio
command piped into
cpio
[user1@ws01 ~]$ rpm2cpio ./packagename-version.x86_64.rpm | cpio -idmv
Similarly, Debian distributions rely on
.deb
archives. Which you can extract using the
ar
command first, then
tar
.
[user1@ws01 deb_extraction]$ ls
fahclient.deb
[user1@ws01 deb_extraction]$ ar x fahclient.deb
[user1@ws01 deb_extraction]$ ls -l
total 6388
-rw-r--r-- 1 user1 group1 2372 Jul 22 13:58 control.tar.xz
-rw-r--r-- 1 user1 group1 3263500 Jul 22 13:58 data.tar.xz
-rw-r--r-- 1 user1 group1 4 Jul 22 13:58 debian-binary
-rw-rw-r-- 1 user1 group1 3266064 Oct 23 2020 fahclient.deb
# data.tar.xz contains the files we are interested in
[user1@ws01 deb_extraction]$ tar xvf data.tar.xz
./
./etc/
./etc/init.d/
./etc/init.d/FAHClient
./usr/
./usr/bin/
./usr/bin/FAHClient
…
Running programs as a different user
Switching users
su - [username]
can be used to change to another user. If no username is specified, root is assumed. You will be prompted to enter the password of the target account, unless you are root.
The optional argument -
is used to specify you want to use a login shell with their environment (aka it will clear up your own environment first)
Launching a program as another user
sudo <command>
can be used to launch a command as another user. This is often used to give admin access to a user for only a subset of commands, without giving them the root password.
For users who are authorized to run as root, a common practice is to use
sudo su - <username>
to become user
username.
Launching the same commands on multiple computers
Using
pdsh
(parallel commands to remote hosts) is used to send the same command to different remote computers. It is often used with computers that have logical names, for instance using ranges of numbers to identify them.
The following would query the computers ws0, ws1, ws2, ws3 about the root filesystem
/
. Note that this operation is done in parallel, so results are coming back without following the alphanumerical order!
[root@workstation ~]$ pdsh -w ws[0-3] df -H /
ws2: Filesystem Size Used Avail Use% Mounted on
ws2: /dev/sda1 33G 18G 15G 55% /
ws1: Filesystem Size Used Avail Use% Mounted on
ws1: /dev/sda1 33G 18G 15G 55% /
ws3: Filesystem Size Used Avail Use% Mounted on
ws3: /dev/sdb1 33G 18G 15G 55% /
ws0: Filesystem Size Used Avail Use% Mounted on
ws0: /dev/sda1 33G 18G 15G 55% /
Sciware
Do you want to know more? Sciware-23 was about the topic of "Command line and shell interaction". See the slides
here