Unix commands
The tutorial is based on the Unix system. In this chapter, we will introduce some basic Unix commands that are useful in later modules. We will introduce basic commands related to Directory management, text file reading, Unix input and output redirection, and the Pipe command. the file used in this tutorial is the earthquake catalog and station list in the Banda Arc–Australian Plate Collision Zone (Jiang et al., 2022).
Developed by Po Wang LAM (Ryan) under the instruction of Han CHEN.
1 Target
Master the basic operation in the directory and file.
Understanding the input and output redirection and the Pipe command.
Make sub-catalogs and sub-station lists based on the Downloaded complete earthquake catalog and save the file in corresponding directories.
2 Commands and concept included
Directory management: pwd
cd
mkdir
ls
File reading: more
cat
wc
grep
awk
Input and output redirection: >
<
Pipe: |
3 Commands usage and examples
3.1 Directory management
Before any operation, it is important to make sure you are at the correct working directory. First, to know which directory you are at, you can use the print working directory command pwd
. It will print the current working directory as output in the terminal.
1$ pwd
2/current-directory #for illustration only
You are now at current-directory
. Then, you should change the current directory to the work directory you want, here we used the desktop as example, by the command change directory cd
.
1$ cd /home/user/desktop
2$ pwd
3/home/user/desktop
Note
The /home/user/desktop
here is used as an example. please change it to your own directory accordingly.
Now you are at the /home/user/desktop
. Before downloading the data, you may want to create an own directory for storing them. This can be done by a simple command mkdir
.
1$ mkdir Data_storage
The command ls
list the content under current directory. Now run the list files ls
command the make sure the Data_storage
is created.
1$ mkdir ls
2Data_storage
Now you can change your working directory to Data_storage
.
1$ cd Data_storage
2$ ls
After running the ls
, no output was shown, which means that the directory is now empty.
Tip
Several common commands for dealing with directories
ls(list files): List directories and file names
cd(change directory):Switch directory
pwd(print work directory):Show current directory
mkdir(make directory):Create a new directory
rmdir(remove directory):Delete an empty directory
cp(copy file): Copy a file or directory
rm(remove): delete file or directory
mv(move file): Move files and directories, or change the names of files and directories
3.2 File reading
3.2.1 Browse documents
Now we can download the catalog file. The file could be download from here
. After downloading the catalogue, it will first be stored in the ~/Downloads
. To move the files from ~/Downloads
to the directory you created, you may use the command mv
1$ pwd
2Home/user/desktop/Data_storage
3$ mv ~/Downloads/Unix_command_Materials.tgz ./
4$ ls
5Unix_command_Materials.tgz
Note
The directory ~/Downloads
here is suitable for Unix system. please change it to your own directory accordingly.
The file was now moved to your working directory. As the file is zipped as a .tgz file, it needs to be unzipped. It can be done by the command tar
1$ tar -zxf Unix_command_Materials.tgz
2$ ls
3CUSeisTut Unix_command_Materials.tgz
The file was now unzipped and the directory CUSeisTut
was create. Change your working directory to the CUSeisTut
and use ls
to check the contents.
1$ cd CUSeisTut
2$ ls
3A_Detailed_EQ_Catalog Stations_Info TSR_Paper
4$ ls A_Detailed_EQ_Catalog
5banda_arc_catalog.txt
6$ ls Stations_Info TSR_Paper
7GE_3_stations.txt YS_30_stations.txt
8Stations_Info:
9GE_3_stations.txt YS_30_stations.txt
10
11TSR_Paper:
12tsr-2021041.1.pdf tsr-2021041_supplement.docx
We will first look at the earthquake catalog banda_arc_catalog.txt
stored in A_Detailed_EQ_Catalog
, try to change your working directory to there.
Before processing the file, we would like to ensure the file is compatible and contains the desired data. There are several ways for reading a text file from the terminal, we would like to introduce two methods: more
cat
1$ cd A_Detailed_EQ_Catalog
2$ ls
3banda_arc_catalog.txt
4$ more banda_arc_catalog.txt
5 indx year mon day time sec_relative_to_day res lat lon dep mag visual_flag hypodd_flag
6 2 2014 03 18 16:56:34.260000 60994.2600 0.697 -9.092 124.191 82.1 3.1 1 1
7 3 2014 03 19 14:39:17.472600 52757.4726 1.593 -8.519 126.329 23.2 3.0 1 0
8 4 2014 03 20 15:32:39.914100 55959.9141 0.706 -7.482 127.900 192.6 3.7 1 0
9 5 2014 03 21 15:10:35.760000 54635.7600 1.090 -8.936 124.325 81.2 1.6 1 1
10 6 2014 03 21 18:16:37.720199 65797.7202 1.898 -8.928 125.775 69.8 2.3 1 0
11 7 2014 03 22 08:18:17.449799 29897.4498 1.095 -8.974 125.587 -4.5 3.0 1 0
12 8 2014 03 22 13:57:21.227599 50241.2276 0.792 -9.891 123.950 62.9 3.9 1 0
13 9 2014 03 22 15:43:47.217700 56627.2177 1.546 -8.747 122.528 114.1 2.4 1 0
14 10 2014 03 22 19:23:44.019000 69824.0190 0.642 -7.383 128.098 218.4 2.9 1 0
15 11 2014 03 23 04:37:50.889999 16670.8900 0.787 -9.359 124.140 57.7 3.3 1 1
16 12 2014 03 23 12:16:37.705200 44197.7052 1.017 -10.549 123.603 -4.1 3.3 1 0
17 13 2014 03 24 15:56:41.793600 57401.7936 1.512 -7.118 127.153 152.0 3.0 1 0
18 14 2014 03 24 17:16:01.760000 62161.7600 1.904 -9.108 124.246 68.7 3.1 1 1
19 17 2014 03 25 17:14:15.798098 62055.7981 0.768 -7.407 126.617 199.0 2.5 1 0
20 18 2014 03 25 18:57:23.260000 68243.2600 0.606 -9.000 124.130 75.6 1.8 1 1
21 19 2014 03 26 06:32:57.887500 23577.8875 0.851 -7.449 127.497 166.7 2.6 1 0
22 23 2014 03 29 20:31:47.700000 73907.7000 0.734 -9.381 123.590 82.9 3.0 1 1
23 24 2014 03 31 10:56:15.241300 39375.2413 0.973 -7.424 126.049 23.6 4.1 1 0
24 26 2014 04 01 01:00:36.254898 3636.2549 0.932 -7.614 127.268 188.9 2.9 1 0
25 27 2014 04 02 03:01:25.829400 10885.8294 1.684 -7.507 122.368 21.8 4.4 1 0
26 28 2014 04 02 15:26:23.424199 55583.4242 0.827 -9.073 124.145 37.0 2.4 1 0
27 29 2014 04 02 18:20:08.600000 66008.6000 0.840 -8.897 124.130 78.2 2.2 1 1
28 30 2014 04 02 19:06:23.708500 68783.7085 1.137 -9.317 120.447 131.6 3.3 1 0
29 31 2014 04 03 11:44:25.794800 42265.7948 0.842 -7.779 128.245 218.4 3.5 1 0
30 33 2014 04 04 07:06:34.235300 25594.2353 1.358 -7.970 123.634 119.6 3.6 1 0
31 34 2014 04 04 11:18:03.472600 40683.4726 0.769 -8.941 124.175 14.9 2.4 1 0
32 35 2014 04 04 16:50:22.464900 60622.4649 2.933 -8.011 127.448 115.0 2.2 1 0
33 36 2014 04 05 02:37:26.165300 9446.1653 1.564 -8.899 125.963 -4.5 2.4 1 0
34 --More--(0%)
The more
command outputs the first few rows of the file, we can press “Enter” to show more lines. and Press ‘Ctrl+C’ to exit the command.
1$ cat banda_arc_catalog.txt
2 ......
3 28683 2018 12 29 12:28:05.790000 44885.7900 0.860 -8.220 123.825 180.2 3.5 1 1
4 28684 2018 12 29 15:32:39.982800 55959.9828 1.553 -8.846 124.025 66.1 2.0 1 0
5 28686 2018 12 29 20:29:02.320000 73742.3200 0.652 -10.036 123.314 31.6 2.3 1 1
6 28688 2018 12 30 03:43:30.899300 13410.8993 1.217 -7.994 128.068 183.4 3.7 1 0
7 28689 2018 12 30 04:10:53.381499 15053.3815 1.544 -9.896 118.925 33.8 5.0 1 0
8 28692 2018 12 30 14:47:19.025200 53239.0252 0.865 -7.788 127.986 176.9 3.5 1 0
9 28695 2018 12 30 18:21:30.339600 66090.3396 0.942 -9.970 123.299 60.6 2.6 1 0
10 28696 2018 12 30 20:22:25.646099 73345.6461 0.928 -8.069 123.234 206.4 3.2 1 0
11 28697 2018 12 31 04:36:34.280000 16594.2800 0.769 -8.808 124.321 96.4 2.5 1 1
12 28699 2018 12 31 19:13:00.751600 69180.7516 0.983 -8.879 123.539 5.2 2.9 1 0
The cat
command will pop the whole content at once.
We could count the total lines and characters of the catalog by using the wc
command.
1$ wc -l banda_arc_catalog.txt
219075 banda_arc_catalog.txt
3$ wc -c banda_arc_catalog.txt
41735843 banda_arc_catalog.txt
The parameters -l
and -c
are parameters that choose the output. -l
means count the lines of the file and -c
means count the total characters of the file. The catalog is a very large catalog with 19074 events (first line of the file indicates the contents).
Tip
The $?
could be used to represent specific column, such as $1
represents the 1st culumn, $2
represents the 2nd column, $NF
represents the last column.
After viewing the original catalog, some processes can be done to divide the catalog for the analysis. The target is to generate sub-divided catalogs based on different properties like time and magnitude. To achieve this, some operational commands need to be used. grep
and awk
is two simple commands for searching certain content from the file.
3.2.1 Extract the text content
The grep
is a command used to search for texts and strings. It output all the rows that contain the searched character.
For example, when we would like to search earthquakes that occurred in 2014, we can use the following commands.
1$ grep ‘2014’ banda_arc_catalog.txt
2 …
3 2707 2014 12 31 13:46:44.510000 49604.5100 1.210 -8.959 123.953 91.9 3.2 1 1
4 2709 2014 12 31 15:50:22.129999 57022.1300 0.599 -8.116 120.694 3.2 2.1 1 1
5 2710 2014 12 31 15:53:45.350000 57225.3500 1.305 -9.457 119.644 64.3 2.2 1 1
6 2711 2014 12 31 16:22:48.430000 58968.4300 1.488 -9.476 120.123 75.6 1.9 1 1
7 2712 2014 12 31 18:16:21.280000 65781.2800 1.329 -10.439 123.626 110.1 2.4 1 1
8 2714 2014 12 31 19:07:44.320000 68864.3200 1.350 -9.601 119.909 55.2 1.8 1 1
9 2715 2014 12 31 19:14:17.627898 69257.6279 1.265 -9.331 124.087 4.9 2.5 1 0
10 2716 2014 12 31 19:19:49.720000 69589.7200 1.649 -9.662 119.824 24.4 1.9 1 1
11 2717 2014 12 31 19:47:27.840000 71247.8400 1.068 -9.146 118.864 51.9 2.6 1 1
12 2718 2014 12 31 20:24:31.120000 73471.1200 1.724 -9.072 123.987 73.2 1.9 1 1
13 2720 2014 12 31 20:33:22.680000 74002.6800 1.350 -8.293 120.601 39.5 2.1 1 1
14 2721 2014 12 31 20:40:36.260000 74436.2600 0.917 -10.070 119.152 13.2 2.2 1 1
15 2722 2014 12 31 22:46:19.320000 81979.3200 1.516 -9.511 120.082 70.5 1.9 1 1
16 6638 2015 05 07 06:06:54.990000 22014.9900 0.837 -9.458 125.033 7.5 2.7 1 1
17 6771 2015 05 12 06:06:54.222800 22014.2228 0.684 -8.800 120.459 109.0 2.1 1 0
18 12014 2015 11 04 19:28:46.320000 70126.3200 1.223 -8.298 125.076 5.2 2.4 1 1
19 20141 2016 06 18 18:32:52.840000 66772.8400 0.836 -8.311 123.929 176.8 1.9 1 1
20 20142 2016 06 18 19:14:26.900000 69266.9000 1.531 -9.441 124.789 4.1 1.8 1 1
21 20143 2016 06 18 20:03:26.445700 72206.4457 2.369 -8.408 126.589 -3.2 2.6 1 0
22 20144 2016 06 18 20:31:17.911600 73877.9116 1.388 -8.957 119.781 21.3 2.0 1 0
23 20147 2016 06 19 01:38:58.808900 5938.8089 1.081 -11.094 118.986 110.4 3.2 1 0
24 20148 2016 06 19 05:31:08.009999 19868.0100 1.257 -9.681 119.794 56.3 2.3 1 1
25 22014 2016 09 11 08:59:33.940000 32373.9400 1.514 -8.672 118.465 119.5 2.6 1 1
26 27150 2018 07 25 10:21:49.201498 37309.2015 0.947 -8.532 126.669 69.8 3.4 1 0
The command gives an output in the terminal with all rows including ‘2014’. But here are some problem, some row contains ‘2014’, but the ‘2014’ does not represent the year 2014 (e.g., line 16 to line 26).
The awk
is a more powerful tool for manipulating data and producing reports. The awk command allows the user to use variables, numeric functions, string functions, and logical operators.
Tip
General command: awk [‘pattern {action}’] [file_name]
[pattern] : indicate where to execute the action, for example, NR>10 means lines > 10
[action] : the default action is to print out all lines fulfilled the pattern, but the action can also be more specific with different input like calculation.
[file_name] : the file to process
The -F
parameter could be used to specify the delimiter. such as awk -F "[|]" [‘pattern action’] [file_name]
specify the |
as the delimiter. By default, the delimiter is Space
.
1$ awk 'NR<20{if ($11>3.0) print;}' banda_arc_catalog.txt
2 indx year mon day time sec_relative_to_day res lat lon dep mag visual_flag hypodd_flag
3 2 2014 03 18 16:56:34.260000 60994.2600 0.697 -9.092 124.191 82.1 3.1 1 1
4 4 2014 03 20 15:32:39.914100 55959.9141 0.706 -7.482 127.900 192.6 3.7 1 0
5 8 2014 03 22 13:57:21.227599 50241.2276 0.792 -9.891 123.950 62.9 3.9 1 0
6 11 2014 03 23 04:37:50.889999 16670.8900 0.787 -9.359 124.140 57.7 3.3 1 1
7 12 2014 03 23 12:16:37.705200 44197.7052 1.017 -10.549 123.603 -4.1 3.3 1 0
8 14 2014 03 24 17:16:01.760000 62161.7600 1.904 -9.108 124.246 68.7 3.1 1 1
9 24 2014 03 31 10:56:15.241300 39375.2413 0.973 -7.424 126.049 23.6 4.1 1 0
In this example, the pattern is NR>20
and the action is if($11>3.0) print
. In a readable way, it means for the first 20 lines, print all rows where their column 11($11) is larger than 3.0. With the physical meaning of each column, the output is all events with a magnitude over 3 within the first 19 events
We could print the column that we are interested in separately by adding the column index after print
.
1$ awk 'NR<20{if ($11>3.0) print $1,$2,$3,$4,$5,$8,$9,$10;}' banda_arc_catalog.txt
2indx year mon day time lat lon dep
32 2014 03 18 16:56:34.260000 -9.092 124.191 82.1
44 2014 03 20 15:32:39.914100 -7.482 127.900 192.6
58 2014 03 22 13:57:21.227599 -9.891 123.950 62.9
611 2014 03 23 04:37:50.889999 -9.359 124.140 57.7
712 2014 03 23 12:16:37.705200 -10.549 123.603 -4.1
814 2014 03 24 17:16:01.760000 -9.108 124.246 68.7
924 2014 03 31 10:56:15.241300 -7.424 126.049 23.6
3.3 File output and input
>
and <
is the Output and Input Redirection in Unix. Most Unix system commands take input from your terminal and send the resulting output back to your terminal, as what shown in above examples.
If the notation > file
is appended to any command, the output of that command will be written to the file instead of your terminal.
1$ echo "hello word"
2hello word
3$echo "hello world" > test.txt
4cat test.txt
5hello word
In the second command, >
was appended to the echo command, so the output was written to the file ‘test.txt’ rather than the terminal.
Note
echo
: output the original content to the screen if it has no special meaning; if the output content has a special meaning, the output will print its meaning.
>
will generate a new file if the file does not exist and will write over the file if the file already exists!
Note
Types of Redirection
Overwrite
>
standard output, <
standard input
Appends
>>
standard output, <<
standard input
We could now use awk
to extract the earthquake based on various criterion and save the output as sub-catalog by using >
.
1$ mkdir earthquake-2014
2$ awk '{if ($2==2014) print;}' banda_arc_catalog.txt > earthquake-2014/banda_arc_catalog-2014.txt
3$ cd earthquake-2014
4$ ls
5banda_arc_catalog-2014.txt
6$ wc -l banda_arc_catalog-2014.txt
71871 banda_arc_catalog-2014.txt
Here we make a directory earthquake-2014
and extract the earthquakes that occurred in 2014 (e.g. $2==2014) and save the sub-catalog into file ‘banda_arc_catalog-2014.txt’ under the directory. We than count the events number by using wc -l
<
will direct the file as a standard input to the command, for example
1$ echo line1 > test.txt
2$ echo line2 >> test.txt
3$ while read line
4$ do
5$ echo $line
6$ done < test.txt
7line1
8line2
Here we use <
to direct the file test.txt
as a standard input for the command while read line
. the command will read the file line by line as $line
.
3.4 The Pipe in Unix
Pipe is used to redirect the output of a command as the input another command
1$ grep 2014 banda_arc_catalog.txt | wc -l
21882
3$ awk '{if ($2==2014) print;}' banda_arc_catalog.txt | wc -l
41871
In this example, we used the pipe |
to redirect the output of a command grep
and awk
as the input of command wc -l
.
4 Exercise
4.1. Make a directory earthquake-Mag
under A_Detailed_EQ_Catalog
. Extract the year, mon, day, time, lat, lon, dep, and mag of earthquakes with Magnitude between 4 to 6 and save the output to file earthquake-Mag-4-6.txt
. Count the number of the extracted earthquakes.
4.2. Make a station list file with only station ‘ALRB’ and save it as Substation.lst
. Only Network, station, latitude, longitude, elevation are needed. Append same information of station ‘PPLA’ to the Subastaion.lst.
References
Jiang C, Zhang P, White M C A, et al. A Detailed Earthquake Catalog for Banda Arc–Australian Plate Collision Zone Using Machine‐Learning Phase Picker and an Automated Workflow[J]. The Seismic Record, 2022, 2(1): 1-10.