Understanding the `comm` Command in Linux
In the world of Linux, the comm
command is a lesser-known but highly effective
tool for comparing two sorted files line by line. It's particularly useful for
identifying lines that are common or unique to one of the files.
Syntax
The basic syntax of the comm
command is:
comm [OPTION]... FILE1 FILE2
FILE1
andFILE2
are the two sorted files you want to compare.[OPTION]...
represents the various options that can be applied to thecomm
command.
Options
Here's a table outlining the options for the comm
command:
Option | Shorthand | Description |
---|---|---|
--check-order | Check that the input is correctly sorted, even if all input lines are pairable. | |
--nocheck-order | -n | Do not check that the input is correctly sorted. |
--output-delimiter | -o | Separate columns with the provided delimiter character. |
--help | Display a help message and exit. | |
--version | Display version information and exit. | |
-1 | Suppress the output of column 1 (lines unique to FILE1 ). | |
-2 | Suppress the output of column 2 (lines unique to FILE2 ). | |
-3 | Suppress the output of column 3 (lines that appear in both files). |
Creating Example Files
To demonstrate the comm
command, we need two sorted text files. Let's create
them:
File 1: list1.txt
Create the file:
vim list1.txt
Insert the following sorted list:
apple
banana
cherry
dateSave and exit with
:wq
.File 2: list2.txt
Create the second file:
vim list2.txt
Type in the sorted list:
banana
date
fig
grapeSave and exit as before.
Example 1: Basic Comparison
To compare the two files and output three columns:
comm list1.txt list2.txt
Output:
apple
banana
cherry
date
fig
grape
Here, the first column contains lines unique to list1.txt
, the second column
has lines unique to list2.txt
, and the third column shows the common lines.
Example 2: Suppressing Columns
To suppress the first column and compare the files:
comm -1 list1.txt list2.txt
Output:
banana
date
fig
grape
The -1
option removes lines that are only in list1.txt
.
Example 3: Suppressing Multiple Columns
You can suppress more than one column:
comm -23 list1.txt list2.txt
Output:
apple
cherry
The -23
option suppresses both the second and third columns, displaying only
the unique lines from list1.txt
.
Example 4: No Check for Sorted Order
In cases where you are certain your files are sorted, you can skip the sorted order check:
comm -n list1.txt list2.txt
Example 5: Custom Output Delimiter
You can specify a custom delimiter to separate columns:
comm --output-delimiter=',' list1.txt list2.txt
Output:
apple,,,
,banana,,
cherry,,,
,date,,
,fig,,
,grape,,
Commas are now used as the column separator.
Combining comm
with Other Commands
The comm
command is often used in conjunction with other Unix commands. For
instance, to count the number of common lines:
comm -12 list1.txt list2.txt | wc -l
Conclusion
The comm
command offers a straightforward way to compare sorted files, making
it a useful tool for system administrators, developers, and data analysts
working with Linux. Its ability to suppress columns provides flexibility,
enabling users to get precisely the comparison they need. By mastering comm
and its options, you can efficiently work with sorted data, perform comparisons,
and streamline your workflows on the Linux command line.
What Can You Do Next 🙏😊
If you liked the article, consider subscribing to Cloudaffle, my YouTube Channel, where I keep posting in-depth tutorials and all edutainment stuff for software developers.