Understanding the `comm` Command in Linux
In the world of Linux, the comm command is a lesser-known but highly effective
tool for comparing two sorted files line by line. It's particularly useful for
identifying lines that are common or unique to one of the files.
Syntax
The basic syntax of the comm command is:
comm [OPTION]... FILE1 FILE2
FILE1andFILE2are the two sorted files you want to compare.[OPTION]...represents the various options that can be applied to thecommcommand.
Options
Here's a table outlining the options for the comm command:
| Option | Shorthand | Description |
|---|---|---|
--check-order | Check that the input is correctly sorted, even if all input lines are pairable. | |
--nocheck-order | -n | Do not check that the input is correctly sorted. |
--output-delimiter | -o | Separate columns with the provided delimiter character. |
--help | Display a help message and exit. | |
--version | Display version information and exit. | |
-1 | Suppress the output of column 1 (lines unique to FILE1). | |
-2 | Suppress the output of column 2 (lines unique to FILE2). | |
-3 | Suppress the output of column 3 (lines that appear in both files). |
Creating Example Files
To demonstrate the comm command, we need two sorted text files. Let's create
them:
File 1: list1.txt
Create the file:
vim list1.txtInsert the following sorted list:
apple
banana
cherry
dateSave and exit with
:wq.File 2: list2.txt
Create the second file:
vim list2.txtType in the sorted list:
banana
date
fig
grapeSave and exit as before.
Example 1: Basic Comparison
To compare the two files and output three columns:
comm list1.txt list2.txt
Output:
apple
banana
cherry
date
fig
grape
Here, the first column contains lines unique to list1.txt, the second column
has lines unique to list2.txt, and the third column shows the common lines.
Example 2: Suppressing Columns
To suppress the first column and compare the files:
comm -1 list1.txt list2.txt
Output:
banana
date
fig
grape
The -1 option removes lines that are only in list1.txt.
Example 3: Suppressing Multiple Columns
You can suppress more than one column:
comm -23 list1.txt list2.txt
Output:
apple
cherry
The -23 option suppresses both the second and third columns, displaying only
the unique lines from list1.txt.
Example 4: No Check for Sorted Order
In cases where you are certain your files are sorted, you can skip the sorted order check:
comm -n list1.txt list2.txt
Example 5: Custom Output Delimiter
You can specify a custom delimiter to separate columns:
comm --output-delimiter=',' list1.txt list2.txt
Output:
apple,,,
,banana,,
cherry,,,
,date,,
,fig,,
,grape,,
Commas are now used as the column separator.
Combining comm with Other Commands
The comm command is often used in conjunction with other Unix commands. For
instance, to count the number of common lines:
comm -12 list1.txt list2.txt | wc -l
Conclusion
The comm command offers a straightforward way to compare sorted files, making
it a useful tool for system administrators, developers, and data analysts
working with Linux. Its ability to suppress columns provides flexibility,
enabling users to get precisely the comparison they need. By mastering comm
and its options, you can efficiently work with sorted data, perform comparisons,
and streamline your workflows on the Linux command line.
What Can You Do Next 🙏😊
If you liked the article, consider subscribing to Cloudaffle, my YouTube Channel, where I keep posting in-depth tutorials and all edutainment stuff for software developers.