Skip to main content

Understanding the `comm` Command in Linux

In the world of Linux, the comm command is a lesser-known but highly effective tool for comparing two sorted files line by line. It's particularly useful for identifying lines that are common or unique to one of the files.

Syntax

The basic syntax of the comm command is:

comm [OPTION]... FILE1 FILE2
  • FILE1 and FILE2 are the two sorted files you want to compare.
  • [OPTION]... represents the various options that can be applied to the comm command.

Options

Here's a table outlining the options for the comm command:

OptionShorthandDescription
--check-orderCheck that the input is correctly sorted, even if all input lines are pairable.
--nocheck-order-nDo not check that the input is correctly sorted.
--output-delimiter-oSeparate columns with the provided delimiter character.
--helpDisplay a help message and exit.
--versionDisplay version information and exit.
-1Suppress the output of column 1 (lines unique to FILE1).
-2Suppress the output of column 2 (lines unique to FILE2).
-3Suppress the output of column 3 (lines that appear in both files).

Creating Example Files

To demonstrate the comm command, we need two sorted text files. Let's create them:

  1. File 1: list1.txt

    Create the file:

    vim list1.txt

    Insert the following sorted list:

    apple
    banana
    cherry
    date

    Save and exit with :wq.

  2. File 2: list2.txt

    Create the second file:

    vim list2.txt

    Type in the sorted list:

    banana
    date
    fig
    grape

    Save and exit as before.

Example 1: Basic Comparison

To compare the two files and output three columns:

comm list1.txt list2.txt

Output:

apple
banana
cherry
date
fig
grape

Here, the first column contains lines unique to list1.txt, the second column has lines unique to list2.txt, and the third column shows the common lines.

Example 2: Suppressing Columns

To suppress the first column and compare the files:

comm -1 list1.txt list2.txt

Output:

        banana
date
fig
grape

The -1 option removes lines that are only in list1.txt.

Example 3: Suppressing Multiple Columns

You can suppress more than one column:

comm -23 list1.txt list2.txt

Output:

apple
cherry

The -23 option suppresses both the second and third columns, displaying only the unique lines from list1.txt.

Example 4: No Check for Sorted Order

In cases where you are certain your files are sorted, you can skip the sorted order check:

comm -n list1.txt list2.txt

Example 5: Custom Output Delimiter

You can specify a custom delimiter to separate columns:

comm --output-delimiter=',' list1.txt list2.txt

Output:

apple,,,
,banana,,
cherry,,,
,date,,
,fig,,
,grape,,

Commas are now used as the column separator.

Combining comm with Other Commands

The comm command is often used in conjunction with other Unix commands. For instance, to count the number of common lines:

comm -12 list1.txt list2.txt | wc -l

Conclusion

The comm command offers a straightforward way to compare sorted files, making it a useful tool for system administrators, developers, and data analysts working with Linux. Its ability to suppress columns provides flexibility, enabling users to get precisely the comparison they need. By mastering comm and its options, you can efficiently work with sorted data, perform comparisons, and streamline your workflows on the Linux command line.

What Can You Do Next 🙏😊

If you liked the article, consider subscribing to Cloudaffle, my YouTube Channel, where I keep posting in-depth tutorials and all edutainment stuff for software developers.

YouTube @cloudaffle