Skip to main content

The Linux `join` Command: Combining Text Files on Common Fields

In Linux, the join command is a powerful utility that combines two files based on a common field, similar to a JOIN operation in SQL. It's particularly useful when dealing with tabular data that needs to be merged from two separate files.

Syntax

The basic syntax of the join command is:

join [OPTION]... FILE1 FILE2
  • FILE1 and FILE2 are the two files you want to join.
  • [OPTION]... represents the various options that can be applied to the join command.

Options

Here's a table of some common options for the join command:

Long OptionShorthandDescription
--nocheck-order-vDo not check that the input is correctly sorted.
--ignore-case-iIgnore differences in case when comparing fields.
--order-oFormat of the output.
--field-separator-tField separator character.
--check-orderCheck the sorted order of the input files.
--versionDisplay version information and exit.
--helpDisplay a help message and exit.

Creating Example Files

Let's create two files using vim that we can use in our examples:

  1. File 1: employees.txt

    Create the file and insert data:

    vim employees.txt

    Press i to insert and type:

    101 John
    102 Jane
    103 Doe

    Save with :wq.

  2. File 2: departments.txt

    Create the file and insert data:

    vim departments.txt

    Type in insert mode:

    101 Accounting
    102 Marketing
    104 Sales

    Save and exit as before.

Example 1: Basic Join Operation

To join the two files on the common field (employee ID):

join employees.txt departments.txt

Output:

101 John Accounting
102 Jane Marketing

Lines from each file with matching first fields are combined.

Example 2: Join with a Specified Field Separator

If your files use a separator other than whitespace, you can specify it with -t:

join -t ',' employees.csv departments.csv

(For this example to work, you'd need to have CSV files with comma-separated values.)

Example 3: Join and Output Specific Fields

To join files and output specific fields, you use -o:

join -o 1.1,2.2 employees.txt departments.txt

Output:

101 Accounting
102 Marketing

This shows only the employee ID from employees.txt and department name from departments.txt.

Example 4: Join without Sorting

Sometimes, you have unsorted files or do not wish to sort them. Use -v to join without checking for sorted order:

join -v 1 employees.txt departments.txt

This command will try to join without sorting, which may not produce correct results if the files are not already sorted on the join field.

Example 5: Outer Join

To perform a left outer join, showing all records from FILE1, use:

join -a 1 employees.txt departments.txt

Output:

101 John Accounting
102 Jane Marketing
103 Doe

The -a 1 option includes all lines from FILE1, even if there's no matching line in FILE2.

Example 6: Case-Insensitive Join

To join files without caring about case differences:

join -i employees.txt departments.txt

The -i option will ignore case when comparing fields for a match.

Combining join with Other Commands

You can also combine join with other Linux commands for more advanced data manipulation. For example, to sort the output, you might use:

join employees.txt departments.txt | sort

Or to count the number of joined lines:

join employees.txt departments.txt | wc -l

Conclusion

The join command is an essential tool for combining data from different sources in a Linux environment. It provides a flexible way to perform relational database-like operations without the need for complex software, making it invaluable for shell scripting and data processing tasks.

Remember to ensure your input files are appropriately sorted on the join field

What Can You Do Next 🙏😊

If you liked the article, consider subscribing to Cloudaffle, my YouTube Channel, where I keep posting in-depth tutorials and all edutainment stuff for software developers.

YouTube @cloudaffle