If you work with Microsoft Excel, you may have to merge two files. In fact, that is a common task with Excel: consolidate information in a single item. However, doing that can be daunting with native Excel functions. In this post, we see how we can join tables in Excel in a way that mimics the SQL join statement.
To join tables in Excel, we are going to use a command-line tool. This means you have to run the join from your command prompt or terminal, and not inside Excel. While this is not something most users do, it is not that complicated.
Excel Join Tables
The tool
The first thing we have to do is downloading the tool that can do the merge for us. The download contains an executable file and the source code in Python, in case you want to tweak the software yourself. Use the link below to start the download.
If you are on Windows, you should put the excelmerger.exe
C:\Windows\System32
Once you have done that, you can open the command prompt (Win+R
, cmd
How to use excelmerger
Now, we can see how to use this tool by excelmerger --help
As you can see, to join tables in Excel, you have to provide some mandatory and some optional parameters.
The mandatory parameters
- First File (
-ff
) indicates the first of the two files to merge into a single one. By default, excelmerger will merge the second file into the first. This means that, unless you specify otherwise, this file will be overridden with the final result. - First Sheet (
-fs
) indicates the name of the sheet within the first file. - Second File (
-sf
) is the file that will be merged into the first file. - Second Sheet (
-ss
) is the name of the sheet inside the second file that you want to merge. - Join Column (
-j
) indicates the column on which you want to do the join. In other words, the script will look at the same column in both files: when the column has the same value for both files, it will copy some data from the second to the first file. Note that you don’t have to provide the letter of the column, but the header you put in the first row (e.g. “Account Name” instead of “A”). - Merge Cell (
-m
) indicates which columns (by their name) to copy from second to the first file for every row that is joined. You can use more than one merge cell, but column names must match between the two files.
The optional parameters
- Save as (
--save-as
) allows you to not edit the first file and save the result to a third, new, file. - Dry run (
--dry-run
) shows the output but does not save anything, useful for testing as you don’t modify your files. - Allow override (
--allow-override
) will overwrite existing content in any cell when doing the copy, if not non-empty cells will not be overwritten with data from the second file. Active by default. - Max empty rows and columns (
--max-empty-rows
and--max-empty-columns
) tell the script how many empty rows and columns the script has to check anyway before considering everything as finished. By default, it is 10, but you may need to increase these numbers if your file is full of empty rows. - Highlight (
--highlight
) will set the background of any copied cell to red
See it in action
From this explanation, the script may seem a little complicated, but it isn’t. An example will clarify that for you. Imagine you have two excel files: users.xlsx
accounts.xlsx
What we want here is to enure all the balances are up to date by copying them from accounts into users. We can use the following command from the terminal, but before that ensure you close both files.
excelmerger ^
-ff "users.xlsx" -fs Sheet1 ^
-sf "accounts.xlsx" -ss Sheet1 ^
-j "Account ID" ^
-m "Balance" ^
--highlight ^
--verbose
Tip: ^
\
This produces the following output inside users.xlsx
.
As you can see, the script has overwritten all the balance column. All the cells modified by the script are in red. As you can see, it has correctly filled the cell D3 that used to be empty.
Under the hood
If you know Python, you may want to tweak or tune this script a little bit. The script is fairly simple, and it is only a bunch of files. However, here some guidelines to help you get started.
main.py
contains only the parsing of arguments.merger.py
contains the actual function running the merge, that instantiates two Side classes to represent the two “sides” of the merge.side.py
has all the intelligence. It describes the Side class that represents the Excel sheet. It has several methods: the one to absorb data from another Side, the one to export some data (to be imported from another Side), and the join method to find rows when there is a match.
Remember that the script is provided to you “as-is”, without any warranty. Thus, feel free to modify it to better address your needs.
In conclusion
Joining two tables in Excel can be very painful. Not anymore, with excelmerger, a simple CLI tool that you can use for free to merge two excel files. As a minimum, provide the name of the two files (and the sheets in them), a column on which do the join, and at least one column to copy from the second file into the first.
This tool helped me be more efficient and productive, as it saved me a lot of time spent doing brainless tasks. How are you planning to use it?