I heavily use git to track my coding projects, articles, business work, and more. One of the beautiful things about git is that you can easily compare different states of your work by simply using its built-in diff functionality. Only two constraints need to be met in order to use this functionality: First, you need a git repository and second, the file needs to be tracked by the git repository.
But what if you only want to modify a single file, compare it to an older version and all of this without the need for a git repository? This is where this article comes into place. The goal is to create a diff-tool, which allows you to compare two versions of a file:
Furthermore, our diff-tool should be able to export the computed diff to an HTML-file. To be able to follow this article you need nothing else, but Python. This article was specifically written for Python 3.8.2 (CPython). The source code can be found on GitHub. Without further introduction, let’s jump in!
The Python standard library contains a module called
According to the documentation, this module provides classes and functions for comparing sequences.
Furthermore, various output formats are available .
While inspecting the module, the
unified_diff() function emerges from all other functions.
Having a look at the provided examples and generated outputs, it pretty much looks like the diff-computing function we are looking for.
It takes up to eight arguments, but only two are required:
@@) to be able to be processed properly by
* context lines are used to provide the user some context where the changes happened
In essence, the
unified_diff() function takes two lists of strings and compares them.
If they are equal, the delta is empty.
If there are any differences, a respective delta is returned.
Let’s take a simple example:
You are planning a pizza party together with your best friend and write down some ingredients you need to buy first.
For simplicity, the shopping list is a simple text file (
my_shopping_list.txt), which looks like this:
You send it over to your friend and he adds one ingredient as no one of you has it already at home:
Furthermore, he recognizes your typo and corrects it as well.
To be able to pass the changes properly to our diff-tool, he makes a copy of the list and renames it to
Here is the final shopping list:
cheese tomatoes salami
Of course, this is a fairly simple example and the texts are not very long, so it can be easily processed by a human.
But let’s have fun and follow the example.
To compute the diff between both files, we read them into memory and pass them to the
# shopping_list_diff.py import difflib import sys file1 = open("my_shopping_list.txt").readlines() file2 = open("friends_shopping_list.txt").readlines() delta = difflib.unified_diff(file1, file2) sys.stdout.writelines(delta)
First, we import
Second, we read the content of both files and save them to separate variables,
As we need lists of strings, we use
Subsequently, we compute the delta of both lists and write it to
Executing the script at hand results in the following output:
$ python shopping_list_diff.py --- +++ @@ -1,2 +1,3 @@ cheese -tomates +tomatoes +salami
From the output, the word
cheese was not touched, but is used as a context line because the following line was modified.
tomates has been removed, and
tomatoes as well as
salami were added.
If you look at the header of the printed delta, you can see, that we do not get any information about what
+++ stand for or which files they represent.
Let’s adjust our script by adding the file names to the
delta = difflib.unified_diff(file1, file2, "my_shopping_list.txt", "friends_shopping_list.txt")
Now, running the script results in:
$ python shopping_list_diff.py --- my_shopping_list.txt +++ friends_shopping_list.txt @@ -1,2 +1,3 @@ cheese -tomates +tomatoes +salami
Great! We implemented a simple script computing and printing the difference between two file contents. Let’s move on and turn it into a command-line tool.
With the purpose of turning our script into a useful command-line tool, we utilize Python’s
First of all, we put our previous written code into a function called
create_diff(), which accepts two arguments
Path objects .
We use the passed
Path objects to read their content and get the name of the provided files using the
As our little script is now more general purpose oriented and not limited to shopping lists anymore, we put our code into a new file called
diff_tool.py (which is a more suitable name for our script).
So far, the script looks like this:
# diff_tool.py import difflib import sys from pathlib import Path def create_diff(old_file: Path, new_file: Path): file_1 = open(old_file).readlines() file_2 = open(new_file).readlines() delta = difflib.unified_diff(file_1, file_2, old_file.name, new_file.name) sys.stdout.writelines(delta)
Next, we define a new function
main(), which is responsible for the general workflow.
# previous diff_tool.py code import argparse def main(): parser = argparse.ArgumentParser() parser.add_argument("old_file_version") parser.add_argument("new_file_version") args = parser.parse_args() old_file = Path(args.old_file_version) new_file = Path(args.new_file_version) create_diff(old_file, new_file) if __name__ == "__main__": main()
At first, a new argument parser is defined.
We tell the parser to accept two arguments
Both are required.
parse_args() parses the command-line input and converts the input into the correct format.
Subsequently, both command-line arguments are accessed and converted into
create_diff() is called with
new_file as arguments.
Note: If you want to learn more about the
argparsemodule, I can highly recommend Python’s
argparsetutorial , which provides a more gentle introduction to Python command-line parsing.
Now, if we execute the script without any arguments, it shows us, which arguments are required:
$ python diff_tool.py usage: diff_tool1.py [-h] old_file_version new_file_version diff_tool1.py: error: the following arguments are required: old_file_version, new_file_version
Providing both shopping lists results still in the desired output:
$ python diff_tool.py my_shopping_list.txt friends_shopping_list.txt --- my_shopping_list.txt +++ friends_shopping_list.txt @@ -1,2 +1,3 @@ cheese -tomates +tomatoes +salami
So far, we build a simple diff-tool by turning your short script from the beginning into a command-line tool - cool! Now, we will add some more lines to support HTML output.
difflib module provides an
HtmlDiff class, which can be used to create an HTML table (or a complete HTML file containing the table) showing a side by side, line by line comparison of text with inter-line and intra-line change highlights.
In our example, we use the
HtmlDiff.make_file() function, which returns a string representing a complete HTML file. The latter highlights any differences line by line.
Therefore, we extend our script as follows:
# diff_tool.py import argparse import difflib import sys from pathlib import Path def create_diff(old_file: Path, new_file: Path, output_file: Path = None): file_1 = open(old_file).readlines() file_2 = open(new_file).readlines() if output_file: delta = difflib.HtmlDiff().make_file( file_1, file_2, old_file.name, new_file.name ) with open(output_file, "w") as f: f.write(delta) else: delta = difflib.unified_diff(file_1, file_2, old_file.name, new_file.name) sys.stdout.writelines(delta) def main(): parser = argparse.ArgumentParser() parser.add_argument("old_file_version") parser.add_argument("new_file_version") parser.add_argument("--html", help="specify html to write to") args = parser.parse_args() old_file = Path(args.old_file_version) new_file = Path(args.new_file_version) if args.html: output_file = Path(args.html) else: output_file = None create_diff(old_file, new_file, output_file) if __name__ == "__main__": main()
create_diff() function now takes an additional third parameter
output_file, which is also a
This will be the file, we write our HTML diff into.
We check whether an
output_file was passed.
If so, we compute the diff in HTML format and save it to the passed file.
Note: We use the
wmode for writing. If the file already exists, it is truncated beforehand .
output_file was passed, we compute the unified diff and write it to
We extend the
main() function by registering an additional, optional command-line argument
--html taking a filename as input.
If a filename is provided, it is converted into a
Path object and passed to
After executing the following command, you have a
diff.html file in your current working directory, which you can open with your favourite browser to see the actual diff.
$ python diff_tool.py my_shopping_list.txt friends_shopping_list.txt --html diff.html
Congratulations, you have made it through the article!
While reading the article you learned how to compute a simple diff using Python’s
Furthermore, you were able to turn your little diff-script into a command-line tool using Python’s
Subsequently, you added a few lines of code to also support HTML as an output format.
You can check out the
difflib documentation , get to know varying ways to compute diffs, search for other kinds of diffs, and extend your diff-tool even further.
Additionally, you can check out the article’s GitHub repository and compute the diff between
Do you find all changes?
I hope you enjoyed reading the article. Make sure to share it with your friends and colleagues. If you have not already, consider following me on Twitter where I am @DahlitzF. Stay curious and keep coding!