Creating Your Own Diff-Tool Using Python

In fact, it can be pretty fun and rewarding once you get the hang of it.

Before anything else: what’s a diff tool? Well, if you’re unfamiliar with the term, let me explain. A diff tool (short for “difference”) is essentially a program that compares two files or pieces of text and highlights any differences between them. This can be incredibly useful when working on code projects, as it allows you to easily see what changes have been made since your last commit or pull request.

So how do we create our own diff tool using Python? Well, first things first: let’s import the necessary modules. We’ll need `sys` for command-line arguments and `argparse` for parsing them. And of course, we’ll be using `difflib`, which is a built-in module in Python that provides classes and functions for comparing sequences (more on this later).

# Importing necessary modules
import sys # importing the sys module for command-line arguments
from argparse import ArgumentParser # importing the ArgumentParser class from the argparse module for parsing command-line arguments
import difflib # importing the difflib module for comparing sequences

# Creating an ArgumentParser object
parser = ArgumentParser(description='Compare two files and display the differences.')

# Adding arguments to the parser
parser.add_argument('file1', help='first file to compare') # adding a positional argument for the first file to compare
parser.add_argument('file2', help='second file to compare') # adding a positional argument for the second file to compare
parser.add_argument('-o', '--output', help='output file to save the differences') # adding an optional argument for specifying an output file to save the differences

# Parsing the command-line arguments
args = parser.parse_args()

# Reading the contents of the first file
with open(args.file1, 'r') as f1:
    file1 = f1.readlines()

# Reading the contents of the second file
with open(args.file2, 'r') as f2:
    file2 = f2.readlines()

# Using difflib to compare the two files
differences = difflib.unified_diff(file1, file2, fromfile=args.file1, tofile=args.file2) # creating a generator object that contains the differences between the two files

# Displaying the differences
for line in differences:
    print(line)

# Saving the differences to an output file if specified
if args.output:
    with open(args.output, 'w') as f:
        f.writelines(differences)

Now let’s create our main function, which will be responsible for handling command-line arguments and calling the `create_diff()` function we’ll write in a bit.

# Main function to handle command-line arguments and call create_diff() function
def main():
    # Parse command-line arguments using argparse
    parser = ArgumentParser(description='Create a diff between two files') # Creates an ArgumentParser object with a description
    parser.add_argument('old', help='The old file to compare against') # Adds an argument for the old file
    parser.add_argument('new', help='The new file with changes') # Adds an argument for the new file
    args = parser.parse_args() # Parses the arguments and stores them in the args variable
    
    # Call create_diff function and print output
    diff = create_diff(args.old, args.new) # Calls the create_diff function with the old and new file arguments and stores the output in the diff variable
    print(diff) # Prints the output of the create_diff function

# Call the main function
main()

As you can see, we’re using `argparse` to parse our command-line arguments (the old file and the new file). We then call the `create_diff()` function with these two files as arguments, which will return a string containing the diff output. Finally, we print this output to the console.

Now let’s write that `create_diff()` function! This is where things get interesting.

# This function takes in two file paths as arguments and returns a string containing the diff output between the two files.
def create_diff(old_file: str, new_file: str) -> str:
    # Open the old file and read its contents into a list
    with open(old_file, 'r') as f1:
        old = f1.readlines()
    
    # Open the new file and read its contents into a list
    with open(new_file, 'r') as f2:
        new = f2.readlines()
    
    # Use difflib's unified_diff function to compute the differences between the two files
    d = difflib.unified_diff(old, new)
    
    # Convert the output to a string and return it
    return '\n'.join([line for line in d])

Here we’re reading in the old and new files using `open()`, which returns a file object that allows us to read from the file. We then convert these lines into lists, so we can easily compare them later on.

Next, we use `difflib`’s `unified_diff()` function to compute the diff between our old and new files. This function returns an iterator over strings that represent the differences between the two sequences (in this case, the lines in each file). We then convert these strings into a list using a list comprehension, so we can easily join them together later on.

Finally, we return the resulting string as output from our `create_diff()` function. And that’s it! Our diff tool is complete.

But wait what if we want to support HTML output? Well, luckily for us, Python’s `difflib` module provides an HtmlDiff class that can be used to create an HTML table showing a side by side, line by line comparison of text with inter-line and intra-line change highlights.

# Import the HtmlDiff class from the difflib module
from difflib import HtmlDiff

# Define the main function
def main():
    # Parse command-line arguments using argparse
    # Create an ArgumentParser object and set the description
    parser = ArgumentParser(description='Create a diff between two files')
    # Add arguments for the old and new files
    parser.add_argument('old', help='The old file to compare against')
    parser.add_argument('new', help='The new file with changes')
    # Parse the arguments and store them in the args variable
    args = parser.parse_args()
    
    # Call the create_diff function and store the output in the diff variable
    diff = create_diff(args.old, args.new)
    # Create an HtmlDiff object and use the make_file method to generate an HTML table
    # The first two arguments are left blank to indicate that we are comparing two files
    # The last two arguments are the old and new files
    html = HtmlDiff().make_file('', '', args.old, args.new)
    # Print the HTML output
    print(html)

Here we’re using `HtmlDiff()` to generate an HTML table of the differences between our two files. We then pass this output to Python’s built-in `print()` function, which will display it in a web browser (assuming you have one open).

And that’s all there is to it! With just a few lines of code and some basic knowledge of Python, we can create our own diff tool using the power of difflib. So go ahead give it a try and see what kind of magic you can make happen with this awesome library.

SICORPS