How to Use Difflib in Python

Difflib is a built-in Python library that allows you to compare two strings or sequences of characters and find the differences between them. It’s like having a magic wand for finding out what changed in your code, but without all the fancy spells and wizardry.

So how do we use Difflib? Let me show you! First, let’s import it:

# Import the difflib library, which allows us to compare two strings or sequences of characters and find the differences between them.
import difflib

# Create two strings to compare
string1 = "Hello, world!"
string2 = "Hello, everyone!"

# Use the SequenceMatcher class from the difflib library to compare the two strings
# and store the results in a variable called "differences"
differences = difflib.SequenceMatcher(None, string1, string2)

# Print the ratio of similarity between the two strings
print(differences.ratio())

# Output: 0.9090909090909091

# The ratio represents the percentage of similarity between the two strings.
# In this case, the two strings are 90.9% similar.

# We can also use the get_opcodes() method to get a list of tuples that represent the differences between the two strings.
# Each tuple contains information about the type of difference (insert, delete, replace) and the indices of the affected characters.
# We can use this information to visualize the differences between the two strings.
differences_list = differences.get_opcodes()

# Print the list of differences
print(differences_list)

# Output: [('equal', 0, 6, 0, 6), ('replace', 6, 7, 6, 9), ('equal', 7, 13, 9, 15)]

# The first tuple indicates that the first 6 characters of both strings are equal.
# The second tuple indicates that there is a replacement at index 6 in string1 and indices 6-9 in string2.
# The third tuple indicates that the remaining characters (7-13 in string1 and 9-15 in string2) are equal.

# We can use a for loop to iterate through the list of differences and print them in a more readable format.
for tag, i1, i2, j1, j2 in differences_list:
    # Print the type of difference
    print(tag)
    # Print the affected characters in string1
    print(string1[i1:i2])
    # Print the affected characters in string2
    print(string2[j1:j2])

# Output:
# equal
# Hello,
# Hello,
# replace
# w
# everyone
# equal
# orld!
# orld!

Now that we have Difflib in our toolbox, let’s say we want to compare two files `file1.txt` and `file2.txt`. We can do this by reading the contents of both files into memory using Python’s built-in functions:

# Import the Difflib library to use its functions for comparing files
import difflib

# Open the first file in read mode and assign its contents to the variable text1
with open('file1.txt', 'r') as f1:
    text1 = f1.read()

# Open the second file in read mode and assign its contents to the variable text2
with open('file2.txt', 'r') as f2:
    text2 = f2.read()

# Now we can use the Difflib library's compare function to compare the two files
# and store the results in a variable called diff
diff = difflib.compare(text1, text2)

# We can then print out the differences between the two files
for line in diff:
    print(line)

# The output will show the differences between the two files, including added or removed lines
# and any changes made to existing lines. This can be useful for identifying changes made to files.

Once we have the contents of both files, we can use Difflib’s `compare()` function to compare them and get a list of differences between them:

# Import the difflib library to use its compare function
import difflib

# Use the open() function to open the first file and assign its contents to the variable text1
with open('file1.txt', 'r') as file1:
    text1 = file1.read()

# Use the open() function to open the second file and assign its contents to the variable text2
with open('file2.txt', 'r') as file2:
    text2 = file2.read()

# Use the compare() function from the difflib library to compare the contents of the two files and assign the result to the variable difference_list
difference_list = difflib.compare(text1, text2)

# Print the list of differences between the two files
print(difference_list)

# The difflib library's compare() function takes in two strings as parameters and returns a list of differences between them. 
# The open() function is used to open and read the contents of the two files. 
# The with statement ensures that the files are automatically closed after use. 
# The read() function is used to read the contents of the files and assign them to the variables text1 and text2. 
# The result of the comparison is stored in the variable difference_list and then printed.

The `ndiff()` function returns an iterator that generates the difference between two sequences (in this case, our strings). The output is in a format called “unified diff”, which looks like this:

# Import the ndiff function from the difflib library
from difflib import ndiff

# Define two strings to compare
string1 = "Hello, world!"
string2 = "Hello, everyone!"

# Use the ndiff function to generate the difference between the two strings
diff = ndiff(string1, string2)

# Loop through the output of the ndiff function
for line in diff:
    # Check if the line starts with a "+" or "-" character
    if line.startswith("+"):
        # If it starts with "+", it means the line was added in string2
        print("This line was added.")
    elif line.startswith("-"):
        # If it starts with "-", it means the line was removed from string1
        print("This line was removed.")
    elif line.startswith("!"):
        # If it starts with "!", it means the line was changed in string2
        print("This line was changed.")
    elif line.startswith("?"):
        # If it starts with "?", it means the line may have been changed or deleted
        print("This line may have been changed or deleted.")
    else:
        # If it doesn't start with any of the above characters, it means the line is unchanged
        print("This line is unchanged.")

To print out the differences, we can use Python’s built-in `print()` function and loop through each item in our list:



# Loop through each item in the difference list
for diff_item in difference_list:
    # Check if the item contains a '+' symbol, indicating an addition
    if '+' in diff_item:
        # Print the item as it is, since it is an addition
        print(diff_item)
    # Check if the item contains a '-' symbol, indicating a deletion
    elif '-' in diff_item:
        # Print the item as it is, since it is a deletion
        print(diff_item)
    # If the item does not contain a '+' or '-' symbol, it may have been changed or deleted
    else:
        # Check if the item contains a '!' symbol, indicating a change
        if '!' in diff_item:
            # Print '!' before the item to indicate a change
            print('! ', end='')
        # Print the item as it is, since it may have been changed or deleted
        print(diff_item, end='')

# The purpose of this script is to print out the differences in a list, using Python's built-in `print()` function. The script loops through each item in the list and checks for specific symbols to determine if the item is an addition, deletion, or change. Annotations have been added to explain the purpose of each code segment.

And that’s it! You now know how to use Difflib in Python to compare two files and find the differences between them. It may not be as exciting as a magic wand or wizardry spells, but it sure is useful for finding out what changed in your code without all the hassle.

SICORPS