Memory Mapping with mmap Module

It’s called memory mapping (or mmap for short), and it’s like having superpowers when dealing with large files.

But first, let’s take a quick detour to talk about computer memory. You see, there are three main types of memory: physical, virtual, and shared. Physical memory is the RAM in your computer that stores data temporarily while it’s being used by programs. Virtual memory is an extension of physical memory that allows you to access more data than what can fit into physical memory at once. And then we have shared memory, which is a special type of memory that multiple processes or threads can share and modify simultaneously.

Now how mmap works. Essentially, it creates a mapping between a file on disk and a region of memory in your computer. This allows you to access the contents of the file as if they were stored directly in memory. And because memory is much faster than reading from a hard drive or SSD, this can result in significant performance improvements for certain types of operations.

So how do we use mmap? Let’s take a look at some examples! First, let’s say you have a large text file that you want to read and modify. Instead of reading the entire file into memory (which could be slow and inefficient), you can use mmap to create a mapping between the file and a region of memory:

# Import the mmap module
import mmap

# Define the filename of the large text file
filename = 'my_large_file.txt'

# Open the file in read mode and assign it to the variable 'f'
with open(filename, 'r') as f:
    # Get the size of the file in bytes and convert it to an integer
    size = int(f.seek(0, 2), base=10)
    # Reset the file pointer to the beginning
    f.seek(0)
    # Create a memory-mapped object for the entire file using the mmap module
    mmap_obj = mmap.mmap(f.fileno(), size)

    # Do some operations on the memory-mapped data...
    # This allows us to access and modify the file without loading it into memory
    # This can be more efficient for large files as it avoids reading the entire file into memory at once

Now you can access the contents of the file as if they were stored directly in memory:

# This script reads a portion of a memory-mapped file and prints its contents to the console.

# Import the necessary module for memory-mapping files
import mmap

# Open the file in read-only mode and map it to memory
with open('file.txt', 'r') as file:
    # Create a memory-mapped object using the file descriptor and size
    mmap_obj = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ)

# Read 1MB of data starting at byte offset 1MB
data = mmap_obj[1024*1024:1024*1025]

# Print the contents of the data to the console
print(data)

And because memory is much faster than reading from a hard drive or SSD, this can result in significant performance improvements for certain types of operations. But that’s not all! You can also use mmap to modify portions of a file without having to rewrite the entire thing:

# This script demonstrates how to use mmap to modify portions of a file without having to rewrite the entire thing.

# First, we import the mmap module which allows us to work with memory-mapped files.
import mmap

# Next, we open a file in read and write mode and create a memory-mapped object using the mmap module.
with open("example.txt", "r+b") as file:
    # The "r+b" mode allows us to read and write to the file.
    # The "with" statement ensures that the file is closed properly after we are done using it.
    # The memory-mapped object is created using the mmap.mmap() function and takes in the file object and the size of the file as parameters.
    mmap_obj = mmap.mmap(file.fileno(), 0)

# Now, we can write some data back to the memory-mapped object.
# First, we create a byte string containing "hello, world!" followed by a newline character.
data = b"hello, world!\n"

# Then, we use the mmap_obj to write the data to bytes offset 1MB and up to 1MB+1 (i.e., 1MB+1 byte).
# This means that the data will be written starting at the 1MB mark and will take up 1 byte of space.
mmap_obj[1024*1025:1024*1026] = data

# Finally, we close the memory-mapped object and the file.
mmap_obj.close()
file.close()

# By using mmap, we can modify specific portions of a file without having to rewrite the entire thing.
# This can result in significant performance improvements for certain types of operations.

And that’s it! With mmap, you can read and modify large files much faster than traditional file I/O operations. But be careful: because memory mapping creates a direct mapping between the contents of a file and physical memory, it can consume a lot of resources if used improperly. So use it sparingly and only when necessary!

SICORPS