This can be a real time saver if you’re dealing with large data sets or streaming data from external sources.
In previous versions of Python, attempting to write to a zip file while it is being read could result in unexpected behavior and errors. However, as of version 3.9, this issue has been resolved thanks to the addition of a new `seekable_readers` attribute that allows you to specify which readers are seekable when writing to the archive.
Here’s an example:
# Import necessary libraries
import os # Importing the os library to work with file paths and directories
import tempfile # Importing the tempfile library to create temporary directories
from io import BytesIO # Importing the BytesIO class from the io library to work with binary data
from zipfile import ZipFile, ZIP_DEFLATED # Importing the ZipFile class and ZIP_DEFLATED constant from the zipfile library to work with zip files
# Create a temporary directory for our test data
tempdir = tempfile.mkdtemp() # Using the mkdtemp() function from the tempfile library to create a temporary directory and assigning it to the variable tempdir
os.chdir(tempdir) # Using the chdir() function from the os library to change the current working directory to the temporary directory we just created
# Generate some sample data to write to the archive
data1 = b"This is some sample data 1." # Creating a bytes object and assigning it to the variable data1
with open("sample_data_1", "wb") as f: # Using the open() function to create a new file named "sample_data_1" in write binary mode and assigning it to the variable f
f.write(data1) # Writing the data1 bytes object to the file using the write() method
# Create a zip file and add our sample data to it
with ZipFile('output.zip', 'w') as archive: # Using the ZipFile class to create a new zip file named "output.zip" in write mode and assigning it to the variable archive
with open("sample_data_1", "rb") as f: # Using the open() function to open the "sample_data_1" file in read binary mode and assigning it to the variable f
# Add the first file using seekable_readers=False, since we're reading from a non-seekable source (the file object)
archive.writestr('file1', f.read(), seekable_readers=False) # Using the writestr() method to write the contents of the "sample_data_1" file to the zip file with the name "file1" and specifying that the reader is not seekable
with open("sample_data_2", "wb") as f: # Using the open() function to create a new file named "sample_data_2" in write binary mode and assigning it to the variable f
# Add the second file using seekable_readers=True, since we're reading from a seekable source (the BytesIO object)
archive.writestr('file2', b"This is some sample data 2.", seekable_readers=True) # Using the writestr() method to write the bytes object to the zip file with the name "file2" and specifying that the reader is seekable
with open("sample_data_3", "wb") as f: # Using the open() function to create a new file named "sample_data_3" in write binary mode and assigning it to the variable f
# Add the third file using seekable_readers=None, since we're reading from a non-seekable source (the file object), but we don't need to specify whether it's seekable or not because Python will automatically detect that it isn't.
archive.writestr('file3', data1) # Using the writestr() method to write the data1 bytes object to the zip file with the name "file3" and letting Python automatically detect if the reader is seekable or not.
In this example, we first create some sample data and write it to a temporary directory using the `open()` function. We then open our output zip file in write mode and add three files to it: “sample_data_1”, which is read from a non-seekable source (the file object), “sample_data_2”, which is read from a seekable source (a BytesIO object), and “sample_data_3”, which is also read from a non-seekable source, but we don’t need to specify whether it’s seekable or not because Python will automatically detect that it isn’t.
This article about the improved compatibility for streamed files in zipfile sounds interesting. Maybe something related to working with large datasets?