Introduction
The zipfile
module in Python is used for compressing and decompressing files in the ZIP format. As ZIP is a very common format, this module is used quite frequently.
Here, I’ll document some usage methods for zipfile
, which will be convenient for both myself and others.
The Python zipfile
module is used for compressing and decompressing data encoded in the ZIP format. To perform related operations, you first need to instantiate a ZipFile object. ZipFile
accepts a string representing the compressed archive name as its required parameter. The second parameter is optional and indicates the open mode, similar to file operations. It has r
/w
/a
modes, representing read, write, and append, respectively. The default is r
, read mode.
There are two very important classes in zipfile
: ZipFile
and ZipInfo
. In most cases, we only need to use these two classes. ZipFile
is the main class used to create and read ZIP files, while ZipInfo
stores information about each file within the ZIP archive.
I. Basic Operations of These Two Classes
For example, to read a Python zipfile module, let’s assume filename
is a file path:
import zipfile
z = zipfile.ZipFile(filename, 'r')
# The second parameter 'r' means reading a zip file, 'w' means creating a zip file
for f in z.namelist():
print f
The code above reads the names of all files in a ZIP archive. z.namelist()
will return a list of all filenames within the archive.
Let’s look at another example:
import zipfile
z = zipfile.ZipFile(filename, 'r')
for i in z.infolist():
print i.file_size, i.header_offset
Here, z.infolist()
is used, which returns information about all files within the compressed archive as a list of ZipInfo
objects. A ZipInfo
object contains information about a file inside the archive, with commonly used attributes being filename
, file_size
, and header_offset
, representing the filename, file size, and the offset of file data within the compressed archive, respectively. In fact, z.namelist()
previously just read the filename
from ZipInfo
objects and returned them as a list.
To extract a file from a compressed archive, use the read
method of ZipFile
:
import zipfile
z = zipfile.ZipFile(filename, 'r')
print z.read(z.namelist()[0])
This reads the first file in z.namelist()
and prints it to the screen. Of course, you can also save it to a file. Below is the method for creating a ZIP archive, which is quite similar to the reading method:
import zipfile, os
z = zipfile.ZipFile(filename, 'w')
# Note that the second parameter here is 'w', and filename is the name of the compressed archive
Note that the second parameter here is w
, and filename
is the name of the compressed archive.
Suppose you want to add all files from a directory named testdir
to the archive (only files in the first-level subdirectory are added here):
if os.path.isdir(testdir):
for d in os.listdir(testdir):
z.write(testdir+os.sep+d)
z.close()
The code below is very simple. Now, consider a problem: if I add test/111.txt
to the compressed archive, but I want it to be placed at test22/111.txt
inside the archive, what should I do? This is where the second parameter of the Python ZipFile
module’s write
method comes into play. You just need to call it like this:
z.write("test/111.txt", "test22/111.txt")
II. Basic Operations of ZipFile and ZipInfo Classes
1. class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])
Creates a ZipFile
object, representing a ZIP file. The file
parameter indicates the path of the file or a file-like object; the mode
parameter specifies the mode for opening the ZIP file.
The default value is 'r'
, which means reading an existing ZIP file. It can also be 'w'
or 'a'
. 'w'
means creating a new ZIP document or overwriting an existing one.
import zipfile
f = zipfile.ZipFile(filename, 'r') # The second parameter 'r' means reading a zip file, 'w' or 'a' means creating a zip file
for f_name in f.namelist(): # z.namelist() returns a list of all filenames within the archive.
print(f_name)
# The code above reads the names of all files in a zip archive.
'a'
means appending data to an existing ZIP document. The compression
parameter indicates the compression method used when writing the ZIP document, and its value can be zipfile.ZIP_STORED
or zipfile.ZIP_DEFLATED
. If the ZIP file to be operated on exceeds 2GB, allowZip64
should be set to True
.
ZipFile
also provides the following commonly used methods and properties:
ZipFile.getinfo(name)
Retrieves information about a specified file within the ZIP document. Returns a zipfile.ZipInfo
object, which includes detailed file information.
ZipFile.infolist()
Retrieves information about all files within the ZIP document, returning a list of zipfile.ZipInfo
objects.
ZipFile.namelist()
Retrieves a list of all file names within the ZIP document.
ZipFile.extract(member[, path[, pwd]])
Extracts the specified file from the ZIP document to the current directory. The member
parameter specifies the name of the file to be extracted or its corresponding ZipInfo
object; the path
parameter specifies the folder where the extracted file will be saved; pwd
is the decompression password. The following example extracts all files from duoduo.zip
located in the program’s root directory to the D:/Work
directory:
import zipfile, os
f = zipfile.ZipFile(os.path.join(os.getcwd(), 'duoduo.zip')) # Concatenate to form a path
for file in f.namelist():
f.extract(file, r'd:/Work') # Extract files to d:/Work
f.close()
The image above demonstrates the usage of os.getcwd()
!
ZipFile.extractall([path[, members[, pwd]]])
Extracts all files from the ZIP document to the current directory. The default value for the members
parameter is a list of all file names within the ZIP document, but you can also set it yourself to select specific files to extract.
ZipFile.printdir()
Prints information about the ZIP document to the console.
ZipFile.setpassword(pwd)
Sets the password for the ZIP document.
ZipFile.read(name[, pwd])
Retrieves the binary data of the specified file within the ZIP document. The following example demonstrates the use of read()
. The ZIP document contains a text file duoduo.txt
. read()
is used to read its binary data, which is then saved to D:/duoduo.txt
.
import zipfile, os
zipFile = zipfile.ZipFile(os.path.join(os.getcwd(), 'duoduo.zip'))
data = zipFile.read('duoduo.txt')
# (lambda f, d: (f.write(d), f.close()))(open(r'd:/duoduo.txt', 'wb'), data) # One-line statement to write the file. Think about it! ~_~
with open(r'd:/duoduo.txt','wb') as f:
for d in data:
f.write(d)
zipFile.close()
ZipFile.write(filename[, arcname[, compress_type]])
Adds a specified file to the ZIP document. filename
is the file path, arcname
is the name to be saved within the ZIP document, and the compress_type
parameter indicates the compression method, whose value can be zipfile.ZIP_STORED
or zipfile.ZIP_DEFLATED
. The following example demonstrates how to create a ZIP document and add the file D:/test.doc
to the compressed document.
import zipfile, os
zipFile = zipfile.ZipFile(r'D:/test.zip'), 'w')
zipFile.write(r'D:/test.doc', 'name_to_save', zipfile.ZIP_DEFLATED)
zipFile.close()
ZipFile.writestr(zinfo_or_arcname, bytes)
writestr()
supports directly writing binary data to the compressed document.
2. Class ZipInfo
-
The
ZipFile.getinfo(name)
method returns aZipInfo
object, representing information about the corresponding file in the ZIP document. It supports the following attributes: -
ZipInfo.filename
: Get file name. -
ZipInfo.date_time
: Get last modification time of the file. Returns a tuple containing 6 elements: (year, month, day, hour, minute, second). -
ZipInfo.compress_type
: Compression type. -
ZipInfo.comment
: Document comment. -
ZipInfo.extr
: Extra field data. -
ZipInfo.create_system
: Get the system that created this ZIP document. -
ZipInfo.create_version
: Get the PKZIP version that created the ZIP document. -
ZipInfo.extract_version
: Get the PKZIP version required to extract the ZIP document. -
ZipInfo.reserved
: Reserved field, current implementation always returns 0. -
ZipInfo.flag_bits
: ZIP flag bits. -
ZipInfo.volume
: Volume label of the file header. -
ZipInfo.internal_attr
: Internal attributes. -
ZipInfo.external_attr
: External attributes. -
ZipInfo.header_offset
: File header offset. -
ZipInfo.CRC
: CRC-32 of the uncompressed file. -
ZipInfo.compress_size
: Get compressed size. -
ZipInfo.file_size
: Get uncompressed file size.
The following simple example illustrates the meaning of these attributes:
import zipfile, os
zipFile = zipfile.ZipFile(os.path.join(os.getcwd(), 'duoduo.zip'))
zipInfo = zipFile.getinfo('file_in_archive.txt')
print ('filename:', zipInfo.filename) # Get file name
print ('date_time:', zipInfo.date_time) # Get last modification time of the file. Returns a tuple containing 6 elements: (year, month, day, hour, minute, second)
print ('compress_type:', zipInfo.compress_type) # Compression type
print ('comment:', zipInfo.comment) # Document comment
print ('extra:', zipInfo.extra) # Extra field data
print ('create_system:', zipInfo.create_system) # Get the system that created this ZIP document.
print ('create_version:', zipInfo.create_version) # Get the PKZIP version that created the ZIP document.
print ('extract_version:', zipInfo.extract_version) # Get the PKZIP version required to extract the ZIP document.
print ('extract_version:', zipInfo.reserved) # Reserved field, current implementation always returns 0.
print ('flag_bits:', zipInfo.flag_bits) # ZIP flag bits.
print ('volume:', zipInfo.volume) # Volume label of the file header.
print ('internal_attr:', zipInfo.internal_attr) # Internal attributes.
print ('external_attr:', zipInfo.external_attr) # External attributes.
print ('header_offset:', zipInfo.header_offset) # File header offset.
print ('CRC:', zipInfo.CRC) # CRC-32 of the uncompressed file.
print ('compress_size:', zipInfo.compress_size) # Get compressed size.
print ('file_size:', zipInfo.file_size) # Get uncompressed file size.
zipFile.close()
III. Python Example: Using In-Memory Zipfile Objects to Package Files
import zipfile
import StringIO # Note: StringIO is for Python 2. For Python 3, use io.BytesIO or io.StringIO
class InMemoryZip(object):
def __init__(self):
self.in_memory_zip = StringIO.StringIO() # For Python 3, use io.BytesIO() for binary data
def append(self, filename_in_zip, file_contents):
# Get a handle to the in-memory zip in append mode
zf = zipfile.ZipFile(self.in_memory_zip, "a", zipfile.ZIP_DEFLATED, False)
# Write the file to the in-memory zip
zf.writestr(filename_in_zip, file_contents)
# Mark the files as having been created on Windows so that
# Unix permissions are not inferred as 0000
for zfile in zf.filelist:
zfile.create_system = 0
return self
def read(self):
self.in_memory_zip.seek(0)
return self.in_memory_zip.read()
def writetofile(self, filename):
f = file(filename, "w") # Note: 'file' is for Python 2. For Python 3, use open()
f.write(self.read())
f.close()
if __name__ == "__main__":
# Run a test
imz = InMemoryZip()
imz.append("test.txt", "Another test").append("test2.txt", "Still another")
imz.writetofile("test.zip")
Python Reading Zip Files
The following code demonstrates how to read a ZIP file using Python, print all files within the compressed archive, and read the first file from the compressed archive.
import zipfile
z = zipfile.ZipFile("zipfile.zip", "r")
# Print list of files in the zip file
for filename in z.namelist():
print('File:', filename)
# Read the first file in the zip file
first_file_name = z.namelist()[0]
content = z.read(first_file_name)
print(first_file_name)
print(content)
Python Writing/Creating Zip Files
Python primarily uses the write
function of ZipFile
to write ZIP files.
import zipfile
z = zipfile.ZipFile('test.zip', 'w', zipfile.ZIP_DEFLATED)
z.write('test.html')
z.close()
When creating a ZipFile
instance, there are 2 points to note:
- Use
'w'
or'a'
mode to open the ZIP file in a writable manner. - Compression modes are
ZIP_STORED
andZIP_DEFLATED
.ZIP_STORED
is merely a storage mode and does not compress files (this is the default value). If you need to compress files, you must useZIP_DEFLATED
mode.
IV. Python Method for Cracking Encrypted Zip Files
First, let’s create a file on the desktop.
We create a text file named q.txt
, then compress it, remembering to set a password during compression.
I’ll set the password to 123456
.
Using Python’s zipfile
module, we’ll write a ZIP file password cracking machine. We need to use the extractall
method from the ZipFile
class. This class and method are very useful for programming a password-protected ZIP file cracker. Please note that the extractall()
method takes an optional password parameter.
After importing the library, instantiate a new ZipFile
class with the filename of the password-protected ZIP file. To decompress this ZIP file, we use the extractall
method and provide the password in the optional pwd
parameter.
Create a .py
file in the root directory, then place our compressed file in the same directory. Project structure:
Our .py
file code:
import zipfile
zipFile = zipfile.ZipFile("q.zip","r") # This is our compressed file
zipFile.extractall(pwd="123456") # This is our password
This code essentially tries to decompress our compressed file with the given password. Most online tutorials write it this way, but when I use Python 3.6, I encounter an error during execution:
The error roughly means that the pwd
parameter expects a bytes
type, but it received a str
type, so it’s a type mismatch. Let’s convert the password to bytes
type. Our .py
file code will be as follows:
import zipfile
zipFile = zipfile.ZipFile("q.zip","r")
password = '123456'
zipFile.extractall(pwd=str.encode(password))
Now, let’s run the project again.
This time, there’s no error.
We can see that a new file, the one we compressed earlier, has appeared in our project’s root directory.
If you want to learn more about zipfile, you can click here to open the link.
Next, let’s continue to refactor. What happens if we execute this script with an incorrect password? Let’s add some error handling code to the script to display the error message.
import zipfile
zipFile = zipfile.ZipFile("q.zip","r")
try:
password = '123s456' # Incorrect password
zipFile.extractall(pwd=str.encode(password))
except Exception as ex:
print(ex)
Now, let’s look at our .py
file code, and we’ll deliberately write an incorrect password to test it and see the execution result.
Here, we can see the error message, which tells us the password is incorrect.
We can use the exception thrown due to an incorrect password to test whether our dictionary file (the zidian.text
that follows) contains the ZIP file’s password. After instantiating a ZipFile
class, we open the dictionary file, iterate, and test each word in the dictionary. If the extractall()
function executes without error, then print a message outputting the correct password. However, if the extractall()
function throws a password error exception, ignore this exception and continue testing the next password in the dictionary.
First, let’s create a zidian.text
file.
Next, we’ll write our password dictionary in the zidian.text
file, one password per line. The red part is our correct password.
Then, place our password dictionary into the project.
Next, we’ll continue to modify our script.
zipFile = zipfile.ZipFile("q.zip","r")
# Open our dictionary file
passFile = open('zidian.txt')
for line in passFile.readlines():
# Read each line of data (each password)
password = line.strip('n')
try:
zipFile.extractall(pwd=str.encode(password))
print('=========Password is:'+password+'n')
# If the password is correct, exit the program
exit(0)
except Exception as ex:
# Skip
pass
Next, let’s look at the execution result.
Haha, we have successfully cracked the ZIP file password! From here, it’s easy to see that as long as our dictionary contains the password, we can crack it.
We continue to optimize our project:
import zipfile
def extractFile(zFile,password):
try:
zFile.extractall(pwd=str.encode(password))
# If successful, return the password
return password
except:
return
def main():
zFile = zipfile.ZipFile("q.zip","r")
# Open our dictionary file
passFile = open('zidian.txt')
for line in passFile.readlines():
# Read each line of data (each password)
password = line.strip('n')
guess = extractFile(zFile,password)
if (guess):
print("=========Password is:"+password+"n")
exit(0)
if __name__=='__main__':
main()
This is much better! Next, I’ll provide code to generate all six-digit numeric passwords:
f = open('zidian.txt','w')
for id in range(1000000):
password = str(id).zfill(6)+'n'
f.write(password)
f.close()
After successful execution, we can see that our
zidian.txt
has been generated with numbers from 000000 to 999999. This means we can now crack any 6-digit numeric password for ZIP files!