Recently, I wanted to back up a website, but PHP has file size limits for downloads, and I was too lazy to set up FTP to download it. So I thought about temporarily creating a subdomain site, then using Python (Python 3) and the requests library to directly download all files and folders under the root directory of the website, achieving the purpose of backup.
1. Install the requests library
pip install requests
2. Download all files and folders under a directory
The main part to handle here is the folders. We check if the link is a folder; if so, automatically create the folder and recursively continue. Otherwise, if it is a file, directly use requests.get to download it. Without further ado, here is the code:
import requests
import re
import os
import sys
def help(script):
text = 'python3 %s thttps://www.bobobk.com ./' % script
print(text)
def get_file(url, path): ## File download function
content = requests.get(url)
print("write %s in %s" % (url, path))
filew = open(path + url.split("/")[-1], 'wb')
for chunk in content.iter_content(chunk_size=512 * 1024):
if chunk: # filter out keep-alive new chunks
filew.write(chunk)
filew.close()
def get_dir(url, path): # Folder handling logic
content = requests.get(url).text
if "<title>Index of" in content:
sub_url = re.findall('href="(.*?)"', content)
print(sub_url)
for i in sub_url:
if "/" in i:
i = i.split("/")[0]
print(i)
if i != "." and i != "..":
if not os.path.exists(path + i):
os.mkdir(path + i)
get_dir(url + "/" + i, path + i + "/")
print("url:" + url + "/" + i + " nurl_path:" + path + i + "/")
else:
get_file(url + "/" + i, path)
else:
get_file(url, path)
if __name__ == '__main__':
if len(sys.argv) <= 1:
help(sys.argv[0])
exit(0)
else:
get_dir(sys.argv[1], "./")
At this point, the entire directory structure and files of the original website have been fully downloaded and restored locally.