使用Clash+Requests实现Python爬虫VPN

发表于 2022-06-14 分类于 Python 爬虫代理本文字数： 912 阅读时长 ≈ 2 分钟

前言

工作中对于需要VPN才能爬取的网站，该如何爬取该网站上的数据呢？

环境

Windows 11
Requests v2.27.1
Clash for Windows v0.19.20
Python v3.9.12
VPN订阅自备

教程

打开Clash，启用VPN

运行以下测试代码

import time
import urllib3

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36'
}

# 增加重试连接次数
requests.adapters.DEFAULT_RETRIES = 5
s = requests.Session()
# 关闭多余连接
s.keep_alive = False
# 取消验证证书
s.verify = False
# 关闭在设置了verify=False后的错误提示
urllib3.disable_warnings()

url = 'https://www.google.com/search?q=ip'
proxies = {
    'http': 'http://127.0.0.1:7890/',
    'https': 'http://127.0.0.1:7890/'
}

res = s.get(url=url, headers=headers, proxies=proxies)
time.sleep(2)
with open('daili.html', 'wb') as fp:
    fp.write(res.content)
print(res.status_code)

说明

端口 7890 为Clash客户端端口，可以在 Clash -->常规 页面中查看
代理模式默认为 规则