requests库

调用参数

建议copy了 curl后, 从 https://curl.trillworks.com/ 获取代码修改

SSLError: HTTPSConnectionPool

参考: https://www.cnblogs.com/hum0ro/p/9536033.html

方案1:添加参数 verify=False

response = requests.get('http://www.baidu.com/', headers = header, verify=False)  

会有提示 InsecureRequestWarning: Unverified HTTPS request is being made to host, 消除该警告:

import urllib3
urllib3.disable_warnings()

# 在请求代码 requests.get(url...) 前添加如下代码即可
requests.packages.urllib3.disable_warnings()
r = requests.get(url...)

方案2:安装如下的模块,不一定有用

conda install cryptography pyOpenSSL certifi

10060错误: ProtocolError: ('Connection aborted.', TimeoutError(10060...

可以尝试的解决方案:

  • 延长访问频率: time.sleep(15)
  • 随机切换User-Agent:
 user_agent_list = ["Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36",
                    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
                    "Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/61.0",
                    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36",
                    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36",
                    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36",
                    "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
                    "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.5; en-US; rv:1.9.2.15) Gecko/20110303 Firefox/3.6.15",
                    ]
                headers['User-Agent'] = random.choice(user_agent_list)

UnicodeEncodeError

报错信息:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2026' in position 30: ordinal not in range(256)

有可能是 User-Agent 等参数里有特殊字符, 比如下面的 (…):

"Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/61.0",

ConnectionError

报错信息类似:

ConnectionError: HTTPSConnectionPool(host='www.xxx.org', port=443): Max retries exceeded with url

http的连接数超过最大限制,默认的情况下连接是Keep-alive的,所以这就导致了服务器保持了太多连接而不能再新建连接. 解决方式:

  • header中不使用持久连接: 'Connection': 'close',
  • 请求前设置: requests.adapters.DEFAULT_RETRIES = 5

类似连接这类问题,使用JMS这类代理一般都可以解决,只是连接速度有可能降低.

© Licensed under CC BY-NC-SA 4.0

退潮时, 便可知道谁在裸泳。——巴菲特

发表我的评论
取消评论
表情

Hi,您需要填写昵称和邮箱!