不管是使用 urllib3 或是第三方库 requests,始终无法完全去掉两个头部,一个是 User-Agent,一个是 Accept-Encoding,这是在 python 的底层库里面,检测到没有这两头部时会自动添加上去。
要想去掉,可以修改底层库,做一个 monkey patch,修改此行为所在的代码文件 http.client.HTTPConnection._send_request
,代码如下
import http.client
def _send_request(self, method, url, body, headers, encode_chunked):
# Honor explicitly requested Host: and Accept-Encoding: headers.
header_names = frozenset(k.lower() for k in headers)
skips = {}
if 'host' in header_names:
skips['skip_host'] = 1
if True: # 去掉 accept_encoding 头部
skips['skip_accept_encoding'] = 1
self.putrequest(method, url, **skips)
# chunked encoding will happen if HTTP/1.1 is used and either
# the caller passes encode_chunked=True or the following
# conditions hold:
# 1. content-length has not been explicitly set
# 2. the body is a file or iterable, but not a str or bytes-like
# 3. Transfer-Encoding has NOT been explicitly set by the caller
if 'content-length' not in header_names:
# only chunk body if not explicitly set for backwards
# compatibility, assuming the client code is already handling the
# chunking
if 'transfer-encoding' not in header_names:
# if content-length cannot be automatically determined, fall
# back to chunked encoding
encode_chunked = False
content_length = self._get_content_length(body, method)
if content_length is None:
if body is not None:
if self.debuglevel > 0:
print('Unable to determine size of %r' % body)
encode_chunked = True
self.putheader('Transfer-Encoding', 'chunked')
else:
self.putheader('Content-Length', str(content_length))
else:
encode_chunked = False
excluded_headers = ['User-Agent'] # 需要去掉的 header 头
# headers
for hdr, value in headers.items():
if hdr not in excluded_headers:
self.putheader(hdr, value)
if isinstance(body, str):
# RFC 2616 Section 3.7.1 says that text default has a
# default charset of iso-8859-1.
body = http.client._encode(body, 'body')
self.endheaders(body, encode_chunked=encode_chunked)
# 解决发送 http 请求自动带 User-Agent 和 Accept-Encoding 头的问题
http.client.HTTPConnection._send_request = _send_request
上面有中文注释的两处地方,就是我们修改的点。一个是设置 skip_accept_encoding,一个是添加 excluded_headers 判断。
在使用 requests 前,调用以上代码 patch,再像正常一样使用即可
session = requests.Session()
session.headers = {}
req = requests.Request('POST', url, data=data)
prepped = req.prepare()
print(prepped.headers)
res = session.send(prepped)