Python 请求 HTTP 时如何去掉 User-Agent

不管是使用 urllib3 或是第三方库 requests,始终无法完全去掉两个头部,一个是 User-Agent,一个是 Accept-Encoding,这是在 python 的底层库里面,检测到没有这两头部时会自动添加上去。

要想去掉,可以修改底层库,做一个 monkey patch,修改此行为所在的代码文件 http.client.HTTPConnection._send_request ,代码如下

import http.client

def _send_request(self, method, url, body, headers, encode_chunked):
    # Honor explicitly requested Host: and Accept-Encoding: headers.
    header_names = frozenset(k.lower() for k in headers)
    skips = {}
    if 'host' in header_names:
        skips['skip_host'] = 1
    if True:  # 去掉  accept_encoding  头部
        skips['skip_accept_encoding'] = 1

    self.putrequest(method, url, **skips)

    # chunked encoding will happen if HTTP/1.1 is used and either
    # the caller passes encode_chunked=True or the following
    # conditions hold:
    # 1. content-length has not been explicitly set
    # 2. the body is a file or iterable, but not a str or bytes-like
    # 3. Transfer-Encoding has NOT been explicitly set by the caller

    if 'content-length' not in header_names:
        # only chunk body if not explicitly set for backwards
        # compatibility, assuming the client code is already handling the
        # chunking
        if 'transfer-encoding' not in header_names:
            # if content-length cannot be automatically determined, fall
            # back to chunked encoding
            encode_chunked = False
            content_length = self._get_content_length(body, method)
            if content_length is None:
                if body is not None:
                    if self.debuglevel > 0:
                        print('Unable to determine size of %r' % body)
                    encode_chunked = True
                    self.putheader('Transfer-Encoding', 'chunked')
                self.putheader('Content-Length', str(content_length))
        encode_chunked = False
    excluded_headers = ['User-Agent']  # 需要去掉的 header 头
    # headers
    for hdr, value in headers.items():
        if hdr not in excluded_headers:
            self.putheader(hdr, value)
    if isinstance(body, str):
        # RFC 2616 Section 3.7.1 says that text default has a
        # default charset of iso-8859-1.
        body = http.client._encode(body, 'body')
    self.endheaders(body, encode_chunked=encode_chunked)

# 解决发送 http 请求自动带 User-Agent 和 Accept-Encoding 头的问题
http.client.HTTPConnection._send_request = _send_request

上面有中文注释的两处地方,就是我们修改的点。一个是设置 skip_accept_encoding,一个是添加 excluded_headers 判断。

在使用 requests 前,调用以上代码 patch,再像正常一样使用即可

session = requests.Session()
session.headers = {}
req = requests.Request('POST', url, data=data)
prepped = req.prepare()
res = session.send(prepped)