让urllib2使用自定义的DNS

今天用urllib2去抓网页的时候,发现非常慢,而用浏览器打开网页就非常快,所以我怀疑urllib2在进行DNS解析这一步骤慢了,因为浏览器它自己会对DNS进行缓存的。

在不改本地hosts文件的情况下,可以用代码设置来改变urllib2的DNS解析逻辑。

示例:

import urllib2
import httplib
import socket

def MyResolver(host):
  if host == 'news.bbc.co.uk':
    return '66.102.9.104' # Google IP
  else:
    return host

class MyHTTPConnection(httplib.HTTPConnection):
  def connect(self):
    self.sock = socket.create_connection((MyResolver(self.host),self.port),self.timeout)
class MyHTTPSConnection(httplib.HTTPSConnection):
  def connect(self):
    sock = socket.create_connection((MyResolver(self.host), self.port), self.timeout)
    self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)

class MyHTTPHandler(urllib2.HTTPHandler):
  def http_open(self,req):
    return self.do_open(MyHTTPConnection,req)

class MyHTTPSHandler(urllib2.HTTPSHandler):
  def https_open(self,req):
    return self.do_open(MyHTTPSConnection,req)

opener = urllib2.build_opener(MyHTTPHandler,MyHTTPSHandler)
urllib2.install_opener(opener)

f = urllib2.urlopen('http://news.bbc.co.uk')
data = f.read()
from lxml import etree
doc = etree.HTML(data)

>>> print doc.xpath('//title/text()')
['Google']

来源:http://stackoverflow.com/questions/2236498/tell-urllib2-to-use-custom-dns