利用phantomjs进行网页截图

在开发中,有这样一个需求,就是分享某个网页内容到微博上,需要将网页截图下来,当微博配图发上去。phantomjs刚好可以满足这个需求。

phantomjs地址: https://github.com/ariya/phantomjs/

ubuntu中安装phantomjs:

sudo apt-get install phantomjs

新建一个js文件 github.js:

var page = require('webpage').create();
page.open('http://github.com/', function () {
    page.render('github.png');
    phantom.exit();
});

然后运行:

phantomjs github.js

传递参数给js文件:

//用phantom库,将html页面保存为图片
// 参数1 为网页完整的url地址  参数2 为图片保存的完整路径
var page = require('webpage').create();

var args = require('system').args;
args.forEach(function(arg, i) {
    if (i==1){
        address = arg;
    }
    else if(i==2){
        dest_filename = arg;
    }
});

page.settings.userAgent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25';
//page.customHeaders = {'Referer': 'localhost'};
page.open(address, function () {
    page.render(dest_filename);
    phantom.exit();
});
console.log(dest_filename);

可以通过python来调用phantomjs:

def timeout_command(command, timeout):

    """call shell-command and either return its output or kill it
    if it doesn't normally exit within timeout seconds and return None"""

    cmd = command
    start = datetime.datetime.now()
    process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
    while process.poll() is None:
        time.sleep(0.2)
        now = datetime.datetime.now()
        if (now - start).seconds> timeout:
            try:
                os.kill(process.pid, signal.SIGKILL)
                os.waitpid(-1, os.WNOHANG)
            except:
                pass
            return ''
    return process.stdout.readlines()

''' 将页面保存为图片 '''
def html2jpg(url):
    filename = '%s.jpg' % time.time()
    imgdir = '/tmp/screenshots/'
    if not os.path.isdir(imgdir):
        os.makedirs(imgdir)
    filepath = os.path.join(imgdir, filename)
    jspath = os.path.join(os.path.dirname(__file__), 'html2jpg.js')
    cmd = 'phantomjs %s "%s" %s' % (jspath, url, filepath)
    ret = timeout_command(cmd, timeout=5)
    if not ret:
        return None
    ret = ret[0].strip()
    if not str(ret) == str(filepath):
        return None
    img_url = '/tmp/screenshots/%s' % filename
    return img_url

如果运行 phantomjs 命令出现类似 X Server打开失败的错误,将phantomjs升级到最新版本就好了。

解决中文字体问题:apt-get install xfonts-wqy 

知乎用的另一个库 wkhtmltojpg

http://www.zhihu.com/question/21455769

python有个类似的库:

https://github.com/jeanphix/Ghost.py

不过我试了一下,好像有些问题,还不太稳定和成熟。