微信公众号文章列表抓取方法

将公众号“查看历史消息”的网页,通过在微信分享到登陆web微信(https://wx2.qq.com/)的自己,然后查看网页元素,如图:

aa

看到A标签的href属性, 值为:

/cgi-bin/mmwebwx-bin/webwxcheckurl?requrl=http%3A%2F%2Fmp.weixin.qq.com%2Fmp%2Fgetmasssendmsg%3F__biz%3DMjM5MDMwMzIwMA%3D%3D%23wechat_webview_type%3D1%26wechat_redirect&skey=%40crypt_6aa23c5d_10e641a2bcf2ec7d4041015bbfaf1c31&deviceid=e846890001557767&pass_ticket=gs3O0nNq60Q6I1QjnQ%252BuLABJx3r%252FhI4wM%252BDu18SZj6lKj%252FZl2CEQkZjtCzghtAN0&opcode=2&scene=1&username=@6a993ed3aa2a8717bb313803d02212de04cc49b080617011a39c72094dd35d94

添加前缀 https://wx2.qq.com/ ,

https://wx2.qq.com/cgi-bin/mmwebwx-bin/webwxcheckurl?requrl=http%3A%2F%2Fmp.weixin.qq.com%2Fmp%2Fgetmasssendmsg%3F__biz%3DMjM5MDMwMzIwMA%3D%3D%23wechat_webview_type%3D1%26wechat_redirect&skey=%40crypt_6aa23c5d_10e641a2bcf2ec7d4041015bbfaf1c31&deviceid=e846890001557767&pass_ticket=gs3O0nNq60Q6I1QjnQ%252BuLABJx3r%252FhI4wM%252BDu18SZj6lKj%252FZl2CEQkZjtCzghtAN0&opcode=2&scene=1&username=@6a993ed3aa2a8717bb313803d02212de04cc49b080617011a39c72094dd35d94

访问后就会301到文章列表页,再在列表页抓取文章链接,就很容易了。

上面的链接,去掉无用参数后,简化为url :

https://wx2.qq.com/cgi-bin/mmwebwx-bin/webwxcheckurl?requrl=http%3A%2F%2Fmp.weixin.qq.com%2Fmp%2Fgetmasssendmsg%3F__biz%3DMjM5MDMwMzIwMA%3D%3D%23wechat_webview_type%3D1%26wechat_redirect&amp

requrl参数就是重定向的公众号的地址,其中 __biz  为公众号唯一的id标识。要访问这个url得到正确跳转,还必须设置登陆的Cookie,经验证,发现如下Cookie就有效:

curl -H "Cookie: wxuin=1616249307; wxsid=rgPoQoMYwOT9Qu5f; webwx_data_ticket=AQY+4MOfYXdYXfOeo0Lwh+Kw" "http://wx2.qq.com/cgi-bin/mmwebwx-bin/webwxcheckurl?requrl=http%3A%2F%2Fmp.weixin.qq.com%2Fmp%2Fgetmasssendmsg%3F__biz%3DMjM5MDMwMzIwMA%3D%3D%23wechat_webview_type%3D1%26wechat_redirect&amp"

这个Cookie可以通过登陆web版微信拿到。