爬虫报404问题:

在进行爬虫爬取数据的过程中,使用语句:

r = requests.get(url, timeout=60, headers=headers, stream=True)
# print(r.status_code)
open(r'D:\us\{}\{}\{}\img\{}.jpg'.format(year, mouth_day, id, l), 'wb').write(r.content)  # 将内容写入图片

获取网页数据,爬取数据,打印网页响应码,返回404问题

解决方案:

经过不断的尝试,发现报错问题,是因为header头部携带的信息问题:

在使用之前运行的代码的过程中,如今重新使用,对于Cookie要及时根据自己的浏览器进行更新。

headers = {
            "connection": "close",
            "Cookie": "JSESSIONID=E4674C29E2A76CB08BB651053D8C951E.bswa3n; wipo-visitor-uunid=ff51e08378c28600; "
                      "_gcl_au=1.1.661799052.1650246701; _ga=GA1.3.1709262595.1650246701; "
                      "_pk_ref.14.ec75=%5B%22%22%2C%22%22%2C1650442707%2C%22https%3A%2F%2Fwww3.wipo.int%2F%22%5D; "
                      "_hjSessionUser_787562=eyJpZCI6IjhhNGViODJkLTFiNTEtNWNmNC1iMDc0LTliNDRiZGJkYTlhZCIsImNyZWF0ZWQ"
                      "iOjE2NTA0NDI3MDc5NTAsImV4aXN0aW5nIjpmYWxzZX0=; "
                      "_pk_id.14.ec75=845d6b854d46c8ec.1650440759.2.16504 "
                      "42818.1650440759.; _gid=GA1.3.807169207.1650806717; _gid=GA1.2.807169207.1650806717; "
                      "_ga=GA1.1.17092 "
                      "62595.1650246701; _pk_id.9.ec75=3222e84a40150571.1650246702.; "
                      "_pk_id.9.d630=a4ae4c09b954546d.1650246701 "
                      ".; _pk_uid=0%3DczoxNjoiMzIyMmU4NGE0MDE1MDU3MSI7%3A_%3D4d811534abc282543fa0eeaad6da945e10b9c701"
                      "; _ga_15TSHJ0H "
                      "WP=GS1.1.1651022240.33.0.1651022878.0",
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                          "Chrome/95.0.4638.54 Safari/537.36",
        }
Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐