request使用

一.基本用法
1.准备工作安装request库,pip安装或再pycharm内安装。
2.实例引入renquest库中方法清晰简单,获取网页直接使用get方法就能直接实现:

代码:

import requests
response = requests.get('http://www.baidu.com/')
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.cookies)

运行结果:

<class 'requests.models.Response'>
200
<class 'str'>
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>

request中我们使用其他方法也很简单:

r = request.post('url/post')
r = request.put('url/put')
r = request.delete('url/delete')
r = request.head('url/get')
r = request.post('url/get')

**3.**GET请求
http中最常见的请求是get请求,对get请求进行详解。
首先是一个简单的get的请求,请求一个链接http://httpbin.org/get,该网站会判断如果客户端发起get请求的话,它会返回相应信息:
这是一个专门测试的网站 http://httpbin.org/

import requests
response = requests.get('http://httpbin.org/get')
print(response.text)

运行结果:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.1", 
    "X-Amzn-Trace-Id": "Root=1-6076af01-5222d2ca535f9b6039280a85"
  }, 
  "origin": "18.167.102.111", 
  "url": "http://httpbin.org/get"
}

可以看到我们发起了请求,返回的结果能看到我们的请求头,链接,ip等信息。
我们还可以向get请求中传入参数,构造一个附加额外信息的get请求。
直接构造:

r = request.get('http://httpbin.org/get?name=germey&age=22')

?后面接参数开始 &连接参数,显然这样太复杂因此我们引入params参数。
params参数示例:

import requests
data = {
    'name': 'germey',
    'age': '22'
}
response = requests.get('http://httpbin.org/get',params=data)
print(response.text)

运行结果:

{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.1", 
    "X-Amzn-Trace-Id": "Root=1-6076b273-24b1eebd6542e66659560793"
  }, 
  "origin": "18.167.102.111", 
  "url": "http://httpbin.org/get?name=germey&age=22"
}

根据控制台返回的信息我们知道request已经自动为我们构造了一个url链接。我们还可以调用json方法,将字符串类型的json格式转换为字典类型的json。

print(response.json)

抓取网页
我们以知乎为例来抓取知乎的界面信息

import requests
import re
headers = {
        'cookie': '填入你的cookies',

       'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
r = requests.get('https://www.zhihu.com/', headers=headers)
print(r.text)
print(r.status_code)

抓取二进制数据
我们抓取了一个知乎界面但其实我们知道它是一个html文档,如果我们向抓取二进制数据图片、视频、音频又应该怎么办呢。
我们以知乎的图标为例:
在这里插入图片描述

import requests
import re
headers = {
        'cookie': '填入你的cookies',

       'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
r = requests.get('https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.a53ae37b.png', headers=headers)
print(r.text)	# 直接打印会乱码 因为将二进制强制以字符格式输出
with open('zhihu.png','wb') as f :	# 写入本地可再与.py文件相同的地址查看图片
    f.write(r.content)

4.post请求
我们了解了最基本的get请求,post也是应该常见的请求方式,就是把相关信息返回。

import request
data = {'name': 'germey', 'age': '22'}
r = requests.post('http://httpbin.org/post',data=data)
print(r.text)

运行结果:

{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.1", 
    "X-Amzn-Trace-Id": "Root=1-6076f804-446c1b1d538e216f066e33ee"
  }, 
  "origin": "111.17.194.60", 
  "url": "http://httpbin.org/get?name=germey&age=22"
}

我们可以看到数据的提交。
5.响应

import requests
headers = {
        'cookie': 'url',
       'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
}
r = requests.get('https://www.zhihu.com/', headers=headers)
print('状态码:',r.status_code)
print('请求头:',r.headers)
print('cookies:',r.cookies)

运行结果:

状态码: 200
请求头: {'Server': 'CLOUD ELB 1.0.0', 'Date': 'Wed, 14 Apr 2021 14:18:07 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Vary': 'Accept-Encoding', 'set-cookie': 'tst=; path=/; expires=Thu, 01 Jan 1970 00:00:00 GMT; httponly, KLBRSID=ed2ad9934af8a1f80db52dcb08d13344|1618409886|1618409886; Path=/', 'content-security-policy': "default-src * blob:; img-src * data: blob: resource: t.captcha.qq.com cstaticdun.126.net necaptcha.nosdn.127.net; connect-src * wss: blob: resource:; frame-src 'self' *.zhihu.com mailto: tel: weixin: *.vzuu.com mo.m.taobao.com getpocket.com note.youdao.com safari-extension://com.evernote.safari.clipper-Q79WDW8YH9 zhihujs: captcha.guard.qcloud.com pos.baidu.com dup.baidustatic.com openapi.baidu.com wappass.baidu.com passport.baidu.com *.cme.qcloud.com vs-cdn.tencent-cloud.com t.captcha.qq.com c.dun.163.com; script-src 'self' blob: *.zhihu.com g.alicdn.com qzonestyle.gtimg.cn res.wx.qq.com open.mobile.qq.com 'unsafe-eval' unpkg.zhimg.com unicom.zhimg.com resource: captcha.gtimg.com captcha.guard.qcloud.com pagead2.googlesyndication.com cpro.baidustatic.com pos.baidu.com dup.baidustatic.com i.hao61.net 'nonce-395681d5-2009-4f24-ba9b-2f4de9719d15' hm.baidu.com zz.bdstatic.com b.bdstatic.com imgcache.qq.com vs-cdn.tencent-cloud.com ssl.captcha.qq.com t.captcha.qq.com cstaticdun.126.net c.dun.163.com ac.dun.163.com/ acstatic-dun.126.net; style-src 'self' 'unsafe-inline' *.zhihu.com unicom.zhimg.com resource: captcha.gtimg.com ssl.captcha.qq.com t.captcha.qq.com cstaticdun.126.net c.dun.163.com ac.dun.163.com/ acstatic-dun.126.net", 'x-frame-options': 'SAMEORIGIN', 'strict-transport-security': 'max-age=15552000; includeSubDomains', 'surrogate-control': 'no-store', 'cache-control': 'no-cache, no-store, must-revalidate, private, max-age=0', 'pragma': 'no-cache', 'expires': '0', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1; mode=block', 'X-Backend-Response': '0.435', 'Referrer-Policy': 'no-referrer-when-downgrade', 'X-SecNG-Response': '0.43999981880188', 'X-UDID': 'AFDupPWGbBCPTjG_3TfIUhnlEgcac_LzB2M=', 'x-lb-timing': '0.441', 'x-idc-id': '2', 'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked', 'X-NWS-LOG-UUID': '7093856299592433989', 'Connection': 'keep-alive', 'X-Cache-Lookup': 'Cache Miss', 'x-edge-timing': '0.462', 'x-cdn-provider': 'tencent'}
cookies: <RequestsCookieJar[<Cookie KLBRSID=ed2ad9934af8a1f80db52dcb08d13344|1618409886|1618409886 for www.zhihu.com/>]>

当然还有其他的各种响应信息不再列举。

											来源于《python3网络爬虫开发实战》笔记
Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐