request库详解
Request库request库中文文档:https://docs.python-requests.org/zh_CN/latest/user/quickstart.html视频链接:https://www.bilibili.com/video/BV1Us41177P1?p=2实例引入import requestsresponse = requests.get("https://www.baidu
Request库
request库中文文档:https://docs.python-requests.org/zh_CN/latest/user/quickstart.html
视频链接:https://www.bilibili.com/video/BV1Us41177P1?p=2
urlopen详解见:https://blog.csdn.net/qq_41845823/article/details/119465293
requests.get和urlopen的比较(通俗易懂):https://blog.csdn.net/qq_41845823/article/details/119517519
实例引入
import requests
response = requests.get("https://www.baidu.com/")
print(type(response))
print(type(response.text))
print(response.cookies)
# 输出:
#<class 'requests.models.Response'>
#<class 'str'>
#<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
各种请求方式
import requests
requests.post("http://httpbin.org/post")
requests.put("http://httpbin.org/put")
requests.delete("http://httpbin.org/delete")
requests.head("http://httpbin.org/get")
requests.options("http://httpbin.org/get")
请求
基本GET请求
基本写法
import requests
response = requests.get("http://httpbin.org/get")
# text输出文本内容
print(response.text)
带参数的GET请求
import requests
# 在网址中带上参数,get的特点,参数在网址中
response = requests.get("http://httpbin.org/get?name=germey&age=22")
# text输出文本内容
print(response.text)
import requests
data = {
'name':'germey'
'age':22
}
response = requests.get("http://httpbin.org/get",params=data)
# text输出文本内容
print(response.text)
解析json
import requests
response = requests.get("http://httpbin.org/get")
print(type(response.text))
print(response.json())
print(type(response.json()))
输出结果:
<class ‘str’>
{‘args’: {}, ‘headers’: {‘Accept’: ‘/’, ‘Accept-Encoding’: ‘gzip, deflate’, ‘Host’: ‘httpbin.org’, ‘User-Agent’: ‘python-requests/2.25.1’, ‘X-Amzn-Trace-Id’: ‘Root=1-610f79c8-788ecf637b57eaa258660c0b’}, ‘origin’: ‘36.152.116.137’, ‘url’: ‘http://httpbin.org/get’}
<class ‘dict’>
- 该结果与
json.load()
一致
获取二进制数据
当需要获取图片,视频,音频等文件时,需要获取的内容为二进制流,此时不能用
response.text
,而是response.content
import requests
response = requests.get("https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fattach.bbs.miui.com%2Fforum%2F201205%2F03%2F01400598djmyeczcskh2yr.jpg&refer=http%3A%2F%2Fattach.bbs.miui.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1630996425&t=5f11732e42d109ede6212fc50826a8ce")
print(type(response.text), type(response.content))
print(response.text)
print(response.content)
输出:
2��ƭ�E2��1��^��D��5�I�
��m���’q�<V��_�Л#���l<6|���3��x|P$�NQ߄�#̝�B�Iۧ� M� �->Q�5�B�l��w.pc?��_Qe��3]�K&�a�6�O\S��z���Z����<��&3/��y���T����(jA'>�R^��XT�:�I��H��\�CVZf�[@z�|a
�lRUY��v���XΕ�$��/q\�=��± ���jP�"LA'�� �?n+�5��a]n�6G����<�x5R�az�!�7g�0�9�<���̭;C*�\�G%��ѷf�[�g���k݊+]N��2m�mm��#:�M�
�Ua7(�-lI��nCV0��P7W����SgEϱ/��
7+(这是text的一段结果)xba\x8bW\xad\xf9]\xa2\xfec\xc8|\x94l\xf5s\xd9)c\xa8o\xc5B\xc5\x9bR\xbb\x19\x12\x1c7\x96\x82\xb88~j7\xba\xdb\(这是content的二进制数据)
import requests
response = requests.get("https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fattach.bbs.miui.com%2Fforum%2F201205%2F03%2F01400598djmyeczcskh2yr.jpg&refer=http%3A%2F%2Fattach.bbs.miui.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1630996425&t=5f11732e42d109ede6212fc50826a8ce")
print(response.content)
with open("test.jpg", "wb") as f:
f.write(response.content)
添加Headers
有的时候请求不加headers
,服务器会返回请求错误,或者把你禁掉之类的现象,比如知乎:
import requests
response = requests.get('https://www.zhihu.com/')
print(response.text)
输出:
403 Forbidden 403 Forbidden
openresty
只需要给请求价格加个headers
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
}
response = requests.get('https://www.zhihu.com/', headers=headers)
print(response.text)
基本POST请求
- 在
request
库中进行post
请求只需要用字典把data
或者headers
用字典封装起来放入参数内就可以。 - 而在
urllib.request.openurl
中需要把data
等进行bytys()
转码才能作为参数 post
的用法和get
一摸一样,只需要把方法名改成requests.post
即可
响应
response
属性
import requests
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
}
response = requests.get('http://www.jianshu.com', headers=headers)
# 状态码
print(type(response.status_code), response.status_code)
# 头部
print(type(response.headers), response.headers)
# cookies
print(type(response.cookies), response.cookies)
print(type(response.url), response.url)
print(type(response.history), response.history)
输出:
<class ‘int’> 200
<class ‘requests.structures.CaseInsensitiveDict’> {‘Server’: ‘Tengine’, ‘Date’: ‘Sun, 08 Aug 2021 07:02:38 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Transfer-Encoding’: ‘chunked’, ‘Connection’: ‘keep-alive’, ‘Vary’: ‘Accept-Encoding’, ‘X-Frame-Options’: ‘SAMEORIGIN’, ‘X-XSS-Protection’: ‘1; mode=block’, ‘X-Content-Type-Options’: ‘nosniff’, ‘ETag’: ‘W/“5cfbf685a4b630685cdeda963311bba5”’, ‘Cache-Control’: ‘max-age=0, private, must-revalidate’, ‘Set-Cookie’: ‘locale=zh-CN; path=/’, ‘X-Request-Id’: ‘15ea5165-4d3b-4c80-b62a-f01a04452d6e’, ‘X-Runtime’: ‘0.005068’, ‘Strict-Transport-Security’: ‘max-age=31536000; includeSubDomains; preload’, ‘Content-Encoding’: ‘gzip’}
<class ‘requests.cookies.RequestsCookieJar’> <RequestsCookieJar[
状态码判断
import requests
response= requests.get("http://www.baidu.com")
exit() if not response.status_code == requests.codes.ok else print("Request Successfully")
import requests
response= requests.get("http://www.baidu.com")
exit() if not response.status_code == 200 else print("Request Successfully")
requests附带了一个内置的状态码查询对象
100: ('continue',), 101: ('switching_protocols',), 102: ('processing',), 103: ('checkpoint',), 122: ('uri_too_long', 'request_uri_too_long'), 200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\o/', '✓'), 201: ('created',), 202: ('accepted',), 203: ('non_authoritative_info', 'non_authoritative_information'), 204: ('no_content',), 205: ('reset_content', 'reset'), 206: ('partial_content', 'partial'), 207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'), 208: ('already_reported',), 226: ('im_used',), ## Redirection. 300: ('multiple_choices',), 301: ('moved_permanently', 'moved', '\o-'), 302: ('found',), 303: ('see_other', 'other'), 304: ('not_modified',), 305: ('use_proxy',), 306: ('switch_proxy',), 307: ('temporary_redirect', 'temporary_moved', 'temporary'), 308: ('permanent_redirect', 'resume_incomplete', 'resume',), # These 2 to be removed in 3.0 ## Client Error. 400: ('bad_request', 'bad'), 401: ('unauthorized',), 402: ('payment_required', 'payment'), 403: ('forbidden',), 404: ('not_found', '-o-'), 405: ('method_not_allowed', 'not_allowed'), 406: ('not_acceptable',), 407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'), 408: ('request_timeout', 'timeout'), 409: ('conflict',), 410: ('gone',), 411: ('length_required',), 412: ('precondition_failed', 'precondition'), 413: ('request_entity_too_large',), 414: ('request_uri_too_large',), 415: ('unsupported_media_type', 'unsupported_media', 'media_type'), 416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'), 417: ('expectation_failed',), 418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'), 421: ('misdirected_request',), 422: ('unprocessable_entity', 'unprocessable'), 423: ('locked',), 424: ('failed_dependency', 'dependency'), 425: ('unordered_collection', 'unordered'), 426: ('upgrade_required', 'upgrade'), 428: ('precondition_required', 'precondition'), 429: ('too_many_requests', 'too_many'), 431: ('header_fields_too_large', 'fields_too_large'), 444: ('no_response', 'none'), 449: ('retry_with', 'retry'), 450: ('blocked_by_windows_parental_controls', 'parental_controls'), 451: ('unavailable_for_legal_reasons', 'legal_reasons'), 499: ('client_closed_request',), ## Server Error. 500: ('internal_server_error', 'server_error', '/o\', '✗'), 501: ('not_implemented',), 502: ('bad_gateway',), 503: ('service_unavailable', 'unavailable'), 504: ('gateway_timeout',), 505: ('http_version_not_supported', 'http_version'), 506: ('variant_also_negotiates',), 507: ('insufficient_storage',), 509: ('bandwidth_limit_exceeded', 'bandwidth'), 510: ('not_extended',), 511: ('network_authentication_required', 'network_auth', 'network_authentication'),
高级操作
上传文件
import requests
files = {'hle': open('test.jpg', 'rb')}
response = requests.post('http://httpbin.org/post', files=files)
print(response.text)
获取cookies
import requests
response = requests.get("https://www.baidu.com")
print(response.cookies)
for key, value in response.cookies.items():
print(key + '=' + value)
cookie的一个作用就是可以用于模拟登陆,做会话维持
import requests
requests.get("http://httpbin.org/cookies/set/number/123456")
response = requests.get("http://httpbin.org/cookies")
print(response.text)
- 这样写会产生问题,即在请求到页面之后会产生登录响应,但是在输入密码之后还会产生一个请求,也就是说产生了两次请求,也就有两个响应,但是输出的是第一次需要登录的响应,应该下面这种方式写
import requests
s = requests.Session()
s.get("http://httpbin.org/cookies/set/number/123456")
response = s.get("http://httpbin.org/cookies")
print(response.text)
证书验证
现在的很多网站都是https的方式访问,所以这个时候就涉及到证书的问题
代理设置
当我们爬取一个网站的时候,有时候需要重复访问多次,此时该网站可能会捕获你的访问次数,当检测到访问次数异常时会禁止你的ip访问,这个时候我们需要设置代理进行访问该网站,在爬虫运行过程中不断切换代理
超时设置
import requests
response = requests.get('http://www.12306.cn/', timeout=0.0001)
print(response.status_code)
import requests
from requests.exceptions import ReadTimeout
try:
response = requests.get("http://www.taobao.com/", timeout=0.001)
print(response.status_code)
except ReadTimeout:
print('Timeout')
认证设置
有的网站机内需要输入用户名密码,requests.get
提供auth
参数用来传用户名密码
import requests
from requests.auth import HTTPBasicAuth
r = requests.get('http://120.27.34.24:9001', auth=HTTPBasicAuth('user', '123'))
print(r.status_code)
也可以不用HTTPBasicAuth
类,直接用字典形式
import requests
r = requests.get('http://120.27.34.24:9001', auth=('user', '123'))
print(r.status_code)
异常处理
import requests
from requests.exceptions import ReadTimeout, HTTPError, RequestException
try:
response = requests.get("http://ttpbin.org/get", timeout = 0.5)
print(response.status_code)
except ReadTimeout:
print('Timeout')
except HTTPError:
print('Http error')
except RequestException:
print('Error')
更多推荐
所有评论(0)