python的request库详细讲解、七个方法、举例子爬取
目录准备库、测试安装request库启动idle测试百度网页,打印出来Requests库的7个主要方法(前言)request.get(url)Request库的2个重要对象Response对象的属性实际的例子爬取网页的通用代码框架Requests库的异常提供了一个方法,与异常打交道Request库的7个主要方法理解http协议HTTP协议对资源的操作理解PATCH和PUT区别HTTP协议和Requ
·
目录
- 准备库、测试
- Requests库的7个主要方法(前言)
- 七个方法(详细讲解)
- 总结
- 举例
- 代码的可靠和稳定最重要
准备库、测试
安装request库
pip install requests
启动idle
测试百度网页,打印出来
看到百度的主页已经被抓取下来
Requests库的7个主要方法(前言)
request.get(url)
打开request库的源代码,get方法使用了request方法来封装
Request库的2个重要对象
Response对象的属性
实际的例子
r.apparent_encoding:根据网页内容分析出的编码方式
爬取网页的通用代码框架
Requests库的异常
提供了一个方法,与异常打交道
Request库的7个主要方法
理解http协议
用户请求url,服务器做出响应
HTTP协议对资源的操作
通过这六个进行管理,每个都是独立的。
理解PATCH和PUT区别
HTTP协议和Request库
request.post()
request.put()
七个方法(详细讲解)
request.request(method,url,**kwargs)
method:请求方式 七个
**kwargs:控制访问的参数,均位可选项
params
data
json
headers
cookies
files
timeout 超时时间
proxies
allow_redirects、stream、verify、cert
总结
requests.get(url,params=None,**kwargs)
get方法爬取一些内容,并向服务器发送一些内容。
request.head(url,**kwargs)
requests.post(url,data=None,json=None,**kwargs)
requests.put(url,data=None,**kwargs)
requests.patch(url,data=None,**kwargs)
requests.delete(url,**kwargs)
后六个方法,这六个方法会常用到一些访问控制参数,所以把参数量放到了函数设计里面,那些不常用的,就放倒了可选的访问字段里面。
总结
举例
京东爬取
亚马逊商品页面爬取
访问成功200
访问500
百度/360搜索关键词提交
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> import requests
>>> kv = {'wd':'Python'}
>>> r = request.get("http://www.baidu.com/s",params = kv)
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
r = request.get("http://www.baidu.com/s",params = kv)
NameError: name 'request' is not defined
>>> r = requests.get("http://www.baidu.com/s",params = kv)
>>> r.status_code
200
>>>
请求成功
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> import requests
>>> kv = {'wd':'Python'}
>>> r = request.get("http://www.baidu.com/s",params = kv)
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
r = request.get("http://www.baidu.com/s",params = kv)
NameError: name 'request' is not defined
>>> r = requests.get("http://www.baidu.com/s",params = kv)
>>> r.status_code
200
>>> r.requests.url
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
r.requests.url
AttributeError: 'Response' object has no attribute 'requests'
>>> r.request.url
'https://wappass.baidu.com/static/captcha/tuxing.html?&logid=8036147062640333629&ak=c27bbc89afca0463650ac9bde68ebe06&backurl=https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3DPython&signature=2fa70a21294522eebadb45a7c1695212×tamp=1632972573'
>>> len(r.text)
1545
>>> r.text
'<!DOCTYPE html>\n<html lang="zh-CN">\n<head>\n <meta charset="utf-8">\n <title>ç\x99¾åº¦å®\x89å\x85¨éª\x8cè¯\x81</title>\n <meta http-equiv="Content-Type" content="text/html; charset=utf-8">\n <meta name="apple-mobile-web-app-capable" content="yes">\n <meta name="apple-mobile-web-app-status-bar-style" content="black">\n <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0">\n <meta name="format-detection" content="telephone=no, email=no">\n <link rel="shortcut icon" href="https://www.baidu.com/favicon.ico" type="image/x-icon">\n <link rel="icon" sizes="any" mask href="https://www.baidu.com/img/baidu.svg">\n <meta http-equiv="X-UA-Compatible" content="IE=Edge">\n <meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">\n <link rel="stylesheet" href="https://ppui-static-wap.cdn.bcebos.com/static/touch/css/api/mkdjump_0635445.css" />\n</head>\n<body>\n <div class="timeout hide">\n <div class="timeout-img"></div>\n <div class="timeout-title">ç½\x91ç»\x9cä¸\x8dç»\x99å\x8a\x9bï¼\x8c请ç¨\x8då\x90\x8eé\x87\x8dè¯\x95</div>\n <button type="button" class="timeout-button">è¿\x94å\x9b\x9eé¦\x96页</button>\n </div>\n <div class="timeout-feedback hide">\n <div class="timeout-feedback-icon"></div>\n <p class="timeout-feedback-title">é\x97®é¢\x98å\x8f\x8dé¦\x88</p>\n </div>\n\n<script src="https://wappass.baidu.com/static/machine/js/api/mkd.js"></script>\n<script src="https://ppui-static-wap.cdn.bcebos.com/static/touch/js/mkdjump_fbb9952.js"></script>\n</body>\n</html>'
网络图片的爬取和存储
Python 3.8.8 (default, Apr 13 2021, 15:08:03) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> import requests
>>> path = "E:/tp/abc.jpg"
>>> url = "http://www.sinaimg.cn/dy/slidenews/1_img/2016_43/63957_743512_229766.jpg"
>>> r = requests.get(url)
>>> r.status_code
200
>>> with open(path,'wb') as f:
f.write(r.content)
156992
>>>
>>> f.close()
>>>
代码的可靠和稳定最重要
代码怎么执行都不会错
更多推荐
已为社区贡献4条内容
所有评论(0)