【爬虫1】安装第三方库requests的两种方法及使用方法；看完你也会基础爬虫了

一、安装requests：A、使用pip install requests命令进行安装1、在cmd黑窗口里面输入：2、在pycharm的终端Terminal中输入命令来安装B、如果网速不好，也可以用国内源：pypi镜像（清华源）pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests备注：同样用cmd或者pycharm的方式来

白粥bz

27571人浏览 · 2022-01-20 22:35:43

白粥bz · 2022-01-20 22:35:43 发布

一、安装requests：

A、使用pip install requests命令进行安装

1、在cmd黑窗口里面输入：

2、在pycharm的终端Terminal中输入命令来安装

B、如果网速不好，也可以用国内源：pypi镜像（清华源）

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple requests

备注：同样用cmd或者pycharm的方式来安装，只是命令有所不同

二、使用requests 的get爬取内容

import requests   #安装好requests后，导入模块
url = 'https://www.sogou.com/web?query=周杰伦'

resp = requests.get(url)
print(resp)  #<Response [200]>  代表OK
print(resp.text) #页面源代码
# 提示：此验证码用于确认这些请求是您的正常行为而不是自动程序发出的，需要您协助验证
resp.close()  #爬完数据，请务必要关闭

有些时候，我们会发现无法正常获取数据，因为系统检测我们是自动程序发出的请求，所以需要模拟人工用什么设备进行请求（防爬机制）；即给上 User-Agent请求载体的身份标识；可以使用浏览器抓包工具获取User-Agent（F12或者右键-审查元素）。

代码修改如下：

import requests
url = 'https://www.sogou.com/web?query=周杰伦'
#请求头中放入User-Agent，注意这是一个字典
headers_dic = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
}
resp = requests.get(url,headers=headers_dic)
print(resp)  
print(resp.text) 
resp.close()

如果希望搜索的内容是一个变量：

query = input('请输入你要搜索的内容：')
url = f'https://www.sogou.com/web?query={query}'

三、使用requests的post获取数据(字典的键：kw)

#使用百度翻译，来翻译英文单词
import requests
query = input('请输入你要翻译的英文单词：')
url = 'https://fanyi.baidu.com/sug'
data = {
    "kw":query
}
resp = requests.post(url,data=data)    #post方法此处用data，get方法使用headers
# print(resp.text)
print(resp.json())
resp.close()

备注：同样可以用：浏览器抓包工具查看

提示：get和post变量的存放位置不同，注意区分

四、获取豆瓣喜剧排行榜

1、服务器渲染：再服务器那边直接把数据和html整合再一起，统一返回给浏览器
             在页面源代码中，能看到数据
2、客户端渲染：第一次请求只要一个html骨架，第二次请求拿到数据，进行数据展示
            在页面源代码中，看不到数据

所以，为了获取到喜剧排行榜的数据；需要查看XHR，找到对应的链接，如下：

豆瓣电影分类排行榜 - 喜剧片

url里面的一些参数：

需要对url进行重构：

import requests
#https://movie.douban.com/j/chart/top_list?type=24&interval_id=100%3A90&action=&start=0&limit=20

url = 'https://movie.douban.com/j/chart/top_list'
#重新分装参数
param = {
    'type': '24',
    'interval_id': '100:90',
    'action': '',
    'start': 0,
    'limit': 20
}

resp = requests.get(url=url,params=param)
#可以看到重新分装后的地址和最原始的地址是一致的
print(resp.request.url) #https://movie.douban.com/j/chart/top_list?type=24&interval_id=100%3A90&action=&start=0&limit=20
print(resp.text) #有防爬机制，无法直接获取数据
print(resp.request.headers)  #查看一下请求头
resp.close()

运行发现无法获取数据，该网站有防爬机制；查看一下User-Agent，这是用python运行的：

修改后最终代码如下：

import requests

url = 'https://movie.douban.com/j/chart/top_list'
#重新分装参数
param = {
    'type': '24',
    'interval_id': '100:90',
    'action': '',
    'start': 0,
    'limit': 20
}
#请求头中放入User-Agent，注意这是一个字典
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
}
resp = requests.get(url=url,params=param,headers=headers)
print(resp.json()) #现在可以得到要找的数据
resp.close()

拓展：上述代码得到的是20个数据，如果要抓取更多数据，可以修改start的值为：0,20,40,60依此类推。

五、获取王者荣耀皮肤


#获取王者荣耀的英雄皮肤
import requests
url="http://pvp.qq.com/web201605/js/herolist.json"
#请求头
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
}
response = requests.get(url,headers=headers)
json_list = response.json()
#把标号 英雄名称，提取出来
for m in range(len(json_list)):
    #ename 英雄编号
    hero_num = json_list[m]["ename"]
    #英雄名称cname
    hero_name = json_list[m]["cname"]
    #皮肤
    skin_name = json_list[m]["skin_name"].split("|")
    skin_count = len(skin_name)
    print(f'英雄名称：{hero_name},皮肤数量{skin_count}')
    #保存图片
    for i in range(1,skin_count+1):
        #拼接网站
        #https://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/110/110-bigskin-1.jpg
        url = 'https://game.gtimg.cn/images/yxzj/img201606/skin/hero-info/'
        url_pic=url+str(hero_num)+'/'+str(hero_num)+'-bigskin-'+str(i)+'.jpg'
        pic1=requests.get(url_pic).content #二进制
        # print(pic1)
        with open('D:/Mypic/'+hero_name+'-'+skin_name[i-1]+'.jpg','wb') as f:
            f.write(pic1)