Python爬虫——解析插件JsonPath安装及使用
Python爬虫——解析插件JsonPath安装及使用
·
目录
1.安装JsonPath
JsonPath适用于解析JSON文件。
CMD进入python编辑器所在的Scripts目录下。
pip install jsonpath
2.JsonPath与xpath不同
JsonPath与xpath不同,JsonPath只能解析本地文件,xpath可以解析本地文件也可以解析服务器响应文件。参考文章:http://blog.csdn.net/luxideyao/article/details/77802389
编写json文件
{ "store": {
"book": [
{ "category": "修真",
"author": "六道",
"title": "坏蛋是怎样练成的",
"price": 8.95
},
{ "category": "修改",
"author": "天蚕土豆",
"title": "斗破苍穹",
"price": 12.99
},
{ "category": "修真",
"author": "唐家三少",
"title": "斗罗大陆",
"isbn": "0-553-21311-3",
"price": 8.99
},
{ "category": "修真",
"author": "南派三叔",
"title": "星辰变",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"author": "老马",
"color": "黑色",
"price": 19.95
}
}
}
编写python代码
import jsonpath
import json
obj = json.load(open('4.json','r',encoding='utf-8'))
# 书店所有书的作者
author_list = jsonpath.jsonpath(obj,'$.store.book[*].author')
# 所有的作者
author_list = jsonpath.jsonpath(obj,'$..author')
# store下面的所有元素
tag_list = jsonpath.jsonpath(obj,'$.store.*')
# store里面所有东西的price
price_list = jsonpath.jsonpath(obj,'$.store..price')
# 第三个书
book = jsonpath.jsonpath(obj,'$..book[2]')
# 最后一本书
book = jsonpath.jsonpath(obj,'$..book[(@.length-1)]')
# 前面的两本书
# book_list = jsonpath.jsonpath(obj,'$..book[0,1]')
book_list = jsonpath.jsonpath(obj,'$..book[:2]')
# 条件过滤需要在圆括号的前面添加问号
# 过滤出所有的包含isbn的书
book_list = jsonpath.jsonpath(obj,'$..book[?(@.isbn)]')
# 那本书超过了10块钱
book_list = jsonpath.jsonpath(obj,'$..book[?(@.price>10)]')
print(book_list)
3.案例:淘票票
需求:获取淘票票官网上的城市名称。
代码如下:
import urllib.request
import json
import jsonpath
url = 'https://dianying.taobao.com/cityAction.json?activityId&_ksTS=1660459746946_108&jsoncallback=jsonp109&action=cityAction&n_s=new&event_submit_doGetAllRegion=true'
headers = {
# ':authority': 'dianying.taobao.com',
# ':method': 'GET',
# ':path': '/cityAction.json?activityId&_ksTS=1660459746946_108&jsoncallback=jsonp109&action=cityAction&n_s=new&event_submit_doGetAllRegion=true',
# ':scheme': 'https',
'accept': 'text/javascript, application/javascript, application/ecmascript, application/x-ecmascript, */*; q=0.01',
# 'accept-encoding': 'gzip, deflate, br',
'accept-language': 'zh-CN,zh;q=0.9',
'bx-v': '2.2.2',
'cookie': 't=e1b36d2553572e46a78d2f77ae0ba17a; cookie2=1a48588cb63edadedfe7415e8ba197de; v=0; _tb_token_=7b03734ef7eb9; cna=Q1x9G4NYUxcCAbe6vxxlbI/T; xlly_s=1; tb_city=110100; tb_cityName="sbG+qQ=="; tfstk=cDcNBVGd8CdNMFDrLWN2lBSqdcwOZDr0V6zzI3sYulzWbSeGijsY-_QU8r7FmRf..; l=eBag1-XmLvObyoEWBO5Zourza77tNIRb4sPzaNbMiInca6ddtFTqRNCHWorHSdtjgtCfWetzqZSAbdLHR3AgCc0c07kqm05o3xvO.; isg=BDAwbCdMJst8r_rPQMuk2uyyAf6CeRTDLIjnTiqBWgte5dCP0okoU87bPe2F9cyb',
'referer': 'https://dianying.taobao.com/?spm=a1z21.3046609.city.1.32c0112aTZ8oQq&city=110100',
'sec-ch-ua': '".Not/A)Brand";v="99", "Google Chrome";v="103", "Chromium";v="103"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'x-requested-with': 'XMLHttpRequest',
}
request = urllib.request.Request(url=url,headers=headers)
response = urllib.request.urlopen(request)
content = response.read().decode('utf-8')
# split切割
content = content.split('(')[1].split(')')[0]
# 保存本地-----jsonpath只可以识别本地文件
with open('淘票票.json','w',encoding='utf-8')as fp:
fp.write(content)
# 注意打开的是一个文件
obj = json.load(open('淘票票.json','r',encoding='utf-8'))
city_list = jsonpath.jsonpath(obj,'$..regionName')
print(city_list)
更多推荐
已为社区贡献6条内容
所有评论(0)