使用TransBigData包进行出租车GPS数据处理

使用示例中的样例数据集在github仓库中,链接为:https://github.com/ni1o1/transbigdata/tree/main/docs/source/gallery

下面我们介绍如何使用TransBigData包,调用其中的函数实现对出租车GPS数据的快速处理。

首先我们引入TransBigData包,并读取数据:

import transbigdata as tbd
import pandas as pd
import geopandas as gpd
#读取数据
data = pd.read_csv('TaxiData-Sample.csv',header = None)
data.columns = ['VehicleNum','Time','Lng','Lat','OpenStatus','Speed']
data
VehicleNumTimeLngLatOpenStatusSpeed
03474520:27:43113.80684722.623249127
13474520:24:07113.80989822.62739900
23474520:24:27113.80989822.62739900
33474520:22:07113.81134822.62806700
43474520:10:06113.81988522.647800054
.....................
5449942826521:35:13114.32150322.709499018
5449952826509:08:02114.32270122.68170000
5449962826509:14:31114.33670022.69010000
5449972826521:19:12114.35260022.72839900
5449982826519:08:06114.13770322.62170000

544999 rows × 6 columns

#读取区域信息
import geopandas as gpd
sz = gpd.read_file(r'sz/sz.shp')
sz.crs = None
sz.plot()

../_images/output_3_1.png

数据预处理

TransBigData包也集成了数据预处理的常用方法。其中,tbd.clean_outofshape方法输入数据和研究范围区域信息,筛选剔除研究范围外的数据。而tbd.clean_taxi_status方法则可以剔除的载客状态瞬间变化的记录。在使用预处理的方法时,需要传入相应的列,代码如下:

#数据预处理
#剔除研究范围外的数据
data = tbd.clean_outofshape(data, sz, col=['Lng', 'Lat'], accuracy=500)
#剔除出租车数据中载客状态瞬间变化的记录
data = tbd.clean_taxi_status(data, col=['VehicleNum', 'Time', 'OpenStatus'])

数据栅格化

以栅格形式表达数据分布是最基本的表达方法。GPS数据经过栅格化后,每个数据点都含有对应的栅格信息,采用栅格表达数据的分布时,其表示的分布情况与真实情况接近。如果要使用TransBigData工具进行栅格划分,首先需要确定栅格化的参数(可以理解为定义了一个栅格坐标系),参数可以帮助我们快速进行栅格化:

#栅格化
#定义范围,获取栅格化参数
bounds = [113.6,22.4,114.8,22.9]
params = tbd.grid_params(bounds,accuracy = 500)
params

(113.6, 22.4, 0.004872390756896538, 0.004496605206422906)

取得栅格化参数后,将GPS对应至栅格,由LONCOL与LATCOL两列共同指定一个栅格:

#将GPS栅格化
data['LONCOL'],data['LATCOL'] = tbd.GPS_to_grids(data['Lng'],data['Lat'],params)

统计每个栅格的数据量:

#集计栅格数据量
datatest = data.groupby(['LONCOL','LATCOL'])['VehicleNum'].count().reset_index()

生成栅格的地理图形,并将它转化为GeoDataFrame:

#生成栅格地理图形
datatest['geometry'] = tbd.gridid_to_polygon(datatest['LONCOL'],datatest['LATCOL'],params)
#转为GeoDataFrame
import geopandas as gpd
datatest = gpd.GeoDataFrame(datatest)

绘制栅格测试是否成功:

#绘制
datatest.plot(column = 'VehicleNum')

../_images/output_17_1.png

出行OD提取与集计

使用tbd.taxigps_to_od方法,传入对应的列名,即可提取出行OD:

#从GPS数据提取OD
oddata = tbd.taxigps_to_od(data,col = ['VehicleNum','Time','Lng','Lat','OpenStatus'])
oddata
VehicleNumstimeslonslatetimeelonelatID
4270752239600:19:41114.01301622.66481800:23:01114.02140022.6639180
1313012239600:41:51114.02176722.64020000:43:44114.02607022.6402661
4174172239600:45:44114.02809922.64508200:47:44114.03038022.6500172
3761602239601:08:26114.03489722.61630101:16:34114.03561422.6467173
217682239601:26:06114.04602122.64125101:34:48114.06604822.6361834
...........................
576663680522:37:42114.11340322.53476722:48:01114.11436522.5506325332
1755193680522:49:12114.11436522.55063222:50:40114.11550122.5579835333
2120923680522:52:07114.11540222.55808323:03:27114.11848422.5478675334
1190413680523:03:45114.11848422.54786723:20:09114.13328622.6177505335
2241033680523:36:19114.11296822.54960123:43:12114.08948522.5389185336

5337 rows × 8 columns

对提取出的OD进行OD的栅格集计,并生成GeoDataFrame

#栅格化OD并集计
od_gdf = tbd.odagg_grid(oddata,params)
od_gdf.plot(column = 'count')

../_images/output_22_1.png

出行OD小区集计

TransBigData包也提供了将OD直接集计到小区的方法

#OD集计到小区(在不传入栅格化参数时,直接用经纬度匹配)
od_gdf = tbd.odagg_shape(oddata,sz,round_accuracy=6)
od_gdf.plot(column = 'count')

../_images/output_25_1.png

#OD集计到小区(传入栅格化参数时,先栅格化后匹配,可加快匹配速度,数据量大时建议使用)
od_gdf = tbd.odagg_shape(oddata,sz,params = params)
od_gdf.plot(column = 'count')

../_images/output_26_1.png

基于matplotlib的地图绘制

tbd中提供了地图底图加载和比例尺指北针的功能。使用plot_map方法添加地图底图,plotscale添加比例尺和指北针:

#创建图框
import matplotlib.pyplot as plt
import plot_map
fig =plt.figure(1,(8,8),dpi=80)
ax =plt.subplot(111)
plt.sca(ax)
#添加地图底图
tbd.plot_map(plt,bounds,zoom = 12,style = 4)
#绘制colorbar
cax = plt.axes([0.05, 0.33, 0.02, 0.3])
plt.title('count')
plt.sca(ax)
#绘制OD
od_gdf.plot(ax = ax,vmax = 100,column = 'count',cax = cax,legend = True)
#绘制小区底图
sz.plot(ax = ax,edgecolor = (0,0,0,1),facecolor = (0,0,0,0.2),linewidths=0.5)
#添加比例尺和指北针
tbd.plotscale(ax,bounds = bounds,textsize = 10,compasssize = 1,accuracy = 2000,rect = [0.06,0.03],zorder = 10)
plt.axis('off')
plt.xlim(bounds[0],bounds[2])
plt.ylim(bounds[1],bounds[3])
plt.show()

../_images/output_29_0.png

出租车轨迹的提取

使用tbd.taxigps_traj_point方法,输入数据和OD数据,可以提取出轨迹点

data_deliver,data_idle = tbd.taxigps_traj_point(data,oddata,col=['VehicleNum', 'Time', 'Lng', 'Lat', 'OpenStatus'])
data_deliver
VehicleNumTimeLngLatOpenStatusSpeedLONCOLLATCOLIDflag
4270752239600:19:41114.01301622.664818163.085.059.00.01.0
4270852239600:19:49114.01403022.665483155.085.059.00.01.0
4166222239600:21:01114.01889822.66250011.086.058.00.01.0
4274802239600:21:41114.01934822.66230017.086.058.00.01.0
4166232239600:22:21114.02061522.66336610.086.059.00.01.0
.................................
1709603680523:42:31114.09276622.538317166.0101.031.05336.01.0
1709583680523:42:37114.09172122.538349165.0101.031.05336.01.0
1709743680523:42:43114.09075222.538300160.0101.031.05336.01.0
1709733680523:42:49114.08981322.538099162.0101.031.05336.01.0
2530643680523:42:55114.08950022.538067151.0100.031.05336.01.0

190492 rows × 10 columns

data_idle
VehicleNumTimeLngLatOpenStatusSpeedLONCOLLATCOLIDflag
4166282239600:23:01114.02140022.663918025.086.059.00.00.0
4017442239600:25:01114.02711522.662100025.088.058.00.00.0
3946302239600:25:41114.02455122.659834021.087.058.00.00.0
3946712239600:26:21114.02279722.65836700.087.057.00.00.0
3946722239600:26:29114.02279722.65836700.087.057.00.00.0
.................................
644113680523:53:09114.12035422.54430012.0107.032.05336.00.0
644053680523:53:15114.12035422.54430011.0107.032.05336.00.0
643903680523:53:21114.12035422.54430010.0107.032.05336.00.0
644063680523:53:27114.12035422.54430010.0107.032.05336.00.0
643933680523:53:33114.12035422.54430010.0107.032.05336.00.0

312779 rows × 10 columns

对轨迹点生成载客与空载的轨迹

traj_deliver = tbd.points_to_traj(data_deliver)
traj_deliver.plot()

../_images/output_36_1.png

traj_idle = tbd.points_to_traj(data_idle)
traj_idle.plot()

../_images/output_37_1.png

轨迹可视化

TransBigData包也依托于kepler.gl提供的可视化插件提供了一键数据整理与可视化的方法

使用此功能请先安装python的keplergl包

pip install keplergl

将轨迹数据进行可视化:

tbd.visualization_trip(data_deliver)

../_images/kepler-traj.png

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐