ES删除大批量数据方法

1.Delete By Query API

POST twitter/_delete_by_query
{
  "query": { 
"match": {
  "message": "some message"
}
  }
}

一般回用如下:

http://172.16.96.*:9200/index/_delete_by_query?slices=3&wait_for_completion=false&scroll_size=5000&conflicts=proceed

slices:线程数(根据CPU的数量设置)

wait_for_completion:如果设置为true,则导致 API 阻塞,直到索引器状态完全停止。如果设置为false,API 立即返回,并且索引器在后台异步停止。默认为 false。如果请求包含wait_for_completion=false,则 Elasticsearch 将执行一些预检检查,启动请求,然后返回一个task 可用于任务 API 以取消或获取任务状态的内容。Elasticsearch 还将创建此任务的记录作为文档,位于.tasks/task/${taskId}. 这是您认为合适的保留或删除。完成后,将其删除,以便 Elasticsearch 可以回收它使用的空间。

scroll_size:游标查询,根据index.max_result_window值设置,scroll_size应当小于index.max_result_window值,默认是10000

conflicts:在_delete_by_query执行过程中,依次执行多个搜索请求,以便找到所有匹配的文档进行删除。每找到一批文档,就会执行相应的批量请求,删除所有这些文档。如果搜索或批量请求被拒绝,_delete_by_query 则依靠默认策略重试被拒绝的请求(最多 10 次,指数回退)。达到最大重试限制会导致_delete_by_query 中止,并且所有失败都在failures响应中返回。已执行的删除仍然存在。换句话说,该过程没有回滚,只是中止。当第一次失败导致中止时,失败的批量请求返回的所有失败都在failures 元素; 因此,可能会有相当多的失败实体。如果您想计算版本冲突而不是导致它们中止,请conflicts=proceed在 url 或"conflicts": "proceed"请求正文中设置。

2.根据task查看正在删除任务的状态细节

GET http://172.16.96.*:9200/_tasks?detailed=true&actions=*/delete/byquery

{
"nodes": {}
}

示例:

"w4axaLWqQiq19k0wYvHQIw:4550655": {
"node": "w4axaLWqQiq19k0wYvHQIw",
"id": 4550655,
"type": "transport",
"action": "indices:data/write/delete/byquery",
"status": {
"slice_id": 5,
"total": 43088333,
"updated": 0,
"created": 0,
"deleted": 0,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1.0,
"throttled_until_millis": 0
},
"description": "delete-by-query [history-alarm-data]",
"start_time_in_millis": 1644559029887,
"running_time_in_nanos": 5024318204892,
"cancellable": true,
"cancelled": false,
"parent_task_id": "w4axaLWqQiq19k0wYvHQIw:4550644",
"headers": {}
}

如果删除任务完成了,返回如下:

{
"nodes": {}
}

数据查询任务:

GET http://172.16.96.*:9200/_tasks?detailed=true&actions=*/read/search

{
"nodes": {
"YM7smnJOQGWJ4VJf_THNNg": {
"name": "YM7smnJ",
"transport_address": "172.16.96.*:9300",
"host": "172.16.96.*",
"ip": "172.16.96.*:9300",
"roles": [
"master",
"data",
"ingest"
],
"tasks": {
"YM7smnJOQGWJ4VJf_THNNg:172443363": {
"node": "YM7smnJOQGWJ4VJf_THNNg",
"id": 172443363,
"type": "netty",
"action": "indices:data/read/search",
"description": "indices[tracker_travel], types[travel], search_type[DFS_QUERY_THEN_FETCH], source[{\"from\":0,\"size\":5000,\"query\":{\"bool\":{\"must\":[{\"match\":{\"imei\":{\"query\":\"863637044306814\",\"operator\":\"OR\",\"prefix_length\":0,\"max_expansions\":50,\"fuzzy_transpositions\":true,\"lenient\":false,\"zero_terms_query\":\"NONE\",\"boost\":1.0}}},{\"range\":{\"begin_time\":{\"from\":1644795551000,\"to\":null,\"include_lower\":true,\"include_upper\":true,\"boost\":1.0}}}],\"disable_coord\":false,\"adjust_pure_negative\":true,\"boost\":1.0}},\"version\":true,\"sort\":[{\"begin_time\":{\"order\":\"asc\"}}]}]",
"start_time_in_millis": 1644819308136,
"running_time_in_nanos": 21187617,
"cancellable": true
},
"YM7smnJOQGWJ4VJf_THNNg:172443360": {
"node": "YM7smnJOQGWJ4VJf_THNNg",
"id": 172443360,
"type": "netty",
"action": "indices:data/read/search",
"description": "indices[tracker_travel], types[travel], search_type[DFS_QUERY_THEN_FETCH], source[{\"from\":0,\"size\":5000,\"query\":{\"bool\":{\"must\":[{\"match\":{\"imei\":{\"query\":\"863637044306806\",\"operator\":\"OR\",\"prefix_length\":0,\"max_expansions\":50,\"fuzzy_transpositions\":true,\"lenient\":false,\"zero_terms_query\":\"NONE\",\"boost\":1.0}}},{\"range\":{\"begin_time\":{\"from\":1644806393000,\"to\":null,\"include_lower\":true,\"include_upper\":true,\"boost\":1.0}}}],\"disable_coord\":false,\"adjust_pure_negative\":true,\"boost\":1.0}},\"version\":true,\"sort\":[{\"begin_time\":{\"order\":\"asc\"}}]}]",
"start_time_in_millis": 1644819308131,
"running_time_in_nanos": 26798586,
"cancellable": true
},
"YM7smnJOQGWJ4VJf_THNNg:172443354": {
"node": "YM7smnJOQGWJ4VJf_THNNg",
"id": 172443354,
"type": "netty",
"action": "indices:data/read/search",
"description": "indices[tracker_travel], types[travel], search_type[DFS_QUERY_THEN_FETCH], source[{\"from\":0,\"size\":5000,\"query\":{\"bool\":{\"must\":[{\"match\":{\"imei\":{\"query\":\"863637044306749\",\"operator\":\"OR\",\"prefix_length\":0,\"max_expansions\":50,\"fuzzy_transpositions\":true,\"lenient\":false,\"zero_terms_query\":\"NONE\",\"boost\":1.0}}},{\"range\":{\"begin_time\":{\"from\":1644792687000,\"to\":null,\"include_lower\":true,\"include_upper\":true,\"boost\":1.0}}}],\"disable_coord\":false,\"adjust_pure_negative\":true,\"boost\":1.0}},\"version\":true,\"sort\":[{\"begin_time\":{\"order\":\"asc\"}}]}]",
"start_time_in_millis": 1644819308126,
"running_time_in_nanos": 31671549,
"cancellable": true
}
}
}
}
}
Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐