问题

线上aws rds 偶尔出现sql查询很慢 异常慢。  比如如下的一个表:

最终发现是一个很普通的sql:

select * from int_twilio_twilionumber where merchant_id = 1234 and active = true and "default"=true
order by date_created limit 1

 这个查询竟然发了2-3分钟无法回复。

调查

1. 查看表大小  800MB:, 表记录大概4000条。

SELECT pg_size_pretty( pg_total_relation_size('int_twilio_twilionumber') );

2. 查看当时pg stat:

PostgreSQL: Documentation: 12: 27.2. The Statistics Collector

select * from pg_stat_activity where query like 'select * from int_twilio_twilionumber%' order by backend_start desc limit 20

发现请求状态如下:

发现阻塞在IO操作上, 怀疑是vacuum有问题。

文档有说:

wait_event_typetextThe type of event for which the backend is waiting, if any; otherwise NULL. Possible values are:
  • LWLock: The backend is waiting for a lightweight lock. Each such lock protects a particular data structure in shared memory. wait_event will contain a name identifying the purpose of the lightweight lock. (Some locks have specific names; others are part of a group of locks each with a similar purpose.)

  • Lock: The backend is waiting for a heavyweight lock. Heavyweight locks, also known as lock manager locks or simply locks, primarily protect SQL-visible objects such as tables. However, they are also used to ensure mutual exclusion for certain internal operations such as relation extension. wait_event will identify the type of lock awaited.

  • BufferPin: The server process is waiting to access to a data buffer during a period when no other process can be examining that buffer. Buffer pin waits can be protracted if another process holds an open cursor which last read data from the buffer in question.

  • Activity: The server process is idle. This is used by system processes waiting for activity in their main processing loop. wait_event will identify the specific wait point.

  • Extension: The server process is waiting for activity in an extension module. This category is useful for modules to track custom waiting points.

  • Client: The server process is waiting for some activity on a socket from user applications, and that the server expects something to happen that is independent from its internal processes. wait_event will identify the specific wait point.

  • IPC: The server process is waiting for some activity from another process in the server. wait_event will identify the specific wait point.

  • Timeout: The server process is waiting for a timeout to expire. wait_event will identify the specific wait point.

  • IO: The server process is waiting for a IO to complete. wait_event will identify the specific wait point.

3. 用同样的数据创建一个新表 发现查询很快。

CREATE TABLE test_twilio_twilionumber AS
SELECT
    *
FROM
    int_twilio_twilionumber

4. 执行vacuum操作,问题解决。 该操作很慢! 而且会锁表。

vacuum (VERBOSE, analyze) int_twilio_twilionumber
vacuum full int_twilio_twilionumber

PostgreSQL: Documentation: 9.5: VACUUM

总结

1. 如果pg某个查询很慢,排除sql语句性能问题后, 可以查看表: pg_stat_activity.

2. 如果怀疑是磁盘错误 可以使用vacuum full拷贝数据。  

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐