SQLite批量插入效率

SQLite对于大批量插入数据时的效率提升方法

PengX_Seek

8420人浏览 · 2022-03-31 19:25:50

PengX_Seek · 2022-03-31 19:25:50 发布

一、循环单条数据插入

SQLite插入单条数据效率并不高，原因是：

Because it does not have a central server to coordinate access, SQLite must close and reopen the database file, and thus invalidate its cache, for each transaction. In this test, each SQL statement is a separate transaction so the database file must be opened and closed and the cache must be flushed 1000 times.

因为它没有中央服务器来协调访问，所以SQLite必须为每个事务关闭并重新打开数据库文件，从而使其缓存无效。在这个测试中，每个SQL语句都是一个独立的事务，因此必须打开和关闭数据库文件，并且必须刷新缓存1000次。

插入 1000 条数据测试：

const static char * insert_record_static = "INSERT INTO tbl_test (a, b, c) VALUES (1, 1, 1);";
// 1、循环插入 1000 条数据
t_start = clock();
for(int i = 0; i < 1000; i++)
{
    sqlite3_exec(db, insert_record_static, NULL, NULL, &err);
}
t_stop = clock();
std::cout << "Insert static 1000 cost time " << (t_stop - t_start) << " ms" << std::endl;

结果输出:

Insert static 1000 cost time 5322 ms

二、使用事务提升插入效率

使用 BEGIN 命令手动启动事务，COMMIT 命令是用于把事务调用的更改保存到数据库中的事务命令。由此可以得出，只有等到 COMMIT 命令才会执行磁盘IO的读写，事务中执行的多次插入语句，只用打开关闭一次数据库文件，极大的提升了批量插入数据的效率。

插入 100000 条数据测试：

const static char * insert_record_static = "INSERT INTO tbl_test (a, b, c) VALUES (1, 1, 1);";
sqlite3_exec(db, "begin;", 0, 0, 0);
t_start = clock();
for(int i = 0; i < 100000; i++)
{
    sqlite3_exec(db, insert_record_static, 0, 0, &err);
}
sqlite3_exec(db, "commit;", 0, 0, 0);
t_stop = clock();
std::cout << "Insert static 100000 cost time " << (t_stop - t_start) << " ms" << std::endl;

结果输出:

Insert static 1000 cost time 1588 ms

三、使用预编译执行提升插入效率

To execute an SQL statement, it must first be compiled into a byte-code program using one of these routines. Or, in other words, these routines are constructors for the prepared statement object.

要执行SQL语句，必须首先使用这些例程之一将其编译成字节码程序。换句话说，这些例程是预备语句对象的构造函数。

实际sqlite每次在执行语句时，都会对语句进行编译，查看源码可以看到sqlite3_exec函数中都会调用sqlite3_prepare_v2执行编译，因此直接采用预编译的方式代替语句的直接执行也能够提升效率。

插入 100000 条数据测试：

sqlite3_exec(db, "begin;", 0, 0, 0);
t_start = clock();
sqlite3_prepare_v2(db, insert_record_bind, strlen(insert_record_bind), &stmt, 0);
for(int i = 0; i < 100000; i++)
{
    sqlite3_bind_int(stmt, 1, i);
    sqlite3_bind_int(stmt, 2, i * 2);
    sqlite3_bind_int(stmt, 3, i * 3);
    sqlite3_step(stmt);
    sqlite3_reset(stmt);
}
sqlite3_exec(db, "commit;", 0, 0, 0);
t_stop = clock();
std::cout << "Insert bind 100000 cost time " << (t_stop - t_start) << " ms" << std::endl;

结果输出:

Insert bind 100000 cost time 411 ms

四、关闭写同步（synchronous）

个人建议这个优化项可以忽略，速度的提高带来的结果是安全性的降低。

synchronous的设置有四种：

sqlite3_exec(db, "PRAGMA synchronous = EXTRA;", 0, 0, 0);   // (3)
sqlite3_exec(db, "PRAGMA synchronous = FULL;", 0, 0, 0);    // (2)
sqlite3_exec(db, "PRAGMA synchronous = NORMAL;", 0, 0, 0);  // (1)
sqlite3_exec(db, "PRAGMA synchronous = OFF;", 0, 0, 0);     // (0)

EXTRA (3)

EXTRA synchronous is like FULL with the addition that the directory containing a rollback journal is synced after that journal is unlinked to commit a transaction in DELETE mode. EXTRA provides additional durability if the commit is followed closely by a power loss.

EXTRA同步类似于FULL，只不过在日志被解除链接后，包含回滚日志的目录将被同步，以以DELETE模式提交事务。如果提交之后紧接着出现功耗损失，则EXTRA提供了额外的持久性。

FULL (2)

When synchronous is FULL (2), the SQLite database engine will use the xSync method of the VFS to ensure that all content is safely written to the disk surface prior to continuing. This ensures that an operating system crash or power failure will not corrupt the database. FULL synchronous is very safe, but it is also slower. FULL is the most commonly used synchronous setting when not in WAL mode.

当synchronous为FULL(2)时，SQLite数据库引擎将使用VFS的xSync方法来确保在继续之前将所有内容安全地写入磁盘表面。这可以确保操作系统崩溃或电源故障不会破坏数据库。全同步非常安全，但也比较慢。 FULL是不使用WAL模式时最常用的同步设置。

NORMAL (1)

When synchronous is NORMAL (1), the SQLite database engine will still sync at the most critical moments, but less often than in FULL mode. There is a very small (though non-zero) chance that a power failure at just the wrong time could corrupt the database in journal_mode=DELETE on an older filesystem. WAL mode is safe from corruption with synchronous=NORMAL, and probably DELETE mode is safe too on modern filesystems. WAL mode is always consistent with synchronous=NORMAL, but WAL mode does lose durability. A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash. Transactions are durable across application crashes regardless of the synchronous setting or journal mode. The synchronous=NORMAL setting is a good choice for most applications running in WAL mode

当同步模式为NORMAL(1)时，SQLite数据库引擎仍然会在最关键的时刻进行同步，但频率比FULL模式低。在一个较旧的文件系统上，在错误的时间出现电源故障可能会破坏journal_mode=DELETE中的数据库，这种可能性很小(尽管不为零)。 WAL模式在使用synchronous=NORMAL时是安全的，DELETE模式在现代文件系统上可能也是安全的。 WAL模式总是与synchronous=NORMAL一致，但是WAL模式会失去耐久性。使用synchronous=NORMAL的WAL模式提交的事务可能会在断电或系统崩溃后回滚。事务在应用程序崩溃时是持久的，无论采用同步设置还是日志模式。对于大多数以WAL模式运行的应用程序来说，synchronous=NORMAL设置是一个很好的选择。

OFF (0)

With synchronous OFF (0), SQLite continues without syncing as soon as it has handed data off to the operating system. If the application running SQLite crashes, the data will be safe, but the database might become corrupted if the operating system crashes or the computer loses power before that data has been written to the disk surface. On the other hand, commits can be orders of magnitude faster with synchronous OFF.

使用同步OFF (0)， SQLite一旦将数据传递给操作系统，就会继续不进行同步。如果运行SQLite的应用程序崩溃，数据将是安全的，但如果操作系统崩溃或计算机在数据被写到磁盘表面之前失去电源，则数据库可能会损坏。另一方面，使用同步OFF可以使提交快几个数量级。

五、测试代码

// sqlite_example.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <time.h>
#include <iostream>

#include "sqlite3/sqlite3.h"

int _tmain(int argc, _TCHAR* argv[])
{
    int ret = 0;
    char *err = NULL;
    sqlite3 *db = NULL;
    sqlite3_stmt *stmt = NULL;
    clock_t t_start, t_stop;


    // 打开数据库
    sqlite3_open("sqlite_example.db", &db);
    // 关闭写同步
    sqlite3_exec(db, "PRAGMA synchronous = OFF;", 0, 0, 0);
    // 删除表
    sqlite3_exec(db, "DROP TABLE if exists tbl_test;", 0, 0, 0);
    // 创建表
    sqlite3_exec(db, "CREATE TABLE tbl_test (a INTEGER, b INTEGER, c INTEGER);", 0, 0, 0);

    const static char * insert_record_static = "INSERT INTO tbl_test (a, b, c) VALUES (1, 1, 1);";
    const static char * insert_record_bind = "INSERT INTO tbl_test (a, b, c) VALUES (?, ?, ?);";

    // 1、循环插入 1000 条数据
    t_start = clock();
    for(int i = 0; i < 1000; i++)
    {
        sqlite3_exec(db, insert_record_static, 0, 0, &err);
    }
    t_stop = clock();
    std::cout << "Insert static 1000 cost time " << (t_stop - t_start) << " ms" << std::endl;

    t_start = clock();
    sqlite3_prepare_v2(db, insert_record_bind, strlen(insert_record_bind), &stmt, 0);
    for(int i = 0; i < 1000; i++)
    {
        sqlite3_bind_int(stmt, 1, i);
        sqlite3_bind_int(stmt, 2, i * 2);
        sqlite3_bind_int(stmt, 3, i * 3);
        sqlite3_step(stmt);
        sqlite3_reset(stmt);
    }
    t_stop = clock();
    std::cout << "Insert bind 1000 cost time " << (t_stop - t_start) << " ms" << std::endl;

    // 2、使用事务提升插入效率
    sqlite3_exec(db, "begin;", 0, 0, 0);
    t_start = clock();
    for(int i = 0; i < 100000; i++)
    {
        sqlite3_exec(db, insert_record_static, 0, 0, &err);
    }
    sqlite3_exec(db, "commit;", 0, 0, 0);
    t_stop = clock();
    std::cout << "Insert static 100000 cost time " << (t_stop - t_start) << " ms" << std::endl;

    // 3、使用预编译提升插入效率
    sqlite3_exec(db, "begin;", 0, 0, 0);
    t_start = clock();
    sqlite3_prepare_v2(db, insert_record_bind, strlen(insert_record_bind), &stmt, 0);
    for(int i = 0; i < 100000; i++)
    {
        sqlite3_bind_int(stmt, 1, i);
        sqlite3_bind_int(stmt, 2, i * 2);
        sqlite3_bind_int(stmt, 3, i * 3);
        sqlite3_step(stmt);
        sqlite3_reset(stmt);
    }
    sqlite3_exec(db, "commit;", 0, 0, 0);
    t_stop = clock();
    std::cout << "Insert bind 100000 cost time " << (t_stop - t_start) << " ms" << std::endl;

    sqlite3_close(db);
    sqlite3_finalize(stmt);

    return 0;
}