SQLite
一、概述
AILab-PDBC (Python DataBase Connectivity)是数智教育发展(山东)有限公司 AI Lab 100 团队开发的高效、灵活的数据接口(API)
ailab100.pdbc.sql类用于用于关系型数据库管理系统(RDBMS)的读写,支持MySQL
、微软SQL SERVER、SQLite、ORACLE、MariaDB 、PostgreSQL等主流结构化数据库。
本章介绍如何使用ailab100.pdbc.sql.Mysql 连接 SQLite 数据库进行读写和下载,以及数据库类型和Pandas类型之间的转换。
SQLite是一款轻量级的关系型数据库,它的设计目标是用于嵌入式系统中的,占用资源很低,只需要几百K的内存,支持Windows、Linux、Unix等主流的操作系统。
二、API 接口说明
(一) DBConnector 基类
(二) SQLite 类
三、应用案例及技术性能
(一) SQLite应用案例
Since SQLite adopts a dynamic type system, we infer type as follow:
- If there is a declared type of the column, we derive the type using column affinity rules, code can be found here.
- Otherwise we directly adopt the value's type in the first row of the result (in each partition), which results in INTEGER, REAL, TEXT and BLOB.
- If the first row of the result is NULL in the partition, try next partition. Throw an error if first rows of all partitions are NULL for a column.
SQLite Connection
import connectorx as cx
db_path = '/home/user/path/examples.alcedo_pdbc' # path to your SQLite database
conn = 'sqlite://' + db_path # connection token
query = 'SELECT * FROM `database.dataset.table`' # query string
cx.read_sql(conn,query) # read data from SQLite
Example on windows:
import connectorx as cx
import urllib
db_path = urllib.parse.quote("C:\\user\\path\\examples.alcedo_pdbc"
) # url encode the path to your SQLite database
conn = 'sqlite://' + db_path # connection token
query = 'SELECT * FROM `database.dataset.table`' # query string
cx.read_sql(conn,query) # read data from SQLite
(二) 数据类型映射
SQLite - Pandas 数据类型映射
| SQLite Type | Pandas Type | Comment |
|---|---|---|
| INTEGER | int64, Int64(nullable) | declared type that contains substring "int" |
| BOOL | bool, boolean(nullable) | declared type is "boolean" or "bool" |
| REAL | float64 | declared type that contains substring "real", "floa", "doub" |
| TEXT | object | declared type that contains substring "char", "clob", "text" |
| BLOB | object | declared type that contains substring "blob" |
| DATE | datetime64[ns] | declared type is "date" |
| TIME | object | declared type is "time" |
| TIMESTAMP | datetime64[ns] | declared type is "datetime" or "timestamp", the format must follow YYYY-MM-DD HH:MM:SS"/"YYYY-MM-DD HH:MM:SS.SSS |
(三) 性能测试
实验室通过比较 Python 中的不同解决方案,采用 4 个线程并行处理,读取 SQL Server 中一个 10,981,106 行的数据表 (1,092,616,192 字节,1.02 GB) 加载到 DataFrame 中,实验结果如下:
(1) 响应时间 (越短越好)

(2) 内存消耗 (越低越好)

总之,AILab PDBC 使用的内存减少了 1/3 ,与 Pandas 相比响应时间减少了近 1 倍(与 Polars 相比,响应时间相差无几 )。