跳转至

SQLite

一、概述

AILab-PDBC (Python DataBase Connectivity)是数智教育发展(山东)有限公司 AI Lab 100 团队开发的高效、灵活的数据接口(API)

ailab100.pdbc.sql类用于用于关系型数据库管理系统(RDBMS)的读写,支持MySQL 、微软SQL SERVERSQLiteORACLEMariaDBPostgreSQL等主流结构化数据库。

本章介绍如何使用ailab100.pdbc.sql.Mysql 连接 SQLite 数据库进行读写和下载,以及数据库类型和Pandas类型之间的转换。

SQLite是一款轻量级的关系型数据库,它的设计目标是用于嵌入式系统中的,占用资源很低,只需要几百K的内存,支持Windows、Linux、Unix等主流的操作系统。

二、API 接口说明

(一) DBConnector 基类

(二) SQLite 类

三、应用案例及技术性能

(一) SQLite应用案例

Since SQLite adopts a dynamic type system, we infer type as follow:

  • If there is a declared type of the column, we derive the type using column affinity rules, code can be found here.
  • Otherwise we directly adopt the value's type in the first row of the result (in each partition), which results in INTEGER, REAL, TEXT and BLOB.
  • If the first row of the result is NULL in the partition, try next partition. Throw an error if first rows of all partitions are NULL for a column.

SQLite Connection

import connectorx as cx

db_path = '/home/user/path/examples.alcedo_pdbc'  # path to your SQLite database
conn = 'sqlite://' + db_path  # connection token
query = 'SELECT * FROM `database.dataset.table`'  # query string
cx.read_sql(conn,query)  # read data from SQLite

Example on windows:

import connectorx as cx
import urllib

db_path = urllib.parse.quote("C:\\user\\path\\examples.alcedo_pdbc"
                             )  # url encode the path to your SQLite database         
conn = 'sqlite://' + db_path  # connection token
query = 'SELECT * FROM `database.dataset.table`'  # query string
cx.read_sql(conn,query)  # read data from SQLite

(二) 数据类型映射

SQLite - Pandas 数据类型映射

SQLite Type Pandas Type Comment
INTEGER int64, Int64(nullable) declared type that contains substring "int"
BOOL bool, boolean(nullable) declared type is "boolean" or "bool"
REAL float64 declared type that contains substring "real", "floa", "doub"
TEXT object declared type that contains substring "char", "clob", "text"
BLOB object declared type that contains substring "blob"
DATE datetime64[ns] declared type is "date"
TIME object declared type is "time"
TIMESTAMP datetime64[ns] declared type is "datetime" or "timestamp", the format must follow YYYY-MM-DD HH:MM:SS"/"YYYY-MM-DD HH:MM:SS.SSS

(三) 性能测试

实验室通过比较 Python 中的不同解决方案,采用 4 个线程并行处理,读取 SQL Server 中一个 10,981,106 行的数据表 (1,092,616,192 字节,1.02 GB) 加载到 DataFrame 中,实验结果如下:

(1) 响应时间 (越短越好)

pg-time.png

(2) 内存消耗 (越低越好)

pg-time.png

总之,AILab PDBC 使用的内存减少了 1/3 ,与 Pandas 相比响应时间减少了近 1 倍(与 Polars 相比,响应时间相差无几 )。