PostgreSQL
一、概述
AILab-PDBC (Python DataBase Connectivity)是数智教育发展(山东)有限公司 AI Lab 100 团队开发的高效、灵活的数据接口(API)
ailab100.pdbc.sql类用于用于关系型数据库管理系统(RDBMS)的读写,支持MySQL
、微软SQL SERVER、SQLite、ORACLE、MariaDB 、PostgreSQL等主流结构化数据库。
本章介绍如何使用ailab100.pdbc.sql.ORACLE 连接 PostgreSQL 数据库进行读写和下载,以及数据库类型和Pandas类型之间的转换。
PostgreSQL是以加州大学伯克利分校计算机系开发的对象关系型数据库管理系统(RDBMS)。
二、API 接口说明
(一) DBConnector 基类
(二) PostgreSQL 类
Protocols
binary: Postgres Binary COPY protocol, recommend to use in general since fast data parsing speed.csv: Postgres CSV COPY protocol, recommend to use when network is slow (csvusually results in smaller size thanbinary).cursor: Conventional wire protocol (slowest one), recommend to use only whenbinaryandcsvis not supported by the source (e.g. Redshift).
Postgres Connection
Adding `sslmode=require` to connection uri parameter force SSL connection. Example: `postgresql://username:password@host:port/db?sslmode=require`. `sslmode=disable` to disable SSL connection.
To connect to redshift, replace `postgresql://` with `redshift://`.
import connectorx as cx
conn = 'postgres://username:password@server:port/database' # connection token
query = "SELECT * FROM table" # query string
cx.read_sql(conn,query) # read data from Postgres
Postgres-Pandas Type Mapping
| Postgres Type | Pandas Type | Comment |
|---|---|---|
| BOOL | bool, boolean(nullable) | |
| INT2 | int64, Int64(nullable) | |
| INT4 | int64, Int64(nullable) | |
| INT8 | int64, Int64(nullable) | |
| FLOAT4 | float64 | |
| FLOAT8 | float64 | |
| NUMERIC | float64 | cannot support precision larger than 28 |
| TEXT | object | |
| BPCHAR | object | |
| VARCHAR | object | |
| CHAR | object | |
| BYTEA | object | |
| DATE | datetime64[ns] | |
| TIME | object | |
| TIMESTAMP | datetime64[ns] | |
| TIMESTAMPZ | datetime64[ns] | |
| UUID | object | |
| JSON | object | |
| JSONB | object | |
| ENUM | object | need to convert enum column to text manually (::text) when using csv and cursor protocol |
| ltree | object | binary protocol supported only after Postgres version 13 |
| lquery | object | binary protocol supported only after Postgres version 13 |
| ltxtquery | object | binary protocol supported only after Postgres version 13 |
| INT2[] | object | list of i64 |
| INT4[] | object | list of i64 |
| INT8[] | object | list of i64 |
| FLOAT4[] | object | list of f64 |
| FLOAT8[] | object | list of f64 |
| NUMERIC[] | object | list of f64 |
Performance (db.m6g.4xlarge RDS)
- Time chart, lower is better.

- Memory consumption chart, lower is better.

In conclusion, ConnectorX uses 3x less memory and 13x less time compared with Pandas.