Metadata-Version: 2.4
Name: pyxlsbwriter
Version: 0.1.0
Summary: Read and write XLSB and XLSX files efficiently.
Author: Krzysztof Duśko
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Office/Business :: Office Suites
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Provides-Extra: test
Requires-Dist: pyodbc; extra == "test"
Provides-Extra: examples
Requires-Dist: memory-profiler; extra == "examples"

# Python XLSB Reader & Writer

A Python library for reading and writing XLSB and XLSX files efficiently.

## Installation

```bash
pip install pyxlsbwriter
```

## Usage

### Basic Example

```python
from pyxlsbwriter import XlsbWriter
import datetime
from decimal import Decimal

data = [
    ["Name", "Age", "City", "info"],
    [-123, 2147483647, 2147483648, 2147483999],
    ["x", "y", "z", datetime.datetime.today()],
    ["Alice", 25, "New York", datetime.date.today()],
    ["Bob", 30, "London", Decimal(3.14)],
    ["Charlie", 35, "Paris", datetime.datetime.now()],
    [True, False, None, datetime.datetime.utcnow()]
]

# Initialize writer with a specific compression level
with XlsbWriter("output.xlsb", compressionLevel=6) as writer:
    # Add a visible sheet
    writer.add_sheet("Visible Sheet")
    writer.write_sheet(data)

    # Add a hidden sheet
    writer.add_sheet("Hidden Sheet", hidden=True)
    writer.write_sheet([["This sheet is hidden."]])
```

### XlsxWriter Example

```python
from pyxlsbwriter import XlsxWriter
import datetime
from decimal import Decimal

data = [
    ["Name", "Age", "City", "info"],
    [-123, 2147483647, 2147483648, 2147483999],
    ["x", "y", "z", datetime.datetime.today()],
    ["Alice", 25, "New York", datetime.date.today()],
    ["Bob", 30, "London", Decimal(3.14)],
    ["Charlie", 35, "Paris", datetime.datetime.now()],
    [True, False, None, datetime.datetime.utcnow()]
]

# Initialize writer with a specific compression level
with XlsxWriter("output.xlsx", compressionLevel=6) as writer:
    # Add a visible sheet
    writer.add_sheet("Visible Sheet")
    writer.write_sheet(data)

    # Add a hidden sheet
    writer.add_sheet("Hidden Sheet", hidden=True)
    writer.write_sheet([["This sheet is hidden."]])
```

### Reading XLSB and XLSX Files

Reading files is done via the `ExcelReader` class, which automatically detects the format.

```python
from pyxlsbwriter import ExcelReader

with ExcelReader("input.xlsx") as reader:  # or .xlsb
    names = reader.get_sheet_names()
    print(f"Sheets: {names}")

    for sheet_name in names:
        rows = reader.read_all(sheet_name)
        for row in rows:
            print(row)

# Generator usage (memory efficient for large files):
with ExcelReader("large_file.xlsb") as reader:
    for row in reader.get_rows("Sheet1"):
        print(row)
```

### Streaming from a Database (ODBC)

This example shows how to stream data directly from a database query into an XLSB file. This is highly memory-efficient as it doesn't load the entire dataset into memory.

First, ensure you have `pyodbc` installed:
```bash
pip install pyodbc
```

Then, you can use a generator function to feed data to `XlsbWriter`.

```python
import os
import pyodbc
from typing import Generator
from pyxlsbwriter import XlsbWriter

# --- Configuration ---
# Make sure you have an ODBC driver and a configured DSN, or use a DSN-less connection string.
DSN = "DRIVER={Your ODBC Driver};SERVER=your_server;DATABASE=your_db;UID=your_user;PWD=your_password"
QUERY = "SELECT * FROM YourTable"
OUTPUT_FILENAME = "db_output.xlsb"

def row_generator(cursor: pyodbc.Cursor) -> Generator[list[any], None, None]:
    """
    Generates rows from a pyodbc cursor, yielding headers first, followed by data rows.
    """
    # Extract column headers from cursor description
    headers = [column[0] for column in cursor.description]
    yield headers

    # Yield each row until the cursor is exhausted
    while row := cursor.fetchone():
        yield list(row)

# --- Main Execution ---
try:
    # Connect to the database
    with pyodbc.connect(DSN) as conn:
        cursor = conn.cursor()
        cursor.execute(QUERY)

        # Use XlsbWriter to write the data stream
        with XlsbWriter(OUTPUT_FILENAME) as writer:
            writer.add_sheet("Database Export")
            writer.write_sheet(row_generator(cursor))
            
            # You can also add hidden sheets with metadata, like the query itself
            writer.add_sheet("SQL Query", hidden=True)
            writer.write_sheet([["SQL"], [QUERY]])


    print(f"Successfully created '{OUTPUT_FILENAME}'")

except pyodbc.Error as ex:
    sqlstate = ex.args[0]
    print(f"Database connection or query execution error: {sqlstate}\n{ex}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")
```

## Performance

`pyxlsbwriter` is designed for high performance, both when writing and reading large datasets.

### Writing

The binary `.xlsb` format is significantly more compact and faster to write compared to the standard XML-based `.xlsx` format.

A benchmark was performed with a 50000x50 dataset:

| Library        | Time Taken | File Size |
|----------------|------------|-----------|
| `pyxlsbwriter` (XlsbWriter) | 2.48s | 7.52 MB |
| `pyxlsbwriter` (XlsxWriter) | 5.02s | 6.58 MB |
| [`xlsxwriter`](https://pypi.org/project/xlsxwriter/) | 11.56s | 11.35 MB |

Note: The `xlsxwriter` library offers significantly more features than `pyxlsbwriter`, which is why its performance is lower. `pyxlsbwriter` is optimized for high-performance writing of large datasets with a focus on speed and file size efficiency.

### Reading

| Dataset       | XLSX  | XLSB  |
|---------------|-------|-------|
| 65K rows x 14 cols | ~1.9s | ~2.4s |
| 200K rows x 4 cols  | ~2.0s | ~3.2s |

XLSX reading uses the native C-based `expat` XML parser, while XLSB reading is implemented in pure Python (binary BIFF record parser), which explains the difference in read performance.

You can run the writing benchmark yourself using the script located in `examples/performance_test.py`.
