Metadata-Version: 2.4
Name: notion2pandas
Version: 1.10.0
Summary: Notion Client extension to import notion Database into pandas Dataframe
Project-URL: Homepage, https://gitlab.com/Jaeger87/notion2pandas
Author-email: Andrea Rosati <rosati.1595834@gmail.com>
License: MIT
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: <3.15,>=3.10
Requires-Dist: notion-client==3.0.0
Requires-Dist: numpy>=2.1.0
Requires-Dist: pandas==2.3.3
Description-Content-Type: text/markdown

# Notion2Pandas
<p align="center">
<img src="https://gitlab.com/Jaeger87/notion2pandas/-/raw/main/readme_assets/logo.png?ref_type=heads"  class="center">
</p> <p align="center">

<div align="center">
  <p>
    <a href="https://pypi.org/project/notion2pandas/"><img src="https://gitlab.com/Jaeger87/notion2pandas/-/badges/release.svg" alt="Latest Release"></a>
    <a href="https://pepy.tech/projects/notion2pandas"><img src="https://static.pepy.tech/personalized-badge/notion2pandas?period=total&units=INTERNATIONAL_SYSTEM&left_color=GRAY&right_color=BRIGHTGREEN&left_text=downloads" alt="PyPI Downloads"></a>
  <a href="https://codecov.io/gl/Jaeger87/notion2pandas" >
<img src="https://codecov.io/gl/Jaeger87/notion2pandas/branch/main/graph/badge.svg?token=R4XZ2IWDDU"/>
</a>
</p>
</div>

Notion2Pandas is a Python 3 package that extends the capabilities of the excellent [notion-sdk-py](https://ramnes.github.io/notion-sdk-py/) by [Ramnes](https://github.com/ramnes). It enables the seamless import of a Notion database into a pandas dataframe and vice versa, requiring just a single line of code.

## Installation

```bash
pip install notion2pandas
```

## Quick Start

### Synchronous Client

```python
from notion2pandas import Notion2PandasClient

# Create client
n2p = Notion2PandasClient(auth=os.environ["NOTION_TOKEN"])

# Import database to DataFrame
df = n2p.from_notion_DB_to_dataframe(os.environ["DATABASE_ID"])

# Work with your data
df.loc[df['Name'] == 'Task 1', 'Status'] = 'Done'

# Save changes back to Notion
n2p.update_notion_DB_from_dataframe(os.environ["DATABASE_ID"], df)
```

### Asynchronous Client

```python
from notion2pandas.async_client import AsyncNotion2PandasClient

# Create async client
async_n2p = AsyncNotion2PandasClient(auth=os.environ["NOTION_TOKEN"])

# Import database to DataFrame (concurrent processing)
df = await async_n2p.from_notion_DB_to_dataframe(os.environ["DATABASE_ID"])

# Work with your data
df.loc[df['Name'] == 'Task 1', 'Status'] = 'Done'

# Save changes back to Notion (concurrent updates)
await async_n2p.update_notion_DB_from_dataframe(os.environ["DATABASE_ID"], df)
```

## Usage

<p align="center">
<img src="https://gitlab.com/Jaeger87/notion2pandas/-/raw/main/readme_assets/notiondb2pandas.gif?ref_type=heads"  class="center">
</p> <p align="left">

### Choosing Between Sync and Async

**Use the synchronous client (`Notion2PandasClient`)** when:
- Working in Jupyter notebooks or simple scripts
- You prefer straightforward, blocking code

**Use the asynchronous client (`AsyncNotion2PandasClient`)** when:
- You need concurrent operations for better performance
- Your application is already async

### Basic Usage (Sync)

* Import the Notion2PandasClient class:
```python
from notion2pandas import Notion2PandasClient
```

* Create an instance by passing your authentication token:
```python
n2p = Notion2PandasClient(auth=os.environ["NOTION_TOKEN"])
```

* Use the `from_notion_DB_to_dataframe` method to get the data into a dataframe:
```python
df = n2p.from_notion_DB_to_dataframe(os.environ["DATABASE_ID"])
```

* When you're done working with your dataframe, use the `update_notion_DB_from_dataframe` method to save the data back to Notion:
```python
n2p.update_notion_DB_from_dataframe(os.environ["DATABASE_ID"], df)
```

### Basic Usage (Async)

* Import the AsyncNotion2PandasClient class:
```python
from notion2pandas.async_client import AsyncNotion2PandasClient
```

* Create an instance by passing your authentication token:
```python
async_n2p = AsyncNotion2PandasClient(auth=os.environ["NOTION_TOKEN"])
```

* Use the `from_notion_DB_to_dataframe` method with `await`:
```python
df = await async_n2p.from_notion_DB_to_dataframe(os.environ["DATABASE_ID"])
```

* Save changes back to Notion with `await`:
```python
await async_n2p.update_notion_DB_from_dataframe(os.environ["DATABASE_ID"], df)
```

### Configuring Concurrent Requests (Async Only)

Control the number of concurrent API requests to balance speed and rate limits:

```python
# Conservative (safer for rate limits)
async_n2p = AsyncNotion2PandasClient(
    auth=os.environ["NOTION_TOKEN"],
    max_concurrent_requests=5
)

# Aggressive (faster, but might hit rate limits)
async_n2p = AsyncNotion2PandasClient(
    auth=os.environ["NOTION_TOKEN"],
    max_concurrent_requests=20
)

# Default is 10 concurrent requests
```

### Working with Filters and Sorts

#### Sync
```python
published_filter = {
    "filter": {
        "property": "Published",
        "checkbox": {
            "equals": True
        }
    }
}

df = n2p.from_notion_DB_to_dataframe(os.environ["DATABASE_ID"], published_filter)
```

#### Async
```python
published_filter = {
    "filter": {
        "property": "Published",
        "checkbox": {
            "equals": True
        }
    }
}

df = await async_n2p.from_notion_DB_to_dataframe(os.environ["DATABASE_ID"], published_filter)
```

## 🆕 Working with Data Sources (API 2025-09-03)

Starting with Notion API version 2025-09-03, databases can contain **multiple data sources**. Notion2Pandas now fully supports this feature!

### Understanding Data Sources

Each Notion database now contains one or more **data sources**. When you use the basic methods without specifying a data source, Notion2Pandas automatically uses the **first data source** in the database.

### Getting Data Source Information

#### Sync
```python
# Get list of all data sources in a database
data_sources = n2p.get_data_source_ids(database_id)
# Returns: [{'id': 'ds_abc123...', 'name': 'Main Tasks'}, 
#           {'id': 'ds_def456...', 'name': 'Archive'}]
```

#### Async
```python
# Get list of all data sources in a database
data_sources = await async_n2p.get_data_source_ids(database_id)
# Returns: [{'id': 'ds_abc123...', 'name': 'Main Tasks'}, 
#           {'id': 'ds_def456...', 'name': 'Archive'}]
```

### Working with Specific Data Sources

#### Sync
```python
# Import from a specific data source
df = n2p.from_notion_DB_to_dataframe_kwargs(
    database_id=os.environ["DATABASE_ID"],
    data_source_id="your_data_source_id"
)

# Update a specific data source
n2p.update_notion_DB_from_dataframe(
    database_id=os.environ["DATABASE_ID"],
    df=df,
    data_source_id="your_data_source_id"
)
```

#### Async
```python
# Import from a specific data source
df = await async_n2p.from_notion_DB_to_dataframe_kwargs(
    database_id=os.environ["DATABASE_ID"],
    data_source_id="your_data_source_id"
)

# Update a specific data source
await async_n2p.update_notion_DB_from_dataframe(
    database_id=os.environ["DATABASE_ID"],
    df=df,
    data_source_id="your_data_source_id"
)
```

### 🎯 New Methods for Multi-Source Databases

#### Get All Data Sources as DataFrames (Sync)

```python
# Get all data sources from a database
dataframes_dict = n2p.from_notion_database_to_dataframes(
    database_id=os.environ["DATABASE_ID"]
)

# Access individual data sources
for data_source_id, df in dataframes_dict.items():
    print(f"Data Source: {data_source_id}")
    print(df.head())
```

#### Get All Data Sources as DataFrames (Async)

```python
# Get all data sources from a database (processed concurrently)
dataframes_dict = await async_n2p.from_notion_database_to_dataframes(
    database_id=os.environ["DATABASE_ID"]
)

# Access individual data sources
for data_source_id, df in dataframes_dict.items():
    print(f"Data Source: {data_source_id}")
    print(df.head())
```

#### Get Specific Data Sources (Sync)

```python
# Get specific data sources
data_source_ids = ["ds_abc123...", "ds_def456..."]
dataframes_dict = n2p.from_notion_data_sources_to_dataframes(
    data_source_ids=data_source_ids,
    filter_params=published_filter  # Optional: apply filters
)
```

#### Get Specific Data Sources (Async)

```python
# Get specific data sources (processed concurrently)
data_source_ids = ["ds_abc123...", "ds_def456..."]
dataframes_dict = await async_n2p.from_notion_data_sources_to_dataframes(
    data_source_ids=data_source_ids,
    filter_params=published_filter  # Optional: apply filters
)
```

### Backward Compatibility

**All existing code continues to work!** If you don't specify a `data_source_id`, Notion2Pandas automatically:
1. Retrieves all data sources for the database
2. Selects the first one
3. Logs which data source is being used

This ensures seamless migration to the new API version.

## PageID and Row_Hash

In the pandas dataframe, you'll find two additional columns compared to those in the original database: **PageID** and **Row_Hash**.

- **PageID**: The unique ID of the Notion page for that entry
- **Row_Hash**: A calculated value based on the row's field values, used by `update_notion_DB_from_dataframe` to detect changes and avoid unnecessary API calls

⚠️ **Important:** Do not modify these columns, as changes can lead to malfunctions!

## Utility Functions

Notion2Pandas extends the [Client](https://github.com/ramnes/notion-sdk-py/blob/main/notion_client/client.py) (or [AsyncClient](https://github.com/ramnes/notion-sdk-py/blob/main/notion_client/client.py)) class from notion_client, so all notion_client features are available. Additionally, Notion2Pandas provides convenient wrapper functions:

### Database and Data Source Methods

All methods are available in both sync and async versions. Async methods require `await`.

* `get_data_source_ids(database_id)` - Get all data sources in a database
* `get_database_columns(database_id, data_source_id=None)` - Get columns/properties
* `from_notion_database_to_dataframes(database_id, **kwargs)` - Get all data sources as dict of DataFrames
* `from_notion_data_sources_to_dataframes(data_source_ids, **kwargs)` - Get specific data sources as dict

### Page Methods

All methods are available in both sync and async versions. Async methods require `await`.

* `create_page(parent_id, properties=None, parent_type='data_source_id')`
* `update_page(page_id, **kwargs)`
* `retrieve_page(page_id)`
* `delete_page(page_id)`
* `delete_rows_and_pages(dataframe, rows_to_delete_indexes)`

### Block Methods

All methods are available in both sync and async versions. Async methods require `await`.

* `retrieve_block(block_id)`
* `retrieve_block_children_list(block_id)`
* `update_block(block_id, field, field_value_updated)`

## Read Write Functions

Notion2Pandas automatically parses Notion data types, but you can customize this behavior. Each Notion data type is associated with a tuple of two functions: one for reading and one for writing.

### Example: Custom Date Parsing

Parse only the start date from date ranges:

```python
def date_read_only_start(notion_property):
    return notion_property.get('date').get('start') if notion_property.get('date') is not None else '' 

def date_write_only_start(row_value):
    return {'date': {'start': row_value} if row_value != '' else None}, True

# Works for both sync and async clients
n2p.set_lambdas('date', date_read_only_start, date_write_only_start)
# or
async_n2p.set_lambdas('date', date_read_only_start, date_write_only_start)
```

### Function Signatures

Read and write functions can accept up to three arguments:
- **Primary argument**: `notion_property` (read) or `row_value` (write) - the data being processed
- `column_name` (optional): the column name, useful for column-specific logic
- `n2p` (optional): the Notion2PandasClient instance

**Return values:**
- Read functions: Return the value to insert into the DataFrame
- Write functions: Return a tuple `(value_for_notion, should_update_bool)`

⚠️ **Important:** Arguments must always be in this order: `(notion_property/row_value, column_name, n2p)`

### Example: Column-Specific Logic

Handle different columns differently:

```python
import ast

def relation_read(notion_property: dict, column_name: str) -> str:
    relations = notion_property.get('relation', [])
    relation_ids = [relation.get('id') for relation in relations]
    
    # Special handling for single-relation columns
    if column_name == 'Test table 1':
        if len(relation_ids) > 0:
            return relation_ids[0]
        return ''
    
    return str(relation_ids)

def relation_write(row_value: str, column_name: str):
    if row_value == '':
        return {"relation": []}, True
    
    if column_name == 'Test table 1':
        return {"relation": [{"id": row_value}]}, True
    
    notion_relations = ast.literal_eval(row_value)
    relation_ids = [{"id": notion_relation} for notion_relation in notion_relations]
    
    return {"relation": relation_ids}, True

n2p.set_lambdas('relation', relation_read, relation_write)
```

### Rich Text and Title Handling

Notion2Pandas preserves formatting and mentions in `rich_text` and `title` fields using Markdown-like syntax:

- **Bold**: `**text**`
- **Italic**: `*text*`
- **Underline**: `<u>text</u>`
- **Strikethrough**: `~~text~~`
- **Code**: `<code>text</code>`
- **Color**: `<span style="color:{color}">text</span>`
- **Links**: `[text](url)`
- **Equations**: `$expression$`
- **Mentions**:
  - Users: `<notion-user id="{user_id}" name="{name}" />`
  - Pages: `<notion-page id="{page_id}" title="{title}" href="{url}" />`

This allows you to edit formatted text in DataFrames while preserving all formatting when writing back to Notion.

💡 **Tip:** To implement custom parsing, start with the original functions from [n2p_read_write.py](https://gitlab.com/Jaeger87/notion2pandas/-/blob/main/notion2pandas/n2p_read_write.py) and modify them.

### Supported Data Types

| Notion Data Type | Function Key         |
|------------------|---------------------|
| Title            | title               |
| Rich Text        | rich_text           |
| Checkbox         | checkbox            |
| Number           | number              |
| Date             | date                |
| Date Range       | date_range          |
| Select           | select              |
| Multi Select     | multi_select        |
| Status           | status              |
| Email            | email               |
| People           | people              |
| Phone Number     | phone_number        |
| URL              | url                 |
| Relation         | relation            |
| Rollup           | rollup              |
| Files            | files               |
| Formula          | formula             |
| String           | string              |
| Unique ID        | unique_id           |
| Button           | button              |
| Created By       | created_by          |
| Created Time     | created_time        |
| Last Edited By   | last_edited_by      |
| Last Edited Time | last_edited_time    |
| Place            | place               |

## Adding and Removing Rows

### Adding Rows

When you add a row to the DataFrame and call `update_notion_DB_from_dataframe`, Notion2Pandas automatically creates a new page in Notion.

Both sync and async clients work the same way - just remember to use `await` with the async client.

### Removing Rows (Sync)

Deleting rows from the DataFrame doesn't automatically delete pages from Notion. Use the `delete_rows_and_pages` method:

```python
# Delete specific rows and their corresponding Notion pages
rows_to_delete = [0, 5, 10]  # DataFrame row indices
n2p.delete_rows_and_pages(df, rows_to_delete)
```

### Removing Rows (Async)

```python
# Delete specific rows and their corresponding Notion pages
rows_to_delete = [0, 5, 10]  # DataFrame row indices
await async_n2p.delete_rows_and_pages(df, rows_to_delete)
```

This method:
1. Deletes the pages from Notion
2. Removes the rows from the DataFrame

## Adding Page Data to the DataFrame

Sometimes you need data from the Notion page itself (not just database properties). You can add custom columns during DataFrame creation:

### Sync Example

```python
from notion2pandas import Notion2PandasClient

def get_cover_page(notion_page):
    """Extract the cover image URL from a Notion page"""
    cover_obj = notion_page.get('cover')
    if cover_obj is None:
        return ''
    cover_type = cover_obj.get('type')
    if cover_type == 'external':
        return cover_obj.get('external').get('url')
    if cover_type == 'file':
        return cover_obj.get('file').get('url')
    return ''

def get_icon_page(notion_page):
    """Extract the icon from a Notion page"""
    icon_obj = notion_page.get('icon')
    if icon_obj is None:
        return ''
    icon_type = icon_obj.get('type')
    if icon_type == 'external':
        return icon_obj.get('external').get('url')
    if icon_type == 'file':
        return icon_obj.get('file').get('url')
    if icon_type == 'emoji':
        return icon_obj.get('emoji')
    return ''

def get_image_url(notion_blocks):
    """Extract the first image URL from page blocks"""
    if notion_blocks is None:
        return ''
    for block in notion_blocks.get('results'):
        if block.get('type') == 'image':
            image = block.get('image')
            image_type = image.get('type')
            if image_type == 'file':
                return image.get('file').get('url')
    return ''

# Define custom columns
custom_page_prop = {
    'icon': get_icon_page,
    'cover': get_cover_page
}

custom_block_prop = {
    'inside_image': get_image_url
}

# Create DataFrame with custom columns
n2p = Notion2PandasClient(auth='token')
df = n2p.from_notion_DB_to_dataframe_kwargs(
    'database_id',
    columns_from_page=custom_page_prop,
    columns_from_blocks=custom_block_prop
)
```

### Async Example

```python
from notion2pandas.async_client import AsyncNotion2PandasClient

# Use the same custom functions as above

# Define custom columns
custom_page_prop = {
    'icon': get_icon_page,
    'cover': get_cover_page
}

custom_block_prop = {
    'inside_image': get_image_url
}

# Create DataFrame with custom columns (processes pages concurrently)
async_n2p = AsyncNotion2PandasClient(auth='token')
df = await async_n2p.from_notion_DB_to_dataframe_kwargs(
    'database_id',
    columns_from_page=custom_page_prop,
    columns_from_blocks=custom_block_prop
)
```

### Important Notes

⚠️ **Performance Warning:** Using `columns_from_page` or `columns_from_blocks` results in:
- **One API call per row** for `columns_from_page`
- **One API call per row** for `columns_from_blocks`
- Using both means **two API calls per row**

💡 **Async Advantage:** The async client processes these calls concurrently, significantly reducing total execution time for large databases.

🔒 **Read-Only:** Custom columns are **read-only**. Modifying their values in the DataFrame will **not** update Notion. Use the appropriate Notion API methods to update this data.

## Notion Executor

The `_notion_executor` method handles all Notion API calls with automatic retry logic for:
- Network issues
- Rate limits
- Internal server errors
- Other transient failures

### Configuration (Sync)

```python
n2p = Notion2PandasClient(
    auth=token,
    secondsToRetry=20,        # Wait 20 seconds between retries
    maxAttemptsExecutioner=10  # Try up to 10 times
)
```

### Configuration (Async)

```python
async_n2p = AsyncNotion2PandasClient(
    auth=token,
    secondsToRetry=20,         # Wait 20 seconds between retries
    maxAttemptsExecutioner=10,  # Try up to 10 times
    max_concurrent_requests=10  # Async-specific: concurrent requests limit
)
```

**Default values:**
- `secondsToRetry`: 30 seconds
- `maxAttemptsExecutioner`: 3 attempts
- `max_concurrent_requests`: 10 (async only)

## Logging

Both `Notion2PandasClient` and `AsyncNotion2PandasClient` use the built-in logger from `NotionClient` to provide helpful debug and info messages during execution.

### Option 1: Set Log Level (Simple)

```python
import logging
from notion2pandas import Notion2PandasClient
from notion2pandas.async_client import AsyncNotion2PandasClient

# Sync
n2p = Notion2PandasClient(auth="your_token", log_level=logging.DEBUG)

# Async
async_n2p = AsyncNotion2PandasClient(auth="your_token", log_level=logging.DEBUG)
```

### Option 2: Custom Logger (Advanced)

For full control over logging behavior:

```python
import logging
from notion2pandas import Notion2PandasClient

# Create a custom logger
logger = logging.getLogger("notion2pandas")
logger.setLevel(logging.DEBUG)

# Create handler (e.g., output to stdout)
handler = logging.StreamHandler()

# Define custom format
formatter = logging.Formatter("[%(levelname)s] %(asctime)s - %(message)s")
handler.setFormatter(formatter)

# Add handler to logger
logger.addHandler(handler)

# Pass the logger to the client (works for both sync and async)
n2p = Notion2PandasClient(auth="your_token", logger=logger)
```

**Note:** If both `logger` and `log_level` are provided, the custom logger takes precedence.

## Migration Guide: API 2025-09-03

### What Changed?

The Notion API now supports **multi-source databases**, where a single database can contain multiple data sources. This is a significant architectural change.

### Do I Need to Change My Code?

**No!** Notion2Pandas maintains full backward compatibility. Existing code will:
1. Automatically detect and use the first data source
2. Log which data source is being used
3. Work exactly as before

### When Should I Specify data_source_id?

Specify `data_source_id` when you need to:
- Work with a specific data source in a multi-source database
- Ensure you're querying the correct data source
- Work with multiple data sources programmatically

### Example Migration

**Before (still works):**
```python
df = n2p.from_notion_DB_to_dataframe(database_id)
# Async
df = await async_n2p.from_notion_DB_to_dataframe(database_id)
```

**After (explicit data source):**
```python
# Get available data sources
data_sources = n2p.get_data_source_ids(database_id)
# Async
data_sources = await async_n2p.get_data_source_ids(database_id)

print(f"Available sources: {data_sources}")

# Use specific data source
df = n2p.from_notion_DB_to_dataframe_kwargs(
    database_id=database_id,
    data_source_id=data_sources[0]['id']
)
# Async
df = await async_n2p.from_notion_DB_to_dataframe_kwargs(
    database_id=database_id,
    data_source_id=data_sources[0]['id']
)
```

**Working with all data sources:**
```python
# Get all data sources as separate DataFrames
all_dfs = n2p.from_notion_database_to_dataframes(database_id)
# Async (processed concurrently)
all_dfs = await async_n2p.from_notion_database_to_dataframes(database_id)

for ds_id, df in all_dfs.items():
    print(f"Processing data source: {ds_id}")
    # Work with each DataFrame separately
```

## Roadmap

Planned features for upcoming releases:
- Managing the 2700 API calls / 15 minutes rate limit
- Custom DataFrame implementations

## Changelog

View the complete version history on the [changelog page](https://gitlab.com/Jaeger87/notion2pandas/-/blob/main/CHANGELOG.md?ref_type=heads).

## Support

Notion2Pandas is an open-source project. Contributions are welcome!

- **Report Issues**: Found a bug? [Open an issue](https://gitlab.com/Jaeger87/notion2pandas/-/issues)
- **Propose Changes**: Have an improvement? [Submit a merge request](https://gitlab.com/Jaeger87/notion2pandas/-/merge_requests)
- **Fork the Project**: Disagree with the direction? You're free to fork with our blessing!

All proposals will be evaluated and responded to.

## License

This project is open-source and available under the MIT License.