Metadata-Version: 2.4
Name: folio_data_import
Version: 0.6.0
Summary: A python module to perform bulk import of data into a FOLIO environment. Currently supports MARC and user data import.
Author: Brooks Travis
Author-email: Brooks Travis <brooks.travis@gmail.com>
License-Expression: MIT
Requires-Dist: folioclient>=1.0.3,<2.0
Requires-Dist: pymarc>=5.2.2
Requires-Dist: pyhumps>=3.8.0
Requires-Dist: tabulate>=0.9.0
Requires-Dist: aiofiles>=24.1.0
Requires-Dist: flake8-black>=0.3.6
Requires-Dist: flake8-bugbear>=24.8.19
Requires-Dist: flake8-bandit>=4.1.1
Requires-Dist: flake8-isort>=6.1.1
Requires-Dist: flake8-docstrings>=1.7.0
Requires-Dist: cyclopts>=4.2.0
Requires-Dist: questionary>=2.1.1
Requires-Dist: folio-uuid>=1.0.0
Requires-Dist: pydantic>=2.12.3
Requires-Dist: psycopg2-binary>=2.9.11 ; extra == 'postgres'
Requires-Dist: redis>=7.1.0 ; extra == 'redis'
Requires-Python: >=3.10, <4.0
Provides-Extra: postgres
Provides-Extra: redis
Description-Content-Type: text/markdown

# folio_data_import

## Description

This project is designed to import data into the FOLIO LSP. It provides a simple and efficient way to import data from various sources using FOLIO's REST APIs.

## Features

- Import MARC records using FOLIO's Data Import system
- Import User records using FOLIO's User APIs
- Batch post Instances, Holdings, and Items to FOLIO inventory storage

## Installation

## Installation

Using `pip`
```shell
pip install folio_data_import
```
or `uv pip`
```shell
uv pip install folio_data_import
```

To install the project from the git repo using Poetry, follow these steps:

1. Clone the repository.
2. Navigate to the project directory: `$ cd /path/to/folio_data_import`.
3. [Install `uv`](https://docs.astral.sh/uv/getting-started/installation/) if you haven't already.
4. Install the project and its dependencies: `$ uv sync`.
5. Run the application using Poetry: `$ uv run folio-data-import --help`.

Make sure to activate the virtual environment created by `uv` before running the application.

## Usage

This package provides CLI commands for importing data into FOLIO:

```shell
# Main command with subcommands
folio-data-import <subcommand> [options]

# Or use standalone commands
folio-user-import [options]
folio-marc-import [options]
folio-batch-poster [options]
```

**Tab Completion:** Install shell completions for better CLI experience:
```shell
folio-data-import --install-completion
```

### Environment Variables

All commands support environment variables for FOLIO connection credentials, allowing you to avoid repeating these parameters:

```shell
export FOLIO_GATEWAY_URL="https://folio-snapshot-okapi.dev.folio.org"
export FOLIO_TENANT_ID="diku"
export FOLIO_USERNAME="diku_admin"
export FOLIO_PASSWORD="admin"
```

Once set, you can omit these parameters from your commands:

```shell
# Instead of:
folio-data-import users --gateway-url "..." --tenant-id "..." --username "..." --password "..." --user-file users.jsonl

# You can simply use:
folio-data-import users --user-file users.jsonl
```

This works for all subcommands: `users`, `marc`, and `batch-poster`.

## CLI Commands

### folio-data-import users

**Alias:** `folio-user-import`

Import users to FOLIO with extended functionality beyond `mod-user-import`.

#### Quick Start

```shell
folio-data-import users \
  --gateway-url "https://folio-snapshot-okapi.dev.folio.org" \
  --tenant-id diku \
  --username diku_admin \
  --password admin \
  --user-file users.jsonl
```

#### Features

**Service Point Management:** Specify service points using codes instead of UUIDs:
```
{
    "username": "checkin-all",
    "barcode": "1728439497039848103",
    "active": true,
    "type": "patron",
    "patronGroup": "staff",
    "departments": [],
    "personal": {
        "lastName": "Admin",
        "firstName": "checkin-all",
        "addresses": [
          {
            "countryId": "HU",
            "addressLine1": "Andrássy Street 1.",
            "addressLine2": "",
            "city": "Budapest",
            "region": "Pest",
            "postalCode": "1061",
            "addressTypeId": "Home",
            "primaryAddress": true
          }
        ],
        "preferredContactTypeId": "email"
    },
    "requestPreference": {
        "holdShelf": true,
        "delivery": false,
        "fulfillment": "Hold Shelf"
    }
    "servicePointsUser": {
        "defaultServicePointId": "cd1",
        "servicePointsIds": [
            "cd1",
            "Online",
            "000",
            "cd2"
        ]
    }
}
```

**Flexible Matching:** Match users by `id`, `externalSystemId`, `username`, or `barcode`:
```shell
folio-data-import users --user-file users.jsonl --user-match-key username
```

**Preferred Contact Type:** Accepts FOLIO IDs or human-friendly strings (`mail`, `email`, `text`, `phone`, `mobile`). Set a default for users without a valid value:
```shell
folio-data-import users --user-file users.jsonl --default-preferred-contact-type email
```

**Update Behavior:** Control how existing records are updated:

> **⚠️ Breaking change in v0.6.0:** Prior versions incorrectly merged incoming records into existing records, preserving fields absent from the incoming data. Starting in v0.6.0, the default behavior is full replacement — fields not present in the incoming record are removed. If you depend on the previous merge behavior, use `--update-only-present-fields`.

- **Full replacement** (default): The incoming record completely replaces the existing record. Fields present in the existing record but absent from the incoming record are removed. The preferred contact type is preserved from the existing record when not specified in the incoming record.
  ```shell
  folio-data-import users --user-file users.jsonl
  ```

- **Partial update**: Only fields present in the incoming record are updated; missing fields are preserved from the existing record.
  ```shell
  folio-data-import users --user-file users.jsonl --update-only-present-fields
  ```

**Field Protection:** Protect specific fields from being updated:

- **Job-level protection** (applies to all records):
  ```shell
  folio-data-import users --user-file users.jsonl \
    --fields-to-protect "personal.preferredFirstName,barcode"
  ```

- **Per-record protection** (using custom field `protectedFields`):
  ```json
  {
    "username": "jdoe",
    "customFields": {
      "protectedFields": "barcode,personal.telephone,personal.addresses"
    }
  }
  ```

**User Deletion:** Delete users listed in the input file(s) instead of creating or updating them:
```shell
folio-data-import users --user-file users.jsonl --delete-all
```

- Only users that exist in FOLIO and match via the configured match key are deleted
- Staff and system users are automatically skipped to prevent accidental removal
- Associated request preferences, permission user records, and service point assignments are also removed
- An interactive confirmation prompt is displayed before deletions begin; use `--yes`/`-y` to skip it (required in non-interactive environments)

#### Input Format

JSON Lines format - one user object in the style<sup>*</sup> of [mod-user-import](https://github.com/folio-org/mod-user-import) (with extended support mentioned above) per line

<sup>*</sup>also supports dereferenced (UUIDs instead of reference strings) user objects (eg. directly extracted from `/users`)

### folio-data-import marc

**Alias:** `folio-marc-import`

Import binary MARC21 records via FOLIO's Data Import system using the [change-manager](https://github.com/folio-org/mod-source-record-manager?tab=readme-ov-file#data-import-workflow) APIs.

#### Quick Start

```shell
folio-data-import marc \
  --gateway-url "https://folio-snapshot-okapi.dev.folio.org" \
  --tenant-id diku \
  --username diku_admin \
  --password admin \
  --marc-source-path records.mrc
```

The command will prompt you to select a [Data Import Job Profile](https://docs.folio.org/docs/metadata/additional-topics/jobprofiles/) configured in your FOLIO tenant.

#### Features

- Process single files or entire directories of MARC files
- Interactive job profile selection
- Real-time progress tracking
- Automatic retry on transient errors

**Note:** FOLIO's import logs can be unreliable. If you don't see a job summary when your job completes, check Data Import in FOLIO (Data Import > Actions > View all logs...).

### folio-data-import batch-poster

**Alias:** `folio-batch-poster`

Efficiently batch post Instances, Holdings, and Items to FOLIO's inventory storage endpoints with support for creating new records and updating existing ones.

#### Quick Start

```shell
folio-data-import batch-poster \
  --gateway-url "https://folio-snapshot-okapi.dev.folio.org" \
  --tenant-id diku \
  --username diku_admin \
  --password admin \
  --object-type Items \
  --file-paths items.jsonl \
  --batch-size 100 \
  --upsert
```

#### Key Features

- **Multiple File Support**: Process multiple files with glob patterns
  ```shell
  folio-data-import batch-poster --object-type Items --file-paths "items_*.jsonl"
  ```

- **Upsert Mode**: Create new records or update existing ones
  ```shell
  folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert
  ```

- **Field Preservation**: Control which fields are preserved during updates
  ```shell
  folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert \
    --preserve-statistical-codes \
    --preserve-administrative-notes \
    --preserve-temporary-locations \
    --overwrite-item-status
  ```

- **Selective Patching**: Update only specific fields
  ```shell
  folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert \
    --patch-existing-records \
    --patch-paths "barcode,status,itemLevelCallNumber"
  ```

- **Failed Records**: Automatically save failed records to a file
  ```shell
  folio-data-import batch-poster --object-type Items --file-paths items.jsonl \
    --failed-records-file failed_items.jsonl
  ```

- **Progress Tracking**: Real-time progress bar with statistics
  - Disable with `--no-progress` for CI/CD environments

- **Config File Support**: Use a JSON config file for complex configurations
  ```shell
  folio-data-import batch-poster config.json
  ```
  
  Example `config.json`:
  ```json
  {
    "object_type": "Items",
    "file_paths": ["items1.jsonl", "items2.jsonl"],
    "batch_size": 100,
    "upsert": true,
    "preserve_statistical_codes": true,
    "preserve_item_status": true,
    "failed_records_file": "failed_items.jsonl"
  }
  ```

#### Input Format

Input files should be JSONL (JSON Lines) format - one complete JSON object per line:

```jsonl
{"id": "item-001", "barcode": "12345", "status": {"name": "Available"}}
{"id": "item-002", "barcode": "12346", "status": {"name": "Available"}}
{"id": "item-003", "barcode": "12347", "status": {"name": "Checked out"}}
```

#### Common Use Cases

**Create new items:**
```shell
folio-data-import batch-poster --object-type Items --file-paths new_items.jsonl
```

**Update existing items (by ID):**
```shell
folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert
```

**Update only barcodes and call numbers:**
```shell
folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert \
  --patch-existing-records \
  --patch-paths "barcode,itemLevelCallNumber"
```

**Process multiple files:**
```shell
folio-data-import batch-poster --object-type Holdings \
  --file-paths holdings_*.jsonl \
  --batch-size 500 \
  --upsert
```

#### Available Options

Run `folio-data-import batch-poster --help` to see all available options:
- `--object-type`: Type of inventory object (Items, Holdings, or Instances) - **Required**
- `--file-paths`: Path(s) to JSONL file(s) - supports glob patterns - **Required**
- `--batch-size`: Number of records per batch (1-1000, default: 100)
- `--upsert`: Enable create-or-update mode
- `--preserve-statistical-codes`: Keep existing statistical codes during updates
- `--preserve-administrative-notes`: Keep existing administrative notes
- `--preserve-temporary-locations`: Keep temporary location (Items only)
- `--preserve-temporary-loan-types`: Keep temporary loan type (Items only)
- `--preserve-item-status`: Keep item status (Items only, default: true)
- `--patch-existing-records`: Enable selective field patching
- `--patch-paths`: Comma-separated list of fields to patch
- `--failed-records-file`: Path to save failed records
- `--no-progress`: Disable progress bar (useful for CI/CD)

## Programmatic Usage

All CLI commands can also be used programmatically in your Python applications.

### BatchPoster

```python
import asyncio
from folioclient import FolioClient
from folio_data_import.BatchPoster import BatchPoster

async def post_items():
    # Create FOLIO client
    folio = FolioClient(
        okapi_url="https://folio-snapshot-okapi.dev.folio.org",
        tenant_id="diku",
        username="diku_admin",
        password="admin"
    )
    
    # Configure batch poster
    config = BatchPoster.Config(
        object_type="Items",
        batch_size=100,
        upsert=True,
        preserve_statistical_codes=True
    )
    
    # Post records
    async with BatchPoster(
        folio, 
        config,
        failed_records_file="failed_items.jsonl"
    ) as poster:
        await poster.post_records("items.jsonl")
        print(f"Posted: {poster.stats.records_posted}, Failed: {poster.stats.records_failed}")

asyncio.run(post_items())
```

### UserImporter

```python
import asyncio
from folioclient import FolioClient
from folio_data_import.UserImport import UserImporter

async def import_users():
    folio = FolioClient(
        okapi_url="https://folio-snapshot-okapi.dev.folio.org",
        tenant_id="diku",
        username="diku_admin",
        password="admin"
    )
    
    config = UserImporter.Config(
        user_file="users.jsonl",
        user_match_key="username",
        default_preferred_contact_type="email"
    )
    
    importer = UserImporter(folio, config)
    result = await importer.do_work()
    print(f"Imported {result.records_created} users")

asyncio.run(import_users())
```

### MARCImportJob

```python
import asyncio
from folioclient import FolioClient
from folio_data_import.MARCDataImport import MARCImportJob

async def import_marc():
    folio = FolioClient(
        okapi_url="https://folio-snapshot-okapi.dev.folio.org",
        tenant_id="diku",
        username="diku_admin",
        password="admin"
    )
    
    config = MARCImportJob.Config(
        marc_source_path="records.mrc",
        job_profile_id="profile-uuid",
        job_profile_name="Bibliographic records"
    )
    
    job = MARCImportJob(folio, config)
    result = await job.start_import()
    print(f"Imported {result.created_records} MARC records")

asyncio.run(import_marc())
```

### Additional Documentation

For complete API documentation and advanced usage, [visit our full documentation](https://folio-data-import.readthedocs.io/en/latest/).

## Contributing

Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request.

## License

This project is licensed under the [MIT License](LICENSE).
