Metadata-Version: 2.4
Name: pipewell-jira-ingest
Version: 1.0.1
Summary: Async Jira data pipeline supporting Cloud and Data Center, with pluggable multi-protocol output
Project-URL: Homepage, https://github.com/pipewell/jira-ingest
Project-URL: Repository, https://github.com/pipewell/jira-ingest
Project-URL: Documentation, https://github.com/pipewell/jira-ingest#readme
Project-URL: Changelog, https://github.com/pipewell/jira-ingest/releases
Project-URL: Bug Tracker, https://github.com/pipewell/jira-ingest/issues
Author-email: Olumide Ibilaiye <donolu@gmail.com>
License: MIT
License-File: LICENSE
Keywords: asyncio,data-engineering,etl,jira,sqlalchemy
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: adlfs>=2024.2
Requires-Dist: aiocache>=0.12
Requires-Dist: aiofiles>=23.0
Requires-Dist: aiohttp>=3.9
Requires-Dist: click>=8.1
Requires-Dist: fsspec>=2024.3
Requires-Dist: gcsfs>=2024.3
Requires-Dist: pandas>=2.2
Requires-Dist: pyarrow>=15.0
Requires-Dist: pydantic-settings>=2.2
Requires-Dist: pydantic>=2.6
Requires-Dist: python-dateutil>=2.9
Requires-Dist: s3fs>=2024.3
Requires-Dist: tenacity>=8.2
Provides-Extra: database
Requires-Dist: sqlalchemy>=2.0; extra == 'database'
Provides-Extra: dev
Requires-Dist: mypy>=1.9; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.1; extra == 'dev'
Requires-Dist: python-dotenv>=1.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: sqlalchemy>=2.0; extra == 'dev'
Provides-Extra: redshift
Requires-Dist: redshift-connector>=2.1; extra == 'redshift'
Requires-Dist: sqlalchemy>=2.0; extra == 'redshift'
Description-Content-Type: text/markdown

# jira-ingest

[![PyPI version](https://img.shields.io/pypi/v/pipewell-jira-ingest.svg)](https://pypi.org/project/pipewell-jira-ingest/)
[![Python versions](https://img.shields.io/pypi/pyversions/pipewell-jira-ingest.svg)](https://pypi.org/project/pipewell-jira-ingest/)
[![Licence: MIT](https://img.shields.io/badge/Licence-MIT-blue.svg)](LICENSE)

Async Jira data pipeline for Data Center and Cloud. Fetches projects, releases, boards, issues, and transitions; writes Parquet, CSV, or JSON Lines to local disk, S3, Azure Blob, or GCS; and optionally loads records into any SQLAlchemy-compatible database.

## Features

- Dual-mode: Jira Data Center (Bearer PAT + optional mTLS) and Jira Cloud (Basic Auth)
- Async fetching with concurrent project processing, in-memory caching, and exponential-backoff retry
- Configurable custom fields extraction -- map any `customfield_XXXXX` to a logical name
- PII hashing for assignee and author fields
- Output formats: Parquet (Snappy), CSV, JSON Lines
- Output destinations: local filesystem, S3, Azure Blob, GCS via [fsspec](https://filesystem-spec.readthedocs.io)
- Pluggable database loader: PostgreSQL, Redshift (with S3 COPY fast path), Snowflake, DuckDB, SQLite
- Click CLI with `run` and `validate` commands
- Pydantic v2 settings and data schemas
- ruff + mypy strict + pre-commit + GitHub Actions CI

## Quick start

```bash
pip install pipewell-jira-ingest
cp .env.example .env   # edit with your Jira URL and credentials
jira-ingest validate   # confirm connectivity
jira-ingest run        # fetch everything and write to ./output
```

For database loading, install the optional extra:

```bash
pip install "pipewell-jira-ingest[database]"   # PostgreSQL, SQLite, etc.
pip install "pipewell-jira-ingest[redshift]"   # Redshift with S3 COPY fast path
```

## Documentation

| Guide | Description |
|---|---|
| [Authentication](https://github.com/pipewell/jira-ingest/blob/main/docs/authentication.md) | Jira Cloud vs Data Center, PAT vs Basic Auth, mTLS certificates, scoping by project |
| [Output sinks](https://github.com/pipewell/jira-ingest/blob/main/docs/sinks.md) | Local filesystem, S3, Azure Blob, GCS -- URIs, auth options, output layout |
| [Database loading](https://github.com/pipewell/jira-ingest/blob/main/docs/database-loading.md) | PostgreSQL, Redshift S3 COPY, Snowflake, DuckDB, SQLite; programmatic API |
| [Custom fields](https://github.com/pipewell/jira-ingest/blob/main/docs/custom-fields.md) | Mapping `customfield_XXXXX` IDs to logical names, finding field IDs |

## Configuration reference

All settings are read from environment variables (or a `.env` file) with the prefix `JIRA_`.

| Variable | Default | Description |
|---|---|---|
| `JIRA_MODE` | `cloud` | `cloud` or `dc` |
| `JIRA_URL` | required | Jira base URL |
| `JIRA_API_TOKEN` | required | API token (Cloud) or PAT (DC) |
| `JIRA_EMAIL` | required for Cloud | Account email |
| `JIRA_CERT_PEM` | | Base64-encoded PEM for mTLS (DC only) |
| `JIRA_PROJECT_KEYS` | all projects | Comma-separated project keys to scope the run |
| `JIRA_OUTPUT_FORMAT` | `parquet` | `parquet`, `csv`, or `jsonl` |
| `JIRA_SINK_URI` | `./output` | fsspec URI for output destination |
| `JIRA_SINK_OPTIONS` | `{}` | JSON dict of auth options forwarded to fsspec |
| `JIRA_CUSTOM_FIELDS` | `{}` | JSON dict mapping logical name to Jira field ID |
| `JIRA_LOG_LEVEL` | `INFO` | Log verbosity |
| `DATABASE_URL` | | SQLAlchemy URL to load into a database after writing |
| `DATABASE_SCHEMA` | | Target schema (used with `DATABASE_URL`) |
| `REDSHIFT_IAM_ROLE` | | IAM role ARN for Redshift S3 COPY |

## CLI

```
jira-ingest run [OPTIONS]

  --env-file TEXT            Path to .env file  [default: .env]
  --start-date TEXT          Filter issues from date (YYYY-MM-DD)
  --end-date TEXT            Filter issues until date (YYYY-MM-DD)
  --date-suffix TEXT         Output file date suffix  [default: today]
  --database-url TEXT        SQLAlchemy URL to load into a database
  --db-schema TEXT           Target database schema
  --redshift-iam-role TEXT   IAM role ARN for Redshift S3 COPY

jira-ingest validate [OPTIONS]

  --env-file TEXT            Path to .env file  [default: .env]
```

## Output layout

```
{JIRA_SINK_URI}/
  issues/issues_{date}.parquet
  projects/projects_{date}.parquet
  releases/releases_{date}.parquet
  boards/boards_{date}.parquet
  transitions/transitions_{date}.parquet
```

## Development

```bash
pip install -e ".[dev]"
pre-commit install
pytest
```
