Metadata-Version: 2.4
Name: bh-audit-logger
Version: 0.3.0
Summary: Cloud-agnostic Python audit logger for emitting PHI-safe behavioral healthcare audit events conforming to bh-audit-schema v1.1
Project-URL: Homepage, https://github.com/bh-healthcare/bh-audit-logger
Project-URL: Documentation, https://github.com/bh-healthcare/bh-audit-logger#readme
Project-URL: Repository, https://github.com/bh-healthcare/bh-audit-logger
Project-URL: Issues, https://github.com/bh-healthcare/bh-audit-logger/issues
Project-URL: Changelog, https://github.com/bh-healthcare/bh-audit-logger/blob/main/CHANGELOG.md
Author-email: BH Healthcare <oss@bh-healthcare.github.io>
License: Apache-2.0
License-File: LICENSE
Keywords: audit,behavioral-health,healthcare,hipaa,logging,phi
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Healthcare Industry
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Logging
Classifier: Typing :: Typed
Requires-Python: >=3.11
Provides-Extra: dev
Requires-Dist: mypy>=1.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: jsonschema
Requires-Dist: jsonschema>=4.0.0; extra == 'jsonschema'
Description-Content-Type: text/markdown

# bh-audit-logger

Cloud-agnostic Python utilities for emitting **privacy-preserving audit events** for behavioral healthcare systems.

Events conform to **bh-audit-schema v1.1**:
https://github.com/bh-healthcare/bh-audit-schema

## Why

Audit logging in healthcare is often inconsistent across services and jobs.
This library provides a small, boring, correct baseline for emitting structured audit events from **any Python code** — Lambdas, workers, CLIs, ETL jobs, cron scripts — without logging raw PHI.

It is **not tied to FastAPI** (see [bh-fastapi-audit](https://github.com/bh-healthcare/bh-fastapi-audit) for middleware-based logging).

## Quickstart

```bash
pip install bh-audit-logger
```

```python
from bh_audit_logger import AuditLogger, AuditLoggerConfig

logger = AuditLogger(
    config=AuditLoggerConfig(
        service_name="overstory-datalake",
        service_environment="prod",
    )
)

logger.audit(
    "READ",
    actor={"subject_id": "service_lambda", "subject_type": "service"},
    resource={"type": "Patient", "id": "patient_123"},
    outcome={"status": "SUCCESS"},
    correlation={"request_id": "req_abc"},
)
```

By default, events are emitted as **one compact JSON line** via Python logging (stdout-friendly).

### Example output

```json
{"schema_version":"1.1","event_id":"6d3f0f6b-0c1a-4b9f-9d6f-9f6f7f5b2b0a","timestamp":"2026-03-28T12:00:00.000Z","service":{"name":"overstory-datalake","environment":"prod"},"actor":{"subject_id":"service_lambda","subject_type":"service"},"action":{"type":"READ","data_classification":"UNKNOWN"},"resource":{"type":"Patient","id":"patient_123"},"outcome":{"status":"SUCCESS"},"correlation":{"request_id":"req_abc"}}
```

## Production usage: container logging

```python
from bh_audit_logger import AuditLogger, AuditLoggerConfig, LoggingSink

logger = AuditLogger(
    config=AuditLoggerConfig(
        service_name="my-service",
        service_environment="prod",
    ),
    sink=LoggingSink(logger_name="bh.audit", level="INFO"),
)
```

Works anywhere stdout is collected: **CloudWatch**, **GCP Cloud Logging**, **Azure Monitor**, **Kubernetes logging pipelines**.

## Production hardening

### Frozen config

`AuditLoggerConfig` is frozen after creation (`@dataclass(frozen=True)`) to prevent runtime mutation of security settings:

```python
config = AuditLoggerConfig(
    service_name="my-service",
    metadata_allowlist=frozenset({"batch_id", "region"}),
)
config.sanitize_errors = False  # raises AttributeError
```

### Sink failure isolation

By default, sink failures are logged but never propagate to your application logic:

```python
config = AuditLoggerConfig(
    service_name="my-service",
    emit_failure_mode="log",       # "silent", "log" (default), or "raise"
    failure_logger_name="bh.audit.internal",
)
```

### Metadata restrictions

Metadata values are enforced to be scalar JSON types (`str`, `int`, `float`, `bool`, `None`). Dict, list, and tuple values are silently dropped. Long strings are truncated:

```python
config = AuditLoggerConfig(
    service_name="my-service",
    metadata_allowlist=frozenset({"batch_id", "region"}),
    max_metadata_value_length=200,
)
```

### Internal counters

Track emission health via lightweight counters:

```python
logger = AuditLogger(config=config)
# ... emit events ...
print(logger.stats.snapshot())
# {"events_emitted_total": 42, "emit_failures_total": 0, "events_dropped_total": 0, "validation_failures_total": 0}
```

### Non-blocking async emission (optional)

v0.3 adds `EmitQueue` for async emission from async contexts:

```python
from bh_audit_logger import EmitQueue

queue = EmitQueue(sink, stats, maxsize=5000)
queue.start()
queue.enqueue(event)
# ... later ...
await queue.shutdown()
```

## Sinks

| Sink | Use case | Notes |
|---|---|---|
| `LoggingSink` *(default)* | Production | One compact JSON line per event via Python `logging`; stdout-friendly |
| `JsonlFileSink` | Local dev, demos | Appends to a `.jsonl` file; thread-safe, flush-on-write by default |
| `MemorySink` | Tests | Bounded optional (`maxlen`); use `len(sink)` and `sink.events` in assertions |

Pass any sink to `AuditLogger(config=..., sink=...)`. Omit `sink` to get `LoggingSink` by default.

## Configuration

`AuditLoggerConfig` fields (frozen after creation):

| Field | Type | Default | Description |
|---|---|---|---|
| `service_name` | `str` | *required* | Name of the service emitting events |
| `service_environment` | `str` | `"unknown"` | Deployment environment (prod, staging, dev) |
| `service_version` | `str \| None` | `None` | Service version/build identifier |
| `default_actor_id` | `str` | `"unknown"` | Default actor when none provided |
| `default_actor_type` | `Literal["human", "service"]` | `"service"` | Default actor type |
| `metadata_allowlist` | `frozenset[str]` | `frozenset()` | Allowed metadata keys (empty = no metadata) |
| `sanitize_errors` | `bool` | `True` | Sanitize error messages (redact SSN/email/phone) |
| `error_message_max_len` | `int` | `200` | Max length for sanitized error messages |
| `emit_failure_mode` | `Literal` | `"log"` | How to handle sink failures |
| `time_source` | `Callable` | `utcnow` | Injectable time source for testing |
| `id_factory` | `Callable` | `uuid4` | Injectable ID factory for testing |
| `schema_version` | `str` | `"1.1"` | Schema version for emitted events |

## Typed event blocks

v0.3 exports `TypedDict` definitions for all event sub-blocks:

```python
from bh_audit_logger import (
    AuditEvent, ServiceBlock, ActorBlock, ActionBlock,
    ResourceBlock, OutcomeBlock, CorrelationBlock,
    ActionType, ActorType, OutcomeStatus, DataClassification,
)
```

## PHI-safe by default

- **No request/response bodies** — the library never tries to capture payloads
- **Metadata is opt-in and strictly allowlisted** — only keys in `metadata_allowlist` pass through; values must be scalar JSON types
- **Error messages are sanitized** — SSN, email, phone patterns are redacted and messages are length-capped
- **PHI safety is enforced by tests** that assert synthetic PHI tokens never appear in emitted events

## Schema conformance

All events conform to [bh-audit-schema v1.1](https://github.com/bh-healthcare/bh-audit-schema). The v1.1 schema adds:
- `DENIED` outcome status (for authorization denials)
- Conditional FAILURE validation (requires `error_type` + `error_message`)
- `maxLength`/`minLength` bounds on all string fields
- Scalar-only metadata enforcement

## Optional schema validation

```bash
pip install bh-audit-logger[jsonschema]
```

```python
from bh_audit_logger import validate_event

event = {...}
validate_event(event)  # raises ValidationError on failure
```

Validates against the vendored bh-audit-schema v1.1 JSON schema included in the package.

## Related projects

- **bh-audit-schema**: [github.com/bh-healthcare/bh-audit-schema](https://github.com/bh-healthcare/bh-audit-schema) — the schema standard
- **bh-fastapi-audit**: [github.com/bh-healthcare/bh-fastapi-audit](https://github.com/bh-healthcare/bh-fastapi-audit) — FastAPI middleware for automatic audit logging

## License

Apache 2.0
