Metadata-Version: 2.4
Name: attribution-campaign-context
Version: 0.1.0
Summary: Portable UTM and attribution telemetry primitives for Tigrbl apps.
Author-email: Jacob Stewart <jacob@groupsum.xyz>
License: Apache-2.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: <3.14,>=3.10
Requires-Dist: pydantic>=2
Provides-Extra: tigrbl
Requires-Dist: tigrbl>=0.1.0; extra == 'tigrbl'
Description-Content-Type: text/markdown

# attribution-campaign-context

[![CI](https://github.com/tigrbl/attribution-campaign-context/actions/workflows/ci.yml/badge.svg)](https://github.com/tigrbl/attribution-campaign-context/actions/workflows/ci.yml)
![Python](https://img.shields.io/badge/python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)
![License](https://img.shields.io/badge/license-Apache--2.0-green)

`attribution-campaign-context` is a downstream Tigrbl package for portable marketing-attribution telemetry. It captures UTM parameters, click IDs, referrer data, landing URLs, visitor IDs, and session IDs without forcing every consuming app into one business schema.

The package is designed for apps that need:

- request-level attribution extraction
- append-only attribution-touch records
- polymorphic attribution association tables
- optional validation of downstream business subjects
- composable middleware and hook contracts for conversions and first/last-touch linking

## Installation

```powershell
uv add attribution-campaign-context
```

For local development:

```powershell
uv sync --all-groups
uv run pytest
```

## Package Layers

This package exposes two distinct layers.

### 1. Importable Python exports

These are concrete symbols you can import today:

- `UTM_KEYS`
- `CLICK_ID_KEYS`
- `AttributionContext`
- `extract_attribution`
- `AttributionTouch`
- `AttributionSubjectLink`
- `SubjectRef`
- `SubjectResolver`
- `SubjectValidationError`
- `PublicSurface`
- `PUBLIC_SURFACES`
- `AttributionMiddleware`
- `AttributionRuntimeState`
- `touch_from_context`
- `subject_ref`
- `validate_subject_ref`
- `subject_link`
- `attribution_pre_handler`
- `attribution_post_handler`
- `attribution_post_commit`

### 2. Public operator surfaces

These are the package's documented integration contracts, exposed through `PUBLIC_SURFACES` metadata:

- tables: `AttributionTouch`, `AttributionSubjectLink`
- planned tables: `AttributionVisitor`, `AttributionSession`
- hooks: `attribution_pre_handler`, `attribution_post_handler`, `attribution_post_commit`
- middleware: `AttributionMiddleware`
- helpers: `extract_attribution`, `SubjectResolver`, `touch_from_context`, `subject_ref`, `subject_link`

The middleware and hook names are implemented as framework-light helpers so a consuming Tigrbl app can wire them into its own persistence and request lifecycle.

## Exported Python API

### `UTM_KEYS`

Allowlisted UTM keys:

- `utm_source`
- `utm_medium`
- `utm_campaign`
- `utm_term`
- `utm_content`
- `utm_id`

### `CLICK_ID_KEYS`

Allowlisted click-ID keys:

- `gclid`
- `gbraid`
- `wbraid`
- `fbclid`
- `msclkid`
- `ttclid`
- `li_fat_id`

### `AttributionContext`

`extract_attribution(...)` returns an `AttributionContext` with:

- `utm: dict[str, str]`
- `click_ids: dict[str, str]`
- `raw_params: dict[str, str]`
- `referer: str | None`
- `landing_path: str | None`
- `landing_url: str | None`
- `visitor_id: str | None`
- `session_id: str | None`
- `has_signal: bool`

### `AttributionTouch`

`AttributionTouch` is the canonical append-only touch ledger model.

Primary fields:

- `id`
- `visitor_id`
- `session_id`
- `utm_source`
- `utm_medium`
- `utm_campaign`
- `utm_term`
- `utm_content`
- `utm_id`
- `click_ids`
- `landing_path`
- `landing_url`
- `referer`
- `user_agent_hash`
- `ip_hash`
- `raw_params`
- `created_at`
- `expires_at`

### `AttributionSubjectLink`

`AttributionSubjectLink` is the generic attribution association table model.

Primary fields:

- `id`
- `touch_id`
- `subject_resource`
- `subject_id`
- `relation`
- `subject_table`
- `subject_pk_type`
- `subject_tenant_id`
- `snapshot`
- `created_at`

Supported `relation` values:

- `first_touch`
- `last_touch`
- `conversion`
- `assist`

### `SubjectRef`

`SubjectRef` is a normalized description of a downstream business subject:

- `resource`
- `subject_id`
- `table`
- `pk_type`
- `tenant_id`

### `SubjectResolver`

`SubjectResolver` is the application-provided validation protocol:

- `canonical_resource(model_or_resource) -> str`
- `canonical_id(obj_or_payload) -> str`
- `exists(resource, subject_id, db) -> bool`

### Runtime helpers

- `touch_from_context` converts extracted request context into an `AttributionTouch`.
- `subject_ref` canonicalizes loose or resolver-backed subject references.
- `validate_subject_ref` asks an application resolver whether a subject exists.
- `subject_link` creates `AttributionSubjectLink` association rows.
- `attribution_pre_handler` snapshots request attribution before business handling.
- `attribution_post_handler` resolves downstream subject identifiers after business handling.
- `attribution_post_commit` creates conversion, first-touch, last-touch, or assist links after commit.
- `AttributionMiddleware` is a pure ASGI middleware for extraction, visitor/session cookie minting, request-state storage, and optional touch recording.

### `PUBLIC_SURFACES`

`PUBLIC_SURFACES` is the package's machine-readable inventory of public operator surfaces. It is useful for integration docs, app bootstrapping, SSOT alignment, and tooling that needs to know what the package claims as first-class or planned.

## UTM Keys

| Key | Meaning | Typical Values | Notes |
| --- | --- | --- | --- |
| `utm_source` | The source that sent the visitor. | `google`, `newsletter`, `linkedin` | Lowercased during extraction. |
| `utm_medium` | The acquisition medium or channel. | `cpc`, `email`, `social`, `referral` | Lowercased during extraction. |
| `utm_campaign` | The campaign name or grouping. | `spring_launch`, `retargeting_q2` | Preserved as provided after trim. |
| `utm_term` | Paid-search keyword or audience term. | `crm`, `founder tools` | Preserved as opaque business text. |
| `utm_content` | Creative, link, placement, or variant marker. | `hero_cta`, `sidebar_a`, `video_15s` | Useful for A/B attribution. |
| `utm_id` | Stable campaign identifier. | `cmp_2026_04_001` | Useful when campaign names change. |

## Click ID Keys

| Key | Meaning | Typical Source | Notes |
| --- | --- | --- | --- |
| `gclid` | Google Click ID. | Google Ads | Preserved as an opaque identifier. |
| `gbraid` | Google privacy-preserving click identifier. | Google Ads | Preserved as an opaque identifier. |
| `wbraid` | Google web-to-app / privacy-preserving click identifier. | Google Ads | Preserved as an opaque identifier. |
| `fbclid` | Meta click identifier. | Meta | Preserved as an opaque identifier. |
| `msclkid` | Microsoft Advertising click identifier. | Microsoft Advertising | Preserved as an opaque identifier. |
| `ttclid` | TikTok click identifier. | TikTok Ads | Preserved as an opaque identifier. |
| `li_fat_id` | LinkedIn ad tracking identifier. | LinkedIn Ads | Preserved as an opaque identifier. |

## `extract_attribution`

### Signature

```python
extract_attribution(
    request,
    *,
    max_value_length: int = 256,
    extra_keys: set[str] | frozenset[str] = frozenset(),
    visitor_cookie: str = "tigrbl_vid",
    session_cookie: str = "sid",
) -> AttributionContext
```

### What it reads

- request query parameters
- request headers
- request cookies
- request path
- request URL

### What it does

- allowlists only known UTM keys, click IDs, and optional `extra_keys`
- trims values
- drops empty values
- truncates values to `max_value_length`
- lowercases `utm_source` and `utm_medium`
- returns normalized visitor and session cookie values

### Extraction example

```python
from attribution_campaign_context import extract_attribution

context = extract_attribution(request)

if context.has_signal:
    print(context.utm)
    print(context.click_ids)
    print(context.referer)
```

### Extraction with app-specific keys

```python
from attribution_campaign_context import extract_attribution

context = extract_attribution(
    request,
    extra_keys={"affiliate_id", "creative_id"},
    visitor_cookie="visitor_id",
    session_cookie="session_id",
)
```

## `AttributionTouch`

`AttributionTouch` is the package-owned record of what attribution signal was present on a request. Treat it as append-only telemetry rather than mutable business state.

Typical uses:

- preserve first observed marketing context
- preserve last observed marketing context
- support funnel, conversion, and assist analysis
- support downstream event uploads or warehouse joins

### Touch creation example

```python
from attribution_campaign_context import AttributionTouch, extract_attribution

context = extract_attribution(request)

touch = AttributionTouch(
    visitor_id=context.visitor_id,
    session_id=context.session_id,
    utm_source=context.utm.get("utm_source"),
    utm_medium=context.utm.get("utm_medium"),
    utm_campaign=context.utm.get("utm_campaign"),
    utm_term=context.utm.get("utm_term"),
    utm_content=context.utm.get("utm_content"),
    utm_id=context.utm.get("utm_id"),
    click_ids=context.click_ids,
    landing_path=context.landing_path,
    landing_url=context.landing_url,
    referer=context.referer,
    raw_params=context.raw_params,
)
```

## `AttributionSubjectLink`

`AttributionSubjectLink` is the package's attribution association table. It links one package-owned touch record to one downstream business subject without forcing a hard foreign key to an arbitrary application table.

That is the key portability boundary in this package.

### Why use an association table

Use `AttributionSubjectLink` when:

- one package must work across many different consuming apps
- the converted subject might be a `lead`, `user`, `organization`, `opportunity`, `quote`, or `order`
- multiple touch relationships may exist for the same subject
- you need first-touch, last-touch, conversion, and assist rows without mutating the original touch

### Why not use direct foreign keys

If this package hard-coded a foreign key to one app table, it would stop being portable. The association table keeps the subject side polymorphic:

```text
touch_id -> package-owned touch row
subject_resource -> app-defined logical resource name
subject_id -> app-defined opaque primary key
```

### Association example

```python
from attribution_campaign_context import AttributionSubjectLink

link = AttributionSubjectLink(
    touch_id="touch_123",
    subject_resource="lead",
    subject_id="lead_456",
    relation="conversion",
    subject_table="crm_leads",
    subject_pk_type="uuid",
    subject_tenant_id="tenant_123",
    snapshot={"status": "qualified"},
)
```

### Typical association patterns

- one lead with one `first_touch` row and one `last_touch` row
- one order with one `conversion` row
- one opportunity with multiple `assist` rows
- one user linked across multiple sessions to many touches

## `SubjectResolver` and validation modes

`SubjectResolver` lets an app decide how strictly it wants to validate downstream subject references.

### Loose mode

Loose mode records `subject_resource` and `subject_id` without existence checks.

Use loose mode when:

- the business record is created asynchronously
- the subject may exist in another service
- the app wants maximal portability with minimal coupling

Example:

```python
subject_resource = "lead"
subject_id = "lead_456"
```

### Validated mode

Validated mode uses a `SubjectResolver` to canonicalize resource names, normalize ids, and confirm that the subject exists before writing the association row.

Use validated mode when:

- the business object is local to the app
- you want to avoid orphaned attribution links
- multiple models alias the same logical resource

Example resolver:

```python
from attribution_campaign_context import SubjectResolver


class AppSubjectResolver(SubjectResolver):
    def canonical_resource(self, model_or_resource):
        return str(model_or_resource).lower()

    def canonical_id(self, obj_or_payload):
        return str(getattr(obj_or_payload, "id", obj_or_payload))

    async def exists(self, resource, subject_id, db):
        return await db.subject_exists(resource, subject_id)
```

### Strict app mode

Strict app mode keeps the package's portable association row, but the consuming app may additionally:

- validate through a `SubjectResolver`
- write app-local foreign keys
- enforce tenant isolation rules
- restrict which `subject_resource` values are allowed
- attach app-owned denormalized fields for reporting

Use strict app mode when:

- the app has a stable local business schema
- the app wants stronger invariants than the package itself can enforce
- compliance or governance requires hard business constraints

In strict app mode, `attribution-campaign-context` remains the shared portable layer, and app-specific tables or joins add stronger local guarantees on top.

## Middleware, helper, and hooks

The package documents middleware and hook surfaces as first-class operator contracts.

### `AttributionMiddleware`

`AttributionMiddleware` is the intended request-entry integration point.

Responsibilities:

- call `extract_attribution`
- read or mint visitor/session cookies
- decide whether the request has meaningful attribution signal
- persist an `AttributionTouch`
- attach touch/context state for later hooks
- emit `Set-Cookie` only when state changes

Middleware-level adoption is the first full-runtime integration tier.

### `attribution_pre_handler`

Use this before business create/update logic when you want to snapshot current attribution state into request-local context or into a local payload before persistence.

Typical uses:

- enrich a create payload with current touch id
- cache the extracted attribution context for downstream logic
- set request-scoped first-touch or last-touch candidates

### `attribution_post_handler`

Use this after business handling when the downstream subject id is only known after creation.

Typical uses:

- database-generated primary keys
- handler-generated lead ids
- post-validation resource canonicalization

### `attribution_post_commit`

This is the default conversion-link hook. Write `AttributionSubjectLink` rows here, after the business transaction succeeds.

Typical uses:

- create a `conversion` row for a lead, order, or signup
- update or insert `first_touch` and `last_touch` associations
- attach `assist` rows when multi-touch attribution is desired

This hook exists so attribution linkage does not get written for business operations that later roll back.

## Composable flow

The intended package flow is:

1. request enters middleware
2. middleware calls `extract_attribution`
3. middleware persists an `AttributionTouch`
4. middleware stores touch/context on request state
5. business handler runs
6. `attribution_pre_handler` can copy or snapshot current attribution state
7. business object is created or updated
8. `attribution_post_handler` resolves the final downstream subject id
9. transaction commits
10. `attribution_post_commit` writes `AttributionSubjectLink` association rows

That gives the app a clean split between touch capture and business-subject association.

## Levels of adoption

### Level 1: extraction only

Use only the request helper.

```python
from attribution_campaign_context import extract_attribution

context = extract_attribution(request)
```

Use this when you only need attribution in handler logic or logging.

### Level 2: touch ledger only

Capture extraction results into `AttributionTouch`.

```python
from attribution_campaign_context import AttributionTouch, extract_attribution

context = extract_attribution(request)
touch = AttributionTouch(
    visitor_id=context.visitor_id,
    session_id=context.session_id,
    click_ids=context.click_ids,
    raw_params=context.raw_params,
)
```

Use this when you want an append-only attribution ledger but are not yet linking touches to business entities.

### Level 3: touch plus association table

Capture the touch, then link it to a downstream entity through `AttributionSubjectLink`.

```python
from attribution_campaign_context import AttributionSubjectLink

link = AttributionSubjectLink(
    touch_id="touch_123",
    subject_resource="signup",
    subject_id="signup_456",
    relation="conversion",
)
```

Use this when you want attribution attached to signups, leads, quotes, opportunities, or orders.

### Level 4: validated mode

Use a `SubjectResolver` before writing association rows.

```python
resource = resolver.canonical_resource("Lead")
subject_id = resolver.canonical_id(created_lead)

if await resolver.exists(resource, subject_id, db):
    ...
```

Use this when you want to keep portability but block invalid downstream references.

### Level 5: strict app mode

Keep the portable package rows, and add app-local constraints on top.

Examples:

- local FK from an app-specific reporting table to the app's `lead` table
- local whitelist for allowed `subject_resource` values
- tenant-aware existence checks
- app-specific denormalized reporting columns

Use this when one app wants stronger invariants than the portable package should require globally.

## Conversion hook examples

### Signup conversion

```python
AttributionSubjectLink(
    touch_id=current_touch_id,
    subject_resource="signup",
    subject_id=signup.id,
    relation="conversion",
)
```

### Lead first-touch and last-touch

```python
AttributionSubjectLink(
    touch_id=first_touch_id,
    subject_resource="lead",
    subject_id=lead.id,
    relation="first_touch",
)

AttributionSubjectLink(
    touch_id=last_touch_id,
    subject_resource="lead",
    subject_id=lead.id,
    relation="last_touch",
)
```

### Opportunity assist touch

```python
AttributionSubjectLink(
    touch_id=assist_touch_id,
    subject_resource="opportunity",
    subject_id=opportunity.id,
    relation="assist",
)
```

## Documentation

- [Attribution keys](docs/attribution-keys.md)
- [Public operator surfaces](docs/public-operator-surfaces.md)
- [SSOT governance packs](docs/ssot-governance-packs.md)

## Non-goals

- attribution data must not drive auth, tenancy, billing, entitlements, or authorization
- the package must not require foreign keys to arbitrary downstream business tables
- the package must stay portable across multiple consuming apps
