Metadata-Version: 2.4
Name: deped-entity
Version: 0.2.1
Summary: Builds a canonical stakeholder entity SQLite database from the Sheet1 workbook.
Requires-Python: >=3.14
Requires-Dist: deped-dcp-template>=0.4.2
Requires-Dist: email-validator>=2.3
Requires-Dist: pydantic>=2.13
Description-Content-Type: text/markdown

# deped-entity

`deped-entity` builds `artifacts/entities.db`, the canonical SQLite artifact for
DepEd stakeholder entities. It reads the stakeholder `Sheet1` workbook, derives
shared `natural_key` values for schools and offices, and stores normalized
location, access, community, personnel, contact, transportation, ITO, coordinate
audit, and build provenance tables.

Downstream packages should use this artifact for entity identity instead of
rebuilding stakeholder tables independently.

## Inputs

Required:

- `data/2026-04-29-entities.xlsx`, or another stakeholder workbook with the
  expected `Sheet1` headers

Optional:

- `data/2026-04-29-ito-list.xlsx` for Regional and Division ICT Officer tables
- `data/2025-12-20-geo-k12-deped.csv` for school coordinate baseline audits

The `justfile` uses these default paths. Override them with the CLI when using a
different local file.

## Build

Download shared source files:

```sh
just download
```

Build the standard artifact:

```sh
just build
```

Audit the result:

```sh
just audit
```

The standard build writes `artifacts/entities.db`, loads the stakeholder
workbook, imports ITO data, loads the coordinate baseline, creates indexes and
views, and records a `build_runs` provenance row.

## CLI

Build with explicit paths:

```sh
uv run entity build \
  --input data/2026-04-29-entities.xlsx \
  --ito-input data/2026-04-29-ito-list.xlsx \
  --geo-input data/2025-12-20-geo-k12-deped.csv \
  --db artifacts/entities.db
```

Refresh only ITO tables in an existing database:

```sh
uv run entity ito \
  --input data/2026-04-29-ito-list.xlsx \
  --db artifacts/entities.db
```

Inspect artifact counts and audit metrics:

```sh
uv run entity audit --db artifacts/entities.db
```

## Consumer Notes

- `entities.source_row_id` is unique row provenance from the workbook `ID`.
- `entities.natural_key` is the shared cross-artifact identity key.
- Multiple workbook rows can share one `natural_key`; use
  `v_recent_entries` when a consumer needs the latest row per entity.
- `entities.longitude` and `entities.latitude` remain workbook-submitted
  coordinates. The optional geo CSV is loaded into `school_coordinates_base` and
  compared through `v_school_coordinate_deviations`.
- ITO rows are kept even when best-effort matching cannot resolve an
  `entity_id`.

## Docs

- [Operator Guide](docs/operations.md)
- [Artifact Contract](docs/reference/artifact-contract.md)
- [Normalization Reference](docs/reference/normalization.md)
- [Development](docs/development.md)
