A medallion-zone, federation-first design for the Statistical Institute of Jamaica — replacing manual ingestion and ad-hoc spreadsheets with auditable, on-prem data flows.







Document map
Sections are numbered for cross-reference. Indented entries are sub-sections. Page numbers match the printed pagination, not the PDF viewer.

Section 3.1
Every external source — SFTP drops, mail attachments, vendor REST endpoints, JDBC pulls from MariaDB — becomes a single validated, audit-trailed pipeline. The admin's morning routine of downloading, renaming, and folder-moving disappears entirely.
The first NiFi release covers five sources that together account for ~83% of the manual hours logged in the 2025 STATIN time-tracking audit. Each becomes a separate ProcessGroup with its own schema validator, retry policy, and provenance hash.
Table 3.1 — Day-one ingestion sources
| Source | Owner | Method | Rec/day | Latency |
|---|---|---|---|---|
| CPI submissions | STATIN — Prices | NiFi SFTP listener | 14,820 | 5 min |
| Customs declarations | Jamaica Customs | NiFi REST poll | 9,610 | 15 min |
| Trade statistics | Min. of Industry | JDBC pull · MariaDB | 22,180 | 10 min |
| Population estimates | STATIN — Demography | Manual upload | 240 | 24 hr |
| Daily total | 50,250 | — | ||
Replaces
Manual SFTP polling · CSV rename scripts · Excel VLOOKUP across three workbooks · USB-drive backups · the email thread titled "FINAL_v7_use this one.xlsx".
Every ProcessGroup is versioned in the NiFi Registry, peer-reviewed in a pull request, and rolled out by Airflow. The full flow descriptor below registers the CPI listener.
Listing 3.1 — cpi-sftp-listener.flow.json
{
"name": "cpi-sftp-listener",
"zone": "raw",
"schedule": "*/5 * * * *",
"source": { "type": "sftp", "path": "/in/cpi/*.csv" },
"validate": { "schema": "cpi.v3", "on_fail": "quarantine" },
"sink": "s3://lake/raw/cpi/{yyyy}/{mm}/"
}
Listing 3.2 — registering & deploying the flow
$ d4n flow register --file cpi-sftp-listener.flow.json ✔ schema cpi.v3 resolved ✔ published to NiFi Registry @ rev f3a91c $ d4n flow deploy --env prod --rev f3a91c ✔ deployed · next run 06:00 UTC