Document elements · 02

Callouts, quotes & code

Block-level annotation patterns for technical proposals and reports. Each callout keeps the same left-rule + iconograph pattern; the rule's color is the only thing that changes. Magenta is reserved for the "Replaces:" callout — the Data4Now content motif.

i

Note

Trino federates queries to SQL Server and MariaDB via ODBC/JDBC. No data is copied — filters are pushed down to each source.

Best practice

Run airflow dags reserialize after every NiFi flow change so the scheduler picks up the new schema fingerprint.

!

Warning

Anonymized buckets are not a substitute for k-anonymity review. Apply the suppression rules in §4.2 before exposing aggregated views to public dashboards.

×

Do not

Never write to the raw/ zone from a notebook. Raw is append-only via NiFi and must retain provenance hashes for audit.

Replaces

Manual SFTP polling · CSV rename scripts · Excel vlookups across three workbooks · USB-drive backups · the email thread titled "FINAL_v7_use this one.xlsx".

We replaced six weeks of CSV-juggling with a single SELECT statement. The lake didn't replace anyone — it freed four statisticians to do statistics again.
Dr. M. Henriques — Director, Statistical Computing · STATIN
SQL · Trinofederated-cpi.sql
-- Monthly CPI by parish, federated across SQL Server + MinIO
SELECT
  p.parish_name,
  date_trunc('month', c.observed_at) AS month,
  avg(c.price_index)                       AS cpi
FROM   mssql.dim.parish               AS p
JOIN   lake.aggregated.cpi_observations AS c
  ON   c.parish_id = p.parish_id
WHERE  c.observed_at >= '2025-01-01'
GROUP BY 1, 2
ORDER BY 1, 2;

The medallion architecture organizes data across five zones — Raw Anonymized Staging Aggregated Archive — each with its own retention policy and access group¹.

¹
Retention windows follow the STATIN Records & Information Management Policy (2024), §6 — "Statistical micro-data". Archive zone is governed separately by the Veeam immutability schedule.