{% extends "base.html" %} {% block title %}TraceBi — Home{% endblock %} {% block content %}
Four ideas that everything else is built on. Understand these and the rest follows.
The core data container in TraceBi is the DataSet — a thin wrapper
around a pandas DataFrame. The critical rule: every operation
returns a new DataSet. Nothing mutates the original.
This means you can chain transformations freely — filter, transform, sort, rename, select — and each step produces a clean, independent result. If you filter the wrong rows, the original is still there. If you want to branch a dataset two different ways, both branches start from the same immutable source.
Every DataSet carries a lineage list — a chain of
LineageNode records. Each node records the operation type, a
human-readable description, row counts before and after, and a timestamp.
Because each operation appends a new node rather than replacing anything,
a DataSet at the end of a chain holds the complete history
of everything that produced it — connector, filters, joins, transforms,
aggregations — traceable back to the original source.
TraceBi structures pipelines as three named layers, each with a distinct purpose:
Each layer reads from the previous, writes to a configurable sink (SQL table, CSV, memory), and records its lineage. Every layer can be scheduled independently and re-run on demand.
A StarSchema sits above the connector layer and adds BI semantics.
You declare two types of tables:
Once defined, schema.query() is fully declarative. You describe the result
you want — which measures, grouped by which dimension attributes, filtered how —
and TraceBi resolves all the joins, applies filters, and aggregates automatically.
You never write the join logic by hand.
Six building blocks — each independent, all composable.
CSV, SQL (any SQLAlchemy dialect), BigQuery, Snowflake, and in-memory DataFrames —
all through a single connector.load(name) interface. Swap sources
without touching transform code.
Bronze → Silver → Gold layers, each writing to a configurable sink. Declarative cleaning in Silver (cast, deduplicate, drop nulls), StarSchema-powered aggregation in Gold.
Declare facts and dimensions once. Query with dot-notation dimension references
("dim_customer.region"). Auto-joins, filters, and aggregates — no SQL.
Compose reports from TextSection, TableSection, and
ChartSection blocks. Render to Excel or HTML. A lineage manifest
is written alongside every render.
Register layers, assign cron schedules, declare dependencies between layers. Run history is persisted to SQLite with row counts and upstream run IDs linking the full cross-layer chain.
Interactive Plotly/Dash dashboards with associative filters — selecting one panel auto-filters all others that share the same column. Mounted directly inside the TraceBi web server.
Follow along with the built-in demo data, or bring your own sources.
No files, no database, no setup. load_demo_model() returns a fully connected
DataModel backed by in-memory sample data — orders, customers, and a revenue
trend — so you can follow every step of this walkthrough immediately after
pip install tracebi.
Every method on DataSet — filter, transform, sort, select, rename —
returns a new immutable DataSet with the step appended to its lineage chain.
Call print_lineage() at any point to see the full audit trail.
A Report is assembled from section objects — text headings, tables, and charts.
Pass any DataSet directly into a section. Render to Excel or HTML; both
renderers write a manifest.json capturing the full lineage of every dataset.
Build a Dashboard from panel components — metrics, charts, tables, and filters.
Filters are associative: selecting a region
automatically updates every panel that shares that column, with no event-wiring code.
Register connectors and logical table names in a DataModel.
You can mix sources — orders from SQL, a customer lookup from CSV, a reference
table from BigQuery — and reference them all by name in your transforms.
Call model.connect() once to open all connections.
Every method on DataSet — filter, transform, sort, select, rename —
returns a new immutable DataSet with the step appended to its lineage chain.
Call print_lineage() at any point to see the full audit trail, or
fingerprint() to get a hash of the current data state.
For production pipelines, structure your work as Bronze → Silver → Gold layers.
Each layer reads from the previous sink and writes its output to the next.
SilverLayer transforms are declared once and reused on every run.
GoldLayer queries a StarSchema to produce the final
analytic output.
A Report is assembled from section objects — text headings, tables, and
charts. Pass any DataSet directly into a TableSection or
ChartSection. Render to Excel or HTML; both renderers write a
manifest.json alongside the file capturing the full lineage of every
dataset in the report.
Register all your layers with a PipelineRunner. Assign each layer a cron
schedule and optionally declare which layer it depends on. Run any layer on demand
with runner.run(name), or pass refresh=True to cascade
through all upstream dependencies first. Every run is persisted to SQLite with
row counts and an upstream_run_id linking the full cross-layer chain.
Build a Dashboard from panel components — metrics, charts, tables, and
filters. Filters are associative: selecting
a region in the filter panel automatically updates every chart and table that shares
that column, without any event-wiring code. Mount the dashboard into the TraceBi web
server via DashboardServer — no separate Dash process required.
Two ways to work with TraceBi — pick what fits your workflow.
Install TraceBi and use it from a notebook or script. Define your connectors, build your medallion pipeline, query your star schema, and render reports to HTML or Excel — no web server required.
Then follow the walkthrough above. Run any example to see it in action:
Register your connectors, models, reports, and pipelines in
web/demo_app.py. The web server exposes a UI for running reports,
browsing pipelines, and viewing lineage — plus a full REST API.
Every DataSet carries a chain of LineageNode records
describing exactly which connector, transformation, and timestamp produced each result.
Run any report here to visualise the full DAG.
Every UI feature is backed by a documented REST API. Trigger report runs, kick pipeline layers, and pull lineage data programmatically — without touching the UI.