18 Lineage Tracking
Track transformation history and understand data provenance using the @lineage mode.
18.1 Example 1: Basic Lineage Tracking
Enable lineage tracking by adding lineage=True to your operations, then use @lineage mode to see the transformation history.
import pandas as pd
import additory as add
df = pd.DataFrame({
'price': [100, 200, 150],
'quantity': [2, 1, 3]
})
# Perform operation with lineage tracking enabled
result = add.transform(
'@calc',
df,
columns=['price', 'quantity'],
expression='price * quantity',
as_='total',
lineage=True # Enable lineage tracking
)
# Get lineage report
lineage_report = add.scan(
'@lineage',
result
)
print(lineage_report)Output:
═══════════════════════════════════════════════════════════════
LINEAGE REPORT
═══════════════════════════════════════════════════════════════
DataFrame: 3 rows × 3 columns
Operations: 1 transformations applied
───────────────────────────────────────────────────────────────
Step 1: add.transform - 2026-03-11T09:53:59
───────────────────────────────────────────────────────────────
Rows: 3 → 3 (no change)
Columns Added: total
Parameters:
columns: ["price","quantity"]
expression: ["price * quantity"]
strategy: null
by: null
Key Points: - Add lineage=True to any operation (transform, to, synthetic) - Lineage metadata is stored with the DataFrame - Use @lineage mode to generate a human-readable report - Report shows operation history, parameters, and column changes
Note: This also works with polars DataFrames.
18.2 Example 2: Multi-Step Lineage Tracking
Lineage accumulates across multiple operations, giving you a complete transformation history.
import pandas as pd
import additory as add
df = pd.DataFrame({
'price': [100, 200, 150],
'cost': [60, 120, 90],
'quantity': [2, 1, 3]
})
# Step 1: Calculate profit with lineage tracking
result = add.transform(
'@calc',
df,
columns=['price', 'cost'],
expression='price - cost',
as_='profit',
lineage=True
)
# Step 2: Calculate total revenue (lineage continues)
result = add.transform(
'@calc',
result,
columns=['price', 'quantity'],
expression='price * quantity',
as_='revenue',
lineage=True
)
# Get complete lineage report
lineage_report = add.scan(
'@lineage',
result
)
print(lineage_report)Output:
═══════════════════════════════════════════════════════════════
LINEAGE REPORT
═══════════════════════════════════════════════════════════════
DataFrame: 3 rows × 5 columns
Operations: 2 transformations applied
───────────────────────────────────────────────────────────────
Step 1: add.transform - 2026-03-11T10:15:23
───────────────────────────────────────────────────────────────
Rows: 3 → 3 (no change)
Columns Added: profit
Parameters:
columns: ["price","cost"]
expression: ["price - cost"]
───────────────────────────────────────────────────────────────
Step 2: add.transform - 2026-03-11T10:15:23
───────────────────────────────────────────────────────────────
Rows: 3 → 3 (no change)
Columns Added: revenue
Parameters:
columns: ["price","quantity"]
expression: ["price * quantity"]
Key Insights: - Lineage persists across multiple operations - Each step is numbered and timestamped - You can see the complete transformation pipeline - Useful for debugging complex data workflows
Note: This also works with polars DataFrames.
18.3 Example 3: Lineage Without Tracking - Error Handling
If you forget to enable lineage tracking, @lineage mode provides a helpful error message.
import pandas as pd
import additory as add
df = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 5, 6]
})
# Transform WITHOUT lineage tracking
result = add.transform(
'@calc',
df,
columns=['a', 'b'],
expression='a + b',
as_='c'
# Note: lineage=True is missing!
)
# Try to get lineage report
try:
lineage_report = add.scan('@lineage', result)
except ValueError as e:
print(e)Output:
No lineage metadata found. Lineage tracking must be enabled by adding
lineage=True to add.to(), add.transform(), or add.synthetic() calls.
Example:
df = add.transform('@calc', df, strategy={'total': 'price * qty'}, lineage=True)
result = add.scan('@lineage', df)
Error Handling: - Clear error message when lineage is missing - Provides example of how to enable lineage - Helps you quickly fix the issue
Note: This also works with polars DataFrames.
18.4 Lineage Tracking Features
18.4.1 What Gets Tracked
- Operation Type: Which function was called (transform, to, synthetic)
- Timestamp: When the operation occurred
- Parameters: All operation parameters
- Row Changes: Rows before and after
- Column Changes: Columns added or modified
18.4.2 When to Use Lineage
- Debugging: Understand how data was transformed
- Auditing: Track data provenance for compliance
- Documentation: Auto-generate transformation documentation
- Collaboration: Share transformation history with team
18.4.3 Performance Impact
- Minimal overhead (<100ms per operation)
- Metadata stored in memory only
- No impact on computation speed
18.5 Parameters
18.5.1 Required Parameters
mode:'@lineage'for lineage trackingdf: Input DataFrame with lineage metadata
18.5.2 Optional Parameters
columns: Filter report to specific columns (not yet implemented)trace: Trace specific cell transformations (not yet implemented)as_type: Output format (currently only ‘text’ supported)
18.5.3 Positional Parameters
# Also works without naming certain parameters:
lineage_report = add.scan('@lineage', df)18.6 Enabling Lineage Tracking
18.6.1 In add.transform()
result = add.transform(
'@calc',
df,
columns=['a', 'b'],
expression='a + b',
as_='c',
lineage=True # Enable lineage
)18.6.2 In add.to()
result = add.to(
orders,
bring_from=customers,
bring='name',
against='customer_id',
lineage=True # Enable lineage
)18.6.3 In add.synthetic()
result = add.synthetic(
'@new',
n=100,
strategy={'a': {'distribution': 'normal', 'mean': 0, 'std': 1}},
lineage=True # Enable lineage
)18.7 Limitations (v0.1.4)
18.7.1 Session-Only Lineage
- Lineage metadata is stored in memory only
- Not persisted when saving DataFrames to disk
- Lost when Python session ends
18.7.2 Future Enhancements (v0.2.0)
- Persistent lineage with
add.save()andadd.load() - Cell-level tracing with
traceparameter - Column-specific lineage with
columnsparameter - Lineage visualization and export
18.8 Next Steps
- Page 1: Basic data scanning with
@analyzemode - Page 3: Real-world data quality workflows