22 Troubleshooting Guide
A comprehensive reference for common errors, solutions, and best practices when using Additory.
22.1 Quick Error Reference
| Error Type | Common Cause | Quick Fix |
|---|---|---|
TypeError: DataFrame must be pandas or polars |
Wrong input type | Pass pandas or polars DataFrame |
ValueError: Column 'X' not found |
Missing column | Check column names with df.columns |
ImportError: Rust bindings not available |
Missing Rust extension | Install with pip install additory[rust] |
ValueError: Cannot use 'as_type' with 'lineage=True' |
Conflicting parameters | Use either lineage=True OR as_type, not both |
RuntimeError: Transform failed |
Invalid mode or parameters | Check mode name has @ prefix |
KeyError: 'column_name' |
Column doesn’t exist | Verify column exists in DataFrame |
22.2 Common Errors by Function
22.2.1 add.to() Errors
22.2.1.1 Error: “Column ‘X’ not found in reference DataFrame”
Cause: The column specified in bring doesn’t exist in bring_from DataFrame.
Solution:
# ❌ Wrong - 'age' doesn't exist in customers
result = add.to(orders, bring_from=customers, bring='age', against='customer_id')
# ✅ Correct - check available columns first
print(customers.columns) # ['customer_id', 'name', 'email']
result = add.to(orders, bring_from=customers, bring='name', against='customer_id')22.2.1.2 Error: “Key column ‘X’ not found”
Cause: The column specified in against doesn’t exist in one or both DataFrames.
Solution:
# ❌ Wrong - 'id' doesn't exist in orders
result = add.to(orders, bring_from=customers, bring='name', against='id')
# ✅ Correct - use the actual key column name
result = add.to(orders, bring_from=customers, bring='name', against='customer_id')22.2.1.3 Error: “All reference DataFrames must be the same type”
Cause: Mixing pandas and polars DataFrames in list for bring_from.
Solution:
# ❌ Wrong - mixing pandas and polars
orders_jan = pd.DataFrame(...) # pandas
orders_feb = pl.DataFrame(...) # polars
result = add.to(customers, bring_from=[orders_jan, orders_feb], ...)
# ✅ Correct - use same type for all
orders_jan = pd.DataFrame(...) # pandas
orders_feb = pd.DataFrame(...) # pandas
result = add.to(customers, bring_from=[orders_jan, orders_feb], ...)22.2.1.4 Error: “strategy parameter must be a dict”
Cause: Passing string instead of dict for aggregation strategy.
Solution:
# ❌ Wrong - strategy is a string
result = add.to(customers, bring_from=orders, bring='amount',
against='customer_id', strategy='sum')
# ✅ Correct - strategy is a dict
result = add.to(customers, bring_from=orders, bring='amount',
against='customer_id', strategy={'amount': 'sum'})22.2.2 add.transform() Errors
22.2.2.1 Error: “Mode ‘(calc?)’ requires ‘expression’ parameter”
Cause: Missing required expression parameter for calculation mode.
Solution:
# ❌ Wrong - missing expression
result = add.transform('@calc', df, columns=['price', 'quantity'], as_='total')
# ✅ Correct - include expression
result = add.transform('@calc', df, columns=['price', 'quantity'],
expression='price * quantity', as_='total')22.2.2.2 Error: “Mode ‘(filter?)’ requires ‘where’ parameter”
Cause: Missing required where parameter for filter mode.
Solution:
# ❌ Wrong - missing where condition
result = add.transform('@filter', df, columns=['price', 'stock'])
# ✅ Correct - include where condition
result = add.transform('@filter', df, columns=['price', 'stock'], where='stock > 0')22.2.2.3 Error: “Mode ‘(aggregate?)’ requires ‘by’ parameter”
Cause: Missing required by parameter for aggregation mode.
Solution:
# ❌ Wrong - missing by parameter
result = add.transform('@aggregate', df, columns=['region', 'sales'],
strategy={'sales': 'sum'})
# ✅ Correct - include by parameter
result = add.transform('@aggregate', df, columns=['region', 'sales'],
by='region', strategy={'sales': 'sum'})22.2.2.4 Error: “Invalid mode ‘calc’ - did you mean ‘(calc?)’?”
Cause: Missing @ prefix on mode name.
Solution:
# ❌ Wrong - missing @ prefix
result = add.transform('calc', df, expression='price * 2', as_='doubled')
# ✅ Correct - include @ prefix
result = add.transform('@calc', df, expression='price * 2', as_='doubled')22.2.2.5 Error: “Number of expressions must match number of output names”
Cause: Mismatch between number of expressions and as_ names.
Solution:
# ❌ Wrong - 2 expressions but 1 output name
result = add.transform('@calc', df,
expression=['price * quantity', 'price - cost'],
as_='total')
# ✅ Correct - matching counts
result = add.transform('@calc', df,
expression=['price * quantity', 'price - cost'],
as_=['total', 'profit'])22.2.3 add.synthetic() Errors
22.2.3.1 Error: “Mode ‘(new?)’ requires ‘n’ parameter”
Cause: Missing required n parameter for creating new DataFrame.
Solution:
# ❌ Wrong - missing n parameter
result = add.synthetic('@new', strategy={'id': {'type': 'sequence'}})
# ✅ Correct - include n parameter
result = add.synthetic('@new', n=100, strategy={'id': {'type': 'sequence'}})22.2.3.2 Error: “Mode ‘(augment?)’ requires ‘df’ parameter”
Cause: Missing required df parameter for augmenting existing DataFrame.
Solution:
# ❌ Wrong - missing df parameter
result = add.synthetic('@augment', n=50, strategy={'age': {'distribution': 'normal'}})
# ✅ Correct - include df parameter
result = add.synthetic('@augment', df=existing_df, n=50,
strategy={'age': {'distribution': 'normal'}})22.2.3.3 Error: “Invalid distribution ‘gaussian’ - did you mean ‘normal’?”
Cause: Using incorrect distribution name.
Solution:
# ❌ Wrong - 'gaussian' is not a valid distribution name
result = add.synthetic('@new', n=100,
strategy={'age': {'distribution': 'gaussian'}})
# ✅ Correct - use 'normal' for Gaussian distribution
result = add.synthetic('@new', n=100,
strategy={'age': {'distribution': 'normal', 'mean': 35, 'std': 8}})Valid distributions: normal, lognormal, uniform, exponential, poisson, binomial, beta
22.2.3.4 Error: “LinkedList requires ‘levels’ parameter”
Cause: Missing required levels parameter for linked list strategy.
Solution:
# ❌ Wrong - missing levels
result = add.synthetic('@new', n=10,
strategy={'location': {'type': 'linked_list'}})
# ✅ Correct - include levels
result = add.synthetic('@new', n=10,
strategy={'location': {
'type': 'linked_list',
'levels': [
['USA', 'Canada', 'Mexico'],
['NY', 'Toronto', 'CDMX']
]
}})22.2.4 add.scan() Errors
22.2.4.1 Error: “Mode ‘(lineage?)’ requires lineage=True in previous operations”
Cause: Trying to scan lineage on DataFrame that wasn’t created with lineage=True.
Solution:
# ❌ Wrong - no lineage tracking enabled
df = pd.DataFrame({'a': [1, 2, 3]})
result = add.transform('@calc', df, expression='a * 2', as_='b')
lineage = add.scan('@lineage', result) # Error!
# ✅ Correct - enable lineage tracking
df = pd.DataFrame({'a': [1, 2, 3]})
result = add.transform('@calc', df, expression='a * 2', as_='b', lineage=True)
lineage = add.scan('@lineage', result) # Works!22.2.4.2 Error: “Invalid mode ‘analyze’ - did you mean ‘(analyze?)’?”
Cause: Missing @ prefix on mode name.
Solution:
# ❌ Wrong - missing @ prefix
result = add.scan('analyze', df)
# ✅ Correct - include @ prefix
result = add.scan('@analyze', df)22.3 Parameter Validation Errors
22.3.1 Error: “Cannot use ‘as_type’ with ‘lineage=True’”
Cause: Trying to use both lineage=True and as_type together.
Explanation: Lineage metadata is stored in the DataFrame’s native format and would be lost during type conversion.
Solution:
# ❌ Wrong - using both lineage and as_type
result = add.to(orders, bring_from=customers, bring='name',
against='customer_id', lineage=True, as_type='polars')
# ✅ Option 1: Track lineage (returns same type as input)
result = add.to(orders, bring_from=customers, bring='name',
against='customer_id', lineage=True)
# ✅ Option 2: Convert type (no lineage tracking)
result = add.to(orders, bring_from=customers, bring='name',
against='customer_id', as_type='polars')
# ✅ Option 3: Convert after tracking lineage (lineage will be lost)
result = add.to(orders, bring_from=customers, bring='name',
against='customer_id', lineage=True)
result_polars = pl.from_pandas(result) # Convert separately22.4 Data Type Errors
22.4.1 Error: “TypeError: DataFrame must be pandas or polars”
Cause: Passing wrong type (dict, list, numpy array, etc.) instead of DataFrame.
Solution:
# ❌ Wrong - passing dict
data = {'a': [1, 2, 3], 'b': [4, 5, 6]}
result = add.transform('@calc', data, expression='a + b', as_='c')
# ✅ Correct - convert to DataFrame first
import pandas as pd
df = pd.DataFrame(data)
result = add.transform('@calc', df, expression='a + b', as_='c')22.4.2 Error: “Column ‘X’ has incompatible type for operation”
Cause: Trying to perform numeric operations on string columns or vice versa.
Solution:
# ❌ Wrong - trying to multiply string column
df = pd.DataFrame({'product': ['A', 'B'], 'quantity': [10, 20]})
result = add.transform('@calc', df, expression='product * 2', as_='doubled')
# ✅ Correct - use numeric columns for math operations
result = add.transform('@calc', df, expression='quantity * 2', as_='doubled')22.5 Empty DataFrame Errors
22.5.1 Error: “Cannot perform operation on empty DataFrame”
Cause: Trying to transform or analyze DataFrame with 0 rows.
Solution:
# ❌ Wrong - operating on empty DataFrame
df = pd.DataFrame({'a': [], 'b': []})
result = add.transform('@calc', df, expression='a + b', as_='c')
# ✅ Correct - check for empty DataFrame first
if len(df) > 0:
result = add.transform('@calc', df, expression='a + b', as_='c')
else:
print("DataFrame is empty, skipping transformation")
result = df22.6 Missing Value Errors
22.6.1 Error: “Cannot aggregate column with all null values”
Cause: Trying to aggregate a column that contains only null/None values.
Solution:
# ❌ Wrong - column has all nulls
df = pd.DataFrame({'region': ['A', 'A', 'B'], 'sales': [None, None, None]})
result = add.transform('@aggregate', df, columns=['region', 'sales'],
by='region', strategy={'sales': 'sum'})
# ✅ Correct - handle nulls first
df = pd.DataFrame({'region': ['A', 'A', 'B'], 'sales': [None, None, None]})
# Fill nulls with 0 or use @deduce mode
df['sales'] = df['sales'].fillna(0)
result = add.transform('@aggregate', df, columns=['region', 'sales'],
by='region', strategy={'sales': 'sum'})22.7 Best Practices
22.7.1 1. Always Check Column Names
# Before using columns, verify they exist
print(df.columns)
print(df.dtypes) # Also check data types22.7.2 2. Use Consistent DataFrame Types
# Don't mix pandas and polars in the same operation
# Pick one and stick with it for related operations22.7.3 3. Enable Logging for Debugging
# Enable logging to see detailed operation information
result = add.to(orders, bring_from=customers, bring='name',
against='customer_id', logging=True)22.7.4 4. Validate Data Before Operations
# Check for empty DataFrames
if len(df) == 0:
print("Warning: DataFrame is empty")
# Check for required columns
required_cols = ['price', 'quantity']
missing_cols = [col for col in required_cols if col not in df.columns]
if missing_cols:
print(f"Missing columns: {missing_cols}")
# Check for null values
null_counts = df.isnull().sum()
if null_counts.any():
print(f"Null values found:\n{null_counts[null_counts > 0]}")22.7.5 5. Use Explicit Parameter Names
# ✅ Good - clear and explicit
result = add.to(
bring_to=orders,
bring_from=customers,
bring='name',
against='customer_id'
)
# ❌ Avoid - positional parameters can be confusing
result = add.to(orders, customers, 'name', 'customer_id')22.7.6 6. Handle Errors Gracefully
# Wrap operations in try-except for production code
try:
result = add.transform('@calc', df, expression='price * quantity', as_='total')
except ValueError as e:
print(f"Validation error: {e}")
# Handle error appropriately
except RuntimeError as e:
print(f"Operation failed: {e}")
# Handle error appropriately22.7.7 7. Test with Small Data First
# Test operations on a small sample before running on full dataset
sample_df = df.head(10)
result = add.transform('@calc', sample_df, expression='price * quantity', as_='total')
# If successful, run on full dataset
result = add.transform('@calc', df, expression='price * quantity', as_='total')22.8 Debugging Checklist
When you encounter an error, check these items in order:
- Mode name: Does it have the
@prefix? (@calc, notcalc) - Column names: Do all referenced columns exist in the DataFrame?
- Data types: Are you using numeric columns for math operations?
- Required parameters: Does the mode have all required parameters?
- Parameter types: Are parameters the correct type (dict, list, string)?
- DataFrame type: Is it pandas or polars (not dict, list, etc.)?
- Empty data: Does the DataFrame have at least one row?
- Null values: Are there unexpected null values in key columns?
- Conflicting parameters: Are you using
lineage=Truewithas_type? - Rust bindings: Is the Rust extension installed for Rust-based modes?
22.9 Getting Help
If you’re still stuck after checking this guide:
- Check the error message carefully - It often contains the solution
- Enable logging - Use
logging=Trueto see detailed operation info - Simplify the operation - Try with minimal parameters first
- Check the examples - Review the documentation examples for similar use cases
- Verify installation - Ensure Rust bindings are installed:
pip install additory[rust]
22.10 Common Patterns That Work
22.10.1 Pattern 1: Multi-Step Pipeline
# Build pipelines step by step, checking results at each stage
df = pd.DataFrame(...)
# Step 1: Calculate
result = add.transform('@calc', df, expression='price * quantity', as_='total')
print(f"After calc: {result.shape}")
# Step 2: Filter
result = add.transform('@filter', result, where='total > 100')
print(f"After filter: {result.shape}")
# Step 3: Aggregate
result = add.transform('@aggregate', result, by='region',
strategy={'total': 'sum'})
print(f"After aggregate: {result.shape}")22.10.2 Pattern 2: Safe Column Access
# Always verify columns exist before using them
def safe_transform(df, expression, as_name):
# Extract column names from expression (simplified)
required_cols = ['price', 'quantity'] # Columns used in expression
missing = [col for col in required_cols if col not in df.columns]
if missing:
raise ValueError(f"Missing required columns: {missing}")
return add.transform('@calc', df, expression=expression, as_=as_name)
# Use the safe wrapper
result = safe_transform(df, 'price * quantity', 'total')22.10.3 Pattern 3: Type-Safe Operations
# Ensure consistent types throughout pipeline
import pandas as pd
# Start with pandas
df = pd.DataFrame(...)
# All operations return pandas (no as_type conversion)
result = add.to(df, bring_from=ref_df, bring='name', against='id')
result = add.transform('@calc', result, expression='price * 2', as_='doubled')
result = add.transform('@filter', result, where='doubled > 100')
# Final result is pandas
assert isinstance(result, pd.DataFrame)22.11 Error Message Glossary
| Error Message | Meaning | Action |
|---|---|---|
| “Column ‘X’ not found” | Column doesn’t exist | Check df.columns |
| “Mode ‘X’ not recognized” | Invalid mode name | Add @ prefix or check spelling |
| “Parameter ‘X’ is required” | Missing required parameter | Add the parameter |
| “Type mismatch” | Wrong data type | Convert to correct type |
| “Empty DataFrame” | No rows in DataFrame | Check data source |
| “Rust bindings not available” | Extension not installed | Run pip install additory[rust] |
| “Cannot convert” | Type conversion failed | Check data compatibility |
| “Invalid strategy” | Wrong strategy format | Use dict format: {'col': 'agg'} |
| “Conflicting parameters” | Incompatible parameters used together | Remove one parameter |
| “Operation failed” | Generic operation error | Check all parameters and data |
Note: This guide covers the most common errors. For function-specific details, refer to the individual function documentation pages.