add.to()

Add columns from one dataframe to another

What does add.to() do?

The add.to() function lets you bring columns from one dataframe (the reference) into another dataframe (the target) by matching on common columns. Think of it like a lookup or join operation, but with simpler syntax.

Common use cases:

📋 Table of Contents

📖 Parameters

Parameter Type Required Description
target_df DataFrame ✅ Yes The dataframe you want to add columns to (your main dataframe)
from_df DataFrame ✅ Yes The reference dataframe containing the columns you want to bring over
bring str or list ✅ Yes The column name(s) you want to bring from the reference dataframe. Can be a single column name (string) or multiple columns (list)
against str or list ✅ Yes The column name(s) to match on (the common columns between both dataframes). Can be a single column name (string) or multiple columns (list)
💡 Note: The function works with both pandas and polars dataframes. It will automatically detect the type and return the same type.

🚀 Example 1: Simplest Usage

Scenario: You have an orders dataframe and want to add product prices from a products dataframe.

Setup: Create sample data
import pandas as pd
import additory as add

# Your orders dataframe
orders = pd.DataFrame({
    'order_id': [1, 2, 3],
    'product_id': ['A', 'B', 'A'],
    'quantity': [2, 1, 3]
})

# Reference products dataframe
products = pd.DataFrame({
    'product_id': ['A', 'B', 'C'],
    'price': [10.99, 25.50, 15.00]
})
Add prices to orders
# Bring the 'price' column from products, matching on 'product_id'
result = add.to(
    orders,
    from_df=products,
    bring='price',
    against='product_id'
)

print(result)
Output
   order_id product_id  quantity  price
0         1          A         2  10.99
1         2          B         1  25.50
2         3          A         3  10.99

📦 Example 2: Bringing Multiple Columns

Scenario: You want to bring multiple columns at once (price AND product name).

Setup: Enhanced products data
import pandas as pd
import additory as add

orders = pd.DataFrame({
    'order_id': [1, 2, 3],
    'product_id': ['A', 'B', 'A'],
    'quantity': [2, 1, 3]
})

products = pd.DataFrame({
    'product_id': ['A', 'B', 'C'],
    'product_name': ['Widget', 'Gadget', 'Doohickey'],
    'price': [10.99, 25.50, 15.00],
    'category': ['Tools', 'Electronics', 'Tools']
})
Bring multiple columns using a list
# Bring both 'product_name' and 'price' columns
result = add.to(
    orders,
    from_df=products,
    bring=['product_name', 'price'],  # List of columns
    against='product_id'
)

print(result)
Output
   order_id product_id  quantity product_name  price
0         1          A         2       Widget  10.99
1         2          B         1       Gadget  25.50
2         3          A         3       Widget  10.99

🔗 Example 3: Matching on Multiple Columns

Scenario: You need to match on more than one column (e.g., product_id AND region).

Setup: Regional pricing data
import pandas as pd
import additory as add

# Orders with region information
orders = pd.DataFrame({
    'order_id': [1, 2, 3, 4],
    'product_id': ['A', 'B', 'A', 'B'],
    'region': ['US', 'US', 'EU', 'EU'],
    'quantity': [2, 1, 3, 2]
})

# Products with regional pricing
products = pd.DataFrame({
    'product_id': ['A', 'A', 'B', 'B'],
    'region': ['US', 'EU', 'US', 'EU'],
    'price': [10.99, 12.50, 25.50, 28.00]
})
Match on both product_id AND region
# Match on multiple columns using a list
result = add.to(
    orders,
    from_df=products,
    bring='price',
    against=['product_id', 'region']  # List of matching columns
)

print(result)
Output
   order_id product_id region  quantity  price
0         1          A     US         2  10.99
1         2          B     US         1  25.50
2         3          A     EU         3  12.50
3         4          B     EU         2  28.00

🌟 Example 4: Real-World Scenario

Scenario: E-commerce order processing - bringing customer details and product information into orders.

Setup: Complete e-commerce data
import pandas as pd
import additory as add

# Orders dataframe
orders = pd.DataFrame({
    'order_id': [1001, 1002, 1003, 1004],
    'customer_id': [501, 502, 501, 503],
    'product_id': ['WIDGET-A', 'GADGET-B', 'WIDGET-A', 'TOOL-C'],
    'quantity': [2, 1, 5, 3],
    'order_date': ['2024-01-15', '2024-01-16', '2024-01-16', '2024-01-17']
})

# Customer information
customers = pd.DataFrame({
    'customer_id': [501, 502, 503],
    'customer_name': ['Alice Johnson', 'Bob Smith', 'Carol White'],
    'email': ['alice@email.com', 'bob@email.com', 'carol@email.com'],
    'tier': ['Gold', 'Silver', 'Gold']
})

# Product catalog
products = pd.DataFrame({
    'product_id': ['WIDGET-A', 'GADGET-B', 'TOOL-C'],
    'product_name': ['Premium Widget', 'Smart Gadget', 'Pro Tool'],
    'price': [29.99, 149.99, 79.99],
    'category': ['Widgets', 'Electronics', 'Tools']
})
Step 1: Add customer information
# First, bring customer details
orders_with_customers = add.to(
    orders,
    from_df=customers,
    bring=['customer_name', 'tier'],
    against='customer_id'
)

print("After adding customer info:")
print(orders_with_customers)
Step 2: Add product information
# Then, bring product details
complete_orders = add.to(
    orders_with_customers,
    from_df=products,
    bring=['product_name', 'price', 'category'],
    against='product_id'
)

print("\nComplete orders with all information:")
print(complete_orders)
Step 3: Calculate order totals
# Now you can easily calculate totals
complete_orders['order_total'] = complete_orders['quantity'] * complete_orders['price']

print("\nFinal result with totals:")
print(complete_orders)
Final Output
   order_id  customer_id   product_id  quantity  order_date   customer_name   tier    product_name   price     category  order_total
0      1001          501   WIDGET-A         2  2024-01-15  Alice Johnson   Gold  Premium Widget   29.99      Widgets        59.98
1      1002          502   GADGET-B         1  2024-01-16     Bob Smith Silver    Smart Gadget  149.99  Electronics       149.99
2      1003          501   WIDGET-A         5  2024-01-16  Alice Johnson   Gold  Premium Widget   29.99      Widgets       149.95
3      1004          503     TOOL-C         3  2024-01-17    Carol White   Gold        Pro Tool   79.99        Tools       239.97

⚠️ Common Pitfalls & Tips

Tip 1: Make sure the matching columns exist in both dataframes with the same names.
Tip 2: If a match isn't found, the brought columns will have NaN (missing) values.
Tip 3: The function performs a LEFT join, meaning all rows from the target dataframe are kept.
Tip 4: You can chain multiple add.to() calls to bring columns from different reference dataframes.

🎯 Quick Reference

Basic syntax template
# Single column, single match
result = add.to(target_df, from_df=reference_df, bring='column_name', against='match_column')

# Multiple columns, single match
result = add.to(target_df, from_df=reference_df, bring=['col1', 'col2'], against='match_column')

# Single column, multiple matches
result = add.to(target_df, from_df=reference_df, bring='column_name', against=['match1', 'match2'])

# Multiple columns, multiple matches
result = add.to(target_df, from_df=reference_df, bring=['col1', 'col2'], against=['match1', 'match2'])