Add columns from one dataframe to another
What does add.to() do?
The add.to() function lets you bring columns from one dataframe (the reference) into another dataframe (the target) by matching on common columns. Think of it like a lookup or join operation, but with simpler syntax.
Common use cases:
| Parameter | Type | Required | Description |
|---|---|---|---|
| target_df | DataFrame | ✅ Yes | The dataframe you want to add columns to (your main dataframe) |
| from_df | DataFrame | ✅ Yes | The reference dataframe containing the columns you want to bring over |
| bring | str or list | ✅ Yes | The column name(s) you want to bring from the reference dataframe. Can be a single column name (string) or multiple columns (list) |
| against | str or list | ✅ Yes | The column name(s) to match on (the common columns between both dataframes). Can be a single column name (string) or multiple columns (list) |
Scenario: You have an orders dataframe and want to add product prices from a products dataframe.
import pandas as pd
import additory as add
# Your orders dataframe
orders = pd.DataFrame({
'order_id': [1, 2, 3],
'product_id': ['A', 'B', 'A'],
'quantity': [2, 1, 3]
})
# Reference products dataframe
products = pd.DataFrame({
'product_id': ['A', 'B', 'C'],
'price': [10.99, 25.50, 15.00]
})
# Bring the 'price' column from products, matching on 'product_id'
result = add.to(
orders,
from_df=products,
bring='price',
against='product_id'
)
print(result)
order_id product_id quantity price
0 1 A 2 10.99
1 2 B 1 25.50
2 3 A 3 10.99
Scenario: You want to bring multiple columns at once (price AND product name).
import pandas as pd
import additory as add
orders = pd.DataFrame({
'order_id': [1, 2, 3],
'product_id': ['A', 'B', 'A'],
'quantity': [2, 1, 3]
})
products = pd.DataFrame({
'product_id': ['A', 'B', 'C'],
'product_name': ['Widget', 'Gadget', 'Doohickey'],
'price': [10.99, 25.50, 15.00],
'category': ['Tools', 'Electronics', 'Tools']
})
# Bring both 'product_name' and 'price' columns
result = add.to(
orders,
from_df=products,
bring=['product_name', 'price'], # List of columns
against='product_id'
)
print(result)
order_id product_id quantity product_name price
0 1 A 2 Widget 10.99
1 2 B 1 Gadget 25.50
2 3 A 3 Widget 10.99
Scenario: You need to match on more than one column (e.g., product_id AND region).
import pandas as pd
import additory as add
# Orders with region information
orders = pd.DataFrame({
'order_id': [1, 2, 3, 4],
'product_id': ['A', 'B', 'A', 'B'],
'region': ['US', 'US', 'EU', 'EU'],
'quantity': [2, 1, 3, 2]
})
# Products with regional pricing
products = pd.DataFrame({
'product_id': ['A', 'A', 'B', 'B'],
'region': ['US', 'EU', 'US', 'EU'],
'price': [10.99, 12.50, 25.50, 28.00]
})
# Match on multiple columns using a list
result = add.to(
orders,
from_df=products,
bring='price',
against=['product_id', 'region'] # List of matching columns
)
print(result)
order_id product_id region quantity price
0 1 A US 2 10.99
1 2 B US 1 25.50
2 3 A EU 3 12.50
3 4 B EU 2 28.00
Scenario: E-commerce order processing - bringing customer details and product information into orders.
import pandas as pd
import additory as add
# Orders dataframe
orders = pd.DataFrame({
'order_id': [1001, 1002, 1003, 1004],
'customer_id': [501, 502, 501, 503],
'product_id': ['WIDGET-A', 'GADGET-B', 'WIDGET-A', 'TOOL-C'],
'quantity': [2, 1, 5, 3],
'order_date': ['2024-01-15', '2024-01-16', '2024-01-16', '2024-01-17']
})
# Customer information
customers = pd.DataFrame({
'customer_id': [501, 502, 503],
'customer_name': ['Alice Johnson', 'Bob Smith', 'Carol White'],
'email': ['alice@email.com', 'bob@email.com', 'carol@email.com'],
'tier': ['Gold', 'Silver', 'Gold']
})
# Product catalog
products = pd.DataFrame({
'product_id': ['WIDGET-A', 'GADGET-B', 'TOOL-C'],
'product_name': ['Premium Widget', 'Smart Gadget', 'Pro Tool'],
'price': [29.99, 149.99, 79.99],
'category': ['Widgets', 'Electronics', 'Tools']
})
# First, bring customer details
orders_with_customers = add.to(
orders,
from_df=customers,
bring=['customer_name', 'tier'],
against='customer_id'
)
print("After adding customer info:")
print(orders_with_customers)
# Then, bring product details
complete_orders = add.to(
orders_with_customers,
from_df=products,
bring=['product_name', 'price', 'category'],
against='product_id'
)
print("\nComplete orders with all information:")
print(complete_orders)
# Now you can easily calculate totals
complete_orders['order_total'] = complete_orders['quantity'] * complete_orders['price']
print("\nFinal result with totals:")
print(complete_orders)
order_id customer_id product_id quantity order_date customer_name tier product_name price category order_total
0 1001 501 WIDGET-A 2 2024-01-15 Alice Johnson Gold Premium Widget 29.99 Widgets 59.98
1 1002 502 GADGET-B 1 2024-01-16 Bob Smith Silver Smart Gadget 149.99 Electronics 149.99
2 1003 501 WIDGET-A 5 2024-01-16 Alice Johnson Gold Premium Widget 29.99 Widgets 149.95
3 1004 503 TOOL-C 3 2024-01-17 Carol White Gold Pro Tool 79.99 Tools 239.97
add.to() calls to bring columns from different reference dataframes.
# Single column, single match
result = add.to(target_df, from_df=reference_df, bring='column_name', against='match_column')
# Multiple columns, single match
result = add.to(target_df, from_df=reference_df, bring=['col1', 'col2'], against='match_column')
# Single column, multiple matches
result = add.to(target_df, from_df=reference_df, bring='column_name', against=['match1', 'match2'])
# Multiple columns, multiple matches
result = add.to(target_df, from_df=reference_df, bring=['col1', 'col2'], against=['match1', 'match2'])