4  add.to() - Multiple Columns & Keys

4.1 Overview

Learn how to bring multiple columns at once and use multiple keys for matching. This page builds on basic lookups by showing you how to work with more complex data relationships.

What you’ll learn: - How to bring multiple columns in a single operation - How to use multiple keys for matching (composite keys) - How to combine multiple columns and multiple keys

Prerequisites: - Basic Lookup - Understanding of single column lookups


4.2 Example 1: Bring Multiple Columns

Business Context: You have orders with customer IDs, and you need to add both the customer name and email address for shipping labels.

Code:

import additory as add
import pandas as pd

# Customers with multiple attributes
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'email': ['alice@email.com', 'bob@email.com', 'charlie@email.com'],
    'city': ['NYC', 'LA', 'Chicago']
})

# Orders
orders = pd.DataFrame({
    'order_id': [101, 102, 103],
    'customer_id': [1, 2, 3],
    'amount': [100, 200, 150]
})

# Bring multiple columns using a list
result = add.to(
    orders,
    bring_from=customers,
    bring=['name', 'email'],      # List of columns to bring
    against='customer_id'
)

# Positional parameters (also works without naming certain parameters):
# result = add.to(orders, customers, ['name', 'email'], 'customer_id')

print(result)

Output:

   order_id  customer_id  amount     name              email
0       101            1     100    Alice  alice@email.com
1       102            2     200      Bob    bob@email.com
2       103            3     150  Charlie charlie@email.com

Explanation: - Use a list ['name', 'email'] to bring multiple columns - Both columns are added in a single operation - The ‘city’ column is not brought because it’s not in the list - This is more efficient than calling add.to() twice

Note: This also works with polars DataFrames.


4.3 Example 2: Multiple Keys (Composite Keys)

Business Context: Your sales data is uniquely identified by both region AND product. You need to match targets using both keys together.

Code:

import additory as add
import pandas as pd

# Sales data (identified by region + product)
sales = pd.DataFrame({
    'region': ['East', 'West', 'East', 'West'],
    'product': ['A', 'A', 'B', 'B'],
    'revenue': [1000, 1500, 800, 1200]
})

# Targets (also identified by region + product)
targets = pd.DataFrame({
    'region': ['East', 'West', 'East', 'West'],
    'product': ['A', 'A', 'B', 'B'],
    'target': [900, 1400, 750, 1100]
})

# Match using both region AND product
result = add.to(
    sales,
    bring_from=targets,
    bring='target',
    against=['region', 'product']    # List of key columns
)

# Positional parameters (also works without naming certain parameters):
# result = add.to(sales, targets, 'target', ['region', 'product'])

print(result)

Output:

  region product  revenue  target
0   East       A     1000     900
1   West       A     1500    1400
2   East       B      800     750
3   West       B     1200    1100

Explanation: - Use a list ['region', 'product'] to match on multiple keys - Both keys must match for a successful lookup - This is called a “composite key” in database terminology - East-A matches East-A, but East-A does NOT match West-A

Note: This also works with polars DataFrames.


4.4 Example 3: Multiple Columns AND Multiple Keys

Business Context: Your transaction data is identified by date and store_id. You need to add manager name and region information using both keys.

Code:

import additory as add
import pandas as pd

# Transactions (identified by date + store_id)
transactions = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-01', '2024-01-02'],
    'store_id': [1, 2, 1],
    'sales': [500, 600, 550]
})

# Store information (also by date + store_id)
store_info = pd.DataFrame({
    'date': ['2024-01-01', '2024-01-01', '2024-01-02'],
    'store_id': [1, 2, 1],
    'manager': ['Alice', 'Bob', 'Alice'],
    'region': ['North', 'South', 'North']
})

# Bring multiple columns using multiple keys
result = add.to(
    transactions,
    bring_from=store_info,
    bring=['manager', 'region'],        # Multiple columns
    against=['date', 'store_id']        # Multiple keys
)

# Positional parameters (also works without naming certain parameters):
# result = add.to(transactions, store_info, ['manager', 'region'], ['date', 'store_id'])

print(result)

Output:

         date  store_id  sales manager region
0  2024-01-01         1    500   Alice  North
1  2024-01-01         2    600     Bob  South
2  2024-01-02         1    550   Alice  North

Explanation: - Combines both techniques: multiple columns AND multiple keys - Each row is matched on BOTH date AND store_id - Both manager and region are brought together - Store 1 on different dates can have different managers (as shown)

Note: This also works with polars DataFrames.


4.5 Understanding Lists in add.to()

4.5.1 Single vs Multiple Values

# Single column (string)
bring='name'              # Brings one column

# Multiple columns (list)
bring=['name', 'email']   # Brings two columns

# Single key (string)
against='customer_id'     # Matches on one column

# Multiple keys (list)
against=['region', 'product']  # Matches on two columns

4.5.2 When to Use Multiple Keys

Use multiple keys when: - Your data has composite primary keys - A single column isn’t unique enough - You need to match on multiple dimensions (e.g., time + location)

Example scenarios: - Sales by region + product - Transactions by date + store - Inventory by warehouse + SKU - Prices by date + product


4.6 Key Takeaways

  • Use lists ['col1', 'col2'] to bring multiple columns at once
  • Use lists ['key1', 'key2'] to match on multiple keys (composite keys)
  • You can combine both: multiple columns AND multiple keys
  • All keys must match for a successful lookup
  • This is more efficient than multiple separate add.to() calls

4.7 Common Questions

Q: Can I bring all columns from the reference DataFrame?
A: Not directly. You must specify which columns to bring. This is intentional to keep operations explicit and avoid accidentally bringing unwanted columns.

Q: What if the key columns have different names in each DataFrame?
A: For basic usage, key columns must have the same names. We’ll cover handling different key names in advanced examples.

Q: Is there a limit to how many columns or keys I can use?
A: No practical limit, but keep it reasonable for performance and readability.

Q: What happens if some rows don’t match on all keys?
A: Those rows will have NaN (null) values in the brought columns, just like single-key lookups.


4.8 Next Steps


Version: 0.1.3
Last Updated: March 9, 2026