Convert categorical columns to one-hot encoded columns
What does add.onehotencoding() do?
The add.onehotencoding() function converts categorical columns into binary (0/1) columns, creating one new column for each unique category. This is essential for machine learning algorithms that require numeric input.
Common use cases:
| Parameter | Type | Required | Description |
|---|---|---|---|
| df | DataFrame | ✅ Yes | The dataframe containing categorical columns to encode |
| columns | list or None | ❌ No | Specific columns to encode. If None, auto-detects categorical columns |
Scenario: You have a customer dataset with categorical columns and want to encode all of them automatically.
import pandas as pd
import additory as add
# Customer data with categorical columns
customers = pd.DataFrame({
'customer_id': [1, 2, 3, 4, 5],
'region': ['North', 'South', 'East', 'North', 'West'],
'tier': ['Gold', 'Silver', 'Bronze', 'Gold', 'Silver'],
'status': ['Active', 'Inactive', 'Active', 'Active', 'Inactive'],
'age': [25, 35, 45, 30, 40]
})
print("Original data:")
print(customers)
# Let additory automatically detect and encode categorical columns
result = add.onehotencoding(customers)
print("\nAfter one-hot encoding:")
print(result)
customer_id age region_East region_North region_South region_West tier_Bronze tier_Gold tier_Silver status_Active status_Inactive
0 1 25 0 1 0 0 0 1 0 1 0
1 2 35 0 0 1 0 0 0 1 0 1
2 3 45 1 0 0 0 1 0 0 1 0
3 4 30 0 1 0 0 0 1 0 1 0
4 5 40 0 0 0 1 0 0 1 0 1
Scenario: You only want to encode specific categorical columns, not all of them.
import pandas as pd
import additory as add
# Survey data with multiple categorical columns
survey = pd.DataFrame({
'response_id': [1, 2, 3, 4, 5],
'product_rating': ['Excellent', 'Good', 'Fair', 'Excellent', 'Good'],
'recommend': ['Yes', 'Yes', 'No', 'Yes', 'Maybe'],
'purchase_intent': ['Definitely', 'Probably', 'Unlikely', 'Definitely', 'Maybe'],
'age_group': ['25-34', '35-44', '45-54', '25-34', '35-44'],
'comments': ['Great product!', 'Could be better', 'Not for me', 'Love it!', 'It\'s okay']
})
print("Original survey data:")
print(survey)
# Only encode rating and recommendation columns (skip comments and age_group)
result = add.onehotencoding(
survey,
columns=['product_rating', 'recommend']
)
print("\nAfter encoding specific columns:")
print(result)
response_id purchase_intent age_group comments product_rating_Excellent product_rating_Fair product_rating_Good recommend_Maybe recommend_No recommend_Yes
0 1 Definitely 25-34 Great product! 1 0 0 0 0 1
1 2 Probably 35-44 Could be better 0 0 1 0 0 1
2 3 Unlikely 45-54 Not for me 0 1 0 0 1 0
3 4 Definitely 25-34 Love it! 1 0 0 0 0 1
4 5 Maybe 35-44 It's okay 0 0 1 1 0 0
original_column_category (e.g., "region_North", "tier_Gold").
# Auto-detect all categorical columns
result = add.onehotencoding(df)
# Encode specific columns only
result = add.onehotencoding(df, columns=['column1', 'column2'])
# Single column encoding
result = add.onehotencoding(df, columns=['status'])