Convert categorical columns to one-hot encoded columns
What does add.onehotencoding() do?
The add.onehotencoding() function converts categorical columns into binary (0/1) columns, creating one new column for each unique category. This is essential for machine learning algorithms that require numeric input.
Common use cases:
| Parameter | Type | Required | Description |
|---|---|---|---|
| df | DataFrame | ✅ Yes | The dataframe containing categorical columns to encode |
| columns | str | ✅ Yes | Column name to encode (single column) |
Scenario: You have a customer dataset and want to encode the region column.
import pandas as pd
import additory as add
# Customer data with categorical columns
customers = pd.DataFrame({
'customer_id': [1, 2, 3, 4, 5],
'region': ['North', 'South', 'East', 'North', 'West'],
'tier': ['Gold', 'Silver', 'Bronze', 'Gold', 'Silver'],
'status': ['Active', 'Inactive', 'Active', 'Active', 'Inactive'],
'age': [25, 35, 45, 30, 40]
})
print("Original data:")
print(customers)
# Encode the region column
result = add.onehotencoding(customers, columns='region', max_cardinality_ratio=1.0)
print("\nAfter one-hot encoding:")
print(result)
customer_id tier status age region_East region_North region_South region_West
0 1 Gold Active 25 0 1 0 0
1 2 Silver Inactive 35 0 0 1 0
2 3 Bronze Active 45 1 0 0 0
3 4 Gold Active 30 0 1 0 0
4 5 Silver Inactive 40 0 0 0 1
Scenario: You want to encode a column but keep the original for reference.
import pandas as pd
import additory as add
# Survey data
survey = pd.DataFrame({
'response_id': [1, 2, 3, 4, 5],
'product_rating': ['Excellent', 'Good', 'Fair', 'Excellent', 'Good'],
'recommend': ['Yes', 'Yes', 'No', 'Yes', 'Maybe']
})
print("Original survey data:")
print(survey)
# Encode the recommend column but keep the original
result = add.onehotencoding(
survey,
columns='recommend',
drop_original=False
)
print("\nAfter encoding (original kept):")
print(result)
response_id product_rating recommend recommend_Maybe recommend_No recommend_Yes
0 1 Excellent Yes 0 0 1
1 2 Good Yes 0 0 1
2 3 Fair No 0 1 0
3 4 Excellent Yes 0 0 1
4 5 Good Maybe 1 0 0
original_column_category (e.g., "region_North", "tier_Gold").
drop_original=False to keep it).
# Encode a single column (drops original)
result = add.onehotencoding(df, columns='column_name')
# Keep the original column
result = add.onehotencoding(df, columns='column_name', drop_original=False)
# Encode multiple columns (call multiple times)
result = add.onehotencoding(df, columns='region')
result = add.onehotencoding(result, columns='tier')
result = add.onehotencoding(result, columns='status')