add.harmonize_units()

Convert and standardize units in your data

What does add.harmonize_units() do?

The add.harmonize_units() function converts mixed units in your data to a consistent target unit. It automatically detects the most common unit or lets you specify the target unit.

Common use cases:

📋 Table of Contents

📖 Parameters

Parameter Type Required Description
df DataFrame ✅ Yes The dataframe containing values and units to harmonize
value_column str ✅ Yes Name of the column containing numeric values
unit_column str ✅ Yes Name of the column containing unit strings
target_unit str or None ❌ No Target unit to convert to. If None, auto-detects most common unit
position str ❌ No Where to place new columns ("end", "start", etc.)

🚀 Example 1: Auto-detect Target Unit (Simplest)

Scenario: You have temperature readings in mixed Celsius and Fahrenheit, and want to standardize to the most common unit.

Setup: Create sample temperature data
import pandas as pd
import additory as add

# Temperature readings from different sensors
temperatures = pd.DataFrame({
    'sensor_id': ['A1', 'A2', 'B1', 'B2', 'C1'],
    'location': ['Kitchen', 'Living Room', 'Bedroom', 'Bathroom', 'Garage'],
    'temperature': [22.5, 75.2, 20.0, 68.0, 15.5],
    'unit': ['C', 'F', 'C', 'F', 'C']
})

print("Original temperature data:")
print(temperatures)
Auto-harmonize to most common unit
# Let additory detect the most common unit (Celsius in this case)
result = add.harmonize_units(
    temperatures,
    value_column='temperature',
    unit_column='unit'
)

print("\nAfter harmonizing units:")
print(result)
Output
  sensor_id    location  temperature unit  temperature_harmonized unit_harmonized
0        A1     Kitchen         22.5    C                   22.5               C
1        A2  Living Room         75.2    F                   24.0               C
2        B1     Bedroom         20.0    C                   20.0               C
3        B2    Bathroom         68.0    F                   20.0               C
4        C1      Garage         15.5    C                   15.5               C

🎯 Example 2: Specify Target Unit

Scenario: You have product weights in mixed units and want everything converted to kilograms.

Setup: Product catalog with mixed weight units
import pandas as pd
import additory as add

# Product catalog with different weight units
products = pd.DataFrame({
    'product_id': ['P001', 'P002', 'P003', 'P004', 'P005'],
    'product_name': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
    'weight': [2.5, 180, 0.8, 8.2, 1.2],
    'weight_unit': ['kg', 'g', 'kg', 'lbs', 'kg'],
    'price': [999, 699, 399, 299, 89]
})

print("Original product data:")
print(products)
Convert all weights to kilograms
# Specify target unit as kilograms
result = add.harmonize_units(
    products,
    value_column='weight',
    unit_column='weight_unit',
    target_unit='kg'
)

print("\nAfter converting to kilograms:")
print(result)
Output
  product_id product_name  weight weight_unit  price  weight_harmonized unit_harmonized
0       P001       Laptop     2.5          kg    999               2.50              kg
1       P002        Phone   180.0           g    699               0.18              kg
2       P003       Tablet     0.8          kg    399               0.80              kg
3       P004      Monitor     8.2         lbs    299               3.72              kg
4       P005     Keyboard     1.2          kg     89               1.20              kg

🌡️ Supported Unit Types

Temperature: Celsius (C), Fahrenheit (F), Kelvin (K)
Weight: Kilograms (kg), Grams (g), Pounds (lbs), Ounces (oz)
Distance: Meters (m), Kilometers (km), Miles (mi), Feet (ft), Inches (in)

⚠️ Important Notes

New Columns: Creates value_column_harmonized and unit_harmonized columns.
Original Data: Original columns are preserved alongside the harmonized versions.
Unit Recognition: Supports common unit abbreviations and full names.
Precision: Maintains reasonable precision during conversions.

🎯 Quick Reference

Basic syntax templates
# Auto-detect target unit (most common)
result = add.harmonize_units(df, value_column='temp', unit_column='unit')

# Specify target unit
result = add.harmonize_units(df, value_column='weight', unit_column='unit', target_unit='kg')

# Control column positioning
result = add.harmonize_units(df, value_column='distance', unit_column='unit', 
                           target_unit='km', position='start')