Generate additional data rows or create data from scratch
What does add.augment() do?
The add.augment() function has three powerful modes: augment existing data with more rows, create entirely new datasets from scratch, or load sample data for testing.
Three modes:
| Parameter | Type | Required | Description |
|---|---|---|---|
| df | DataFrame or str | ✅ Yes | DataFrame to augment, "@new" to create, or "@sample" for sample data |
| n_rows | int | ❌ No | Number of rows to generate (default: 5) |
| strategy | str or dict | ❌ No | "auto" for augment mode, dict for create mode (default: "auto") |
| seed | int or None | ❌ No | Random seed for reproducible results |
| output_format | str | ❌ No | Output format: "pandas", "polars", "cudf" (default: "pandas") |
Scenario: You have a small customer dataset and want to generate more similar customers for testing.
import pandas as pd
import additory as add
# Small customer dataset
customers = pd.DataFrame({
'customer_id': [1, 2, 3],
'age': [25, 35, 45],
'income': [50000, 75000, 90000],
'region': ['North', 'South', 'East']
})
print("Original customer data:")
print(customers)
# Generate 10 more customers similar to existing ones
result = add.augment(customers, n_rows=10)
print(f"\nAugmented data ({len(result)} rows):")
print(result)
Augmented data (13 rows):
customer_id age income region
0 1 25 50000 North
1 2 35 75000 South
2 3 45 90000 East
3 4 28 52000 North
4 5 38 78000 South
5 6 42 87000 East
6 7 31 68000 North
7 8 29 55000 South
8 9 47 92000 East
9 10 26 51000 North
10 11 36 76000 South
11 12 44 89000 East
12 13 33 71000 North
Scenario: You need to create a completely new dataset with specific column types and patterns.
import pandas as pd
import additory as add
# Define strategy for each column
strategy = {
'employee_id': 'increment:start=1', # Sequential IDs starting from 1
'name': 'choice:[John Smith,Jane Doe,Mike Brown,Sarah Lee,Tom Wilson]', # Pick from list
'age': 'range:22-65', # Ages between 22 and 65
'department': 'choice:[HR,IT,Sales,Marketing]', # Pick from list
'salary': 'range:40000-120000' # Salary range
}
# Create 50 employees from scratch
result = add.augment("@new", n_rows=50, strategy=strategy)
print("Created employee data:")
print(result.head(10)) # Show first 10 rows
Created employee data:
employee_id name age department salary
0 1 John Smith 28 IT 65000
1 2 Jane Doe 34 Sales 72000
2 3 Mike Brown 45 HR 58000
3 4 Sarah Lee 29 Marketing 69000
4 5 Tom Wilson 38 IT 85000
5 6 John Smith 31 Sales 71000
6 7 Jane Doe 42 HR 62000
7 8 Mike Brown 27 Marketing 67000
8 9 Sarah Lee 35 IT 78000
9 10 Tom Wilson 29 Sales 73000
Scenario: You need some realistic sample data quickly for testing or demos.
import additory as add
# Load 100 rows of sample data
sample_data = add.augment("@sample", n_rows=100)
print("Sample data loaded:")
print(sample_data.head())
print(f"\nDataset shape: {sample_data.shape}")
print(f"Columns: {list(sample_data.columns)}")
Sample data loaded:
customer_id name age income region
0 1 John Smith 28 52000 North
1 2 Jane Doe 34 68000 South
2 3 Mike Brown 45 85000 East
3 4 Sarah Lee 29 47000 West
4 5 Tom Wilson 38 72000 North
Dataset shape: (100, 5)
Columns: ['customer_id', 'name', 'age', 'income', 'region']
seed parameter for consistent results.
# Augment existing data
result = add.augment(df, n_rows=100)
# Create from scratch
result = add.augment("@new", n_rows=50, strategy={'id': 'increment:start=1', 'name': 'choice:[John,Jane,Bob]'})
# Load sample data (if available)
result = add.augment("@sample", n_rows=1000)
# With reproducible seed
result = add.augment(df, n_rows=100, seed=42)