Generate synthetic data from scratch
What does add.synthetic() do?
The add.synthetic() function generates synthetic data from scratch using various strategies. Perfect for creating test data, mock datasets, or generating sample data for development.
Common use cases:
| Parameter | Type | Required | Description |
|---|---|---|---|
| df | str | ✅ Yes | Use '@new' to create data from scratch |
| n_rows | int | ❌ No | Number of rows to generate (default: 5) |
| strategy | dict | ✅ Yes | Dictionary mapping column names to generation strategies |
| seed | int | ❌ No | Random seed for reproducibility (default: None) |
'@new') with generative strategies.
Scenario: Generate a simple dataset with sequential IDs.
import additory as add
# Generate 5 rows (default) with incrementing IDs starting from 1
df = add.synthetic(
'@new',
strategy={'id': 'increment:start=1'}
)
print(df)
id
0 1
1 2
2 3
3 4
4 5
Scenario: Generate a more complex dataset with multiple columns using different strategies.
import additory as add
# Generate 10 rows with different strategies for each column
df = add.synthetic(
'@new',
n_rows=10,
strategy={
'id': 'increment:start=1',
'age': 'range:18-65',
'status': 'choice:[active,inactive,pending]'
},
seed=42
)
print(df)
id age status
0 1 52 active
1 2 33 inactive
2 3 45 pending
3 4 28 active
4 5 61 inactive
5 6 22 pending
6 7 39 active
7 8 56 inactive
8 9 31 pending
9 10 48 active
'increment:start=1' - Sequential numbers'range:18-65' - Random integers in range'choice:[value1,value2,value3]' - Random selection from list'lists@variable_name' - Linked lists (see Example 3)Scenario: Generate data with semantic relationships using linked lists. Perfect for creating related data like adverse events with medications and severity levels.
import additory as add
# Define a linked list with explicit column names
# Format: [Column_Names:[name1,name2,name3]]
# Then: [primary_key, [related_values1], [related_values2]]
AE_CM_SEV = [
['Column_Names:[adverse_event,medication,severity]'],
['Headache', ['Aspirin', 'Ibuprofen'], ['mild', 'moderate']],
['Nausea', ['Ondansetron'], ['severe']]
]
# Generate 10 rows using the linked list
df = add.synthetic(
'@new',
n_rows=10,
strategy={'col1': 'lists@AE_CM_SEV'},
seed=42
)
print(df)
adverse_event medication severity
0 Headache Aspirin mild
1 Headache Aspirin mild
2 Headache Ibuprofen mild
3 Nausea Ondansetron severe
4 Headache Aspirin moderate
5 Headache Ibuprofen mild
6 Nausea Ondansetron severe
7 Headache Aspirin mild
8 Headache Ibuprofen moderate
9 Nausea Ondansetron severe
['Column_Names:[col1,col2,col3]'] - Defines column names[primary, [values1], [values2]] - Primary key with related value lists'@new') only.
synthetic() call.
seed parameter to get consistent results across runs.
# Simple increment
df = add.synthetic('@new', strategy={'id': 'increment:start=1'})
# Multiple strategies
df = add.synthetic('@new', n_rows=10, strategy={
'id': 'increment:start=1',
'age': 'range:18-65',
'status': 'choice:[active,inactive]'
}, seed=42)
# Linked lists
MY_LIST = [
['Column_Names:[col1,col2,col3]'],
['A', ['B'], ['C']]
]
df = add.synthetic('@new', n_rows=5, strategy={'col1': 'lists@MY_LIST'}, seed=42)