scitex_repro
scitex-repro — Reproducibility utilities for scientific computing.
Provides tools for reproducible scientific computing: - Random state management (RandomStateManager) - ID generation (gen_ID) - Timestamp generation (gen_timestamp) - Array hashing (hash_array)
- scitex_repro.gen_ID(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8, *, now_fn=None)
Generate a unique identifier with timestamp and random characters.
Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.
- Parameters:
time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.
N (int, optional) – Number of random characters to append. Default is 8.
now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.
- Returns:
Unique identifier in format “{timestamp}_{random_chars}”
- Return type:
Examples
>>> id1 = gen_id() >>> print(id1) '2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4) >>> print(id2) '20250531_xY9a'
>>> # For experiment tracking >>> exp_id = gen_id() >>> save_path = f"results/experiment_{exp_id}.pkl"
- scitex_repro.gen_id(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8, *, now_fn=None)[source]
Generate a unique identifier with timestamp and random characters.
Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.
- Parameters:
time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.
N (int, optional) – Number of random characters to append. Default is 8.
now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.
- Returns:
Unique identifier in format “{timestamp}_{random_chars}”
- Return type:
Examples
>>> id1 = gen_id() >>> print(id1) '2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4) >>> print(id2) '20250531_xY9a'
>>> # For experiment tracking >>> exp_id = gen_id() >>> save_path = f"results/experiment_{exp_id}.pkl"
- scitex_repro.gen_timestamp(*, now_fn=None)[source]
Generate a timestamp string for file naming.
Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.
- Parameters:
now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.
- Returns:
Timestamp string in format “YYYY-MMDD-HHMM”
- Return type:
Examples
>>> timestamp = gen_timestamp() >>> print(timestamp) '2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv" >>> print(filename) 'experiment_2025-0531-1230.csv'
- scitex_repro.timestamp(*, now_fn=None)
Generate a timestamp string for file naming.
Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.
- Parameters:
now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.
- Returns:
Timestamp string in format “YYYY-MMDD-HHMM”
- Return type:
Examples
>>> timestamp = gen_timestamp() >>> print(timestamp) '2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv" >>> print(filename) 'experiment_2025-0531-1230.csv'
- scitex_repro.hash_array(array_data)[source]
Generate hash for array data.
Creates a deterministic hash for numpy arrays, useful for verifying data integrity and reproducibility.
- Parameters:
array_data (np.ndarray) – Array to hash
- Returns:
16-character hash string
- Return type:
Examples
>>> import numpy as np >>> data = np.array([1, 2, 3, 4, 5]) >>> hash1 = hash_array(data) >>> hash2 = hash_array(data) >>> hash1 == hash2 True
- class scitex_repro.RandomStateManager(seed=42, verbose=False)[source]
Bases:
objectSimple, robust random state manager for scientific computing.
Examples
>>> from scitex_repro import RandomStateManager >>> >>> # Method 1: Direct usage >>> rng = RandomStateManager(seed=42) >>> data = rng("data").random(100) >>> >>> # Verify reproducibility >>> rng.verify(data, "my_data")
- get_np_generator(name)[source]
Get or create a named NumPy random generator.
- Parameters:
name (str) – Generator name (e.g., “data”, “model”, “augment”)
- Returns:
Independent NumPy random generator
- Return type:
numpy.random.Generator
Examples
>>> rng = RandomStateManager(42) >>> gen = rng.get_np_generator("data") >>> values = gen.random(100) >>> perm = gen.permutation(100)
- __call__(name, verbose=None)[source]
Get or create a named NumPy random generator.
This is a backward compatibility wrapper for get_np_generator(). Consider using get_np_generator() directly for clarity.
- verify(obj, name=None, verbose=True)[source]
Verify object matches cached hash (detects broken reproducibility).
First call: caches the object’s hash Later calls: verifies object matches cached hash
- Parameters:
obj (Any) – Object to verify (array, tensor, data, model weights, etc.) Supports: numpy arrays, torch tensors, tf tensors, jax arrays, lists, dicts, pandas dataframes, and basic types
name (str, optional) – Cache name. Auto-generated if not provided.
- Returns:
True if matches cache (or first call), False if different
- Return type:
Examples
>>> data = generate_data() >>> rng.verify(data, "train_data") # First run: caches >>> # Next run: >>> rng.verify(data, "train_data") # Verifies match
- _compute_hash(obj)[source]
Compute hash for various object types.
Supports: - NumPy arrays - PyTorch tensors - TensorFlow tensors - JAX arrays - Pandas DataFrames/Series - Lists, tuples, dicts - Basic types (int, float, str, bool)
- Return type:
- get_sklearn_random_state(name)[source]
Get a random state for scikit-learn.
Scikit-learn uses integers for random_state parameter.
Examples
>>> rng = RandomStateManager(42) >>> from sklearn.model_selection import train_test_split >>> X_train, X_test = train_test_split( ... X, test_size=0.2, ... random_state=rng.get_sklearn_random_state("split") ... )
- get_torch_generator(name)[source]
Get or create a named PyTorch generator.
- Parameters:
name (str) – Generator name
- Returns:
PyTorch generator with deterministic seed
- Return type:
torch.Generator
Examples
>>> rng = RandomStateManager(42) >>> gen = rng.get_torch_generator("model") >>> torch.randn(5, 5, generator=gen)
- scitex_repro.get(verbose=False)[source]
Get or create the global RandomStateManager instance.
- Parameters:
verbose (bool, optional) – Whether to print status messages (default: False)
- Returns:
Global instance
- Return type:
Examples
>>> from scitex_repro import get >>> rng = get() >>> data = rng("data").random(100)