scitex_repro

scitex-repro — Reproducibility utilities for scientific computing.

Provides tools for reproducible scientific computing: - Random state management (RandomStateManager) - ID generation (gen_ID) - Timestamp generation (gen_timestamp) - Array hashing (hash_array)

scitex_repro.gen_ID(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8, *, now_fn=None)

Generate a unique identifier with timestamp and random characters.

Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.

Parameters:
  • time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.

  • N (int, optional) – Number of random characters to append. Default is 8.

  • now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.

Returns:

Unique identifier in format “{timestamp}_{random_chars}”

Return type:

str

Examples

>>> id1 = gen_id()
>>> print(id1)
'2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4)
>>> print(id2)
'20250531_xY9a'
>>> # For experiment tracking
>>> exp_id = gen_id()
>>> save_path = f"results/experiment_{exp_id}.pkl"
scitex_repro.gen_id(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8, *, now_fn=None)[source]

Generate a unique identifier with timestamp and random characters.

Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.

Parameters:
  • time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.

  • N (int, optional) – Number of random characters to append. Default is 8.

  • now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.

Returns:

Unique identifier in format “{timestamp}_{random_chars}”

Return type:

str

Examples

>>> id1 = gen_id()
>>> print(id1)
'2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4)
>>> print(id2)
'20250531_xY9a'
>>> # For experiment tracking
>>> exp_id = gen_id()
>>> save_path = f"results/experiment_{exp_id}.pkl"
scitex_repro.gen_timestamp(*, now_fn=None)[source]

Generate a timestamp string for file naming.

Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.

Parameters:

now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.

Returns:

Timestamp string in format “YYYY-MMDD-HHMM”

Return type:

str

Examples

>>> timestamp = gen_timestamp()
>>> print(timestamp)
'2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv"
>>> print(filename)
'experiment_2025-0531-1230.csv'
scitex_repro.timestamp(*, now_fn=None)

Generate a timestamp string for file naming.

Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.

Parameters:

now_fn (callable, optional) – Zero-argument callable returning a datetime-like object with .strftime(). Defaults to datetime.now. Injection point for deterministic tests — pass a fake that returns a fixed datetime instead of mocking datetime globally.

Returns:

Timestamp string in format “YYYY-MMDD-HHMM”

Return type:

str

Examples

>>> timestamp = gen_timestamp()
>>> print(timestamp)
'2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv"
>>> print(filename)
'experiment_2025-0531-1230.csv'
scitex_repro.hash_array(array_data)[source]

Generate hash for array data.

Creates a deterministic hash for numpy arrays, useful for verifying data integrity and reproducibility.

Parameters:

array_data (np.ndarray) – Array to hash

Returns:

16-character hash string

Return type:

str

Examples

>>> import numpy as np
>>> data = np.array([1, 2, 3, 4, 5])
>>> hash1 = hash_array(data)
>>> hash2 = hash_array(data)
>>> hash1 == hash2
True
class scitex_repro.RandomStateManager(seed=42, verbose=False)[source]

Bases: object

Simple, robust random state manager for scientific computing.

Examples

>>> from scitex_repro import RandomStateManager
>>>
>>> # Method 1: Direct usage
>>> rng = RandomStateManager(seed=42)
>>> data = rng("data").random(100)
>>>
>>> # Verify reproducibility
>>> rng.verify(data, "my_data")
__init__(seed=42, verbose=False)[source]

Initialize with automatic module detection.

_auto_fix_seeds(verbose=None)[source]

Automatically detect and fix ALL available random modules.

get_np_generator(name)[source]

Get or create a named NumPy random generator.

Parameters:

name (str) – Generator name (e.g., “data”, “model”, “augment”)

Returns:

Independent NumPy random generator

Return type:

numpy.random.Generator

Examples

>>> rng = RandomStateManager(42)
>>> gen = rng.get_np_generator("data")
>>> values = gen.random(100)
>>> perm = gen.permutation(100)
__call__(name, verbose=None)[source]

Get or create a named NumPy random generator.

This is a backward compatibility wrapper for get_np_generator(). Consider using get_np_generator() directly for clarity.

Parameters:
  • name (str) – Generator name

  • verbose (bool, optional) – Whether to show deprecation warning

Returns:

NumPy random generator with deterministic seed

Return type:

numpy.random.Generator

verify(obj, name=None, verbose=True)[source]

Verify object matches cached hash (detects broken reproducibility).

First call: caches the object’s hash Later calls: verifies object matches cached hash

Parameters:
  • obj (Any) – Object to verify (array, tensor, data, model weights, etc.) Supports: numpy arrays, torch tensors, tf tensors, jax arrays, lists, dicts, pandas dataframes, and basic types

  • name (str, optional) – Cache name. Auto-generated if not provided.

Returns:

True if matches cache (or first call), False if different

Return type:

bool

Examples

>>> data = generate_data()
>>> rng.verify(data, "train_data")  # First run: caches
>>> # Next run:
>>> rng.verify(data, "train_data")  # Verifies match
_compute_hash(obj)[source]

Compute hash for various object types.

Supports: - NumPy arrays - PyTorch tensors - TensorFlow tensors - JAX arrays - Pandas DataFrames/Series - Lists, tuples, dicts - Basic types (int, float, str, bool)

Return type:

str

checkpoint(name='checkpoint')[source]

Save current state of all generators.

restore(checkpoint)[source]

Restore from checkpoint.

temporary_seed(seed)[source]

Context manager for temporary seed change.

get_sklearn_random_state(name)[source]

Get a random state for scikit-learn.

Scikit-learn uses integers for random_state parameter.

Parameters:

name (str) – Generator name

Returns:

Random state integer for sklearn

Return type:

int

Examples

>>> rng = RandomStateManager(42)
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test = train_test_split(
...     X, test_size=0.2,
...     random_state=rng.get_sklearn_random_state("split")
... )
get_torch_generator(name)[source]

Get or create a named PyTorch generator.

Parameters:

name (str) – Generator name

Returns:

PyTorch generator with deterministic seed

Return type:

torch.Generator

Examples

>>> rng = RandomStateManager(42)
>>> gen = rng.get_torch_generator("model")
>>> torch.randn(5, 5, generator=gen)
get_generator(name)[source]

Alias for get_np_generator for compatibility.

clear_cache(patterns=None)[source]

Clear verification cache files.

Parameters:

patterns (str or list of str, optional) – Specific cache patterns to clear. If None, clears all.

Returns:

Number of cache files removed

Return type:

int

scitex_repro.get(verbose=False)[source]

Get or create the global RandomStateManager instance.

Parameters:

verbose (bool, optional) – Whether to print status messages (default: False)

Returns:

Global instance

Return type:

RandomStateManager

Examples

>>> from scitex_repro import get
>>> rng = get()
>>> data = rng("data").random(100)
scitex_repro.reset(seed=42, verbose=False)[source]

Reset global RandomStateManager with new seed.

Parameters:
  • seed (int) – New seed value

  • verbose (bool, optional) – Whether to print status messages (default: False)

Returns:

New global instance

Return type:

RandomStateManager

Examples

>>> from scitex_repro import reset
>>> rng = reset(seed=123)
scitex_repro.fix_seeds(seed=42, os=True, random=True, np=True, torch=True, tf=False, jax=False, verbose=False, **kwargs)[source]

Deprecated: Use RandomStateManager instead.

This function maintains backward compatibility with the old fix_seeds API.