scitex_core.repro

Reproducibility utilities for scientific computing.

This module provides tools for ensuring reproducible scientific experiments: - Unique ID generation (gen_id) - Timestamp generation (gen_timestamp) - Array hashing for verification (hash_array) - Random state management across libraries (RandomStateManager)

scitex_core.repro.gen_id(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8)[source]

Generate a unique identifier with timestamp and random characters.

Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.

Parameters:
  • time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.

  • N (int, optional) – Number of random characters to append. Default is 8.

Returns:

Unique identifier in format “{timestamp}_{random_chars}”

Return type:

str

Examples

>>> from scitex_core.repro import gen_id
>>> id1 = gen_id()
>>> print(id1)
'2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4)
>>> print(id2)
'20250531_xY9a'
>>> # For experiment tracking
>>> exp_id = gen_id()
>>> save_path = f"results/experiment_{exp_id}.pkl"

Notes

  • Random component uses alphanumeric characters (a-z, A-Z, 0-9)

  • Same timestamp will produce different IDs due to random component

  • IDs are suitable for filesystem use (no special characters)

scitex_core.repro.gen_ID(time_format='%YY-%mM-%dD-%Hh%Mm%Ss', N=8)

Generate a unique identifier with timestamp and random characters.

Creates a unique ID by combining a formatted timestamp with random alphanumeric characters. Useful for creating unique experiment IDs, run identifiers, or temporary file names.

Parameters:
  • time_format (str, optional) – Format string for timestamp portion. Default is “%YY-%mM-%dD-%Hh%Mm%Ss” which produces “2025Y-05M-31D-12h30m45s” format.

  • N (int, optional) – Number of random characters to append. Default is 8.

Returns:

Unique identifier in format “{timestamp}_{random_chars}”

Return type:

str

Examples

>>> from scitex_core.repro import gen_id
>>> id1 = gen_id()
>>> print(id1)
'2025Y-05M-31D-12h30m45s_a3Bc9xY2'
>>> id2 = gen_id(time_format="%Y%m%d", N=4)
>>> print(id2)
'20250531_xY9a'
>>> # For experiment tracking
>>> exp_id = gen_id()
>>> save_path = f"results/experiment_{exp_id}.pkl"

Notes

  • Random component uses alphanumeric characters (a-z, A-Z, 0-9)

  • Same timestamp will produce different IDs due to random component

  • IDs are suitable for filesystem use (no special characters)

scitex_core.repro.gen_timestamp()[source]

Generate a timestamp string for file naming.

Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.

Returns:

Timestamp string in format “YYYY-MMDD-HHMM”

Return type:

str

Examples

>>> from scitex_core.repro import gen_timestamp
>>> timestamp = gen_timestamp()
>>> print(timestamp)
'2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv"
>>> print(filename)
'experiment_2025-0531-1230.csv'

Notes

  • Format: YYYY-MMDD-HHMM (e.g., “2025-0531-1230”)

  • Month and day are zero-padded to 2 digits

  • Hour and minute are zero-padded to 2 digits

  • Suitable for filesystem use (no special characters except hyphen)

scitex_core.repro.timestamp()

Generate a timestamp string for file naming.

Returns a timestamp in the format YYYY-MMDD-HHMM, suitable for creating unique filenames or version identifiers.

Returns:

Timestamp string in format “YYYY-MMDD-HHMM”

Return type:

str

Examples

>>> from scitex_core.repro import gen_timestamp
>>> timestamp = gen_timestamp()
>>> print(timestamp)
'2025-0531-1230'
>>> filename = f"experiment_{gen_timestamp()}.csv"
>>> print(filename)
'experiment_2025-0531-1230.csv'

Notes

  • Format: YYYY-MMDD-HHMM (e.g., “2025-0531-1230”)

  • Month and day are zero-padded to 2 digits

  • Hour and minute are zero-padded to 2 digits

  • Suitable for filesystem use (no special characters except hyphen)

scitex_core.repro.hash_array(array_data)[source]

Generate hash for array data.

Creates a deterministic hash for numpy arrays, useful for verifying data integrity and reproducibility.

Parameters:

array_data (np.ndarray) – Array to hash

Returns:

16-character hash string

Return type:

str

Examples

>>> import numpy as np
>>> from scitex_core.repro import hash_array
>>> data = np.array([1, 2, 3, 4, 5])
>>> hash1 = hash_array(data)
>>> hash2 = hash_array(data)
>>> hash1 == hash2
True
>>> # Different data produces different hash
>>> data2 = np.array([1, 2, 3, 4, 6])
>>> hash3 = hash_array(data2)
>>> hash1 != hash3
True

Notes

  • Uses SHA-256 hashing algorithm

  • Returns first 16 characters of hex digest

  • Same array will always produce same hash

  • Useful for detecting changes in data

class scitex_core.repro.RandomStateManager(seed=42, verbose=False)[source]

Bases: object

Simple, robust random state manager for scientific computing.

Provides centralized management of random number generators with deterministic seeding across multiple ML/scientific libraries.

Parameters:
  • seed (int, optional) – Master seed for all random number generators (default: 42)

  • verbose (bool, optional) – Print status messages (default: False)

Examples

>>> from scitex_core.repro import RandomStateManager
>>>
>>> # Direct usage
>>> rng_manager = RandomStateManager(seed=42)
>>> gen = rng_manager("data")
>>> data = gen.random(100)
>>>
>>> # Verify reproducibility
>>> rng_manager.verify(data, "my_data")
>>>
>>> # Named generators for different purposes
>>> data_gen = rng_manager("data")
>>> model_gen = rng_manager("model")
>>> augment_gen = rng_manager("augment")

Notes

  • Automatically detects and seeds available libraries (numpy, torch, tf, jax)

  • Creates independent named generators for different experiment components

  • Verification cache stored in ~/.scitex/rng/

__call__(name, verbose=None)[source]

Get or create a named NumPy random generator.

This is a convenience wrapper for get_np_generator().

Parameters:
  • name (str) – Generator name

  • verbose (bool, optional) – Whether to show deprecation warning

Returns:

NumPy random generator with deterministic seed

Return type:

numpy.random.Generator

__init__(seed=42, verbose=False)[source]

Initialize with automatic module detection.

_auto_fix_seeds(verbose=None)[source]

Automatically detect and fix ALL available random modules.

_compute_hash(obj)[source]

Compute hash for various object types.

Supports: - NumPy arrays - PyTorch tensors - TensorFlow tensors - JAX arrays - Pandas DataFrames/Series - Lists, tuples, dicts - Basic types (int, float, str, bool)

Return type:

str

checkpoint(name='checkpoint')[source]

Save current state of all generators.

Parameters:

name (str, optional) – Checkpoint name (default: “checkpoint”)

Returns:

Path to checkpoint file

Return type:

Path

clear_cache(patterns=None)[source]

Clear verification cache files.

Parameters:

patterns (str or list of str, optional) – Specific cache patterns to clear. If None, clears all. Can be: - Single name: “my_data” - List of names: [“data1”, “data2”] - Glob pattern: “experiment_*” - None: clear all cache files

Returns:

Number of cache files removed

Return type:

int

Examples

>>> rng_manager = RandomStateManager(42)
>>> rng_manager.clear_cache()  # Clear all
>>> rng_manager.clear_cache("old_data")  # Clear specific
>>> rng_manager.clear_cache(["test1", "test2"])  # Clear multiple
>>> rng_manager.clear_cache("experiment_*")  # Clear pattern
get_generator(name)[source]

Alias for get_np_generator for compatibility.

get_np_generator(name)[source]

Get or create a named NumPy random generator.

Parameters:

name (str) – Generator name (e.g., “data”, “model”, “augment”)

Returns:

Independent NumPy random generator

Return type:

numpy.random.Generator

Examples

>>> rng_manager = RandomStateManager(42)
>>> gen = rng_manager.get_np_generator("data")
>>> values = gen.random(100)
>>> perm = gen.permutation(100)
get_sklearn_random_state(name)[source]

Get a random state for scikit-learn.

Scikit-learn uses integers for random_state parameter.

Parameters:

name (str) – Generator name

Returns:

Random state integer for sklearn

Return type:

int

Examples

>>> rng_manager = RandomStateManager(42)
>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test = train_test_split(
...     X, test_size=0.2,
...     random_state=rng_manager.get_sklearn_random_state("split")
... )
get_torch_generator(name)[source]

Get or create a named PyTorch generator.

Parameters:

name (str) – Generator name

Returns:

PyTorch generator with deterministic seed

Return type:

torch.Generator

Examples

>>> rng_manager = RandomStateManager(42)
>>> gen = rng_manager.get_torch_generator("model")
>>> torch.randn(5, 5, generator=gen)
restore(checkpoint)[source]

Restore from checkpoint.

Parameters:

checkpoint (str or Path) – Path to checkpoint file

temporary_seed(seed)[source]

Context manager for temporary seed change.

Parameters:

seed (int) – Temporary seed value

Examples

>>> rng_manager = RandomStateManager(42)
>>> with rng_manager.temporary_seed(123):
...     data = np.random.random(10)
verify(obj, name=None, verbose=True)[source]

Verify object matches cached hash (detects broken reproducibility).

First call: caches the object’s hash Later calls: verifies object matches cached hash

Parameters:
  • obj (Any) – Object to verify (array, tensor, data, model weights, etc.) Supports: numpy arrays, torch tensors, tf tensors, jax arrays, lists, dicts, pandas dataframes, and basic types

  • name (str, optional) – Cache name. Auto-generated from caller location if not provided.

  • verbose (bool, optional) – Print verification results (default: True)

Returns:

True if matches cache (or first call), False if different

Return type:

bool

Raises:

ValueError – If verification fails (object doesn’t match cached hash)

Examples

>>> data = generate_data()
>>> rng_manager.verify(data, "train_data")  # First run: caches
>>> # Next run:
>>> rng_manager.verify(data, "train_data")  # Verifies match
scitex_core.repro.get(verbose=False)[source]

Get or create the global RandomStateManager instance.

Parameters:

verbose (bool, optional) – Whether to print status messages (default: False)

Returns:

Global instance

Return type:

RandomStateManager

Examples

>>> from scitex_core.repro import get
>>> rng_manager = get()
>>> data = rng_manager("data").random(100)
scitex_core.repro.reset(seed=42, verbose=False)[source]

Reset global RandomStateManager with new seed.

Parameters:
  • seed (int) – New seed value

  • verbose (bool, optional) – Whether to print status messages (default: False)

Returns:

New global instance

Return type:

RandomStateManager

Examples

>>> from scitex_core.repro import reset
>>> rng_manager = reset(seed=123)