scitex_io

scitex-io: Universal scientific data I/O with plugin registry.

Supports 30+ formats out of the box. Register custom handlers via:

from scitex_io import register_saver, register_loader

@register_saver(".myformat")
def save_myformat(obj, path, **kw): ...

@register_loader(".myformat")
def load_myformat(path, **kw): ...

Top-level imports are PEP 562 lazy — import scitex_io is cheap. Public symbols load on first attribute access. See _skills/general/03_interface_01_python-api/04_lazy-imports-and-optional-deps.md.

scitex_io.register_saver(ext, fn=None, *, builtin=False)[source]

Register a save handler for a file extension.

Can be used as a decorator or called directly:

@register_saver(".json")
def my_json_saver(obj, path, **kwargs): ...

register_saver(".json", my_json_saver)
Parameters:
  • ext (str) – File extension (e.g., “.json”, “json” — dot is optional).

  • fn (Callable, optional) – Handler function (obj, path, **kwargs) -> None. If None, returns a decorator.

  • builtin (bool) – If True, registers as built-in (lower priority). User registrations always override built-ins.

scitex_io.register_loader(ext, fn=None, *, builtin=False)[source]

Register a load handler for a file extension.

Same API as register_saver().

Parameters:
  • ext (str) – File extension (e.g., “.json”, “json” — dot is optional).

  • fn (Callable, optional) – Handler function (path, **kwargs) -> Any.

  • builtin (bool) – If True, registers as built-in (lower priority).

scitex_io.get_saver(ext)[source]

Look up a save handler. User overrides take priority.

Return type:

Optional[Callable]

scitex_io.get_loader(ext)[source]

Look up a load handler. User overrides take priority.

Return type:

Optional[Callable]

scitex_io.list_formats()[source]

List all registered formats.

Returns:

``{“save”: {“builtin”: […], “user”: […]},

”load”: {“builtin”: […], “user”: […]}}``

Return type:

dict

scitex_io.unregister_saver(ext)[source]

Remove a user-registered saver. Returns True if found.

Return type:

bool

scitex_io.unregister_loader(ext)[source]

Remove a user-registered loader. Returns True if found.

Return type:

bool

scitex_io.load(lpath, ext=None, show=False, verbose=False, cache=True, **kwargs)[source]

Load data from various file formats.

This function supports loading data from multiple file formats with optional caching.

Parameters:
  • lpath (Union[str, Path]) – The path to the file to be loaded. Can be a string or pathlib.Path object.

  • ext (str, optional) – File extension to use for loading. If None, automatically detects from filename. Useful for files without extensions (e.g., UUID-named files). Examples: ‘pdf’, ‘json’, ‘csv’

  • show (bool, optional) – If True, display additional information during loading. Default is False.

  • verbose (bool, optional) – If True, print verbose output during loading. Default is False.

  • cache (bool, optional) – If True, enable caching for faster repeated loads. Default is True.

  • **kwargs (dict) – Additional keyword arguments to be passed to the specific loading function.

Returns:

The loaded data object, which can be of various types depending on the input file format.

Return type:

object

Raises:
  • ValueError – If the file extension is not supported.

  • FileNotFoundError – If the specified file does not exist.

  • Supported Extensions

  • -------------------

  • - Data formats – .csv, .tsv, .xls, .xlsx, .xlsm, .xlsb, .json, .yaml, .yml:

  • - Scientific – .npy, .npz, .mat, .hdf5, .con:

  • - ML/DL – .pth, .pt, .cbm, .joblib, .pkl:

  • - Documents – .txt, .log, .event, .md, .docx, .pdf, .xml:

  • - Images – .jpg, .png, .tiff, .tif:

  • - EEG data – .vhdr, .vmrk, .edf, .bdf, .gdf, .cnt, .egi, .eeg, .set:

  • - Database – .db:

Examples

>>> data = load('data.csv')
>>> image = load('image.png')
>>> model = load('model.pth')
>>> # Load file without extension (e.g., UUID PDF)
>>> pdf = load('f2694ccb-1b6f-4994-add8-5111fd4d52f1', ext='pdf')
scitex_io.load_configs(IS_DEBUG=None, show=False, verbose=False, config_dir=None)[source]

Load YAML configuration files from specified directory.

Parameters:
  • IS_DEBUG (bool, optional) – Debug mode flag. If None, reads from IS_DEBUG.yaml

  • show (bool) – Show configuration changes

  • verbose (bool) – Print detailed information

  • config_dir (Union[str, Path], optional) – Directory containing configuration files. Can be a string or pathlib.Path object. Defaults to “./config” if None

Returns:

Merged configuration dictionary

Return type:

DotDict

scitex_io.glob(expression, parse=False, ensure_one=False)[source]

Perform a glob operation with natural sorting and extended pattern support.

This function extends the standard glob functionality by adding natural sorting and support for curly brace expansion in the glob pattern.

Parameters:

expressionUnion[str, Path]

The glob pattern to match against file paths. Can be a string or pathlib.Path object. Supports standard glob syntax and curly brace expansion (e.g., ‘dir/{a,b}/*.txt’).

parsebool, optional

Whether to parse the matched paths. Default is False.

ensure_onebool, optional

Ensure exactly one match is found. Default is False.

Returns:

: Union[List[str], Tuple[List[str], List[dict]]]

If parse=False: A naturally sorted list of file paths If parse=True: Tuple of (paths, parsed results)

Examples:

>>> glob('data/*.txt')
['data/file1.txt', 'data/file2.txt', 'data/file10.txt']
>>> glob('data/{a,b}/*.txt')
['data/a/file1.txt', 'data/a/file2.txt', 'data/b/file1.txt']
>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True)
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]
>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True, ensure_one=True)
AssertionError  # if more than one file matches
scitex_io.parse_glob(expression, ensure_one=False)[source]

Convenience function for glob with parsing enabled.

Parameters:

expressionUnion[str, Path]

The glob pattern to match against file paths. Can be a string or pathlib.Path object.

ensure_onebool, optional

Ensure exactly one match is found. Default is False.

Returns:

: Tuple[List[str], List[dict]]

Matched paths and parsed results.

Examples:

>>> paths, parsed = pglob('data/subj_{id}/run_{run}.txt')
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]
>>> paths, parsed = pglob('data/subj_{id}/run_{run}.txt', ensure_one=True)
AssertionError  # if more than one file matches
scitex_io.reload(module_or_func, verbose=False)[source]

Reload a module or the module containing a given function.

This function attempts to reload a module directly if a module is passed, or reloads the module containing the function if a function is passed. This is useful during development to reflect changes without restarting the Python interpreter.

Parameters:

module_or_funcmodule or function

The module to reload, or a function whose containing module should be reloaded.

verbosebool, optional

If True, print additional information during the reload process. Default is False.

Returns:

: None

Raises:

Exception

If the module cannot be found or if there’s an error during the reload process.

Notes:

  • Reloading modules can have unexpected side effects, especially for modules that maintain state or have complex imports. Use with caution.

  • This function modifies sys.modules, which affects the global state of the Python interpreter.

Examples:

>>> import my_module
>>> reload(my_module)
>>> from my_module import my_function
>>> reload(my_function)
scitex_io.flush(sys=<module 'sys' (built-in)>)[source]

Flushes the system’s stdout and stderr, and syncs the file system. This ensures all pending write operations are completed.

scitex_io.cache(id, *args)[source]

Store or fetch data using a pickle file.

This function provides a simple caching mechanism for storing and retrieving Python objects. It uses pickle to serialize the data and stores it in a file with a unique identifier. If the data is already cached, it can be retrieved without recomputation.

Parameters:

idstr

A unique identifier for the cache file.

*argsstr

Variable names to be cached or loaded.

Returns:

: tuple

A tuple of cached values corresponding to the input variable names.

Raises:

ValueError

If the cache file is not found and not all variables are defined.

Example:

>>> import scitex
>>> import numpy as np
>>>
>>> # Variables to cache
>>> var1 = "x"
>>> var2 = 1
>>> var3 = np.ones(10)
>>>
>>> # Saving
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)
>>>
>>> # Loading when not all variables are defined and the id exists
>>> del var1, var2, var3
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)
scitex_io.configure_cache(enabled=None, max_size=None, verbose=None)[source]

Configure cache settings.

Parameters:
  • enabled (Optional[bool]) – Enable or disable caching

  • max_size (Optional[int]) – Maximum number of files to cache

  • verbose (Optional[bool]) – Enable verbose logging

Return type:

None

scitex_io.get_cache_info()[source]

Get cache statistics and configuration.

Returns:

Cache information including stats and config

Return type:

Dict[str, Any]

scitex_io.clear_load_cache()

Clear all cached data.

Return type:

None

class scitex_io.DotDict(dictionary=None)[source]

Bases: object

A dictionary-like object that allows attribute-like access (for valid identifier keys) and standard item access for all keys (including integers, etc.).

__init__(dictionary=None)[source]
get(key, default=None)[source]
to_dict(include_private=False)[source]

Recursively convert to plain dict.

keys()[source]
values()[source]
items()[source]
update(dictionary)[source]
setdefault(key, default=None)[source]
pop(key, *args)[source]
copy()[source]
scitex_io.save_image(obj, spath, **kwargs)[source]
scitex_io.save_text(obj, spath)

Save text content to a file.

Parameters:
  • obj (str) – The text content to save.

  • spath (str) – Path where the text file will be saved.

Return type:

None

scitex_io.save_mp4(fig, spath_mp4)
scitex_io.save_listed_dfs_as_csv(listed_dfs, spath_csv, indi_suffix=None, overwrite=False, verbose=False)
listed_dfs:

[df1, df2, df3, …, dfN]. They will be written vertically in the order.

spath_csv:

/hoge/fuga/foo.csv

indi_suffix:

At the left top cell on the output csv file, ‘{}’.format(indi_suffix[i]) will be added, where i is the index of the df.On the other hand, when indi_suffix=None is passed, only ‘{}’.format(i) will be added.

scitex_io.save_listed_scalars_as_csv(listed_scalars, spath_csv, column_name='_', indi_suffix=None, round=3, overwrite=False, verbose=False)

Puts to df and save it as csv

scitex_io.save_optuna_study_as_csv_and_pngs(study, sdir)[source]
scitex_io.json2md(obj, level=1)[source]
scitex_io.embed_metadata(image_path, metadata)[source]

Embed metadata into an existing image or PDF file.

Parameters:
  • image_path (str) – Path to the image/PDF file (PNG, JPEG, SVG, or PDF)

  • metadata (Dict[str, Any]) – Dictionary containing metadata (must be JSON serializable)

Raises:
Return type:

None

Example

>>> metadata = {
...     'experiment': 'seizure_prediction_001',
...     'session': '2024-11-14',
...     'analysis': 'PAC'
... }
>>> embed_metadata('result.png', metadata)
>>> embed_metadata('result.pdf', metadata)
scitex_io.read_metadata(image_path)[source]

Read metadata from an image or PDF file.

Parameters:

image_path (str) – Path to the file (PNG, JPEG, SVG, or PDF)

Return type:

Optional[Dict[str, Any]]

Returns:

Dictionary containing metadata, or None if no metadata found

Raises:

Example

>>> metadata = read_metadata('result.png')
>>> print(metadata['experiment'])
'seizure_prediction_001'
>>> metadata = read_metadata('result.pdf')
scitex_io.has_metadata(image_path)[source]

Check if an image file has embedded metadata.

Parameters:

image_path (str) – Path to the image file

Return type:

bool

Returns:

True if metadata exists, False otherwise

Example

>>> if has_metadata('result.png'):
...     print(read_metadata('result.png'))

Core I/O

scitex_io.load(lpath, ext=None, show=False, verbose=False, cache=True, **kwargs)[source]

Load data from various file formats.

This function supports loading data from multiple file formats with optional caching.

Parameters:
  • lpath (Union[str, Path]) – The path to the file to be loaded. Can be a string or pathlib.Path object.

  • ext (str, optional) – File extension to use for loading. If None, automatically detects from filename. Useful for files without extensions (e.g., UUID-named files). Examples: ‘pdf’, ‘json’, ‘csv’

  • show (bool, optional) – If True, display additional information during loading. Default is False.

  • verbose (bool, optional) – If True, print verbose output during loading. Default is False.

  • cache (bool, optional) – If True, enable caching for faster repeated loads. Default is True.

  • **kwargs (dict) – Additional keyword arguments to be passed to the specific loading function.

Returns:

The loaded data object, which can be of various types depending on the input file format.

Return type:

object

Raises:
  • ValueError – If the file extension is not supported.

  • FileNotFoundError – If the specified file does not exist.

  • Supported Extensions

  • -------------------

  • - Data formats – .csv, .tsv, .xls, .xlsx, .xlsm, .xlsb, .json, .yaml, .yml:

  • - Scientific – .npy, .npz, .mat, .hdf5, .con:

  • - ML/DL – .pth, .pt, .cbm, .joblib, .pkl:

  • - Documents – .txt, .log, .event, .md, .docx, .pdf, .xml:

  • - Images – .jpg, .png, .tiff, .tif:

  • - EEG data – .vhdr, .vmrk, .edf, .bdf, .gdf, .cnt, .egi, .eeg, .set:

  • - Database – .db:

Examples

>>> data = load('data.csv')
>>> image = load('image.png')
>>> model = load('model.pth')
>>> # Load file without extension (e.g., UUID PDF)
>>> pdf = load('f2694ccb-1b6f-4994-add8-5111fd4d52f1', ext='pdf')
scitex_io.load_configs(IS_DEBUG=None, show=False, verbose=False, config_dir=None)[source]

Load YAML configuration files from specified directory.

Parameters:
  • IS_DEBUG (bool, optional) – Debug mode flag. If None, reads from IS_DEBUG.yaml

  • show (bool) – Show configuration changes

  • verbose (bool) – Print detailed information

  • config_dir (Union[str, Path], optional) – Directory containing configuration files. Can be a string or pathlib.Path object. Defaults to “./config” if None

Returns:

Merged configuration dictionary

Return type:

DotDict

scitex_io.glob(expression, parse=False, ensure_one=False)[source]

Perform a glob operation with natural sorting and extended pattern support.

This function extends the standard glob functionality by adding natural sorting and support for curly brace expansion in the glob pattern.

Parameters:

expressionUnion[str, Path]

The glob pattern to match against file paths. Can be a string or pathlib.Path object. Supports standard glob syntax and curly brace expansion (e.g., ‘dir/{a,b}/*.txt’).

parsebool, optional

Whether to parse the matched paths. Default is False.

ensure_onebool, optional

Ensure exactly one match is found. Default is False.

Returns:

: Union[List[str], Tuple[List[str], List[dict]]]

If parse=False: A naturally sorted list of file paths If parse=True: Tuple of (paths, parsed results)

Examples:

>>> glob('data/*.txt')
['data/file1.txt', 'data/file2.txt', 'data/file10.txt']
>>> glob('data/{a,b}/*.txt')
['data/a/file1.txt', 'data/a/file2.txt', 'data/b/file1.txt']
>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True)
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]
>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True, ensure_one=True)
AssertionError  # if more than one file matches
scitex_io.reload(module_or_func, verbose=False)[source]

Reload a module or the module containing a given function.

This function attempts to reload a module directly if a module is passed, or reloads the module containing the function if a function is passed. This is useful during development to reflect changes without restarting the Python interpreter.

Parameters:

module_or_funcmodule or function

The module to reload, or a function whose containing module should be reloaded.

verbosebool, optional

If True, print additional information during the reload process. Default is False.

Returns:

: None

Raises:

Exception

If the module cannot be found or if there’s an error during the reload process.

Notes:

  • Reloading modules can have unexpected side effects, especially for modules that maintain state or have complex imports. Use with caution.

  • This function modifies sys.modules, which affects the global state of the Python interpreter.

Examples:

>>> import my_module
>>> reload(my_module)
>>> from my_module import my_function
>>> reload(my_function)
scitex_io.flush(sys=<module 'sys' (built-in)>)[source]

Flushes the system’s stdout and stderr, and syncs the file system. This ensures all pending write operations are completed.

scitex_io.cache(id, *args)[source]

Store or fetch data using a pickle file.

This function provides a simple caching mechanism for storing and retrieving Python objects. It uses pickle to serialize the data and stores it in a file with a unique identifier. If the data is already cached, it can be retrieved without recomputation.

Parameters:

idstr

A unique identifier for the cache file.

*argsstr

Variable names to be cached or loaded.

Returns:

: tuple

A tuple of cached values corresponding to the input variable names.

Raises:

ValueError

If the cache file is not found and not all variables are defined.

Example:

>>> import scitex
>>> import numpy as np
>>>
>>> # Variables to cache
>>> var1 = "x"
>>> var2 = 1
>>> var3 = np.ones(10)
>>>
>>> # Saving
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)
>>>
>>> # Loading when not all variables are defined and the id exists
>>> del var1, var2, var3
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)

Registry

scitex_io.register_saver(ext, fn=None, *, builtin=False)[source]

Register a save handler for a file extension.

Can be used as a decorator or called directly:

@register_saver(".json")
def my_json_saver(obj, path, **kwargs): ...

register_saver(".json", my_json_saver)
Parameters:
  • ext (str) – File extension (e.g., “.json”, “json” — dot is optional).

  • fn (Callable, optional) – Handler function (obj, path, **kwargs) -> None. If None, returns a decorator.

  • builtin (bool) – If True, registers as built-in (lower priority). User registrations always override built-ins.

scitex_io.register_loader(ext, fn=None, *, builtin=False)[source]

Register a load handler for a file extension.

Same API as register_saver().

Parameters:
  • ext (str) – File extension (e.g., “.json”, “json” — dot is optional).

  • fn (Callable, optional) – Handler function (path, **kwargs) -> Any.

  • builtin (bool) – If True, registers as built-in (lower priority).

scitex_io.get_saver(ext)[source]

Look up a save handler. User overrides take priority.

Return type:

Optional[Callable]

scitex_io.get_loader(ext)[source]

Look up a load handler. User overrides take priority.

Return type:

Optional[Callable]

scitex_io.list_formats()[source]

List all registered formats.

Returns:

``{“save”: {“builtin”: […], “user”: […]},

”load”: {“builtin”: […], “user”: […]}}``

Return type:

dict

scitex_io.unregister_saver(ext)[source]

Remove a user-registered saver. Returns True if found.

Return type:

bool

scitex_io.unregister_loader(ext)[source]

Remove a user-registered loader. Returns True if found.

Return type:

bool

Cache Control

scitex_io.get_cache_info()[source]

Get cache statistics and configuration.

Returns:

Cache information including stats and config

Return type:

Dict[str, Any]

scitex_io.configure_cache(enabled=None, max_size=None, verbose=None)[source]

Configure cache settings.

Parameters:
  • enabled (Optional[bool]) – Enable or disable caching

  • max_size (Optional[int]) – Maximum number of files to cache

  • verbose (Optional[bool]) – Enable verbose logging

Return type:

None

scitex_io.clear_load_cache()

Clear all cached data.

Return type:

None

Dict Utilities

class scitex_io.DotDict(dictionary=None)[source]

A dictionary-like object that allows attribute-like access (for valid identifier keys) and standard item access for all keys (including integers, etc.).

__init__(dictionary=None)[source]
get(key, default=None)[source]
to_dict(include_private=False)[source]

Recursively convert to plain dict.

keys()[source]
values()[source]
items()[source]
update(dictionary)[source]
setdefault(key, default=None)[source]
pop(key, *args)[source]
copy()[source]

Metadata

scitex_io.embed_metadata(image_path, metadata)[source]

Embed metadata into an existing image or PDF file.

Parameters:
  • image_path (str) – Path to the image/PDF file (PNG, JPEG, SVG, or PDF)

  • metadata (Dict[str, Any]) – Dictionary containing metadata (must be JSON serializable)

Raises:
Return type:

None

Example

>>> metadata = {
...     'experiment': 'seizure_prediction_001',
...     'session': '2024-11-14',
...     'analysis': 'PAC'
... }
>>> embed_metadata('result.png', metadata)
>>> embed_metadata('result.pdf', metadata)
scitex_io.read_metadata(image_path)[source]

Read metadata from an image or PDF file.

Parameters:

image_path (str) – Path to the file (PNG, JPEG, SVG, or PDF)

Return type:

Optional[Dict[str, Any]]

Returns:

Dictionary containing metadata, or None if no metadata found

Raises:

Example

>>> metadata = read_metadata('result.png')
>>> print(metadata['experiment'])
'seizure_prediction_001'
>>> metadata = read_metadata('result.pdf')
scitex_io.has_metadata(image_path)[source]

Check if an image file has embedded metadata.

Parameters:

image_path (str) – Path to the image file

Return type:

bool

Returns:

True if metadata exists, False otherwise

Example

>>> if has_metadata('result.png'):
...     print(read_metadata('result.png'))

Explorers

scitex_io.H5Explorer

alias of None

scitex_io.ZarrExplorer

alias of None