Metadata-Version: 2.3
Name: spacecadet
Version: 0.3.0
Summary: A parallel execution engine that doesn't know anything about serialization
Author: Akshay Gupta
Author-email: Akshay Gupta <akgcodes@gmail.com>
License: MIT license
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Programming Language :: Python :: Free Threading :: 2 - Beta
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development
Requires-Dist: commitizen ; extra == 'build'
Requires-Dist: uv ; extra == 'build'
Requires-Dist: spacecadet[build] ; extra == 'dev'
Requires-Dist: spacecadet[docs] ; extra == 'dev'
Requires-Dist: spacecadet[lazyscribe] ; extra == 'dev'
Requires-Dist: spacecadet[qa] ; extra == 'dev'
Requires-Dist: spacecadet[tests] ; extra == 'dev'
Requires-Dist: sphinx ; extra == 'docs'
Requires-Dist: sphinx-gallery ; extra == 'docs'
Requires-Dist: sphinx-inline-tabs ; extra == 'docs'
Requires-Dist: lazyscribe>=2.1.0,<3.0.0 ; extra == 'lazyscribe'
Requires-Dist: pre-commit ; extra == 'qa'
Requires-Dist: pre-commit-uv ; extra == 'qa'
Requires-Dist: pyproject-fmt ; extra == 'qa'
Requires-Dist: ruff ; extra == 'qa'
Requires-Dist: ty ; extra == 'qa'
Requires-Dist: pytest ; extra == 'tests'
Requires-Dist: pytest-cov ; extra == 'tests'
Requires-Python: >=3.10.0
Project-URL: Repository, https://codeberg.org/akgcodes/spacecadet/
Provides-Extra: build
Provides-Extra: dev
Provides-Extra: docs
Provides-Extra: lazyscribe
Provides-Extra: qa
Provides-Extra: tests
Description-Content-Type: text/markdown

[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
![PyPI](https://img.shields.io/pypi/v/spacecadet)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/spacecadet)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![ty](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ty/main/assets/badge/v0.json)](https://github.com/astral-sh/ty)

# Multi-core parallelism in pure python

`spacecadet` is a library designed to enable parallel code execution where each task
can use multiple cores. Additionally, instead of defining custom serialization methods,
`spacecadet` leverages `lazyscribe` to track and manage artifacts.

## Multithreaded execution

### Variable substitution via `lazyscribe`

To use this functionality, please install the `lazyscribe` extra:

```bash
uv pip install spacecadet[lazyscribe]
```

Suppose you are building a model. Every experiment is tracked via `lazyscribe`

```python
from lazyscribe import Project

project = Project("project.json", mode="w")
with project.log("my-experiment") as exp:
    exp.log_artifact(name="features", value=["a", "b", "c"], handler="json")
    ...

project.save()
```

Suppose you have the following function to report the outcome of this experiment:

```python
import logging

LOG = logging.getLogger(__name__)

def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )
```

We can use `spacecadet` to directly connect the artifact from our experiment to the
function.

```python
from spacecadet.threading import cadet

project = Project("project.json", mode="r")


@cadet(source=project["my-experiment"])
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )
```

this decorator converts `model_report` to a lightly customized extension of `threading.Thread`.
Let's execute this function:

```python
thread = model_report("My model name", "features")
thread.start()
thread.join()
```

The logging will show that we have replaced the literal string `"features"` with the value
`["a", "b", "c"]` from the _artifact_ with the name "features" in our experiment. If you're
shipping more _application_-side code that needs to be flexible to the source `lazyscribe`
experiment or repository, you can also define the source later:

```python
# Application-side code

@cadet
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

# User-side code
thread = model_report("My model name", "features")
thread.options(project["my-experiment"])
```

### Managing thread allocation

Additionally, you can use `spacecadet` to run functions in threads that require multiple resources
themselves. Suppose you have a 12-core system. If each of your functions consumes 4 cores themselves,
you want to limit the number of concurrent functions that are running to reduce thread contention.
With `spacecadet`, we use a `semaphore`-like object to acquire and release multiple threads.

```python
@cadet(num_threads=4)
def model_report(model_name: str, features: list[str]):
    """Print the features used by the model.

    Parameters
    ----------
    model_name : str
        The name of the model.
    features : list[str]
        A list of features used by the model.
    """
    LOG.info(
        "Model %s uses the following features: %s", model_name, ", ".join(features)
    )

thread = model_report("My model name", "features")
thread.options(source=project["my-experiment"])
thread.start()
thread.join()
```

By default, `spacecadet` will use `os.cpu_count` to detect the number of cores on your machine.
This value will represent the total number of threads available. When
`spacecadet.threading.CadetThread.start` is called, we will "acquire" 4 threads from the available
pool. If you don't want to use `os.cpu_count` to determine the total number of threads,
you have a few options.

1. Specify the number of available threads through the environment variable `SPACECADET_MAX_THREADS`

    ```bash
    export SPACECADET_MAX_THREADS=12
    ```

2. Use a context manager to temporarily set the number of threads:

    ```python
    from spacecadet.semaphore import ThreadedSemaphore

    with ThreadedSemaphore(12):
        ...

    # outside of the context handler, os.cpu_count will be used again
    ```

**NOTE**: we have designed `spacecadet.semaphore.ThreadedSemaphore` as a [singleton object](https://refactoring.guru/design-patterns/singleton).
This means that, no matter how many times you instantiate the class, the parameters used in the first instantiation of the
semaphore will be used until all references to that instance have been deleted:

```python
>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> alloc = ThreadedSemaphore(1)
>>> print(alloc)
Allocator: 12 available threads
```

Deleting the instance will allow you to change the total available threads:

```python
>>> from spacecadet.semaphore import ThreadedSemaphore
>>> alloc = ThreadedSemaphore(12)
>>> print(alloc)
Allocator: 12 available threads
>>> del alloc
>>> alloc = ThreadedSemaphore(1)
Allocator: 1 available threads
```
