Metadata-Version: 2.4
Name: trainlib
Version: 0.3.1
Summary: Minimal framework for ML modeling. Supports advanced dataset operations and streamlined training.
Author-email: Sam Griesemer <git@olog.io>
License-Expression: MIT
Project-URL: Homepage, https://doc.olog.io/trainlib
Project-URL: Documentation, https://doc.olog.io/trainlib
Project-URL: Repository, https://git.olog.io/olog/trainlib
Project-URL: Issues, https://git.olog.io/olog/trainlib/issues
Keywords: machine-learning
Classifier: Programming Language :: Python
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Requires-Python: >=3.13
Description-Content-Type: text/markdown
Requires-Dist: torch
Requires-Dist: colorama>=0.4.6
Requires-Dist: matplotlib>=3.10.8
Requires-Dist: numpy>=2.4.1
Requires-Dist: tensorboard>=2.20.0
Requires-Dist: tqdm>=4.67.1
Requires-Dist: setuptools<=81.0.0
Provides-Extra: dev
Requires-Dist: ipykernel; extra == "dev"
Provides-Extra: doc
Requires-Dist: furo; extra == "doc"
Requires-Dist: myst-parser; extra == "doc"
Requires-Dist: sphinx; extra == "doc"
Requires-Dist: sphinx-togglebutton; extra == "doc"
Requires-Dist: sphinx-autodoc-typehints; extra == "doc"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"

# Overview
Minimal framework for ML modeling, supporting advanced dataset operations and
streamlined training workflows.

# Install
The `trainlib` package can be installed from PyPI:

```sh
pip install trainlib
```

# Development
- Initialize/synchronize the project with `uv sync`, creating a virtual
  environment with base package dependencies.
- Depending on needs, install the development dependencies with `uv sync
  --extra dev`.

## Testing
- To run the unit tests, make sure to first have the test dependencies
  installed with `uv sync --extra test`, then run `make test`.
- For notebook testing, run `make install-kernel` to make the environment
  available as a Jupyter kernel (to be selected when running notebooks).

## Documentation
- Install the documentation dependencies with `uv sync --extra doc`.
- Run `make docs-build` (optionally preceded by `make docs-clean`), and serve
  locally with `make docs-serve`.

# Development remarks
- Across `Trainer` / `Estimator` / `Dataset`, I've considered a
  `ParamSpec`-based typing scheme to better orchestrate alignment in the
  `Trainer.train()` loop, e.g., so we can statically check whether a dataset
  appears to be fulfilling the argument requirements for the estimator's
  `loss()` / `metrics()`  methods. Something like

  ```py
  class Estimator[**P](nn.Module):
      def loss(
          self,
          input: Tensor,
          *args: P.args,
          **kwargs: P.kwargs,
      ) -> Generator:
          ...

  class Trainer[**P]:
      def __init__(
          self,
          estimator: Estimator[P],
          ...
      ): ...
  ```

  might be how we begin threading signatures. But ensuring dataset items can
  match `P` is challenging. You can consider a "packed" object where we
  obfuscate passing data through `P`-signatures:

  ```py
  class PackedItem[**P]:
      def __init__(self, *args: P.args, **kwargs: P.kwargs) -> None:
          self._args = args
          self._kwargs = kwargs
  
      def apply[R](self, func: Callable[P, R]) -> R:
          return func(*self._args, **self._kwargs)
  
  
  class BatchedDataset[U, R, I, **P](Dataset):
      @abstractmethod
      def _process_item_data(
          self,
          item_data: I,
          item_index: int,
      ) -> PackedItem[P]:
          ...
  
      def __iter__(self) -> Iterator[PackedItem[P]]:
          ...
  ```

  Meaningfully shaping those signatures is what remains, but you can't really
  do this, not with typical type expression flexibility. For instance, if I'm
  trying to appropriately type my base `TupleDataset`:

  ```py
  class SequenceDataset[I, **P](HomogenousDataset[int, I, I, P]):
      ...
  
  class TupleDataset[I](SequenceDataset[tuple[I, ...], "?"]):
      ...
  ```

  Here there's no way for me to shape a `ParamSpec` to indicate arbitrarily
  many arguments of a fixed type (`I` in this case) to allow me to unpack my
  item tuples into an appropriate `PackedItem`.

  Until this (among other issues) becomes clearer, I'm setting up around a
  simpler `TypedDict` type variable. We won't have particularly strong static
  checks for item alignment inside `Trainer`, but this seems about as good as I
  can get around the current infrastructure. 
