Metadata-Version: 2.4
Name: aind-metadata-extractor
Version: 0.3.2
Summary: Generated from aind-library-template
Author: Allen Institute for Neural Dynamics
License: MIT
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic
Requires-Dist: pydantic-settings
Provides-Extra: dev
Requires-Dist: black; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: interrogate; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: Sphinx; extra == "dev"
Requires-Dist: furo; extra == "dev"
Provides-Extra: smartspim
Requires-Dist: requests; extra == "smartspim"
Provides-Extra: bergamo
Requires-Dist: scanimage-tiff-reader==1.4.1.4; extra == "bergamo"
Requires-Dist: numpy==1.26.4; extra == "bergamo"
Provides-Extra: utils
Requires-Dist: numpy>=1.26.4; extra == "utils"
Requires-Dist: h5py>=3.11.0; extra == "utils"
Requires-Dist: scipy>=1.11.0; extra == "utils"
Requires-Dist: pandas>=2.2.2; extra == "utils"
Provides-Extra: mesoscope
Requires-Dist: aind-metadata-extractor[bergamo]; extra == "mesoscope"
Requires-Dist: aind-metadata-extractor[utils]; extra == "mesoscope"
Requires-Dist: pillow>=10.4.0; extra == "mesoscope"
Requires-Dist: tifffile==2024.2.12; extra == "mesoscope"
Dynamic: license-file

# aind-metadata-extractor

## Install

You should only install the dependencies for the specific extractor you plan to run. You can see the list of available extractors in the `pyproject.toml` file or in the folders in `src/aind_metadata/extractor`

During installation pass the extractor as an optional dependency:

```
pip install 'aind-metadata-extractor[<your-extractor>]'
```

## Develop

To build a new extractor, define a new output model in the models/ folder. Then create a new extractor folder and inherit from `BaseExtractor`. Implement the functions:

- `.run_job()` should store the metadata output object (matching the model) in self.metadata and return a dictionary with the `model_dump()` contents
- `._extract()` should perform the actual data loading, metadata-service calls, etc, necessary to build the metadata model and return it

Your extractor comes with an inherited function `.write()` which writes the metadata to the file <extractor>.json.

### Testing

When testing locally you only need to run your own tests (i.e. `coverage run -m unittest discover -s tests/<new-extractor>`). Do not modify the tests for other extractors in your PRs.

Before opening a PR, modify the file `test_and_lint.yml` and add a new test-group:

```
test-group: ['core', 'smartspim', 'mesoscope', 'utils', '<new-extractor>']
```

Then add the test-group settings below that:

```
    - test-group: '<new-extractor>'
    dependencies: '[dev,<new-extractor>]'
    test-path: 'tests/<new-extractor>'
    test-pattern: 'test_*.py'
```

When running on GitHub, all of the test groups will be run independently with their separate dependencies and then their coverage results are gathered together in a final step.

## Run

Each extractor uses a `JobSettings` object to collect necessary information about data and metadata files to create an `Extractor` which is run by calling `.extract()`. For example, for *smartspim*:

```{python}
from pathlib import Path

from aind_metadata_extractor.smartspim.job_settings import JobSettings
from aind_metadata_extractor.smartspim.extractor import SmartspimExtractor

DATA_DIR = Path("<path-to-your-data>)

job_settings=JobSettings(
    subject_id="786846",
    metadata_service_path="http://aind-metadata-service/slims/smartspim_imaging",
    input_source=DATA_DIR+"SmartSPIM_786846_2025-04-22_16-44-50",
    output_directory=".",
    slims_datetime="2025-0422T18:30:08.915000Z"
)
extractor = SmartspimExtractor(job_settings=job_settings)
extractor.run_job()
extractor.write()
```

The results will be saved in `smartspin.json`
