Metadata-Version: 2.4
Name: nmdc-metadata-suggestor-ai-tool
Version: 1.0.1
Summary: NMDC Submission portal metadata suggestor tool, powered by AI
Project-URL: Homepage, https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool
Project-URL: Repository, https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool
Project-URL: Issues, https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool/issues
License: NMDC Server Copyright (c) 2023, The Regents of the University of California,
        through  Lawrence Berkeley National Laboratory (subject to receipt of any
        required  approvals from the U.S. Dept. of Energy).  All rights reserved.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        (1) Redistributions of source code must retain the above copyright notice,
        this list of conditions and the following disclaimer.
        
        (2) Redistributions in binary form must reproduce the above copyright
        notice, this list of conditions and the following disclaimer in the
        documentation and/or other materials provided with the distribution.
        
        (3) Neither the name of the University of California, Lawrence Berkeley
        National Laboratory, U.S. Dept. of Energy nor the names of its contributors
        may be used to endorse or promote products derived from this software
        without specific prior written permission.
        
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
        ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
        LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
        CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
        SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
        INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
        CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
        ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
        POSSIBILITY OF SUCH DAMAGE.
        
        You are under no obligation whatsoever to provide any bug fixes, patches,
        or upgrades to the features, functionality or performance of the source
        code ("Enhancements") to anyone; however, if you choose to make your
        Enhancements available either publicly, or directly to Lawrence Berkeley
        National Laboratory, without imposing a separate written license agreement
        for such Enhancements, then you hereby grant the following license: a
        non-exclusive, royalty-free perpetual license to install, use, modify,
        prepare derivative works, incorporate into other computer software,
        distribute, and sublicense such enhancements or derivative works thereof,
        in binary and source code form.
License-File: LICENSE
Keywords: bioinformatics,doi,llm,metadata,nmdc
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.12
Requires-Dist: curl-cffi>=0.13.0
Requires-Dist: google-auth>=2.0.0
Requires-Dist: google-genai>=1.62.0
Requires-Dist: linkml-runtime>=1.7.0
Requires-Dist: nmdc-submission-schema>=11.0.0
Requires-Dist: openai>=2.21.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: requests>=2.31.0
Provides-Extra: dev
Requires-Dist: black>=24.0.0; extra == 'dev'
Requires-Dist: mypy>=1.8.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: responses>=0.25.0; extra == 'dev'
Requires-Dist: ruff>=0.3.0; extra == 'dev'
Requires-Dist: types-requests>=2.31.0; extra == 'dev'
Description-Content-Type: text/markdown

# nmdc-metadata-suggestor-ai-tool

A Python application for the NMDC Submission portal metadata suggestor tool, powered by AI. This project uses modern Python tooling with [uv](https://github.com/astral-sh/uv) for dependency management and Docker for containerization.

## Prerequisites

- Python 3.12 or higher
- [uv](https://github.com/astral-sh/uv) (or use Docker)
- Docker and Docker Compose (for containerized development)

## Quick Start

### Option 1: Using uv (Local Development)

1. **Install uv** (if not already installed):
   ```bash
   curl -LsSf https://astral.sh/uv/install.sh | sh
   # or
   pip install uv
   ```

2. **Clone and setup**:
   ```bash
   git clone https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool.git
   cd nmdc-metadata-suggestor-ai-tool
   ```

3. **Install dependencies**:
   ```bash
   uv sync
   ```

4. **Configure environment**:
   ```bash
   cp .env.example .env
   # Edit .env and add your API keys
   ```

5. **Use the package in Python**:
   ```bash
   uv run python
   ```

   ```python
   from nmdc_metadata_suggestor_ai_tool.llm_client import LLMClient
   from nmdc_metadata_suggestor_ai_tool.recommendation_pipeline import run_recommendation_pipeline

   submission_object = {
       # NMDC submission JSON payload
   }

   client = LLMClient(access_provider="gcp")
   result = run_recommendation_pipeline(submission_object, client)
   print(result.model_dump())
   ```

### Option 2: Using Docker

1. **Clone the repository**:
   ```bash
   git clone https://github.com/microbiomedata/nmdc-metadata-suggestor-ai-tool.git
   cd nmdc-metadata-suggestor-ai-tool
   ```

2. **Configure environment**:
   ```bash
   cp .env.example .env
   # Edit .env and add your API keys
   ```

3. **Run with Docker Compose** (development):
   ```bash
   docker-compose up
   ```

4. **Or build and run production image**:
   ```bash
   docker build -t nmdc-suggestor .
   docker run --env-file .env nmdc-suggestor
   ```

## Development

### Project Structure

```
nmdc-metadata-suggestor-ai-tool/
├── src/
│   └── nmdc_metadata_suggestor/
│       ├── __init__.py
│       ├── recommendation_pipeline.py       # Pipeline orchestration
│       ├── llm_client.py                    # LLM client for AI interactions
│       ├── cli/
│       │   ├── __init__.py
│       │   └── doi_cli.py                   # DOI operations CLI
│       ├── models/
│       │   ├── __init__.py
│       │   ├── doi.py                       # DOI data models
│       │   └── llm_output.py                # LLM output model
│       └── publication_ingestion/
│           ├── __init__.py
│           ├── download_pdf.py              # PDF retrieval logic
│           └── retreive_pdf_link.py         # PDF link discovery
├── tests/                                    # Test files
├── scripts/                                  # Vertex AI test scripts
├── docs/                                     # Documentation
├── pyproject.toml                            # Project dependencies and metadata
├── Dockerfile                                # Production Docker image
├── Dockerfile.dev                            # Development Docker image
├── docker-compose.yml                        # Docker Compose configuration
├── .env.example                              # Example environment variables
└── README.md                                 # This file
```

### Running Tests

```bash
# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src/nmdc_metadata_suggestor

# Run specific test file
uv run pytest tests/test_example.py
```

### Code Quality

```bash
# Format code with Black
uv run black src tests

# Lint with Ruff
uv run ruff check src tests

# Type check with MyPy
uv run mypy src
```

### Adding Dependencies

```bash
# Add a production dependency
uv add package-name

# Add a development dependency
uv add --dev package-name

# Update dependencies
uv sync
```

## Configuration

Configuration is managed through environment variables or a `.env` file. See `.env.example` for available options:

- `DEFAULT_MODEL`: Default LLM model to use
- `MAX_TOKENS`: Maximum tokens for LLM responses
- `TEMPERATURE`: Temperature for LLM responses (0.0-1.0)

## Docker Development Workflow

### Interactive Development

For interactive development with hot-reload:

```bash
# Start container in background
docker-compose up -d

# Execute commands in the container
docker-compose exec app uv run pytest
docker-compose exec app uv run black src

# Access shell
docker-compose exec app bash

# Stop container
docker-compose down
```

### Production Build

```bash
# Build production image
docker build -t nmdc-suggestor:latest .

# Run production container
docker run --env-file .env nmdc-suggestor:latest
```

## Contributing

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes
4. Run tests and quality checks
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

## License

See [LICENSE](LICENSE) for licensing terms.


