Metadata-Version: 2.4
Name: nucleotide-archive-mcp
Version: 0.0.3
Summary: MCP server for searching European Nucleotide Archive (ENA) datasets. Find RNA-seq studies, retrieve metadata, and discover related publications to validate research hypotheses.
Project-URL: Documentation, https://nucleotide_archive_mcp.readthedocs.io/
Project-URL: Homepage, https://github.com/biocontext-ai/nucleotide_archive_mcp
Project-URL: Source, https://github.com/biocontext-ai/nucleotide_archive_mcp
Author: Malte Kuehl
Maintainer-email: Malte Kuehl <malte.kuehl@clin.au.dk>
License: 
        Apache License Version 2.0
        Version 2.0, January 2004
        http://www.apache.org/licenses/
        
        TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
        1. Definitions.
        
            "License" shall mean the terms and conditions for use, reproduction,
            and distribution as defined by Sections 1 through 9 of this document.
        
            "Licensor" shall mean the copyright owner or entity authorized by
            the copyright owner that is granting the License.
        
            "Legal Entity" shall mean the union of the acting entity and all
            other entities that control, are controlled by, or are under common
            control with that entity. For the purposes of this definition,
            "control" means (i) the power, direct or indirect, to cause the
            direction or management of such entity, whether by contract or
            otherwise, or (ii) ownership of fifty percent (50%) or more of the
            outstanding shares, or (iii) beneficial ownership of such entity.
        
            "You" (or "Your") shall mean an individual or Legal Entity
            exercising permissions granted by this License.
        
            "Source" form shall mean the preferred form for making modifications,
            including but not limited to software source code, documentation
            source, and configuration files.
        
            "Object" form shall mean any form resulting from mechanical
            transformation or translation of a Source form, including but
            not limited to compiled object code, generated documentation,
            and conversions to other media types.
        
            "Work" shall mean the work of authorship, whether in Source or
            Object form, made available under the License, as indicated by a
            copyright notice that is included in or attached to the work
            (an example is provided in the Appendix below).
        
            "Derivative Works" shall mean any work, whether in Source or Object
            form, that is based on (or derived from) the Work and for which the
            editorial revisions, annotations, elaborations, or other modifications
            represent, as a whole, an original work of authorship. For the purposes
            of this License, Derivative Works shall not include works that remain
            separable from, or merely link (or bind by name) to the interfaces of,
            the Work and Derivative Works thereof.
        
            "Contribution" shall mean any work of authorship, including
            the original version of the Work and any modifications or additions
            to that Work or Derivative Works thereof, that is intentionally
            submitted to Licensor for inclusion in the Work by the copyright owner
            or by an individual or Legal Entity authorized to submit on behalf of
            the copyright owner. For the purposes of this definition, "submitted"
            means any form of electronic, verbal, or written communication sent
            to the Licensor or its representatives, including but not limited to
            communication on electronic mailing lists, source code control systems,
            and issue tracking systems that are managed by, or on behalf of, the
            Licensor for the purpose of discussing and improving the Work, but
            excluding communication that is conspicuously marked or otherwise
            designated in writing by the copyright owner as "Not a Contribution."
        
            "Contributor" shall mean Licensor and any individual or Legal Entity
            on behalf of whom a Contribution has been received by Licensor and
            subsequently incorporated within the Work.
        
        2. Grant of Copyright License. Subject to the terms and conditions of
           this License, each Contributor hereby grants to You a perpetual,
           worldwide, non-exclusive, no-charge, royalty-free, irrevocable
           copyright license to reproduce, prepare Derivative Works of,
           publicly display, publicly perform, sublicense, and distribute the
           Work and such Derivative Works in Source or Object form.
        
        3. Grant of Patent License. Subject to the terms and conditions of
           this License, each Contributor hereby grants to You a perpetual,
           worldwide, non-exclusive, no-charge, royalty-free, irrevocable
           (except as stated in this section) patent license to make, have made,
           use, offer to sell, sell, import, and otherwise transfer the Work,
           where such license applies only to those patent claims licensable
           by such Contributor that are necessarily infringed by their
           Contribution(s) alone or by combination of their Contribution(s)
           with the Work to which such Contribution(s) was submitted. If You
           institute patent litigation against any entity (including a
           cross-claim or counterclaim in a lawsuit) alleging that the Work
           or a Contribution incorporated within the Work constitutes direct
           or contributory patent infringement, then any patent licenses
           granted to You under this License for that Work shall terminate
           as of the date such litigation is filed.
        
        4. Redistribution. You may reproduce and distribute copies of the
           Work or Derivative Works thereof in any medium, with or without
           modifications, and in Source or Object form, provided that You
           meet the following conditions:
        
            (a) You must give any other recipients of the Work or
            Derivative Works a copy of this License; and
        
            (b) You must cause any modified files to carry prominent notices
            stating that You changed the files; and
        
            (c) You must retain, in the Source form of any Derivative Works
            that You distribute, all copyright, patent, trademark, and
            attribution notices from the Source form of the Work,
            excluding those notices that do not pertain to any part of
            the Derivative Works; and
        
            (d) If the Work includes a "NOTICE" text file as part of its
            distribution, then any Derivative Works that You distribute must
            include a readable copy of the attribution notices contained
            within such NOTICE file, excluding those notices that do not
            pertain to any part of the Derivative Works, in at least one
            of the following places: within a NOTICE text file distributed
            as part of the Derivative Works; within the Source form or
            documentation, if provided along with the Derivative Works; or,
            within a display generated by the Derivative Works, if and
            wherever such third-party notices normally appear. The contents
            of the NOTICE file are for informational purposes only and
            do not modify the License. You may add Your own attribution
            notices within Derivative Works that You distribute, alongside
            or as an addendum to the NOTICE text from the Work, provided
            that such additional attribution notices cannot be construed
            as modifying the License.
        
            You may add Your own copyright statement to Your modifications and
            may provide additional or different license terms and conditions
            for use, reproduction, or distribution of Your modifications, or
            for any such Derivative Works as a whole, provided Your use,
            reproduction, and distribution of the Work otherwise complies with
            the conditions stated in this License.
        
        5. Submission of Contributions. Unless You explicitly state otherwise,
           any Contribution intentionally submitted for inclusion in the Work
           by You to the Licensor shall be under the terms and conditions of
           this License, without any additional terms or conditions.
           Notwithstanding the above, nothing herein shall supersede or modify
           the terms of any separate license agreement you may have executed
           with Licensor regarding such Contributions.
        
        6. Trademarks. This License does not grant permission to use the trade
           names, trademarks, service marks, or product names of the Licensor,
           except as required for reasonable and customary use in describing the
           origin of the Work and reproducing the content of the NOTICE file.
        
        7. Disclaimer of Warranty. Unless required by applicable law or
           agreed to in writing, Licensor provides the Work (and each
           Contributor provides its Contributions) on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
           implied, including, without limitation, any warranties or conditions
           of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
           PARTICULAR PURPOSE. You are solely responsible for determining the
           appropriateness of using or redistributing the Work and assume any
           risks associated with Your exercise of permissions under this License.
        
        8. Limitation of Liability. In no event and under no legal theory,
           whether in tort (including negligence), contract, or otherwise,
           unless required by applicable law (such as deliberate and grossly
           negligent acts) or agreed to in writing, shall any Contributor be
           liable to You for damages, including any direct, indirect, special,
           incidental, or consequential damages of any character arising as a
           result of this License or out of the use or inability to use the
           Work (including but not limited to damages for loss of goodwill,
           work stoppage, computer failure or malfunction, or any and all
           other commercial damages or losses), even if such Contributor
           has been advised of the possibility of such damages.
        
        9. Accepting Warranty or Additional Liability. While redistributing
           the Work or Derivative Works thereof, You may choose to offer,
           and charge a fee for, acceptance of support, warranty, indemnity,
           or other liability obligations and/or rights consistent with this
           License. However, in accepting such obligations, You may act only
           on Your own behalf and on Your sole responsibility, not on behalf
           of any other Contributor, and only if You agree to indemnify,
           defend, and hold each Contributor harmless for any liability
           incurred by, or claims asserted against, such Contributor by reason
           of your accepting any such warranty or additional liability.
        
        END OF TERMS AND CONDITIONS
        
        APPENDIX: How to apply the Apache License to your work.
        
              To apply the Apache License to your work, attach the following
              boilerplate notice, with the fields enclosed by brackets "[]"
              replaced with your own identifying information. (Don't include
              the brackets!)  The text should be enclosed in the appropriate
              comment syntax for the file format. We also recommend that a
              file or class name and description of purpose be included on the
              same "printed page" as the copyright notice for easier
              identification within third-party archives.
        
        Copyright (c) 2025, Malte Kuehl
        
        Licensed under the Apache License, Version 2.0 (the "License");
        you may not use this file except in compliance with the License.
        You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
        Unless required by applicable law or agreed to in writing, software
        distributed under the License is distributed on an "AS IS" BASIS,
        WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
        See the License for the specific language governing permissions and
        limitations under the License.
License-File: LICENSE
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.11
Requires-Dist: click
Requires-Dist: fastmcp
Requires-Dist: httpx==0.28.*
Requires-Dist: pydantic<3,>=2.12
Provides-Extra: dev
Requires-Dist: pre-commit; extra == 'dev'
Requires-Dist: twine>=4.0.2; extra == 'dev'
Provides-Extra: doc
Requires-Dist: docutils!=0.18.*,!=0.19.*,>=0.8; extra == 'doc'
Requires-Dist: ipykernel; extra == 'doc'
Requires-Dist: ipython; extra == 'doc'
Requires-Dist: myst-nb>=1.1; extra == 'doc'
Requires-Dist: pandas; extra == 'doc'
Requires-Dist: setuptools; extra == 'doc'
Requires-Dist: sphinx-autoapi; extra == 'doc'
Requires-Dist: sphinx-book-theme>=1; extra == 'doc'
Requires-Dist: sphinx-copybutton; extra == 'doc'
Requires-Dist: sphinx-tabs; extra == 'doc'
Requires-Dist: sphinx>=8.1; extra == 'doc'
Requires-Dist: sphinxext-opengraph; extra == 'doc'
Provides-Extra: test
Requires-Dist: coverage; extra == 'test'
Requires-Dist: pytest; extra == 'test'
Requires-Dist: pytest-asyncio; extra == 'test'
Description-Content-Type: text/markdown

# RNA Dataset Search - MCP Server

[![BioContextAI - Registry](https://img.shields.io/badge/Registry-package?style=flat&label=BioContextAI&labelColor=%23fff&color=%233555a1&link=https%3A%2F%2Fbiocontext.ai%2Fregistry)](https://biocontext.ai/registry)
[![Tests][badge-tests]][tests]
[![Documentation][badge-docs]][documentation]

[badge-tests]: https://img.shields.io/github/actions/workflow/status/biocontext-ai/nucleotide_archive_mcp/test.yaml?branch=main
[badge-docs]: https://img.shields.io/readthedocs/nucleotide_archive_mcp

A Model Context Protocol (MCP) server for searching and accessing RNA sequencing datasets from the [European Nucleotide Archive (ENA)](https://www.ebi.ac.uk/ena/browser/home). Find publicly available bulk RNA-seq and single-cell RNA-seq datasets to validate research hypotheses or reproduce published analyses.

**Optimized for**: Human and mouse disease-related RNA-seq studies with support for bulk, single-cell, and spatial transcriptomics.

## Features

- **Disease-Focused Search**: Find datasets by disease, organism, and tissue type
- **Advanced Technology Filtering**:
  - Simple presets: bulk, single-cell, small-rna, ribo-seq, rna-all
  - Granular control: Filter by 50+ library strategies (RNA-Seq, miRNA-Seq, ChIP-Seq, ATAC-seq, etc.)
  - Source filtering: TRANSCRIPTOMIC, GENOMIC, METAGENOMIC, etc.
- **Common Organism Names**: Use "human", "mouse", "rat" instead of scientific names
- **Download Support**: Generate wget/curl scripts for downloading FASTQ files
- **Study Metadata**: Retrieve comprehensive metadata including PubMed IDs
- **Publication Links**: Discover datasets associated with PubMed publications
- **Flexible Queries**: Build custom queries with multiple field conditions
- **Field Discovery**: Explore available search and return fields
- **Environment Configuration**: Customize API endpoints, timeouts, and logging via environment variables

## Available Tools

The MCP server provides 10 specialized tools:

### Search & Discovery
1. **search_rna_studies** - Unified search with preset filters or advanced library strategy/source filtering
2. **list_library_types** - List all 50+ available library strategies and sources
3. **get_study_details** - Get comprehensive metadata for a specific study (includes PubMed IDs)
4. **find_studies_by_publication** - Find studies associated with a PubMed ID
5. **search_studies_by_keywords** - Flexible keyword search across study titles

### Download & Access
6. **get_download_urls** - Get FTP download URLs for all data files in a study
7. **generate_download_script** - Generate bash scripts (wget/curl) for downloading data

### Advanced
8. **get_available_fields** - Discover searchable and returnable fields for different data types
9. **get_result_types** - List all available data types in ENA
10. **build_custom_query** - Construct advanced queries with multiple field conditions

## Example Use Cases

### Simple Searches (Preset Filters)
- Find human cancer bulk RNA-seq datasets: `disease="cancer"`
- Search for single-cell RNA-seq in mouse brain: `organism="mouse", tissue="brain", technology="single-cell"`
- Find small RNA sequencing studies: `technology="small-rna"`
- Ribosome profiling experiments: `technology="ribo-seq"`

### Advanced Searches (Specific Library Types)
- ChIP-Seq chromatin studies: `library_strategies=["ChIP-Seq"]`
- ATAC-seq accessibility data: `library_strategies=["ATAC-seq"]`
- Combined small RNA types: `library_strategies=["miRNA-Seq", "ncRNA-Seq"]`
- Any single-cell data: `library_sources=["TRANSCRIPTOMIC SINGLE CELL"]`
- Metagenomic RNA: `library_sources=["METATRANSCRIPTOMIC"]`

### Workflow Examples
- Download FASTQ files from a specific study
- Discover datasets from a specific publication
- Generate download scripts with MD5 verification
- List all available sequencing technologies: `list_library_types()`

## Getting started

Please refer to the [documentation][],
in particular, the [API documentation][].

You can also find the project on [BioContextAI](https://biocontext.ai), the community-hub for biomedical MCP servers: [nucleotide_archive_mcp on BioContextAI](https://biocontext.ai/registry/biocontext-ai/nucleotide_archive_mcp).

## Installation

You need to have Python 3.11 or newer installed on your system.
If you don't have Python installed, we recommend installing [uv][].

There are several alternative options to install nucleotide_archive_mcp:

### 1. Use `uvx` to run it immediately
After publication to PyPI:
```bash
uvx nucleotide_archive_mcp
```

Or from a Git repository:

```bash
uvx git+https://github.com/biocontext-ai/nucleotide_archive_mcp.git@main
```

### 2. Include it in one of various clients that supports the `mcp.json` standard

If your MCP server is published to PyPI, use the following configuration:

```json
{
  "mcpServers": {
    "nucleotide_archive_mcp": {
      "command": "uvx",
      "args": ["nucleotide_archive_mcp"]
    }
  }
}
```
In case the MCP server is not yet published to PyPI, use this configuration:

```json
{
  "mcpServers": {
    "nucleotide_archive_mcp": {
      "command": "uvx",
      "args": ["git+https://github.com/biocontext-ai/nucleotide_archive_mcp.git@main"]
    }
  }
}
```

For purely local development (e.g., in Cursor or VS Code), use the following configuration:

```json
{
  "mcpServers": {
    "nucleotide_archive_mcp": {
      "command": "uvx",
      "args": [
        "--refresh",
        "--from",
        "path/to/repository",
        "nucleotide_archive_mcp"
      ]
    }
  }
}
```

If you want to reuse and existing environment for local development, use the following configuration:

```json
{
  "mcpServers": {
    "nucleotide_archive_mcp": {
      "command": "uv",
      "args": ["run", "--directory", "path/to/repository", "nucleotide_archive_mcp"]
    }
  }
}
```

### 3. Install it through `pip`:

```bash
pip install --user nucleotide_archive_mcp
```

### 4. Install the latest development version:

```bash
pip install git+https://github.com/biocontext-ai/nucleotide_archive_mcp.git@main
```

## Configuration

The server can be configured via environment variables. Copy `.env.example` to `.env` and customize:

```bash
# ENA API Configuration
ENA_PORTAL_API_BASE=https://www.ebi.ac.uk/ena/portal/api  # Override API base URL
ENA_BROWSER_API_BASE=https://www.ebi.ac.uk/ena/browser/api
ENA_TIMEOUT=30.0                # Request timeout in seconds
ENA_SEARCH_LIMIT=20            # Default search result limit
ENA_MAX_RPS=10.0               # Rate limiting (requests per second)

# Logging
LOG_LEVEL=INFO                 # DEBUG, INFO, WARNING, ERROR, CRITICAL
```

These settings allow you to:
- Use custom or mirror ENA API endpoints
- Adjust timeouts for slow connections
- Control default result limits
- Configure rate limiting for large batch operations
- Set logging verbosity for debugging

## Data Citation and Attribution

When using data from ENA in publications, please cite the data appropriately:

### How to Cite ENA Data

The top-level Project accession should be cited along with a link to the data in the ENA browser:

> "The data for this study have been deposited in the European Nucleotide Archive (ENA) at EMBL-EBI under accession number PRJEBxxxx (https://www.ebi.ac.uk/ena/browser/view/PRJEBxxxx)."

Replace `PRJEBxxxx` with the actual study accession number from your search results.

### Accessing Data in ENA Browser

All accessions can be viewed in the ENA browser:
- Direct URL: `https://www.ebi.ac.uk/ena/browser/view/<accession>`
- Example: https://www.ebi.ac.uk/ena/browser/view/PRJDB2345

### ORCID Data Claiming

ENA studies can be claimed against your ORCID ID through the [EBI Search interface](https://www.ebi.ac.uk/ebisearch/orcidclaimdocumentation.ebi). Search for your projects and click "Claim to ORCID" to link them to your ORCID profile.

## Data Policy and Usage

### ENA/INSDC Data Policy

This tool accesses data from the European Nucleotide Archive (ENA), which is part of the International Nucleotide Sequence Database Collaboration (INSDC) with DDBJ and GenBank.

**Key Points:**
- **Open Access**: All data in ENA/INSDC databases are freely and publicly accessible
- **No Restrictions**: Data have no use restrictions or licensing requirements
- **Redistribution**: Free redistribution and use of data is permitted
- **Permanence**: All submitted records remain permanently accessible
- **Attribution**: Proper citation of original submissions is expected (see above)

### Data Availability

Data in ENA can be:
- **Public**: Freely accessible through this tool and ENA browser
- **Confidential**: Pre-publication data not yet publicly available (not searchable through this tool)

Released data should be cited appropriately in publications and claimed via ORCID where applicable.

### Data Standards

ENA promotes data harmonization through:
- **Sample Checklists**: Minimum information standards for different data types
- **MIxS Standards**: Genomic Standards Consortium (GSC) minimum information standards
- **Community Standards**: Research community-developed reporting standards

For more information, see the [ENA Data Standards](https://ena-docs.readthedocs.io/en/latest/submit/general-guide/metadata.html) documentation.

## Disclaimer

This tool provides access to data from the European Nucleotide Archive (ENA) at EMBL-EBI. The tool is:
- **Independent**: Not officially affiliated with or endorsed by ENA, EMBL-EBI, or INSDC
- **Quality**: Data quality and accuracy are the responsibility of the original submitters
- **Updates**: ENA data and APIs may change; this tool is maintained to reflect current ENA services
- **Support**: For issues with ENA data or services, contact [ENA Support](https://www.ebi.ac.uk/ena/browser/support)

The European Nucleotide Archive is developed and maintained at EMBL-EBI under the guidance of the INSDC International Advisory Committee.

## Contact

If you found a bug with this MCP server, please use the [issue tracker][].

For questions about ENA data or services, contact [ENA Support](https://www.ebi.ac.uk/ena/browser/support).

## Acknowledgments

This tool accesses data from:
- **European Nucleotide Archive (ENA)** at EMBL-EBI
- **International Nucleotide Sequence Database Collaboration (INSDC)**

Special thanks to the ENA team for maintaining the public API and comprehensive documentation.

[uv]: https://github.com/astral-sh/uv
[issue tracker]: https://github.com/biocontext-ai/nucleotide_archive_mcp/issues
[tests]: https://github.com/biocontext-ai/nucleotide_archive_mcp/actions/workflows/test.yaml
[documentation]: https://nucleotide_archive_mcp.readthedocs.io
[changelog]: https://nucleotide_archive_mcp.readthedocs.io/en/latest/changelog.html
[api documentation]: https://nucleotide_archive_mcp.readthedocs.io/en/latest/api.html
[pypi]: https://pypi.org/project/nucleotide_archive_mcp
