Metadata-Version: 2.4
Name: elixir-training-mcp
Version: 0.0.3
Summary: A Model Context Protocol (MCP) server to access data about training materials.
Project-URL: Homepage, https://github.com/elixir-europe-training/ELIXIR-TrP-KG-training-metadata
Project-URL: Documentation, https://github.com/elixir-europe-training/ELIXIR-TrP-KG-training-metadata
Project-URL: History, https://github.com/elixir-europe-training/ELIXIR-TrP-KG-training-metadata/releases
Project-URL: Tracker, https://github.com/elixir-europe-training/ELIXIR-TrP-KG-training-metadata/issues
Project-URL: Source, https://github.com/elixir-europe-training/ELIXIR-TrP-KG-training-metadata
Author-email: Vincent Emonet <vincent.emonet@gmail.com>
Maintainer-email: Vincent Emonet <vincent.emonet@gmail.com>
License: MIT License
        
        Copyright (c) 2025-present ELIXIR Europe Training
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: Elixir,MCP,Search,Training
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Requires-Dist: extruct>=0.16.0
Requires-Dist: httpx>=0.28.1
Requires-Dist: lxml>=5.0.0
Requires-Dist: mcp>=1.19.0
Requires-Dist: pydantic-settings>=2.11.0
Requires-Dist: pydantic>=2.12.0
Requires-Dist: pyld>=2.0.4
Requires-Dist: rdflib>=7.4.0
Requires-Dist: w3lib>=2.2.0
Description-Content-Type: text/markdown

# Project 18: Mining the potential of knowledge graphs for metadata on training

## Abstract

Knowledge graphs (KGs) can greatly increase the potential of data by revealing hidden relationships and turning it into useful information. A KG is a graph-based representation of data that stores relations between subjects, predicates and objects in triplestores. These entities are typically described in pre-defined ontologies, which increase interoperability and connect data that would otherwise remain isolated in siloed databases. This structured data representation can greatly facilitate complex querying and applications to deep learning approaches like generative AI.

ELIXIR and its Nodes are making a major effort to make the wealth of open training materials on the computational life sciences reusable, amongst others by guidelines and support for annotating training materials with standardized metadata. One major step in standardizing metadata is the use of the Bioschemas training profile, which became a standard for representing training metadata. Despite being standardized and interoperable, there is still a lot of potential to turn these resources into valuable information, linking training data across various databases.

In this project, we aim to create queryable KGs derived from training metadata in the Bioschemas format available from platforms like TeSS and glittr.org. In a subsequent step, we will investigate the potential of such KGs for several use cases, including construction of custom learning paths, creation of detailed trainer profiles, and connection of  training metadata to other databases. These use-cases will also shed light on the limits on the currently available metadata, and will help to make future choices on richer metadata and standards.

## Leads

Geert van Geest, Harshita Gupta, Vincent Emonet

## 💬 MCP server

A [Model Context Protocol (MCP)](https://modelcontextprotocol.io) server to access and search through the training materials of multiple Elixir repositories, such as [TeSS](https://tess.elixir-europe.org/) and [Glittr](https://glittr.org/).

### ⚡️ Usage

> [!IMPORTANT]
>
> Requirement: [`uv`](https://docs.astral.sh/uv/getting-started/installation/), to easily handle python scripts and virtual environments

Use with STDIO transport:

```sh
uv run elixir-training-mcp
```

Use with Deploy as Streamable HTTP server:

```sh
uv run elixir-training-mcp --http
```

### 🧰 Available MCP tools

Once the server is running you can call the following tools from your MCP-compatible client:

| Tool | Description |
| ---- | ----------- |
| `search_training_materials` | Proxies the live TeSS API and returns raw JSON results. |
| `keyword_search` | Searches the harvested TTL datasets (TeSS + GTN) by free-text keyword and returns enriched metadata. |
| `provider_search` | Filters harvested resources by provider name (case-insensitive). |
| `location_search` | Returns TeSS course instances in a given country (optionally city). |
| `date_search` | Finds TeSS course instances starting within a provided ISO date range. |
| `topic_search` | Matches harvested resources by EDAM identifier or topic label. |
| `dataset_stats` | Summarises dataset diagnostics (resource counts, type distribution, access modes). |

> [!NOTE]
> The local tools read from `data/tess_harvest.ttl` and `data/gtn_harvest.ttl`. Regenerate these files with the harvest scripts if you need fresher data.

### 🔌 Connect client to MCP server

Follow the instructions of your favorite chat client.

To add a new MCP server to [**VSCode GitHub Copilot**](https://code.visualstudio.com/docs/copilot/overview):

- Install the [`GitHub.copilot`](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot) extension
- Open the Command Palette (`ctrl+shift+p` or `cmd+shift+p`)
- Search for `MCP: Add Server...`
  - Choose `STDIO`, and provide the command: `uvx elixir-training-mcp`
  - Or choose `HTTP`, and provide the MCP server URL, e.g. http://localhost:8000/mcp

To use it with STDIO transport, your VSCode `mcp.json` should look like:

```json
{
   "servers": {
      "elixir-training-mcp": {
         "type": "stdio",
         "command": "uvx",
         "args": ["elixir-training-mcp"]
      }
   }
}
```

> [!TIP]
>
> You can use a local folder for development:
>
> ```json
> {
>    "servers": {
>       "elixir-training-mcp": {
>          "type": "stdio",
>          "cwd": "~/dev/ELIXIR-TrP-KG-training-metadata",
>          "command": "uv",
>          "args": ["run", "elixir-training-mcp"]
>       }
>    }
> }
> ```

You can also connect to a running server using Streamable HTTP:

```json
{
    "servers": {
        "elixir-training-mcp-http": {
            "url": "http://localhost:8000/mcp",
            "type": "http"
        }
    }
}
```

## Harvesting

The `data/tess_harvest.ttl` files is included in the repository (7MB), you can run the script to harvest JSON-LD data and build this ttl file, but it takes ~30min due to parsing JSON-LD being expensive:

```sh
uv run src/elixir_training_mcp/harvest/harvest_tess.py
```

Harvest GTN:

```sh
uv run src/elixir_training_mcp/harvest/harvest_gtn.py
```


> [!NOTE]
>
> TeSS contains training materials and courses from various providers, such as GTN (Galaxy Training Network) training materials. Metadata about material in TeSS and GTN can be matched on `schema:url`

Deploy a SPARQL endpoint on http://localhost:8000:

```sh
uv run rdflib-endpoint serve src/elixir_training_mcp/data/*_harvest.ttl
```

