Metadata-Version: 2.4
Name: OntoCheck
Version: 0.0.5.0
Summary: A package for assessing the quality and structure of ontologies.
Author-email: "Rishabh Kundu, Van Tran, Redad Mehdi, Ethan Frakes, Abhishek Daundkar, Maliesha Sumudumalie, Vibha S. Mandayam, Jacob A. Lample, Mengjie Li, Laura S. Bruckman, Erika I. Barcelos, Alp Sehirlioglu, Yinghui Wu, Roger H. French" <rxf131@case.edu>
Maintainer-email: Redad Mehdi <mxm1684@case.edu>
License: Copyright 2025 SDLE Lab
        
        Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Project-URL: Homepage, https://github.com/cwru-sdle/OntoCheck
Project-URL: Documentation, https://ontocheck.readthedocs.io/en/latest/
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: rdflib
Requires-Dist: networkx
Requires-Dist: requests
Dynamic: license-file

# OntoCheck

**Query-Driven Ontology Assessment for Scientific Domain Applications**

[![PyPI](https://img.shields.io/pypi/v/OntoCheck)](https://pypi.org/project/OntoCheck/)
[![Documentation](https://readthedocs.org/projects/ontocheck/badge/?version=latest)](https://ontocheck.readthedocs.io/en/latest/)
[![License: BSD-2](https://img.shields.io/badge/License-BSD--2-blue.svg)](LICENSE)

---

## Overview

As scientific fields increasingly adopt FAIR data principles, ontologies have become essential for encoding the semantics of scientific investigations. Yet evaluating ontology quality remains a manual, technically demanding bottleneck. Current frameworks emphasize structural correctness but fail to assess practical utility against the real-world queries posed by domain scientists.

OntoCheck is an open-source Python tool that unifies domain-agnostic structural metrics with a novel, query-driven assessment methodology. By analyzing SPARQL queries derived from natural-language competency questions, OntoCheck compares the required query terms against an ontology's full vocabulary to yield complementary metrics for vocabulary coverage and utilization density. This empowers domain scientists and data engineers to make evidence-based decisions about ontology selection without requiring deep expertise in formal knowledge representation.

OntoCheck is actively developed and maintained by the **SDLE Research Center at Case Western Reserve University**.

---

## Installation

```bash
pip install OntoCheck
```

**Requirements:** Python 3.8 or later.

---

## Quick Start

### Command-Line Interface

```bash
# Display available metrics and usage information
ontocheck -h

# Run specific metrics on an ontology file
ontocheck path/to/ontology.ttl --metrics altLabelCheck definitionCheck

# Run all available task-agnostic metrics
ontocheck path/to/ontology.ttl --metrics all

# Specify custom output file paths
ontocheck path/to/ontology.ttl --metrics all --log-file results.log --csv-file results.csv
```

### Python API

```python
from ontocheck import run_ontology_assessment

# Run selected metrics
run_ontology_assessment(
    ttl_file="path/to/ontology.ttl",
    metrics=["altLabelCheck", "definitionCheck", "isolatedElements"],
)

# Run all task-agnostic metrics
run_ontology_assessment(
    ttl_file="path/to/ontology.ttl",
    metrics="all",
)
```

### Task-Based Assessment

```python
from ontocheck import task_based_metric_v_0_0_1

result = task_based_metric_v_0_0_1(
    ttl_file="path/to/ontology.ttl",
    questions="competency_questions.json",
    domain_prefixes=["mds"],
    domain_ns_fragments=["cwrusdle.bitbucket.io/mds"],
)

print(f"Relevance: {result['relevance']:.2%}")
print(f"Accuracy:  {result['accuracy']:.2%}")
```

---

## Available Metrics

OntoCheck provides **17 task-agnostic metrics** organized into four categories, along with a **task-based assessment methodology**.

### Labeling

| Metric | Function | Description |
|---|---|---|
| `checkLabel` | `mainLabelCheck_v_0_0_1` | Proportion of named classes carrying human-readable identifiers |
| `altLabelCheck` | `mainAltLabelCheck_v_0_0_1` | Proportion of named classes carrying synonyms |
| `definitionCheck` | `mainDefCheck_v_0_0_1` | Proportion of named classes carrying formal definitions |

### Structural

| Metric | Function | Description |
|---|---|---|
| `isolatedElements` | `check_for_isolated_elements` | Identifies orphaned classes within the ontology |
| `classConnections` | `count_class_connected_components` | Identifies disconnected subgraphs |
| `missingDomainRange` | `get_properties_missing_domain_and_range` | Identifies undeclared domain and range restrictions |
| `leafNodeCheck` | `mainLeafNodeCheck_v_0_0_1` | Identifies all leaf nodes in the ontology hierarchy |
| `semanticConnection` | `mainSemanticConnection_v_0_0_1` | Verifies grounding in upper-level ontologies (e.g., CCO, BFO) |

### Accessibility

| Metric | Function | Description |
|---|---|---|
| `sparqlEndpoint` | `check_sparql_accessibility_ttl` | Verifies reachability of the SPARQL endpoint |
| `rdfDump` | `check_rdf_dump_accessibility_ttl` | Verifies availability of the RDF data dump |
| `humanLicense` | `check_human_readable_license_ttl` | Verifies presence and fitness of licensing information |
| `externalLinks` | `check_external_data_provider_links_ttl` | Checks validity of external links within the ontology |

### Naming Convention

| Metric | Function | Description |
|---|---|---|
| `classCapitalCheck` | `mainClassNameCapitalCheck_v_0_0_1` | Flags departures from standard capitalization |
| `classSpaceCheck` | `mainClassNameSpaceCheck_v_0_0_1` | Flags use of spaces in class identifiers |
| `spellCheck` | `spell_check_v_0_0_1` | Spell checking on labels and definitions |
| `duplicateLabels` | `find_duplicate_labels_from_graph` | Identifies duplicate labels across entities |
| `searchClass` | `mainClassSearch_v_0_0_1` | Identifies classes matching a user-specified string |

### Task-Based Assessment

The task-based methodology measures how well an ontology supports analytical queries by computing two complementary metrics from SPARQL competency questions:

- **Relevance** = |T_a intersection T_o| / |T_a| -- the fraction of task-required terms that the ontology defines
- **Accuracy** = |T_a intersection T_o| / |T_o| -- the fraction of ontology terms utilized by the task queries

where T_a is the set of domain terms extracted from the SPARQL queries and T_o is the set of domain terms defined in the ontology.

---

## Documentation

Full documentation is available at [ontocheck.readthedocs.io](https://ontocheck.readthedocs.io/en/latest/).

---

## Authors

- Rishabh Kundu
- Redad Mehdi
- Van D. Tran
- Ethan Frakes
- Abhishek Daundkar
- Maliesha Sumudumalie
- Vibha S. Mandayam
- Jacob A. Lample
- Mengjie Li
- Laura S. Bruckman
- Erika I. Barcelos
- Alp Sehirlioglu
- Roger H. French
- Yinghui Wu

## Affiliation

Materials Data Science for Stockpile Stewardship Center of Excellence (MDS3 COE), Case Western Reserve University, Cleveland, OH 44106, USA

---

## Acknowledgments

- U.S. Department of Energy's National Nuclear Security Administration -- Award Number **DE-NA0004104** and Contract Number **B647887**
- U.S. Department of Energy's Office of Energy Efficiency and Renewable Energy (EERE) under the Solar Energy Technologies Office (SETO) -- Agreement Numbers **DE-EE0009353** and **DE-EE0009347**
- U.S. National Science Foundation -- Award Number **2133576**

---

## How to Cite

If you use OntoCheck in your work, please cite:

> Rishabh Kundu, Redad Mehdi, Van D. Tran, Ethan Frakes, Abhishek Daundkar, Maliesha Sumudumalie, Vibha S. Mandayam, Jacob A. Lample, Mengjie Li, Laura S. Bruckman, Erika I. Barcelos, Alp Sehirlioglu, Roger H. French, Yinghui Wu (2025). OntoCheck: Query-Driven Ontology Assessments for Scientific Domain Applications. [Python]. https://pypi.org/project/OntoCheck/

---

## License

OntoCheck is released under the [BSD-2-Clause License](LICENSE).
