Metadata-Version: 2.4
Name: SPARQLMojo
Version: 0.15.9
Summary: An SQLAlchemy-like ORM for SPARQL endpoints.
License-Expression: MIT
License-File: LICENSE
Keywords: sparql,rdf,orm,pydantic,linked-data,semantic-web
Author: Oliver Sampson
Requires-Python: >=3.12
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: SPARQLWrapper (>=2.0.0)
Requires-Dist: pydantic (>=2.12.4,<3.0.0)
Requires-Dist: rdflib (>=6.0.0)
Project-URL: Documentation, https://codeberg.org/Gitterdan/SPARQLMojo
Project-URL: Homepage, https://codeberg.org/Gitterdan/SPARQLMojo
Project-URL: Repository, https://codeberg.org/Gitterdan/SPARQLMojo
Description-Content-Type: text/markdown

# SPARQLMojo

An SQLAlchemy-like ORM for SPARQL endpoints with Pydantic validation. Currently in beta, so there may be breaking changes.

## Table of Contents

- **[Full Documentation Index](docs/README.md)**
- [Features](#features)
- [Installation](#installation)
- [Version](#version)
- [Usage](#usage)
- [HTTP Method Configuration](#http-method-configuration)
- [Identity Map](#identity-map)
- [PREFIX Management System](#prefix-management-system)
- [Language-Tagged Literals](#language-tagged-literals)
- [Collection Fields](#collection-fields)
- [UPDATE Operations](#update-operations)
- [VALUES Clause Support](#values-clause-support)
- [Property Paths](#property-paths)
- [Ontology-Aware Models with SchemaRegistry](#ontology-aware-models-with-schemaregistry)
- [Class Hierarchy Support](#class-hierarchy-support)
- [Field-Level Filtering](#field-level-filtering)
- [Running Tests](#running-tests)
- [Test Dataset](#test-dataset)
- [Limitations](#limitations)
- [Known Issues and Risks](#known-issues-and-risks)
- [Release Process](#release-process)
- [Dependencies](#dependencies)
- [Key Benefits of Pydantic Integration](#key-benefits-of-pydantic-integration)
- [License](#license)

## Features

- Declarative RDF models using Python classes with **Pydantic validation**
- Type-safe field definitions with automatic validation
- A session layer for querying and updating SPARQL endpoints
- A query compiler that converts Pythonic queries to SPARQL
- **Session identity map** to prevent duplicate instances and ensure consistency
- **PREFIX management system** for namespace handling with short-form IRIs
- **Language-tagged literal support** for multilingual text data
- **Property path support** with ORM-like convenience methods and inverse path support for reverse relationship traversal
- **Field-level filtering** with intuitive syntax and automatic datatype casting for numeric comparisons
- **String filtering on IRI fields** with chainable `str()`, `lower()`, `upper()` methods for case-insensitive matching
- **Ontology-aware models** with SchemaRegistry for automatic inverse relationship discovery via `owl:inverseOf`
- **InverseField** for clean, semantic reverse relationship navigation with automatic fallback to SPARQL `^` operator
- **Class hierarchy support** with automatic polymorphic queries — querying a base class returns all subclass instances without any extra configuration

## Installation

```bash
# Install dependencies
poetry install

# Or install the package in editable mode
pip install -e .
```

## Version

Check the installed version:

```python
import sparqlmojo
print(sparqlmojo.__version__)  # Output: 0.1.0
```

Or from the command line:

```bash
python -c "import sparqlmojo; print(sparqlmojo.__version__)"
```

### Versioning Workflow

This project uses semantic versioning with automated releases. See the [Release Process](#release-process) section for details on creating releases.

## Usage

```python
from typing import Annotated

from sparqlmojo import (
    Condition,
    InverseField,
    IRIField,
    LiteralField,
    Model,
    ObjectPropertyField,
    RDF_TYPE,
    SchemaRegistry,
    Session,
    SPARQLCompiler,
    SubjectField,
)


class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="schema:Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("schema:name")] = None
    age: Annotated[int | None, LiteralField("schema:age")] = None
    knows: Annotated[str | None, ObjectPropertyField("schema:knows", range_="Person")] = None


# Create a session
s = Session(endpoint="http://example.org/sparql")

# For endpoints with separate read/write URLs (e.g., Fuseki):
# s = Session(
#     endpoint="http://example.org/sparql",           # For SELECT queries
#     write_endpoint="http://example.org/update"      # For INSERT/DELETE/UPDATE
# )

# Configure HTTP method for SELECT queries (see "HTTP Method Configuration" below):
# s = Session(endpoint="http://example.org/sparql", query_method="GET")

# Build and compile a query
q = s.query(Person).filter(Condition("age", ">", 30)).limit(5)
sparql = SPARQLCompiler.compile_query(q)
print(sparql)

# Create an instance with validation
bob = Person(iri="http://example.org/bob", name="Bob", age=28)
s.add(bob)
s.commit()

# Pydantic validates types automatically
try:
    invalid = Person(iri="http://example.org/alice", name="Alice", age="not a number")  # Raises ValidationError
except Exception as e:
    print(f"Validation error: {e}")
```

## HTTP Method Configuration

SPARQLMojo supports configurable HTTP methods for SPARQL SELECT queries. By default, POST is used to avoid URL length limitations with large queries.

### Query Methods

| Method | Description | Use Case |
|--------|-------------|----------|
| `POST` | Use HTTP POST for SELECT queries (default) | Recommended for most cases; avoids URL length issues |
| `GET` | Use HTTP GET for SELECT queries | Required by some read-only endpoints; better caching |

### Configuration

```python
from sparqlmojo import Session

# Default: Always use POST (safest option)
session = Session(endpoint="http://example.org/sparql")
# or explicitly:
session = Session(endpoint="http://example.org/sparql", query_method="POST")

# Use GET (for endpoints that require it or for caching benefits)
session = Session(endpoint="http://example.org/sparql", query_method="GET")
```

### When to Use Each Mode

**POST (Default)**
- Recommended for most applications
- No risk of HTTP 414 "URI Too Long" errors
- Works with queries of any size, including large VALUES clauses
- Some proxies/CDNs may not cache POST requests

**GET**
- Better HTTP caching (responses can be cached by proxies)
- Required by some read-only SPARQL endpoints
- Risk of HTTP 414 errors with large queries (URLs > 2000 characters)
- Query is visible in server access logs (potential security consideration)

Note: UPDATE queries (INSERT, DELETE) always use POST regardless of this setting, as required by the SPARQL protocol.

## Identity Map

SPARQLMojo now includes a Session identity map to prevent duplicate instances and ensure consistency:

```python
# First retrieval creates new instance
person1 = session.get(Person, "http://example.org/bob")

# Second retrieval returns the SAME instance (not a duplicate)
person2 = session.get(Person, "http://example.org/bob")

assert person1 is person2  # True - same object reference

# Changes to one reference are visible in all references
person1.name = "Robert"
print(person2.name)  # "Robert" - same object
```

### Benefits

- **Memory Efficiency**: Uses weak references for automatic garbage collection
- **Consistency**: All operations on the same entity work with the same object
- **Performance**: Avoids creating duplicate objects for the same entity
- **Automatic Management**: No manual cache management required

### Manual Cache Management

```python
# Remove specific instance from identity map
session.expunge(person)

# Clear all instances from identity map
session.expunge_all()
```

## PREFIX Management System

SPARQLMojo now includes a comprehensive PREFIX management system for namespace handling:

### Features

- **Built-in Common Prefixes**: schema, foaf, rdf, rdfs, owl, xsd, dc, dcterms, skos, ex
- **Custom Prefix Registration**: Add your own namespace prefixes
- **Short-form IRI Support**: Use `schema:Person` instead of full IRIs
- **Automatic PREFIX Declarations**: SPARQL queries include proper PREFIX clauses
- **IRI Expansion/Contraction**: Convert between short-form and full IRIs

### Usage

```python
from typing import Annotated

from sparqlmojo import IRIField, LiteralField, Model, RDF_TYPE, Session, SubjectField

# Define model with short-form IRIs
class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="schema:Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("schema:name")] = None
    age: Annotated[int | None, LiteralField("schema:age")] = None

# Create session with built-in prefix registry
session = Session()

# Register custom prefix
session.register_prefix("my", "http://example.org/my/")

# Query generation with automatic PREFIX declarations
query = session.query(Person)
sparql = query.compile()
# Generates: PREFIX schema: <http://schema.org/> ...

# IRI expansion/contraction
expanded = session.expand_iri("schema:Person")  # "http://schema.org/Person"
contracted = session.contract_iri("http://schema.org/Person")  # "schema:Person"
```

### Benefits

- **Improved Developer Experience**: No need to write full IRIs everywhere
- **Better Readability**: Code is more concise and understandable
- **Easy Maintenance**: Update namespace URIs in one place
- **Standards Compliance**: Generates proper SPARQL PREFIX declarations

## Language-Tagged Literals

SPARQLMojo supports language-tagged literals via `LangString` and `MultiLangString` fields for multilingual text data with BCP 47 language tag validation.

→ [Full documentation](docs/language-tagged-literals.md)

## Collection Fields

SPARQLMojo supports collection fields (`LiteralList`, `LangStringList`, `IRIList`, `TypedLiteralList`) for aggregating multiple values from multi-valued RDF properties into Python lists, with support for filtering, size limiting, and efficient multi-field queries.

→ [Full documentation](docs/collection-fields.md)

## UPDATE Operations

SPARQLMojo supports UPDATE operations with dirty tracking, as well as batch inserts, updates, and deletes with automatic chunking for large datasets.

→ [Full documentation](docs/update-operations.md)

## Running Tests

```bash
# Run all tests
poetry run pytest

# Run specific test file
poetry run pytest tests/test_basic.py
```

**See Also**: [Test Fixtures Documentation](tests/README.md) for comprehensive documentation of shared fixtures, test models, and test organization.

## Test Dataset

The project includes a comprehensive library management test dataset in `tests/fixtures/library.ttl` with Books, Users, and Checkout Records, along with worked examples showing how Python model instances translate to RDF triples.

→ [Full documentation](docs/test-dataset.md)

## Limitations

This is a prototype with several intentional limitations:

- **No transaction support**: Simple staging mechanism for inserts only
- **No conflict resolution**: Basic operations only
- **Not production-ready**: Focuses on demonstrating design patterns

For real-world use, consider adding:
- Proper literal typing
- Better parsing of results
- Streaming results and pagination
- Transaction support

## Known Issues and Risks

### Pydantic Internal API Dependency

SPARQLMojo uses Pydantic's internal `ModelMetaclass` to enable the intuitive field-level filtering syntax:

```python
# This clean syntax is powered by the custom metaclass
query.filter(Person.name == "Alice")
query.filter(Product.price > 100)
```

**The Risk**: The metaclass is imported from Pydantic's **private internal API**:

```python
from pydantic._internal._model_construction import ModelMetaclass as PydanticModelMetaclass
```

The `_internal` prefix indicates this is not part of Pydantic's public API and **could change without notice** in any Pydantic release. According to the Pydantic maintainers, they "want to be able to refactor the `ModelMetaclass` without it being considered a breaking change."

**What This Means**:
- ⚠️ **No stability guarantees**: The metaclass implementation may change in minor/patch releases
- ⚠️ **No deprecation warnings**: Changes won't be announced in advance
- ⚠️ **Potential breakage**: Any Pydantic update could require code changes

**Mitigation Strategy**:
1. **Pin Pydantic version** carefully in production environments
2. **Test thoroughly** after any Pydantic updates before upgrading
3. **Fallback available**: If the metaclass breaks, fall back to the less elegant method-based approach:
   ```python
   # Alternative syntax that doesn't depend on private APIs
   query.filter(Person._get_field_filter("name") == "Alice")
   ```

**Why We Use It Anyway**: The UX benefit of the SQLAlchemy-like syntax is significant for a prototype focused on design clarity. For production use, consider the risk-reward tradeoff for your specific needs.

**References**:
- [Pydantic Issue #6381: ModelMetaclass Import Location](https://github.com/pydantic/pydantic/issues/6381)
- [Pydantic Discussion #7185: ModelField and ModelMetaclass in v2](https://github.com/pydantic/pydantic/discussions/7185)

## VALUES Clause Support

SPARQLMojo supports the SPARQL VALUES clause for efficient query constraints with explicit value sets, via both an ORM-style field-reference API and a dict-style API for multi-variable bindings.

→ [Full documentation](docs/values-clause.md)

## Property Paths

SPARQLMojo supports SPARQL property paths for advanced relationship traversal, with ORM-like convenience methods (`transitive`, `zero_or_more`, `inverse`, etc.) and a `PropertyPath` escape hatch for complex expressions.

→ [Full documentation](docs/property-paths.md)

## Ontology-Aware Models with SchemaRegistry

SPARQLMojo provides ontology-aware modeling through `SchemaRegistry`, enabling automatic inverse relationship discovery via `owl:inverseOf` and compile-time schema validation (domain, range, cardinality).

→ [Full documentation](docs/schema-registry.md)

## Class Hierarchy Support

SPARQLMojo supports `rdfs:subClassOf` class hierarchies — querying a base class automatically returns all registered subclass instances via polymorphic `VALUES ?type` queries, with no extra configuration required.

→ [Full documentation](docs/class-hierarchy.md)

## Field-Level Filtering

SPARQLMojo provides intuitive field-level filtering similar to SQLAlchemy, with Python comparison operators, automatic datatype casting, chainable string methods for IRI fields, and logical operators (`and_`, `or_`, `not_`).

→ [Full documentation](docs/field-level-filtering.md)

## Release Process

SPARQLMojo uses a tag-based release workflow with automated CHANGELOG management and Codeberg Releases.

### Workflow Overview

1. **During Development**: Update `CHANGELOG.md` in the `[Unreleased]` section when creating merge requests
2. **Accumulate Changes**: Multiple MRs can add to `[Unreleased]` before a release
3. **Create Release**: Tag the commit to trigger automated release creation

### For Contributors (Merge Request Time)

When creating a merge request, update `CHANGELOG.md` under the `[Unreleased]` section:

```markdown
## [Unreleased]

### Fixed
- Issue #123: Fixed bug in query compilation

### Added
- New feature for advanced filtering

### Changed
- Improved performance of batch operations
```

Follow [Keep a Changelog](https://keepachangelog.com/) format with sections:
- `Fixed` - Bug fixes
- `Added` - New features
- `Changed` - Changes to existing functionality
- `Deprecated` - Soon-to-be removed features
- `Removed` - Removed features
- `Security` - Security fixes

### For Maintainers (Release Time)

When ready to release a new version:

```bash
# 1. Preview release notes and create tag
./scripts/tag-release.sh v0.12.0

# 2. Push the tag to trigger CI/CD automation
git push origin v0.12.0
```

The CI/CD workflow (`.gitea/workflows/release.yml`) automatically:
- Extracts release notes from `[Unreleased]` section
- Updates `CHANGELOG.md` (`[Unreleased]` → `[0.12.0] - 2026-03-05`)
- Adds new empty `[Unreleased]` section at the top
- Commits and pushes CHANGELOG update to main
- Creates Codeberg release with extracted notes

**Manual Alternative** (if CI/CD unavailable):

```bash
# 1. Create and push tag
git tag v0.12.0 && git push origin v0.12.0

# 2. Run publish script manually
./scripts/publish-release.sh v0.12.0

# 3. Push CHANGELOG update
git push origin main
```

### Release Scripts

- **`tag-release.sh`** - Create annotated tag with release notes preview
- **`publish-release.sh`** - Update CHANGELOG and publish to Codeberg
- **`create-release.sh`** - Legacy all-in-one script (use `tag-release.sh` instead)

See `scripts/README.md` for detailed documentation.

### Version Format

Use semantic versioning: `vMAJOR.MINOR.PATCH`

- **MAJOR**: Breaking changes
- **MINOR**: New features (backward compatible)
- **PATCH**: Bug fixes (backward compatible)

Examples: `v0.11.0`, `v1.0.0`, `v1.2.3`

## Dependencies

- `pydantic>=2.12.4` - Data validation and type checking
- `SPARQLWrapper>=2.0.0` - SPARQL endpoint communication
- `rdflib>=6.0.0` - RDF graph parsing and manipulation

## Key Benefits of Pydantic Integration

- **Type Safety**: Fields are validated at runtime against their type annotations
- **Better IDE Support**: Full autocomplete and type hints in modern IDEs
- **Clear Error Messages**: Pydantic provides detailed validation errors
- **Automatic Coercion**: Compatible types are automatically converted (e.g., `"123"` → `123` for int fields)
- **Extra Field Protection**: Unknown fields are rejected by default

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

