Metadata-Version: 2.1
Name: sdata-core
Version: 0.1.8
Summary: Structured data format with modern metadata system, Dublin Core and JSON-LD support
License: MIT
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: python-dateutil>=2.8
Requires-Dist: sortedcontainers>=2.4
Description-Content-Type: text/markdown

# sdata_core

Structured data format with modern metadata system, Dublin Core and JSON-LD support.

## Features

- **Type-safe Metadata**: Dataclass-based `Attribute` and `Metadata` classes
- **Dublin Core Support**: Built-in vocabulary mapping for scientific data
- **JSON-LD Export**: Semantic web compatible output
- **DataFrame Integration**: Seamless pandas DataFrame conversion
- **SUUID**: Semantic UUIDs for reproducible identification
- **JSON Schema**: Auto-generated validation schemas

## Installation

```bash
pip install sdata_core
```

Or with uv:

```bash
uv add sdata_core
```

## Quick Start

### Basic Metadata

```python
from sdata_core import Attribute, Metadata, DType

# Create attributes
attr = Attribute(
    name="temperature",
    value=293.15,
    dtype=DType.FLOAT,
    unit="K",
    description="Sample temperature"
)

# Create metadata container
meta = Metadata(name="Experiment 001")
meta.set_attr("force", 5000.0, unit="N", dtype=DType.FLOAT)
meta.set_attr("material", "DP800 Steel")
meta.set_attr("valid", True, dtype=DType.BOOL)

# Access attributes
print(meta["force"].value)  # 5000.0
print(meta.keys())  # ['force', 'material', 'valid']
```

### Serialization

```python
# JSON export (NaN-safe)
json_str = meta.to_json()

# DataFrame export
df = meta.to_dataframe()
print(df)
#              name        value unit   dtype
# key
# force        force       5000.0    N   float
# material  material  DP800 Steel    -     str
# valid        valid         True    -    bool

# Round-trip
meta2 = Metadata.from_json(json_str)
meta3 = Metadata.from_dataframe(df)
```

### Dublin Core Integration

```python
from sdata_core import Metadata, DublinCore, add_dc_attribute

meta = Metadata(name="Research Dataset")

# Add Dublin Core metadata
add_dc_attribute(meta, "title", "Tensile Test Results")
add_dc_attribute(meta, "creator", "Dr. Jane Smith")
add_dc_attribute(meta, "identifier", "doi:10.1234/example")

# Get Dublin Core representation
dc_dict = DublinCore.to_dc_dict(meta)
print(dc_dict)
# {'dc:title': 'Tensile Test Results', 'dc:creator': 'Dr. Jane Smith', ...}
```

### JSON-LD Export

```python
import json

jsonld = meta.to_jsonld()
print(json.dumps(jsonld, indent=2))
# {
#   "@context": {
#     "@vocab": "https://schema.org/",
#     "dc": "http://purl.org/dc/elements/1.1/",
#     ...
#   },
#   "@type": "sdata_core:Metadata",
#   ...
# }
```

### Type-Annotated Fields

```python
from typing import Annotated
from sdata_core import FieldMeta, create_attribute_from_annotated

# Define typed field with metadata
Temperature = Annotated[float, FieldMeta(
    unit="K",
    description="Temperature measurement",
    ontology="http://purl.obolibrary.org/obo/PATO_0000146"
)]

# Create attribute from annotated type
attr = create_attribute_from_annotated("sample_temp", 293.15, Temperature)
print(attr.unit)      # "K"
print(attr.ontology)  # "http://purl.obolibrary.org/obo/PATO_0000146"
```

### Semantic UUIDs (SUUID)

```python
from sdata_core import SUUID

# Create deterministic SUUID from name
sid = SUUID.from_name(class_name="Experiment", name="Test 001")
print(sid.sname)  # "Experiment__test_001__<uuid>"
print(sid.did)    # "did:sdata_core-suuid:Experiment__test_001__<uuid>"

# Random SUUID
sid2 = SUUID(class_name="Data", name="sample")
print(sid2.huuid)  # Random 32-char hex string
```

### JSON Schema Generation

```python
schema = Metadata.get_schema()
print(schema["title"])  # "sdata_core Metadata Schema"

# Validate with jsonschema library
import jsonschema
data = meta.to_dict()
jsonschema.validate(instance=data, schema=schema)
```

## Supported Data Types

| DType | Python Type | Description |
|-------|-------------|-------------|
| `DType.FLOAT` | `float` | Floating point numbers |
| `DType.INT` | `int` | Integers |
| `DType.STR` | `str` | Strings |
| `DType.BOOL` | `bool` | Booleans |
| `DType.TIMESTAMP` | `datetime` | ISO 8601 timestamps |
| `DType.LIST` | `list[str]` | List of strings |

## Export Formats

- **JSON**: `to_json()` / `from_json()`
- **DataFrame**: `to_dataframe()` / `from_dataframe()`
- **CSV**: `to_csv()` / `from_csv()`
- **JSON-LD**: `to_jsonld()` / `from_jsonld()`
- **Dict**: `to_dict()` / `from_dict()`

## License

MIT License

