Metadata-Version: 2.4
Name: yamldb
Version: 2.0.0
Summary: A file based database using yaml as file format
Author-email: Gregor von Laszewski <laszewski@gmail.com>
Maintainer-email: Gregor von Laszewski <laszewski@gmail.com>
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           Copyright 2017 Gregor von Laszewski, Indiana University
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
        
Project-URL: Homepage, https://github.com/cloudmesh/yamldb
Project-URL: Documentation, https://github.com/cloudmesh/yamldb/blob/main/README.md
Project-URL: Repository, https://github.com/cloudmesh/yamldb.git
Project-URL: Issues, https://github.com/cloudmesh/yamldb/issues
Project-URL: Changelog, https://github.com/cloudmesh/yamldb/blob/main/CHANGELOG.md
Keywords: helper library,cloudmesh
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Environment :: Other Environment
Classifier: Environment :: Plugins
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows :: Windows 10
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: User Interfaces
Classifier: Topic :: System
Classifier: Topic :: System :: Distributed Computing
Classifier: Topic :: System :: Shells
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cloudmesh-ai-common
Requires-Dist: jmespath
Requires-Dist: ruamel.yaml
Requires-Dist: portalocker
Requires-Dist: click
Requires-Dist: msgpack
Requires-Dist: fastapi
Requires-Dist: uvicorn
Provides-Extra: encrypt
Requires-Dist: cryptography; extra == "encrypt"
Dynamic: license-file

# YamlDB

YamlDB is a lightweight, file-based database that uses YAML for storage. It provides a simple API for managing nested configuration data with support for atomic writes, concurrency locking, and advanced querying.

## Features

- **Nested Key Access**: Use dot-notation (e.g., `user.profile.name`) to get or set values.
- **Atomic Writes**: Ensures data integrity by writing to a temporary file before replacing the original.
- **Concurrency Locking**: Uses system-level advisory locks (`portalocker`) to prevent data corruption during concurrent access.
- **Comment Preservation**: Powered by `ruamel.yaml`, it preserves comments and formatting in your YAML files.
- **Write Optimization**: An `auto_flush` mechanism and `_dirty` flag reduce unnecessary disk I/O.
- **Advanced Querying**: Integrated JMESPath support for complex searches.
- **Type Casting**: Hybrid system for explicit casting during storage and retrieval.
- **Transactions**: Atomic bulk updates with full rollback support.
- **CLI Tool**: Manage your YAML databases directly from the terminal.

## Installation

### Standard Installation
```bash
pip install .
```

### Installation with Encryption Support
To use the `:encrypt:` backend, you need the `cryptography` library:
```bash
pip install ".[encrypt]"
```

## Quick Start

### Programmatic API (Computing Infrastructure Example)

YamlDB is ideal for managing infrastructure manifests, cluster configurations, and node metadata.

```python
from yamldb import YamlDB

# Initialize DB for cluster configuration
db = YamlDB(filename="cluster_config.yml", auto_flush=True)

# Define infrastructure components using dot-notation
db.set("cluster.name", "hpc-cluster-01")
db.set("cluster.nodes.node01.gpu_count", "8", cast=int)
db.set("cluster.nodes.node01.status", "online")
db.set("cluster.nodes.node02.gpu_count", "4", cast=int)
db.set("cluster.nodes.node02.status", "maintenance")

# Retrieve infrastructure details
gpu_count = db.get_as("cluster.nodes.node01.gpu_count", int)
status = db.get("cluster.nodes.node01.status")

# Advanced Search (JMESPath)
# Find all nodes that are currently 'online'
online_nodes = db.search("cluster.nodes.[?status=='online']")

# Bulk Updates in a Transaction (e.g., updating cluster version)
with db.transaction():
    db.set("cluster.version", "2.4.1")
    db.set("cluster.last_updated", "2026-04-27")
    # If an exception occurs, the version won't be partially updated
```

### CLI Usage

The `yamldb` CLI provides a powerful way to interact with your YAML databases directly from the terminal.

#### General Usage
```bash
yamldb [OPTIONS] COMMAND [ARGS]...
```

#### Commands

**`get`**: Retrieve a value using dot-notation.
```bash
yamldb get <file> <key>
# Example: yamldb get config.yml user.profile.name
```

**`set`**: Set a value. Automatically creates parent keys if they don't exist.
```bash
yamldb set <file> <key> <value>
# Example: yamldb set config.yml app.version 1.2.0
```

**`delete`**: Remove a key from the database.
```bash
yamldb delete <file> <key>
# Example: yamldb delete config.yml user.old_setting
```

**`search`**: Query the database using JMESPath expressions.
```bash
yamldb search <file> <query>
# Example: yamldb search config.yml "[?status=='active']"
```

**`stats`**: Display write efficiency and I/O statistics.
```bash
yamldb stats <file>
# Example: yamldb stats config.yml
```

## Advanced API Reference

### `items_recursive()`
A generator that yields all leaf nodes in the database as `(dot_notation_key, value)` pairs. Useful for auditing entire infrastructure states.
```python
for key, value in db.items_recursive():
    print(f"{key}: {value}")
# Output: cluster.nodes.node01.gpu_count: 8 ...
```

### `find_all(value)` & `filter(predicate)`
Quickly locate infrastructure components based on their state.
```python
# Find all nodes that are in 'maintenance' mode
maintenance_nodes = db.find_all("maintenance")

# Find all nodes with more than 4 GPUs
high_capacity_nodes = db.filter(lambda v: isinstance(v, int) and v > 4)
```

### `update_many(data_dict)`
Perform multiple infrastructure updates atomically.
```python
db.update_many({
    "cluster.nodes.node01.status": "offline",
    "cluster.nodes.node01.last_reboot": "2026-04-27",
    "cluster.global.maintenance_mode": True
})
```

### Wildcard Retrieval
You can use the `*` wildcard in `get()` or via bracket access to retrieve multiple values across the database. This is powered by JMESPath under the hood.

```python
# Get the status of ALL nodes in the cluster
# Returns a list: ['online', 'maintenance', 'online']
statuses = db.get("cluster.nodes.*.status")

# Get the GPU count for all nodes
# Returns a list: [8, 4, 16]
gpu_counts = db["cluster.nodes.*.gpu_count"]
```

### Write Efficiency (`get_stats`)
Track how many disk writes were avoided thanks to the `_dirty` flag.
```python
stats = db.get_stats()
print(f"Write Efficiency: {stats['write_efficiency']}")
```

## Configuration

- `filename`: Path to the YAML file.
- `backend`: 
    - `:file:` (default): Standard human-readable YAML storage.
    - `:memory:`: In-memory storage (no disk I/O).
    - `:binary:`: High-performance binary storage using JSON serialization.
- `auto_flush`: If `True` (default), changes are written to disk immediately unless inside a transaction.

## Advanced Features

### Binary Storage
For applications requiring high performance and smaller file sizes, use the `:binary:` backend.
```python
db = YamlDB(filename="data.bin", backend=":binary:")
db.set("metrics.cpu", 45)

# Export binary data to human-readable YAML for debugging
db.convert_to_yaml("debug_export.yml")
```

### Secure Storage (Encryption)
For sensitive data, use the `:encrypt:` backend. This encrypts the **entire database file** (including keys and structure) using AES-128 symmetric encryption.

> **Note**: The `:encrypt:` backend is currently **experimental**. We are actively refining its implementation and would greatly appreciate your feedback!

```python
# Initialize an encrypted database
db = YamlDB(
    filename="secrets.enc", 
    backend=":encrypt:", 
    password="your-strong-password"
)

# Use it exactly like a normal YamlDB
db.set("cluster.admin_password", "super-secret-123")
db.set("cluster.api_key", "abc-123-def-456")

# The file 'secrets.enc' is now a binary blob that is unreadable 
# without the correct password.
```

### Web UI Prototype
YamlDB comes with a lightweight Web UI for visual data management.

**To run the Web UI:**
1. Install dependencies: `pip install fastapi uvicorn`
2. Run the server: `python yamldb/bin/run_webui.py`
3. Open your browser to `http://localhost:8000`

The Web UI allows you to browse the database tree, set/delete values via dot-notation, and monitor write efficiency in real-time.
