Metadata-Version: 2.4
Name: mcard
Version: 0.1.23
Summary: MCard: Local-first Content Addressable Storage with Content Type Detection
Home-page: https://github.com/xlp0/MCard_TDD
Author: Ben Koo
Author-email: Ben Koo <koo0905@gmail.com>
Project-URL: Homepage, https://github.com/xlp0/MCard_TDD
Project-URL: Source, https://github.com/xlp0/MCard_TDD
Project-URL: Tracker, https://github.com/xlp0/MCard_TDD/issues
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: python-dateutil>=2.9.0.post0
Requires-Dist: SQLAlchemy>=1.4.47
Requires-Dist: aiosqlite>=0.17.0
Requires-Dist: python-dotenv>=1.1.0
Requires-Dist: chardet>=5.1.0
Requires-Dist: structlog>=23.2.0
Requires-Dist: python-json-logger>=3.0.0
Requires-Dist: colorlog>=6.7.0
Requires-Dist: PyYAML>=6.0.0
Requires-Dist: wasmtime>=39.0.0
Provides-Extra: xml
Requires-Dist: lxml>=4.9.0; extra == "xml"
Provides-Extra: dev
Requires-Dist: pytest<9,>=8.2.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.8; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: mypy>=1.7.1; extra == "dev"
Requires-Dist: black>=23.11.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: twine>=6.1.0; extra == "dev"
Requires-Dist: lxml>=4.9.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# MCard: Local-First Content Addressable Storage

[![Python Version](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
[![CI/CD Pipeline](https://github.com/xlp0/MCard_TDD/actions/workflows/ci.yml/badge.svg)](https://github.com/xlp0/MCard_TDD/actions/workflows/ci.yml)
[![Quality Gate](https://img.shields.io/badge/quality-enterprise-green)](https://github.com/xlp0/MCard_TDD)

MCard is a powerful Python library implementing an algebraically closed data structure for content-addressable storage. It provides a robust system where every piece of content is uniquely identified by its cryptographic hash and temporally ordered, enabling content verification, deduplication, and versioning.

The system features a modular architecture with support for multiple content types and a flexible database backend (SQLite).

## Executive Summary

- Implements content-addressable storage with guaranteed integrity and temporal ordering
- Built for performance: binary-first storage with smart text views when needed
- Developer-friendly: rich Python API, examples, tests, and BMAD-driven TDD workflow
- Enterprise-grade: comprehensive logging, CI/CD pipeline, security auditing, and quality gates
- Production-ready: 99.4% test success rate, zero breaking changes, modern tooling (ruff, uv)
- Quick start and test in minutes with `uv` and provided scripts

## 🔮 Future Vision: Towards Verifiable Execution with PTR and Audited Collections

MCard is evolving beyond content-addressable storage to become the foundational layer for a **Polynomial Type Runtime (PTR)** and **Audited MCard Collections**. This strategic direction integrates advanced concepts from the **Cubical Logic Model (CLM)** and **Purely Functional Software Deployment (PFSD)** to deliver a system with mathematically verifiable execution, immutable audit trails, and content-driven programming capabilities.

### Key Capabilities in Development:

- **PTR-Verified Execution**: All critical operations will be mediated by PTR, ensuring that only formally verified PCards (programs defined by CLM's Abstract, Concrete, and Balanced dimensions) are executed. This guarantees correctness and prevents unverified code from running. The runtime may evolve from Python to formal verification systems like **LEAN** for mathematical proof of correctness (see PRD Section 12.4).
- **Immutable Audit Fabric**: Every significant system event, policy decision, and execution trace will be captured as an **Audited MCard**. These collections form a tamper-evident, content-addressed evidence ledger, enabling provable accountability and compliance.
- **Unified Observability**: PTR and audit pipelines will emit OpenTelemetry (OTel)-compatible traces, metrics, and logs, enabling vendor-neutral integration with existing observability stacks (Prometheus, Jaeger, Grafana, commercial APMs) as described in the PRD's observability requirements.
- **Content-Driven Programming**: Leveraging CLM, programs themselves will be represented as graphs of hash-addressed MCards and PCards, moving away from traditional file-based code towards a verifiable, compositional programming model.
- **Polyglot Execution**: The PTR now supports multi-language execution (Python, JavaScript, Rust, C, Wasm), allowing developers to write PCards in their preferred language while maintaining a unified verification and execution interface.
- **Evolutionary Architecture**: MCard now implements an evolutionary model for PCards, distinguishing between persistent **PCard Identities** (upgradable contracts) and immutable **PCard Snapshots** (versioned logic), enabling safe upgrades and forking of logic models.
- **Reproducible Deployment**: Adhering to PFSD principles, the entire software lifecycle (build, verify, deploy) will be driven by content-addressed artifacts and pure functions, ensuring full reproducibility and auditability of all system states.

#### High-Level Observability Architecture

```mermaid
graph LR
  PTR[PTR Runtime] --> OTel[OpenTelemetry Collector]
  OTel --> Prom[Prometheus / Metrics]
  OTel --> Jaeger[Jaeger / Traces]
  OTel --> Graf[Grafana / APM / Dashboards]
```

This evolution transforms MCard into a self-hosting execution kernel, where correctness, security, and auditability are mathematical properties of the system, not just operational conventions. For detailed product requirements, see the [MCard PRD](docs/prd.md).

## 🌌 The Prologue of Spacetime: A Monadic Narrative

We are currently operationalizing the **Meta-Narrative Framework** through the **Prologue of Spacetime**, a twelve-chapter execution plan that builds the universe of the project using the **Cubical Logic Model (CLM)**.

### The Monadic Template
To support this, we have introduced the **Narrative Monad** (`mcard/ptr/core/clm_template.py`), a functional programming template that composes:
*   **Reader Monad**: Carries the Cultural Context (Tri Hita Karana) and Configuration.
*   **State Monad**: Carries the evolving World State (Village Prosperity, Network Topology).
*   **Writer Monad**: Accumulates the Log of the journey (The Story Text, The Audit Trail).
*   **IO Monad**: Handles the Effects at the boundaries (User Interaction, System Deployment).

This template allows us to write each chapter as a pure function: `Chapter :: Context -> State -> (Result, NewState, Log)`.

### Chapter 0 (Prologue): The Value of Counting
The prologue chapter, "The Value of Counting," establishes the **MVP Card: The Counter**. It uses the **Water Clock** HyperCard prototype to demonstrate how the act of observation (IO) creates discrete identity (State) and fairness (Writer).

*   **Specification**: `chapters/chapter_00_prologue/mvp_counter.yaml`
*   **Prototype**: `chapters/chapter_00_prologue/water_clock.py`

### Chapter 1: Resource-Aware Computation (Arithmetic)
This chapter demonstrates **polyglot consensus** across multiple runtimes (Python, JavaScript, Rust, C, WebAssembly, Lean). It implements the Cubical Logic Model through arithmetic operations, proving that truth is invariant across representations.

*   **CLM Specifications**: `chapters/chapter_01_arithmetic/*.yaml`
*   **Runtimes**: Python, JavaScript, Rust, C, WASM, Lean 4

### Chapter 2: Content Addressing (Handle)
This chapter introduces the **Content Handle** system, enabling human-readable names to reference content-addressed MCards. It demonstrates the duality between immutable hash-based retrieval and mutable handle-based resolution.

*   **CLM Specifications**: `chapters/chapter_02_handle/*.clm`
*   **Test Data**: `chapters/chapter_02_handle/test_data/*.yaml`
*   **Key Features**:
    - **UTF-8 Handles**: International characters supported (文檔, مستند, ドキュメント, документ)
    - Handle validation uses Unicode categories (letters, digits, `_`, `-`)
    - NFC normalization + casefold for consistent lookup
    - Version history tracking
    - Dual retrieval (hash vs handle)
*   **Monadic API**: `get_by_handle_m()`, `resolve_handle_m()` return `Maybe` monad for functional composition

### Chapter 3: LLM Integration
This chapter introduces the **LLM Runtime**, enabling Large Language Model execution as a first-class CLM runtime. It demonstrates monadic LLM interactions with local models via Ollama.

*   **CLM Specifications**: `chapters/chapter_03_llm/*.clm`
*   **Providers**: Ollama (default), LMStudio, OpenAI (extensible)
*   **Default Models**: `gemma3:latest`, `llama3:latest`, `qwen3:latest`
*   **Key Features**:
    - **Monadic Interface**: `prompt_monad()`, `chat_monad()` return `IO[Either]` for composition
    - **System Prompts**: Full chat completion support with system/user/assistant roles
    - **Structured Output**: JSON extraction and entity recognition
    - **File Summarization**: Summarize `.md` and `.py` files with configurable styles
*   **Demo Scripts**:
    - `scripts/demo_llm_runtime.py` - Interactive demos
    - `chapters/chapter_03_llm/file_summarizer_logic.py` - File summarization

### Chapter 4: High-Performance Data Loading
This chapter enables **bulk data ingestion** and **performance benchmarking**, dealing with both text and binary data (images, audio). It demonstrates recursive loading and mixed-content handling.

*   **CLM Specifications**: `chapters/chapter_04_load_dir/*.clm`
    *   `binary_loader.clm`: Benchmarks loading binary datasets (Images, Audio).
    *   `recursive_options.clm`: Demonstrates control over recursive vs. flat directory scanning.
    *   `hub_loader.clm`: Benchmarks loading text datasets.
*   **Logic Implementation**: `chapters/chapter_04_load_dir/loader_logic.py`
    *   **Features**: Recursive scanning, binary detection, vector embedding skipping for non-text.
    *   **Metrics**: Throughput (Files/sec, MB/sec), Retrieval Latency.

### Chapter 5: Reflection & Metacognition
This chapter introduces **Systemic Self-Awareness**. It uses CLMs to analyze *other* CLMs, creating a closed loop where the system understands its own structure.

*   **CLM Specifications**: `chapters/chapter_05_reflection/*.clm`
    *   `clm_inventory.clm`: Scans and catalogs all 40+ CLMs in the project.
    *   `runtime_audit.clm`: Analyzes the distribution of polyglot runtimes (Python, Lean, Rust, etc.).
    *   **Narrative Weaver**: `narrative_weaver.clm` reconstructs the "Prologue of Spacetime" story from disjoint chapters.
*   **Concept**: The system can now "read" itself, treating its own code as data (MCard), fulfilling the monadic vision of reflection.

### Advanced Capabilities: GraphRAG & Multimodal Perception
Beyond the core CLM chapters, MCard includes advanced prototypes for **Graph Retrieval-Augmented Generation (GraphRAG)** and **Multimodal Perception**. These features transform MCard into a system that can "see" images and "reason" about relationships.

*   **Logic Implementation**: `mcard/rag/graph/` (Entities, Relationships, Community Detection)
*   **GraphRAG Engine**:
    *   **Entity Extraction**: Automatically extracts entities and relationships from text using local LLMs.
    *   **Knowledge Graph**: Stores entities and relationships in SQLite with graph traversal capabilities.
    *   **Community Detection**: Implements Label Propagation Algorithm (LPA) to detect clusters.
    *   **Hybrid Search**: Combines `sqlite-vec` vector similarity with graph traversal.
*   **Multimodal Vision**:
    *   **Vision Embeddings**: Integrates `llama3.2-vision` to "see" images.
    *   **Describe-then-Embed**: Generates rich text descriptions of images for cross-modal search.
*   **Runnable Demos**:
    *   `scripts/demo_graphrag.py`: Full GraphRAG pipeline (Extraction -> Graph -> Query).
    *   `scripts/demo_graph_communities.py`: Community detection and summarization.
    *   `scripts/demo_vision_rag.py`: Multimodal vision embedding and search.



## ☯️ Architectural Philosophy: The Monad-Polynomial Duality

MCard's design is grounded in the complementary relationship between **Monadic Control** and **Polynomial Data**. This duality allows us to build a system that is both rigorously safe and infinitely flexible.

### 1. Monads: The Invariant Container
We employ Monadic design patterns (`IO`, `Reader`, `Writer`, `State`) to establish the **invariant laws** of execution. Monads manage the *context*—**how** computation happens. They handle:
*   **Purity & Safety**: Encapsulating side effects (IO) and error handling (Either).
*   **Observability**: Accumulating audit logs and traces (Writer).
*   **Context**: Passing configuration and security policies (Reader).

### 2. Polynomials: The Variant Content
Complementing this, **Polynomial Functors** ($P(X) = \sum A \times X^B$) inject **variability** into the system. They represent the *content*—**what** is being computed.
*   **Structure**: The PCard is a reified polynomial, defining a sum of choices (operations) and products of data (inputs).
*   **Flexibility**: By treating logic as data (Polynomials), we can swap implementations (Polyglot Runtimes) without changing the execution container.

### The Synthesis
The **PTR (Polynomial Type Runtime)** acts as the bridge. It *interprets* the **Polynomial** (the variable logic of a PCard) using the **Monad** (the invariant rules of the Runtime). This ensures that while the *logic* can be infinite and varied, the *execution* remains safe, auditable, and formally verifiable.

## Table of Contents

- [Executive Summary](#executive-summary)
- [Architectural Philosophy](#-architectural-philosophy-the-monad-polynomial-duality)
- [Getting Started](#-getting-started)
- [Quick Start](#-quick-start)
- [BMAD Method](#-bmad-method-how-we-work)
- [Project Structure](#-project-structure)
- [Core Concepts](#core-concepts)
  - [Required Attributes](#required-attributes-for-each-mcard)
  - [VCard as Curried MCard](#vcard-as-curried-mcard)
- [Running Tests](#-running-tests)
- [Content Type Detection and Validation](#-content-type-detection-and-validation)
- [Examples](#examples)
- [Enterprise Logging System](#-enterprise-logging-system)
- [CI/CD Pipeline](#-cicd-pipeline)
- [Code Quality & Linting](#-code-quality--linting)
- [Documentation](#-documentation)
 - [Advanced Topics](#advanced-topics)
 - [Zero Trust AuthN/Z](#zero-trust-authnz)

## 📦 Data Model

MCard is built around a simple but powerful data model:

- **Card**: The fundamental unit of content with a unique hash
- **Hash**: Cryptographic identifier for content (SHA-256 by default)
- **Content**: Optimized BLOB storage
  - Binary format ensures maximum performance and exact content preservation
  - Efficient storage for both text and binary data
  - MCard's browsing interface provides human-readable views when needed
- **G-Time**: Global time value for temporal ordering of content claims
- **Temporal Ordering**: Built-in support for temporal ordering of content claims
- **Modular Architecture**: Extensible design with pluggable components
- **Type Hints**: Built with Python type hints

## ✨ Features

- **Content-Addressable Storage**: Store and retrieve content using cryptographic hashes (SHA-256 by default)
- **Optimized Storage**: BLOB format ensures maximum performance while MCard handles all text conversions
- **Content Type Detection**: Automatic detection of various file formats (JSON, XML, CSV, Markdown, Python, etc.)
- **Robust Binary Signatures**: Accurate detection of PNG, JPEG, GIF, PDF, ZIP/OpenXML, and RIFF (WAV/AVI) using raw-byte signatures (no lossy text preprocessing)
- **Smarter YAML Heuristics**: Reduced false positives (e.g., Python dict strings are no longer misclassified as YAML)
- **Temporal Ordering**: Built-in support for temporal ordering of content claims
- **Modular Architecture**: Extensible design with pluggable components
- **Type Safety**: Built with Python type hints and Pydantic models
- **Async Support**: Asynchronous API for improved performance

## 🚀 Getting Started

### Database Inspection

MCard uses BLOB storage for optimal performance and data integrity. The binary format allows for efficient storage and retrieval while MCard handles all necessary text conversions. To inspect the database:

```bash
# Open the database in SQLite CLI
sqlite3 mcard.db

# View the schema
.schema

# View binary content (first 20 bytes as hex)
SELECT hash, hex(substr(content, 1, 20)) as preview, g_time FROM card LIMIT 5;

# MCard's API provides easy access to content in various formats:
# - get_content() - Returns raw bytes for maximum performance
# - get_content(as_text=True) - Returns decoded text when needed
# - to_dict() - Automatically converts content to appropriate formats
```

## 🚀 Developers Start Here: Using MCard in an Internal Service

If you are building an **internal service** and want to use MCard as your content‑addressable storage, this is the recommended starting pattern.

### 1. Install and set up the environment

```bash
git clone [https://github.com/xlp0/MCard_TDD.git](https://github.com/xlp0/MCard_TDD.git)
cd MCard_TDD
./activate_venv.sh

## Zero Trust AuthN/Z

We adopt a Zero Trust Architecture (ZTA) — "Never Trust, Always Check" — for admitting content identities into MCard collections. All network-facing operations evaluate identity, policy and context continuously, not once.

Design goals:
- Continuous verification before, during, and after content admission
- Policy-as-code for authorization decisions
- Deterministic, testable contracts using Pocketflow’s prep → exec → post specification

Pocketflow-style contract for content admission:

- prep (preconditions)
  - Caller identity established via network-aware auth (e.g., mTLS, OIDC, DiD/JWT)
  - Policy context resolved (tenant, collection, role, risk, device posture, IP/geofence)
  - Content metadata validated (size, type, provenance hints)
  - Rate limit/quota check passes

- exec (action)
  - Compute content hash deterministically (e.g., SHA-256)
  - Run validation pipeline; branch per MIME (binary/text validators)
  - Consult authorization engine with (subject, action=admit, resource=hash, context)
  - Persist only if decision = allow and validation = pass

- post (postconditions)
  - Emit audit event with decision rationale, policy version, and evaluator inputs
  - Update temporal index (g_time) and secondary indices (FTS) if applicable
  - Return VCard admission receipt (hash, g_time, policy_decision, signatures)

Currying relation across cards:
- MCard: base content-addressable identity (content, hash, g_time) - the functorial fixed point
- PCard: polynomial representation/interface protocol for MCard - effectively the same as MCard but with computational interpretation $F(X) = \sum_i (A_i \times X^{B_i})$
- VCard: curried specialization of MCard/PCard + (verification rules, validation rules, security checks)

In effect, PCard provides the polynomial functional interface to MCard's fixed point substrate, and VCard completes the currying by applying security/verification rules to the MCard/PCard foundation. See [MVP Cards for PKC](docs/MVP%20Cards%20for%20PKC.md) and [PCard Architecture](docs/PCard%20Architecture.md) for detailed mathematical foundations.

Testing notes:
- Model prep/exec/post as explicit test phases; use property tests for invariants (idempotent hash; monotonic g_time)
- Include negative suites (policy deny, malformed identity, validator fail)

### Prerequisites

- Python 3.9 or higher
- [uv](https://github.com/astral-sh/uv) - A fast Python package installer and resolver

### Basic Installation

1. Clone the repository:
   ```bash
   git clone https://github.com/xlp0/MCard_TDD.git
   cd MCard_TDD
   ```

2. Set up the Python environment using the provided script:
   ```bash
   ./activate_venv.sh
   ```
   This will:
### 🏃 Executing Logic (CLM)
The primary way to run logic in this system is via the **Polynomial Type Runtime (PTR) CLI**. This unifies loading, assembly, and polyglot execution.

```bash
# Run a specific CLM Chapter
uv run python -m mcard.ptr.cli run chapters/chapter_01_arithmetic/advanced_comparison.yaml
```

See [mcard/ptr/README.md](mcard/ptr/README.md) for full documentation on the runtime engine and `CLMRunner` API.
3. For development, install additional development dependencies:
   ```bash
   uv pip install -e ".[dev]"
   ```

### Optional Dependencies

MCard supports optional features that can be installed as extras:

  ```bash
  uv pip install -e ".[xml]"
  ```

## 🌐 Polyglot Runtime Support

MCard's **Polynomial Type Runtime (PTR)** supports **polyglot execution**, enabling you to write PCards in multiple programming languages while maintaining a unified verification and execution interface. Each runtime provides different strengths for specific use cases.

### Why Multiple Runtimes?

The polyglot architecture enables:
- **Language-specific optimization**: Use the best tool for each task (Rust for performance, Python for rapid prototyping, Lean for formal verification)
- **Formal verification**: Lean 4 provides mathematical proof of correctness for critical operations
- **Cross-platform deployment**: WASM enables browser and edge deployment
- **Legacy integration**: C and JavaScript support existing codebases
- **Type safety**: Multiple type systems provide defense in depth

### Required Runtime Installations

To run **all** polyglot tests and examples, you'll need to install the following runtimes:

#### 1. **Python** (Required)
**Purpose**: Core runtime, rapid prototyping, API implementation

**Installation**: Already required (Python 3.9+)
```bash
python3 --version  # Should be 3.9 or higher
```

#### 2. **JavaScript/Node.js** (Optional but Recommended)
**Purpose**: JavaScript PCard execution, frontend integration, WASM tooling

**Installation**:
```bash
# macOS (using Homebrew)
brew install node

# Ubuntu/Debian
sudo apt install nodejs npm

# Verify installation
node --version  # Should be v14+ or higher
npm --version
```

**Why needed**: Executes JavaScript-based PCards, enables browser-side verification, and supports the WASM compilation toolchain.

#### 3. **Rust** (Optional but Recommended)
**Purpose**: High-performance PCard execution, WASM compilation, systems programming

**Installation**:
```bash
# Install Rust using rustup (all platforms)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Add WASM target for WebAssembly compilation
rustup target add wasm32-wasi

# Verify installation
rustc --version
cargo --version
```

**Why needed**: Executes Rust-based PCards with near-native performance, compiles to WASM for universal deployment, and provides memory safety guarantees.

#### 4. **WASM Runtime (wasmtime)** (Optional)
**Purpose**: Execute WebAssembly modules, universal deployment target

**Installation**:
```bash
# macOS (using Homebrew)
brew install wasmtime

# Linux
curl https://wasmtime.dev/install.sh -sSf | bash

# Or install Python bindings
uv pip install wasmtime

# Verify installation
wasmtime --version
```

**Why needed**: Runs compiled WASM modules from Rust/C, enables sandboxed execution, and provides a universal runtime for edge deployment.

#### 5. **Lean 4** (Optional but Important)
**Purpose**: Formal verification, mathematical proof of correctness, theorem proving

**Installation**:
```bash
# Install elan (Lean version manager)
curl https://raw.githubusercontent.com/leanprover/elan/master/elan-init.sh -sSf | sh

# Install Lean 4
elan install leanprover/lean4:stable

# Verify installation
lean --version  # Should show Lean 4.x
```

**Why needed**: Provides mathematically verified computation for critical operations. Lean 4's type system ensures correctness by construction, making it ideal for security-critical and financial applications.

#### 6. **C Compiler (gcc/clang)** (Optional)
**Purpose**: Low-level systems programming, legacy integration, bare-metal execution

**Installation**:
```bash
# macOS (Xcode Command Line Tools)
xcode-select --install

# Ubuntu/Debian
sudo apt install build-essential

# Verify installation
gcc --version
# or
clang --version
```

**Why needed**: Compiles C-based PCards for maximum performance and minimal runtime overhead, integrates with existing C libraries, and enables bare-metal deployment.

### 🚀 Using the PTR CLI

MCard includes a powerful CLI tool `ptr` for managing and executing PCards.

#### 1. Check Runtime Status

```bash
uv run ./ptr --status
```
Output:
```
=== Polyglot Runtime Status ===
✓ Python 3.9.22
✓ Javascript
✓ Rust
✓ C
✓ Wasm
✓ Lean
===============================
Available: 6/6 runtimes
```

#### 2. List Available PCards

```bash
uv run ./ptr --list
```

#### 3. Execute a PCard

Run a PCard by file path or hash:

```bash
# Run runtime check PCard
uv run ./ptr runtime_status_check.clm

# Run arithmetic PCard with input
uv run ./ptr chapters/samples/python_arithmetic.clm 21
# Output: 42.0

# Run by hash (if previously loaded)
uv run ./ptr acdec653... 21
```

#### 4. Programmatic API (Advanced)

For integration into your own applications:

```python
from mcard.ptr.core.runtime import RuntimeFactory


# Check if system can execute PCards
if not RuntimeFactory.at_least_one_available():
    raise RuntimeError("No runtimes available - cannot execute PCards!")

# Get detailed status for all runtimes
status = RuntimeFactory.get_detailed_status()
print(f"Python version: {status['python']['version']}")
```

#### 5. CI/CD Validation

```bash
uv run pytest tests/ptr/test_runtime.py::TestRuntimeFactory::test_list_available_runtimes -v
```


### Minimal vs Full Installation

**Minimal** (Python only): Run the core MCard system and Python-based PCards
```bash
# Just activate the virtual environment
./activate_venv.sh
```

**Full Polyglot** (All runtimes): Run all PCard types and formal verification
```bash
# Install all runtimes as described above
./activate_venv.sh
# Then install: Node.js, Rust, WASM, Lean 4, C compiler
```


## 🔧 Recent Bug Fixes and Improvements

### Lean 4 Float Parsing Fix (December 2025)

**Problem**: Lean 4's standard library doesn't provide a `String.toFloat?` function, causing compilation errors in arithmetic PCards that needed to parse floating-point numbers from JSON strings.

**Solution**: Implemented a custom `parseFloat` function in `chapters/arithmetic/logic_advanced.lean` that:
- Parses integers and converts them to floats
- Handles decimal numbers by splitting on `'.'` and computing fractional parts
- Supports scientific notation (e.g., `1.5e-10`) by parsing mantissa and exponent separately
- Gracefully handles negative numbers and invalid input (returns `0.0` as default)

**Files Modified**:
- `chapters/arithmetic/logic_advanced.lean`: Added custom float parser with scientific notation support

**Impact**: All Lean-based arithmetic tests now pass, enabling formal verification of floating-point operations.

### Test Suite Fixes (December 2025)

**1. YAMLTemplateLoader Parameter Fix**
- **Problem**: Test was using incorrect parameter name (`template_dir` instead of `templates_dir`) and calling non-existent method `get_template()`
- **Solution**: Updated tests to use correct `templates_dir` parameter and `load_template()` method
- **Files**: `tests/ptr/test_clm.py`

**2. RustRuntime Environment Validation Fix**
- **Problem**: Mock didn't prevent wasmtime import, causing validation test to return `True` instead of expected `False`
- **Solution**: Added `patch.dict('sys.modules', {'wasmtime': None})` to properly mock the import
- **Files**: `tests/ptr/test_runtime.py`

**3. RustRuntime WASM Execution Test Fix**
- **Problem**: Test expected "not yet implemented" but actual implementation returns file error
- **Solution**: Updated assertion to match actual error message format
- **Files**: `tests/ptr/test_runtime.py`

**Test Results**: All **388 tests now pass** (1 skipped) ✅

### Lean 4 Polyglot Runtime Fixes (December 2025)

**1. Boolean Comparison Case Sensitivity**
- **Problem**: Lean 4 outputs lowercase `true`/`false`, but Python YAML parser loads booleans as `True`/`False`. Comparison `str("true") != str(True)` caused all boolean test cases to fail.
- **Solution**: Added case-insensitive boolean comparison in `CLMChapterLoader._compare_results()` that normalizes both values to lowercase before comparing.
- **Files**: `mcard/ptr/clm/loader.py`
- **Impact**: All 26 standalone Lean CLM test cases now pass (primality, propositional logic, etc.)

**2. Lean Polyglot Context Passing**
- **Problem**: In polyglot consensus tests, `LeanRuntime.execute()` was passing `target.get_content()` (which was "dummy") instead of the actual operation context containing `op`, `a`, `b` values.
- **Solution**: Updated `LeanRuntime.execute()` to detect polyglot mode (when context has `op`, `a`, `n` keys) and pass the context as JSON, while still supporting standalone CLM mode that uses target content.
- **Files**: `mcard/ptr/core/runtime.py`
- **Impact**: All 8 runtimes (Python, JavaScript, Rust, C, WASM, Lean, R, Julia) now achieve consensus in polyglot tests.

**Test Results**: All **461 tests now pass** (9 skipped) ✅

### Code Quality Improvements

- ✅ **Zero lint errors** across all Lean, Python, and test files
- ✅ **100% polyglot test coverage** for Python, JavaScript, Rust, C, WASM, Lean, R, and Julia runtimes
- ✅ **Improved error handling** in runtime validation and WASM execution
- ✅ **Better test mocking** for platform-dependent runtime checks



## 🧭 BMAD Method: How We Work

We use BMAD to drive a tight Test-Driven Development loop for MCard.

- RED: write a failing test that captures the next smallest behavior
- GREEN: implement the minimal code to pass only that test
- REFACTOR: improve design while keeping tests green

BMAD helper script and config in this repo:
- `bmad_workflow.py` – CLI to guide the RED/GREEN/REFACTOR loop
- `bmad_config.yaml` – test categories, coverage goals, environment
- `BMAD_GUIDE.md` – step-by-step usage and tips

Quick usage:
```bash
# Start a new TDD cycle for a behavior
./bmad_workflow.py start "create card from bytes"

# After writing the failing test
./bmad_workflow.py mark-written

# After making the test pass
./bmad_workflow.py mark-passing

# After refactoring
./bmad_workflow.py complete-refactor

# View current status
./bmad_workflow.py status
```

## 🏗️ Project Structure

MCard_TDD/
├── mcard/                    # Core Python package
│   ├── cli.py                # CLI implementation
│   ├── config/               # Configuration management
│   ├── engine/               # Database engine implementations
│   │   ├── base.py           # Base engine interface
│   │   └── sqlite_engine.py  # SQLite implementation
│   ├── model/                # Data models and content handling
│   │   ├── card.py           # Core MCard implementation
│   │   ├── card_collection.py # Collections of MCards
│   │   ├── ptr/                  # Polynomial Type Runtime (PTR)
│   │   │   ├── core/             # Core PTR engine
│   │   │   ├── clm/              # CLM framework
│   │   │   └── mcard_integration/# MCard integration
│   │   ├── detectors/        # Content type detectors
│   │   └── hash/             # Hashing implementations
│   │       └── algorithms/   # Hash algorithm implementations
│   └── ptr/                  # PTR package (alias/wrapper)
├── data/                     # Data storage directories
│   ├── db/                   # Database files
│   ├── loaded_content/       # Processed content storage
│   └── test_content/         # Test content files
├── docs/                     # Documentation
│   ├── reports/              # Generated reports and summaries
│   └── to-do-plan/           # Project planning documents
├── examples/                 # Example scripts
│   └── demos/                # Demo scripts
├── scripts/                  # Utility scripts and CLI wrappers
├── tests/                    # Test suite
│   ├── data/                 # Test data
│   └── test_data/            # Additional test data
├── pyproject.toml            # Project configuration
└── README.md                 # This file

## 🚦 Quick Start

### Using the Python API (synchronous)

```python
from mcard import MCard, default_collection

# Create a new card (text)
card = MCard("Hello, MCard!")
hash_value = default_collection.add(card)

# Retrieve the card by hash
retrieved = default_collection.get(hash_value)
print(retrieved.get_content().decode("utf-8"))  # Hello, MCard!

# Search for cards containing a substring
results = default_collection.search_by_string("Hello")
for c in results.items:
    try:
        print(c.get_content().decode("utf-8"))
    except UnicodeDecodeError:
        print(f"[binary] {c.hash}")
```

## 🧪 Running Tests

Run the test suite (with uv):

```bash
uv run pytest -q
```

For test coverage report:

```bash
uv run pytest --cov=mcard --cov-report=term-missing
```

### Verifying CLM Specifications
To run the full suite of Cubical Logic Models (CLMs) across all chapters:

```bash
uv run python scripts/verify_all_clms.py
```

This script recursively scans `chapters/` for `.clm` and `.yaml` files, executes them using the `PTR` runtime, and reports a comprehensive pass/fail summary.

## 🔍 Content Type Detection and Validation

- __Binary-first strategy__: `BinaryFirstStrategy` runs signature detection directly on raw bytes via `BinarySignatureDetector.detect_from_bytes()` to avoid corruption.
- __Text detection__: Falls back to text detectors only when no binary signature is recognized.
- __Validation registry__: `ValidationRegistry` dispatches to `BinaryValidator` or `TextValidator` depending on the detected MIME.
- __YAML detection__: `TextFormatDetector._is_yaml()` was refined to avoid misclassifying Python-like content as YAML.
- __Problematic bytes guard__: Optional env flag `MCARD_INTERPRETER_GUARD_PROBLEMATIC=1` treats certain pathological byte patterns as binary to prevent hangs.

## 🤝 Contributing

Contributions are welcome! Please follow these steps:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 📚 Documentation

For more detailed documentation, please see the [docs](docs/) directory:

- [Card Collection Guide](docs/card_collection_guide.md)
- [Global Time Design](docs/design_g_time.md)
- [Test-Driven Development Guide](docs/tdd_guide.md)
- [Cubical Logic Model](docs/cubical_logic_model.md)

## 📝 Changelog

See [CHANGELOG.md](CHANGELOG.md) for a list of notable changes.

## 📧 Contact

## 📚 Documentation

For more detailed documentation, please see the `docs/` directory:

- [Card Collection Guide](docs/card_collection_guide.md)
- [Global Time Design](docs/design_g_time.md)
- [Test-Driven Development Guide](docs/tdd_guide.md)
- [Cubical Logic Model](docs/cubical_logic_model.md)
- [MVP Cards for PKC](docs/MVP%20Cards%20for%20PKC.md) - Mathematical foundation of MCard/PCard/VCard currying architecture
- [PCard Architecture](docs/PCard%20Architecture.md) - Detailed polynomial functor implementation
- [Product Requirements Document](docs/prd.md) - Zero Trust AuthN/Z requirements
- [Architecture Document](docs/architecture.md) - System design with Zero Trust integration
- [Architecture & Knowledge Representation](docs/Architecture_and_Knowledge_Representation.md) - **NEW**: Comprehensive guide to the Evolutionary CLM, Polyglot PTR, and Theoretical Foundations (includes System Diagram).
- [CLM Language Specification](docs/CLM_Language_Specification.md) - **NEW**: Detailed grammar, syntax, and semantics of the CLM YAML language.

For version history, see [CHANGELOG.md](CHANGELOG.md).


## Core Concepts

MCard implements an algebraically closed system where:
1. Every MCard is uniquely identified by its content hash (consistently using SHA-256 by default, with other algorithms configurable).
2. Every MCard has an associated claim time (timezone-aware timestamp with microsecond precision).
3. The database maintains these invariants automatically.
4. Content integrity is guaranteed through immutable hashes.
5. Temporal ordering is preserved at microsecond precision.

This design provides several key guarantees:
- **Content Integrity**: The content hash serves as both identifier and verification mechanism.
- **Temporal Signature**: All cards are associated with a timestamp: `g_time`.
- **Precedence Verification**: The claim time enables determination of content presentation order.
- **Algebraic Closure**: Any operation on MCards produces results that maintain these properties.
- **Type Safety**: Built on Pydantic with strict validation and type checking.

### Required Attributes for Each MCard

Each MCard **must** have the following three required attributes:

#### 1. **`content`**: The actual data being stored (string or bytes).
#### 2. **`hash`**: A cryptographic hash of the content, using SHA-256 by default (configurable to other algorithms).
#### 3. **`g_time`**: A timezone-aware timestamp with microsecond precision, representing the global time when the card was claimed.

### VCard as Curried MCard/PCard

We model VCard as a curried specialization of MCard/PCard using functional composition:

- MCard = functorial fixed point: $F(content) = content$ (immutable data substrate)
- PCard = polynomial interface protocol for MCard: $F(X) = \sum_i (A_i \times X^{B_i})$ (computational interpretation)
- VCard = MCard/PCard + (verification rules, validation rules, security checks) (boundary enforcement)

Benefits:
- PCard provides the polynomial functional interface to MCard's fixed point substrate
- VCard completes the currying by adding security/verification boundaries
- Separation of concerns: data substrate (MCard) → computational interface (PCard) → security boundaries (VCard)
- Composability: different security rule sets can be applied to the same MCard/PCard foundation
- Testability: unit-test each layer independently

See [MVP Cards for PKC](docs/MVP%20Cards%20for%20PKC.md) for the complete mathematical foundation of this currying architecture.

## Directory Structure

- `mcard/`: Contains the main application code.
  - `algorithms/`: Hash algorithm implementations (renamed from `hash_algorithms`)
  - `engine/`: Database engines (SQLite, DuckDB)
  - `model/`: Core data models
  - `api.py`: FastAPI endpoints
  - `logging_config.py`: Logging configuration
- `examples/`: Example scripts demonstrating how to use the MCard system.
- `tests/`: Contains test files for the application.
  - `persistence/`: Database persistence tests
  - `unit/`: Unit tests
- `logs/`: Contains log files generated by the application.
- `data/db/`: Directory for storing database files used by the application.
- `data/files/`: Directory reserved for storing general files used by the application.
- `data/test_content/`: Test files of various types for content detection and validation.
- `data/loaded_content/`: Output directory for loaded and processed content (now gitignored).
- `docs/`: Project documentation.

## Database Storage and Indexing

MCard uses SQLite with BLOB storage for `content`. A virtual FTS5 table `documents` is maintained via triggers for text search. On first initialization, the engine creates the `card` table (BLOB content), the FTS table, and the triggers that keep them in sync.

## Examples

### Default MCard API Example: `examples/MCard_Demo.py`

This script demonstrates the simplest way to use the MCard API through the `default_utility` interface. It covers:

- Adding new cards (with plain text or dictionaries, which are auto-converted to JSON)
- Retrieving cards by hash
- Searching for cards by content
- Counting the total number of cards in the collection

#### How to Run the Demo

```bash
python examples/MCard_Demo.py
```

#### Key Features
- **Minimal Setup**: Uses `from mcard import default_utility` for immediate access to core functionality.
- **Add and Retrieve**: Shows how to add cards and retrieve them by hash.
- **Search**: Demonstrates searching for cards containing a specific substring.
- **Summary Output**: Prints the total number of cards and search results.

---

### Modular Content Loader Example: `examples/Content_Loader.py`

This script demonstrates how to use the MCard system's content detection and storage features in a modular, easy-to-understand way. It:

- Loads files from `data/test_content/` (supports both text and binary types)
- Uses the `ContentTypeInterpreter` to detect file types and validate content
- Creates MCards for each file, handling text and binary content appropriately
- Saves processed files to `data/loaded_content/` with unique, type-appropriate filenames
- Prints summaries of processed files and cleans up temporary files

#### How to Run the Example

```bash
python examples/Content_Loader.py
```

#### Key Features of the Example
- **Modular Functions**: The script is organized into clear, single-purpose functions (e.g., `load_test_files`, `create_mcard_for_file`, `save_card_to_file`, etc.) for maintainability and extensibility.
- **Automatic Content Type Detection**: Uses file signatures and content validation to determine file type and extension.
- **Binary and Text Handling**: Handles binary files (e.g., images) and text files differently, ensuring correct storage and retrieval.
- **Output Directory**: All processed content is saved to `data/loaded_content/` (which is now gitignored).
- **Temporary File Cleanup**: Removes temporary binary files after processing.

See the script and its docstrings for further details and customization options.

### Handling Problematic Files (very large/single-line)

Some files can be pathological (e.g., extremely large single-line text or unstructured binaries). The loader now safely handles these via streamed text normalization with adaptive soft wrapping and strict byte/time caps.

- Defaults remain safe: problematic files are skipped unless `include_problematic=True`.
- When included, problematic files are processed as normalized text with UTF-8 replacement and soft wraps on-the-fly.
- If streaming fails unexpectedly, the system falls back to a capped binary BLOB read.
- Metadata captured for normalized files includes `original_size` and `original_sha256_prefix`.

Example using `load_file_to_collection()` from `mcard.file_utility`:

```python
from pathlib import Path
from mcard.model.card_collection import CardCollection
from mcard.file_utility import load_file_to_collection

collection = CardCollection()

# Load a single file with safe streamed normalization (and optional metadata-only mode)
results = load_file_to_collection(
    Path("tests/test_data/OneMoreLongStringFile.js"),
    collection,
    include_problematic=True,             # opt-in to include problematic files
    max_bytes_on_problem=2 * 1024 * 1024, # cap for streaming/fallback paths
    metadata_only=False                   # set True to store only metadata for problematic files
)

# Or load a directory recursively with the same options
results = load_file_to_collection(
    Path("tests/test_data"),
    collection,
    recursive=True,
    include_problematic=True,
    max_bytes_on_problem=2 * 1024 * 1024,
    metadata_only=False
)
```

Notes:

- Normalized text is stored with `mime_type='text/plain'` and includes `normalized=True` and `wrap_width` in the file info.
- When fallback occurs, MIME is `application/octet-stream` and only capped bytes are stored.
- Adaptive wrap width is chosen by extension via env-configured values.

Environment variables to tune behavior:

- `MCARD_WRAP_WIDTH_DEFAULT` (default 1000)
- `MCARD_WRAP_WIDTH_KNOWN` (default 1200)
- `MCARD_MAX_PROBLEM_TEXT_BYTES` (default 2MB)
- `MCARD_READ_TIMEOUT_SECS` (default 30)


## .gitignore Notes

- The `data/loaded_content/` directory is now included in `.gitignore` and will not be tracked by git. This ensures that output/generated files do not pollute the repository.

## PyTest Configuration

- The project uses [PyTest](https://docs.pytest.org/en/stable/) for testing.
- Tests are located in the `tests` directory.
- The configuration file `pytest.ini` specifies test paths and naming conventions.

## 🏢 Enterprise Logging System

MCard features a comprehensive, enterprise-grade logging system with structured multi-file logging, performance monitoring, and security auditing capabilities.

### Key Features

- **Structured Logging**: JSON-formatted logs with consistent schema across all components
- **Multi-File Strategy**: Separate log files for different concerns (application, security, performance)
- **Performance Monitoring**: Built-in `PerformanceTimer` for operation timing and bottleneck identification
- **Security Auditing**: Dedicated `SecurityAuditLogger` for compliance and security event tracking
- **Colorized Console Output**: Enhanced developer experience with color-coded log levels
- **Backward Compatibility**: Dual logging system maintains compatibility with existing code

### Logging Architecture

```
logs/
├── mcard.log              # Main application logs (rotating, 10MB, 5 backups)
├── mcard_security.log     # Security audit trail
├── mcard_performance.log  # Performance metrics and timing
└── mcard_structured.log   # Structured JSON logs for analysis
```

### Usage Examples

#### Basic Application Logging
```python
from mcard.config.improved_logging import setup_improved_logging, get_logger

def main():
    setup_improved_logging()  # Initialize enterprise logging
    logger = get_logger(__name__)
    logger.info("Application started", extra={"component": "main", "version": "0.1.23"})
```

#### Performance Monitoring
```python
from mcard.config.improved_logging import PerformanceTimer

with PerformanceTimer("database_operation") as timer:
    # Your database operation here
    result = collection.add(card)
    timer.add_metadata({"records_processed": 1, "operation": "add"})
```

#### Security Auditing
```python
from mcard.config.improved_logging import SecurityAuditLogger

audit_logger = SecurityAuditLogger()
audit_logger.log_access_attempt("user123", "read", "card_collection", success=True)
audit_logger.log_data_modification("user123", "create", "card", {"hash": "abc123"})
```


### Configuration

Environment variables for logging control:
- `MCARD_SERVICE_LOG_LEVEL`: Controls log level (DEBUG, INFO, WARNING, ERROR)
- `MCARD_LOG_FORMAT`: Choose between 'json' or 'standard' formatting
- `MCARD_ENABLE_PERFORMANCE_LOGGING`: Enable/disable performance monitoring
- `MCARD_ENABLE_SECURITY_LOGGING`: Enable/disable security audit logging

### Migration from Legacy Logging

The system maintains full backward compatibility. Existing code using the old logging system continues to work unchanged, while new code can leverage the enhanced features. A migration script is provided:

```bash
python migrate_logging.py
```

## 🚀 CI/CD Pipeline

MCard implements a comprehensive CI/CD pipeline with multi-platform testing, security checks, and automated deployment.

### Pipeline Features

- **Multi-Platform Testing**: Ubuntu, macOS, and Windows support
- **Multi-Python Version**: Python 3.9, 3.10, 3.11, and 3.12 compatibility
- **Comprehensive Testing**: Unit tests, integration tests, and coverage reporting
- **Security Scanning**: Automated security checks with bandit and safety
- **Code Quality Gates**: Linting, formatting, and type checking
- **Automated Deployment**: PyPI publishing on release

### Workflow Structure

```yaml
# .github/workflows/ci.yml
jobs:
  test:        # Multi-platform, multi-Python testing
  security:    # Security vulnerability scanning  
  build:       # Package building and PyPI deployment
```

### Quality Metrics

- **Test Coverage**: 99.4% success rate across all test suites
- **Security Score**: Zero critical vulnerabilities detected
- **Code Quality**: 100% compliance with ruff linting rules
- **Performance**: All tests complete in under 5 minutes

### Running CI Locally

```bash
# Run the full test suite locally
make test-all

# Run security checks
make security-check

# Run linting and formatting
make lint
make format
```

## 🔧 Code Quality & Linting

MCard maintains enterprise-grade code quality through modern tooling and automated checks.

### Tooling Stack

- **Ruff**: Lightning-fast Python linter and formatter (replaces black, isort, flake8)
- **MyPy**: Static type checking for type safety
- **Pre-commit**: Automated code quality checks on commit
- **Pytest**: Comprehensive testing framework with coverage reporting

### Recent Quality Improvements

#### Lint Error Cleanup (Commit: b6a70a9)
- ✅ **1000+ lint errors resolved** across 76 files
- ✅ **Exception handling improved** with proper 'from e' clauses  
- ✅ **Modern typing** - replaced deprecated `typing.List` with `list[T]`
- ✅ **Import organization** - consistent import sorting and grouping
- ✅ **Code readability** - improved formatting and structure
- ✅ **Zero regressions** - all 17 tests continue to pass

#### Development Tooling Enhancements (Commit: 4ee6a7d)
- ✅ **Enhanced Makefile** with 15+ development commands
- ✅ **Pre-commit hooks** for automated quality checks
- ✅ **GitHub Actions** with multi-OS/Python version support
- ✅ **Security scanning** integrated into CI pipeline
- ✅ **Project health score** improved from 7.5/10 to 8.5/10

### Code Quality Commands

```bash
# Lint checking
uv run ruff check mcard/

# Auto-formatting  
uv run ruff format mcard/

# Type checking
uv run mypy mcard/ --ignore-missing-imports

# Run all quality checks
make quality-check
```

### Quality Standards

- **Line Length**: 88 characters (Black-compatible)
- **Import Sorting**: Automatic with ruff
- **Type Hints**: Required for all public APIs
- **Test Coverage**: Minimum 75% coverage required
- **Documentation**: Comprehensive docstrings for all modules

 

## Advanced Topics

### Hegel's Dialectic in Testing and CI/CD

Hegel's dialectic is a philosophical framework that describes the process of development and change through a triadic structure: thesis, antithesis, and synthesis. Here's how it relates to software testing and Continuous Integration/Continuous Deployment (CI/CD):

1. **Thesis (Initial Code)**: Represents the initial code or feature implementation, the starting point where a developer writes code to fulfill a specific requirement or feature.

2. **Antithesis (Testing and Bugs)**: Arises during the testing phase, where tests are executed. If tests fail or bugs are discovered, they represent a challenge to the initial implementation, highlighting discrepancies between intended functionality and actual behavior.

3. **Synthesis (Refinement and Improvement)**: Occurs when developers address the issues identified during testing, leading to a refined version of the code that resolves conflicts between the initial implementation and testing outcomes.

### CI/CD Integration
In a CI/CD pipeline, this dialectical process is continuous:

- **Continuous Integration**: Developers frequently integrate code changes into a shared repository. Each integration triggers automated tests, allowing for rapid identification of issues against the current codebase.

- **Continuous Deployment**: Once the code passes testing, it can be automatically deployed, representing the synthesis where refined code is made available to users.

This iterative process fosters continuous improvement, where each round of testing and deployment leads to better software quality and functionality. By applying Hegel's dialectic, teams can embrace the idea that conflict (in the form of bugs and failures) is a natural and necessary part of the development process, ultimately leading to a more robust and effective product.

### Handling Duplicate Events

When a duplicate card is detected, the `duplicate_event_card` is assigned a new timestamp value. This ensures that even though the content is identical to the original card, the hash value will be unique due to the different timestamp. This mechanism allows for robust handling of duplicate content while maintaining the integrity of the system.

### MD5 Collision Testing

The test suite includes verification of MD5 collision detection using known collision pairs from the FastColl attack. These pairs produce identical MD5 hashes despite having different content:

### MD5 Collision Pair
```
Input 1:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa200a8284bf36e8e4b55b35f427593d849676da0d1555d8360fb5f07fea2
                                                                     ^^^                                    ^^^

Input 2:
4dc968ff0ee35c209572d4777b721587d36fa7b21bdc56b74a3dc0783e7b9518afbfa202a8284bf36e8e4b55b35f427593d849676da0d1d55d8360fb5f07fea2
                                                                     ^^^                                    ^^^
```

Key differences:
1. `200` vs `202`
2. `d15` vs `d1d`

Both inputs produce the same MD5 hash value, demonstrating MD5's vulnerability to collision attacks. This is why MCard defaults to using more secure hash functions like SHA-256.

### Modular PCards with Function Entry Points

MCard supports modular programming by allowing PCards to reference code stored in other MCards via `code_hash`. This enables code reuse and separation of concerns.

Additionally, PCards can specify a function `entry_point` within the referenced code. The runtime automatically handles input type conversion based on the PCard's input definition.

Example PCard YAML:
```yaml
concrete:
  runtime: "python"
  operation: "custom"
  entry_point: "custom_cos"  # Function to call
  implementation:
    inputs:
      angle: "float"         # Input type for automatic conversion
  code_hash: "..."           # Hash of the MCard containing the Python code
```

## Testing Behavior

The current tests, particularly `@test_sqlite_persistence.py`, will always clear the database after one of the test functions is run. This means that `test_mcard.db` will only contain the data from the last test executed. If the `clear()` function in the fixture is uncommented, it will remove the content of the last test as well.

## Core Dependencies

- `SQLAlchemy>=1.4.47`: SQL toolkit and ORM
- `aiosqlite>=0.17.0`: SQLite async driver (project code uses synchronous APIs but retains this dependency for compatibility)
- `python-dateutil>=2.9.0.post0`: Date/time utilities
- `python-dotenv>=1.1.0`: Environment management

## Description
MCard is a project designed to facilitate card management with a focus on validation and logging features.

## Installation

### Using uv

You can install the MCard package from PyPI (once published):

```bash
uv pip install mcard
```

### Installing from source

To install MCard directly from the source code:

```bash
# Clone the repository
git clone https://github.com/yourusername/MCard_TDD.git
cd MCard_TDD

# Install in development mode with uv
uv pip install -e .

# Install with development dependencies
uv pip install -e ".[dev]"
```

### Development Environment Setup

MCard uses modern Python tooling with `uv` for fast dependency management and virtual environment handling.

#### Quick Setup (Recommended)
```bash
# Clone and setup in one command
git clone https://github.com/xlp0/MCard_TDD.git
cd MCard_TDD
./activate_venv.sh
```

The `activate_venv.sh` script automatically:
- ✅ Disables conda (if present) to avoid conflicts
- ✅ Creates a virtual environment using `uv venv .venv`
- ✅ Activates the virtual environment
- ✅ Installs all dependencies with `uv sync --all-extras --dev`
- ✅ Ensures you're using the project's preferred Python environment

#### Manual Setup
```bash
# Create and activate virtual environment with uv
uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies with development extras
uv sync --all-extras --dev

# Verify installation
uv run pytest tests/ -v
```

#### Development Commands
```bash
# Run tests with coverage
uv run pytest --cov=mcard --cov-report=term-missing

# Lint and format code
uv run ruff check mcard/
uv run ruff format mcard/

# Type checking
uv run mypy mcard/ --ignore-missing-imports

# Run all quality checks
make quality-check
```

## Usage

After installation, you can use MCard in your Python code (synchronous API as shown above), or via the CLI.

### CLI usage

The package installs a console script `mcard` with the following subcommands:

```bash
# Initialize the database (creates schema if needed)
mcard init --db data/cli_demo.db

# Add a card from text or a file
mcard add --text "hello world" --db data/cli_demo.db
mcard add --file README.md --db data/cli_demo.db

# Retrieve a card by hash
mcard get --hash <hash> --db data/cli_demo.db

# Search by a text fragment
mcard search --query hello --db data/cli_demo.db

# Count cards
mcard count --db data/cli_demo.db
```

## 🎯 Recent Major Improvements

### Enterprise Logging System Implementation (Commits: e0b18ce, 8d657bf, f7c7446)
- ✅ **Comprehensive logging overhaul** with structured multi-file logging strategy
- ✅ **Performance monitoring** with `PerformanceTimer` for operation timing
- ✅ **Security auditing** with dedicated `SecurityAuditLogger` for compliance
- ✅ **Backward compatibility** maintained - zero breaking changes to existing code
- ✅ **11/11 tests pass** for logging system with comprehensive test coverage
- ✅ **Automated migration script** for upgrading existing codebases
- ✅ **Production-ready** with colorlog dependency and structured JSON output

### Code Quality & Lint Cleanup (Commit: b6a70a9)
- ✅ **1000+ lint errors resolved** across 76 files with zero regressions
- ✅ **Exception handling modernized** with proper 'from e' clauses
- ✅ **Type system upgraded** - replaced deprecated `typing.List` with modern `list[T]`
- ✅ **Import organization** - consistent sorting and grouping across codebase
- ✅ **Code readability improved** with better formatting and structure
- ✅ **All 17 tests continue to pass** - no functionality broken

### Development Tooling & CI/CD Pipeline (Commit: 4ee6a7d)
- ✅ **Enhanced development environment** with modern tooling (ruff, pre-commit)
- ✅ **Comprehensive CI/CD pipeline** with GitHub Actions supporting multi-OS/Python versions
- ✅ **Automated security checks** integrated into pipeline
- ✅ **Improved Makefile** with 15+ development commands for common tasks
- ✅ **Project health score** improved from 7.5/10 to 8.5/10
- ✅ **Production-ready** development practices implemented

### Repository Optimization & Cleanup
- ✅ **Enhanced .gitignore** with 272 comprehensive patterns (Commit: 11cddd7)
- ✅ **Repository size optimization** - removed 187MB of unnecessary files
- ✅ **Git history cleanup** using git filter-repo for cleaner repository
- ✅ **Cross-platform support** for macOS, Windows, and Linux development
- ✅ **AI IDE framework compatibility** with BMAD-METHOD integration

### BMAD-METHOD Framework Integration (Commit: 81f1647)
- ✅ **11 specialized agent roles** for comprehensive development workflow
- ✅ **Zero Trust Architecture** with enterprise-grade security patterns
- ✅ **Polynomial functor mathematics** for advanced MCard operations
- ✅ **Cross-platform agent definitions** for multiple AI IDEs
- ✅ **207 files committed** with major architectural enhancements

### Configuration Management Refactoring
- ✅ **Renamed `EnvConfig` to `EnvParameters`** for better clarity and consistency
- ✅ **Moved configuration management** from `env_config.py` to `env_parameters.py`
- ✅ **Updated all references** to use the new class name across the codebase
- ✅ **Enhanced test coverage** for configuration parameters
- ✅ **Maintained singleton pattern** for configuration management
- ✅ **Ensured backward compatibility** with existing environment variable handling

### Database & Performance Enhancements
- ✅ **Implemented `get_all()` method** in SQLiteEngine for efficient pagination
- ✅ **Added support for page size and page number parameters**
- ✅ **Enhanced error handling** for invalid pagination parameters
- ✅ **Improved performance** by optimizing SQL queries
- ✅ **Added comprehensive test coverage** for pagination functionality

## Recent Changes

### Directory Structure Updates
- The `hash_algorithms` directory has been renamed to `algorithms` for simplicity and clarity.
- The `hash_validator.py` file has been renamed to `validator.py` to simplify the naming convention.

### Updated Imports
- All relevant import statements across the codebase have been updated to reflect the new structure and naming.

### Engine Refactor
- Removed the abstract `search_by_content` method from `SQLiteEngine` and `DuckDBEngine`.
- Integrated search functionality into the [search_by_string](cci:1://file:///mcard/model/card_collection.py:94:4-96:82) method, allowing searches across content, hash, and g_time fields.

### Event Generation
- Updated [generate_duplication_event](cci:1://file:///mcard/model/event_producer.py:38:0-54:28) and [generate_collision_event](cci:1://file:///mcard/model/event_producer.py:57:0-76:38) to return JSON strings.
- Enhanced event structure to include upgraded hash functions and content size.

### Logging
- Integrated logging into test cases for better traceability and debugging.

### MCard Class Update
- The [MCard](cci:2://file:///mcard/model/card.py:6:0-47:9) constructor now accepts a [hash_function](cci:1://file:///mcard/model/event_producer.py:8:0-23:16) parameter, providing more flexibility in hash generation.

### Tests
- Adjusted tests to verify the new event generation logic and ensure search functionality works as intended.

## Centralized Configuration Management

### Overview
MCard has adopted a centralized configuration management approach to improve maintainability, scalability, and readability. This involves consolidating all configuration constants into a single location, making it easier to manage and update configuration values across the application.

### Configuration Constants
All configuration constants are now defined in `config_constants.py`. This file contains named constants for various configuration values, including:

- Database schema and paths
- Hash algorithm constants and hierarchy
- Environment variable names
- API configuration
- HTTP status codes
- Error messages
- Event types and structure

### Benefits
Centralized configuration management provides several benefits, including:

- **Single Source of Truth**: All configuration constants are managed in one location.
- **Type Safety**: Constants are properly typed and documented.
- **Maintainability**: Changes to configuration values only need to be made in one place.
- **Code Completion**: IDE support for constant names improves developer productivity.
- **Documentation**: Each constant group is documented with its purpose and usage.
- **Testing**: Test files use the same constants as production code, ensuring consistency.

### Implementation
The `config_constants.py` file uses an enum-based approach for hash algorithms, ensuring type safety and readability. The file is organized into logical groups, making it easier to find and update specific configuration values.

### Example Usage
To use a configuration constant, simply import the `config_constants` module and access the desired constant. For example:
```python
from config_constants import HASH_ALGORITHM_SHA256

# Use the SHA-256 hash algorithm
hash_algorithm = HASH_ALGORITHM_SHA256
```
By adopting a centralized configuration management approach, MCard has improved its maintainability, scalability, and readability, making it easier to manage and update configuration values across the application.

## Using MCardFromData for Stored Values

When retrieving stored MCard data from the database, always use the subclass `MCardFromData`. This approach allows you to bypass unnecessary and unwanted algorithms, significantly speeding up the MCard instantiation process.

## Project Structure

```plaintext
MCard_TDD/
├── mcard/
│   ├── algorithms/          # Hash algorithm implementations
│   ├── engine/             # Database engines (SQLite, DuckDB)
│   ├── model/              # Core data models
│   ├── api.py             # FastAPI endpoints
│   └── logging_config.py   # Logging configuration
├── tests/
│   ├── persistence/       # Database persistence tests
│   └── unit/             # Unit tests
├── docs/                  # Project documentation
├── data/
│   ├── db/               # Database files
│   └── files/            # General files
└── logs/                 # Application logs
```
## Configuration
### Environment Setup
Create a .env file with the following variables:

```plaintext
MCARD_DB_PATH=data/db/mcard_demo.db
TEST_DB_PATH=data/db/test_mcard.db
MCARD_SERVICE_LOG_LEVEL=DEBUG
 ```

## Development Guidelines
### Using MCardFromData
When retrieving stored data, use MCardFromData instead of the base MCard class:

```python
from mcard.model.card import MCardFromData

stored_card = MCardFromData(content=content, hash=hash, g_time=g_time)
 ```

### Hash Algorithm Configuration
The default hash algorithm is SHA-256, but it's configurable:
```python
from mcard.algorithms import HASH_ALGORITHM_SHA256
 ```

## Installation

To set up the project, follow these steps:

1. Create a virtual environment:
   ```bash
   python -m venv .venv
   ```

2. Activate the virtual environment:
   - On macOS and Linux:
     ```bash
     source .venv/bin/activate
     ```
   - On Windows:
     ```bash
     .venv\Scripts\activate
     ```

3. Configure your environment:
   - Copy `.env.example` to create your own `.env` file.
   - The default configuration uses:
     - Database path: `data/db/mcard_demo.db`.
     - Hash algorithm: SHA-256.
     - Connection pool size: 5.
     - Connection timeout: 30 seconds.

## Directory Structure

- **mcard/**
  - **engine/**: Contains the database engine implementations, currently only SQLite.
  - **model/**: Contains the core data models, including `MCard`.
  - **tests/**: Contains all test cases for the MCard library, ensuring functionality and correctness.

## SQLite Persistence Testing

- **tests/persistence/sqlite_test.py**: Contains test cases for SQLite persistence, ensuring data integrity and consistency.

The tests in `@test_sqlite_persistence.py` are designed to clear the database after each test function is run. This means that the `test_mcard.db` file will only contain the data from the last test executed. If the `clear()` function in the fixture is uncommented, it will remove the content of the last test as well. This behavior is intended to ensure that each test starts with a clean database, allowing for more accurate and reliable testing results.
