logosdb change log
==================

0.7.14  (2026-05-11)
--------------------

  * **`logosdb-mcp-server` 0.7.14:** version bump; depend on **`logosdb` ^0.7.12**.
  * **npm `logosdb` (nodejs/) 0.7.12:** version bump from 0.7.11.
  * **Core / packaging alignment:** **`LOGOSDB_VERSION_STRING`**, **CMake**, **Python `pyproject.toml`**, and **n8n** `package.json` **`version`** set to **0.7.12** to match the native release train.

0.7.13  (2026-05-11)
--------------------

  * **`logosdb-mcp-server` 0.7.13:** version bump; depend on **`logosdb` ^0.7.11**.
  * **npm `logosdb` (nodejs/) 0.7.11:** version bump from 0.7.10.
  * **Core / packaging alignment:** **`LOGOSDB_VERSION_STRING`**, **CMake**, **Python `pyproject.toml`**, and **n8n** `package.json` **`version`** set to **0.7.11** to match the npm native release train (**n8n** still lists **`logosdb` ^0.7.10** until **0.7.11** is on the registry).
  * **nodejs `build` / `install` fallback:** **`node-gyp rebuild -j 1`** so parallel **make** does not race when creating nested **`Release/.deps/...`** paths (fixes intermittent **`metadata.o.d.raw`**: *No such file or directory* on Darwin).

0.7.12  (2026-05-11)
--------------------

  * **`logosdb-mcp-server`:** load the native **`logosdb`** addon **lazily** (on first DB open) instead
    of at module import time. If prebuilds are missing, the MCP server still completes handshake and
    **`tools/list`** returns tools; the first mutating call surfaces the native load error instead of
    the client showing **“0 tools”** after a crashed subprocess.
  * **`npm test`** (MCP workspace) runs **`scripts/mcp-stdio-smoke.mjs`**: **`initialize`** + **`tools/list`**
    over stdio (same framing as the SDK / Claude Code) so a startup crash cannot ship unnoticed.

0.7.11  (2026-05-11)
--------------------

  * **MCP `logosdb_index_file`:** optional **`incremental: true`** — index only new or changed files
    (compare `mtimeMs`, size, and `chunk_size` to the last run); delete prior chunk row ids for
    changed files before re-embedding; when the path is a **directory**, remove stored chunks for
    files that no longer exist under that tree. State is kept under **`LOGOSDB_PATH/_logosdb_mcp_manifests/`**.
  * **`.claude/commands/index.md`** (and plugin `/index` skill): pass **`incremental: true`** by default.

0.7.10  (2026-05-11)
--------------------

  * **npm `logosdb`:** replace deprecated **`prebuild-install@7`** with the maintained fork
    **`@mmomtchev/prebuild-install`** (same `prebuild-install` CLI; avoids npm’s deprecation warning).
    **`install`** runs **`prebuild-install --runtime napi --target 8`** so N-API resolution does not
    depend on fragile `npm_config_*` merges; **`package.json` `config`** still sets **`runtime` /
    `target`** for tools that read it.
  * **`logosdb-mcp-server` 0.7.10:** depend on **`logosdb` ^0.7.10**.
  * **nodejs scripts:** rename **`prebuild`** → **`native:prebuild`** (and upload variant) so `npm run build` does not trigger npm’s automatic **`prebuild`** lifecycle hook.

0.7.9  (2026-05-11)
-------------------

  * **npm `logosdb` (nodejs):** ship vendored C++ under `deps/core/` so `node-gyp rebuild` works
    from the published tarball (no broken `../src` paths). `prepublishOnly` runs `vendor-core.mjs`.
  * **`package.json`:** `"config": { "runtime": "napi" }` so `prebuild-install` requests
    `napi-v8-*` assets instead of `node-v*` (fixes “No prebuilt binaries found” on Node 22 when
    release tarballs exist).
  * **`files`:** include `deps/`, `scripts/` in the npm pack.
  * **Workspace:** root `package.json` adds `nodejs` next to `mcp` so local installs resolve the
    native binding from the repo.
  * **`logosdb-mcp-server` 0.7.9:** depend on `logosdb` ^0.7.9; clearer stderr when the native
    module fails to load.

0.7.8  (2026-05-11)
-------------------

  * MCP `logosdb_search`: optional `ts_from`, `ts_to`, and `candidate_k` for timestamp-window search
    (wired to Node `searchTsRange`; closes #94).
  * MCP `logosdb_delete`: delete by numeric `id` or by natural-language `query` with optional
    `search_top_k` and `match_rank` (closes #96).
  * README: Claude Code slash-commands table and `.claude/commands/` in repository contents (closes #95).
  * `.claude/commands/search.md` and `forget.md` updated for the new tool parameters.
  * Root npm workspace (`package.json` / `package-lock.json`): `npm install` at repo root builds MCP;
    `.claude/mcp.json` runs **`node ./mcp/dist/index.js`** for Claude Code in this clone.

0.7.7  (2026-05-11)
-------------------

  Security hardening for MCP server and CLI (progress on #74)

  * MCP: confine `logosdb_index_file` to `process.cwd()` or `LOGOSDB_INDEX_ROOT` after
    `realpath` resolution; reject symlink escapes; cap file size and text/metadata lengths;
    reject disallowed C0 control characters; clamp `chunk_size` and `top_k`.
  * MCP: HTTP embedding providers use `fetch` with `AbortSignal` timeout (default 120s,
    override with `EMBEDDING_FETCH_TIMEOUT_MS`, max 600s).
  * CLI: `--text` metadata validated for length, NUL, and C0 controls before `put`.

  Tooling / CI (progress on #9)

  * CMake option `LOGOSDB_BUILD_FUZZ`: `logosdb-fuzz-jsonl` libFuzzer harness over JSON parse
    (enabled on non-Apple targets; Apple Clang lacks the fuzzer runtime).
  * GitHub Actions: short libFuzzer smoke job for `logosdb-fuzz-jsonl` (Clang on Linux). Full-tree
    ASan/UBSan CI was dropped: vendored hnswlib remains instrumented when included from our TUs,
    so sanitizer ignorelists do not suppress its reports.

  **npm:** publish the `logosdb` native package before `logosdb-mcp-server` so installs can resolve the new patch.

0.7.6  (TBD)
-------------------

0.7.1  (TBD)
-------------------

  Reduced-precision vector storage: float16 and int8 (closes #25, #26)

  * New `StorageDtype` enum: `DTYPE_FLOAT32` (default), `DTYPE_FLOAT16`, `DTYPE_INT8`.
  * Storage format v2: extended 32-byte header with `dtype` and `scale` fields.
  * float16: 2 bytes per dimension (~50% smaller than float32), IEEE 754 half-precision.
  * int8: 1 byte per dimension (~75% smaller than float32), global scale quantization.
  * Transparent dequantization: vectors stored in reduced precision are dequantized
    to float32 on-the-fly for HNSW search (no accuracy loss in index).
  * C API: `logosdb_options_set_dtype(opts, LOGOSDB_DTYPE_FLOAT16/INT8)`.
  * C++ API: `Options.dtype` field (default `LOGOSDB_DTYPE_FLOAT32`).
  * Backward compatible: v1 files auto-upgraded to v2 in memory (float32 only).
  * New ADR document: `docs/adr-002-training-free-quantization.md` describing
    the architecture decision and trade-offs.

  Mistral AI embeddings integration (closes #54)

  * New `python/logosdb/mistral.py` module with `MistralVectorStore`.
  * Uses Mistral embeddings API (`mistral-embed`) with LogosDB backend storage.
  * API: `add_texts()` and `search()` with batched embedding requests.
  * Optional helper: `MistralEmbeddingProvider` for standalone embedding use.
  * New optional extra: `pip install 'logosdb[mistral]'`.
  * Added Python tests in `tests/python/test_mistral.py` (network-free via mock).

  OpenAI embeddings integration (closes #48)

  * New `python/logosdb/openai.py` with `OpenAIVectorStore`.
  * Supports `text-embedding-3-small`, `text-embedding-3-large`, and `text-embedding-ada-002`.
  * Automatic embedding requests to OpenAI API and direct LogosDB persistence.
  * New optional extra: `pip install 'logosdb[openai]'`.
  * Added tests in `tests/python/test_openai.py` (network-free via mock).

  Anthropic Claude embedding integration (closes #59)

  * New `python/logosdb/anthropic.py` with `AnthropicVectorStore` (experimental).
  * Uses Claude messages API with constrained JSON output for vector extraction.
  * New optional extra: `pip install 'logosdb[anthropic]'`.
  * Added tests in `tests/python/test_anthropic.py` (network-free via mock).

  Hugging Face local embeddings integration (closes #50)

  * New `python/logosdb/huggingface.py` with `HuggingFaceVectorStore`.
  * Local embedding computation via `sentence-transformers` (CPU/GPU).
  * New optional extra: `pip install 'logosdb[huggingface]'`.
  * Added tests in `tests/python/test_huggingface.py` (network-free via mock).

  Node.js bindings and npm package (closes #44)

  * New `nodejs/` directory with complete Node.js native addon:
    - N-API C++ bindings in `nodejs/src/node_logosdb.cpp`
    - JavaScript wrapper API matching Python bindings
    - TypeScript definitions in `nodejs/types/index.d.ts`
    - node-gyp build configuration (`binding.gyp`)
    - Prebuild support for binary distribution
  * npm package `logosdb` with prebuilt binaries for:
    - Linux x64, macOS x64/arm64, Windows x64
  * Full API: DB(), put(), search(), searchTsRange(), update(), delete(), count()
  * Test suite with 15+ test cases
  * Install: `npm install logosdb`

  Haystack 2.x integration (closes #24)

  * New LogosDBDocumentStore class implementing Haystack 2.x DocumentStore interface.
  * New LogosDBRetriever component for Haystack 2.x pipelines.
  * API: write_documents(), delete_documents(), count_documents(), filter_documents().
  * Retriever supports timestamp range filtering (ts_from, ts_to).
  * Cosine similarity mode with automatic vector normalization.
  * Optional dependency: pip install 'logosdb[haystack]'.
  * 20 test cases covering DocumentStore and Retriever functionality.
  * Serialization support: to_dict() / from_dict() for pipeline persistence.

0.6.0  (2026-04-28)
-------------------

  Docs + example: on-prem memory-efficient RAG (closes #27)

  * New `docs/rag-on-prem.md` comprehensive guide for on-premises RAG:
    - RAM model: explains mmap-based memory scaling with query patterns
    - Sizing guide: disk size and RAM estimates for 10K to 10M vectors
    - Architecture patterns: embed offline/query online, time-sharding
    - External quantization guidance for edge deployments
  * New `examples/python/memory_efficient_rag.py` with:
    - RSS memory tracking during ingest and query
    - Batch processing to limit peak RAM
    - Database size reporting vs theoretical minimum
    - Complete RAG loop demonstration
  * Updated README with memory-efficiency section and quick RAG example.

  LlamaIndex VectorStore backend (closes #23)

  * New LogosDBIndex class implementing LlamaIndex's VectorStore interface.
  * API: add(), delete(), query(), persist(), clear(), count(), client().
  * Supports timestamp range filtering via query kwargs (ts_from, ts_to).
  * Cosine similarity mode with automatic vector normalization.
  * Optional dependency: pip install 'logosdb[llama-index]'.
  * 19 test cases covering all functionality.
  * Shares helper utilities with LangChain adapter (internal refactor).

  L2-normalization helpers (closes #15)

  * New C API: `logosdb_l2_normalize(float *vec, int dim)` - returns 0 on success,
    -1 if zero norm (vector unchanged).
  * New C++ helpers: `logosdb::l2_normalize(vector<float>&)` and
    `logosdb::l2_normalized(vector<float>)` for in-place and copy variants.
  * Double-precision norm calculation for accuracy.
  * Unit tests cover: random vectors, already-normalized, zero vectors, small values.
  * README examples updated to use helpers instead of hand-rolled loops.

0.5.0  (2026-04-27)
-------------------

  LogosDB CLI polish (closes #13)

  * info command now reads dim from vectors.bin header (--dim optional).
  * Added --json output mode for info and search commands (machine-parseable).
  * New export command: export DB to JSONL with base64-encoded vectors.
  * New import command: import from JSONL with base64 vectors (round-trip capable).
  * Added --query-id option: use existing vector as search query.
  * Added --ts-from and --ts-to for timestamp range filtering in search.
  * Added --version flag and proper --help for each subcommand.
  * Consistent exit codes (0 success, 1 error).

  Avoid full munmap/mmap on every vector append (closes #14)

  * Implemented reservation mapping to eliminate costly munmap/mmap on append.
  * Reserve 1GB virtual address space on open (Linux: MAP_NORESERVE, Windows: large mapping).
  * Extend active mapping incrementally when file grows within reservation.
  * Full remap only triggered when file exceeds reserved space.
  * Pointers returned by row()/data() now remain valid across concurrent appends.
  * Added test verifying pointer stability across 100 appends.

  LangChain VectorStore adapter (closes #22)

  * New LogosDBVectorStore class implementing LangChain's VectorStore interface.
  * API: add_texts(), add_documents(), similarity_search(), similarity_search_by_vector(),
    similarity_search_with_score(), delete(), from_documents(), from_texts().
  * Supports timestamp range filtering via kwargs (ts_from, ts_to).
  * Cosine similarity mode with automatic vector normalization.
  * Optional dependency: pip install 'logosdb[langchain]'.
  * 14 test cases covering all functionality.

0.4.1  (2026-04-22)
-------------------

  * Version bump to synchronize C++ and Python package versions.

0.4.0  (2026-04-22)
-------------------

  Configurable distance metrics (closes #8)

  * Added support for three distance metrics via `logosdb_options_set_distance()`:
    - `LOGOSDB_DIST_IP` (default): Inner product on L2-normalized vectors.
    - `LOGOSDB_DIST_COSINE`: Cosine similarity with automatic L2-normalization.
    - `LOGOSDB_DIST_L2`: Euclidean distance using hnswlib::L2Space.
  * The distance metric is persisted in `hnsw.idx.meta` and validated on reopen.
    Reopening a database with a mismatched distance metric returns a clear error.
  * Cosine mode automatically normalizes vectors during `put()` and `search()`
    operations — callers no longer need to pre-normalize embeddings.
  * C++ API: `Options::distance` field accepts the same constants.
  * Added 16 test assertions for distance metric functionality.

  Metadata-filtered search with timestamp range (closes #7)

  * New `logosdb_search_ts_range()` API for post-filtered vector search.
    - Filters results by ISO 8601 timestamp range (inclusive bounds).
    - `ts_from_iso8601` and `ts_to_iso8601` parameters are optional (NULL/empty = no bound).
    - `candidate_k` parameter controls internal fetch size for recall vs latency tradeoff.
  * C++ API: `DB::search_ts_range(query, top_k, ts_from, ts_to, candidate_k)`.
  * Use case: "search within last 24h" or "search within conversation X" for LLM memory.
  * Added 34 test assertions for timestamp range filtering, edge cases, and recall.

0.3.2  (2026-04-20)
-------------------

  * Fixed POSIX compilation issues for older systems (manylinux).
    - Added missing `<sys/types.h>` includes.
    - Added `_GNU_SOURCE` define for `pread`/`pwrite` on older glibc.

0.3.1  (2026-04-20)
-------------------

  * Fixed Windows build compatibility issues.
    - New `src/platform.{h,cpp}` providing cross-platform abstraction layer.
    - Windows implementations for file operations using `_write`, `_read`, `_lseek`,
      `_close`, `_commit` instead of POSIX `pwrite`/`pread`/`fsync`/`close`.
    - Windows memory mapping using `CreateFileMapping`/`MapViewOfFile` instead
      of POSIX `mmap`/`munmap`.
    - Fixed `strdup` deprecation warning on Windows (use `_strdup`).
    - All 553 tests pass on macOS; code compiles for Windows (MSVC/Clang).

0.3.0  (2026-04-20)
-------------------

  * Added batch Put API for efficient bulk ingestion. Closes #6.
    - New C API: `logosdb_put_batch(db, embeddings, n, dim, texts, timestamps,
      out_ids, errptr)` for inserting N vectors in one call.
    - New C++ wrapper: `DB::put_batch(embeddings, n, texts, timestamps)` returning
      a vector of assigned row ids.
    - Optimized implementation:
      * Single ftruncate + pwrite for all vectors (vs N separate calls).
      * Single batch write for all metadata JSONL lines.
      * Single mmap remap at the end (vs N remaps).
      * Write mutex held once for entire batch.
    - Benchmark: batch insertion of 100K vectors is ~4× faster than equivalent
      loop of individual put() calls.
    - New tests: `test_put_batch_basic`, `test_put_batch_empty`.

0.2.2  (2026-04-20)
-------------------

  * Introduced Write-Ahead Log (WAL) for atomic Put operations. Closes #2.
    - New `src/wal.{h,cpp}` implementing `WriteAheadLog` class with binary
      append-only format, state tracking (PENDING/COMMITTED), and replay.
    - `logosdb_put` now writes to WAL first (durability point), then modifies
      vector storage, metadata, and HNSW index. On success, WAL entry is marked
      committed; on crash, pending entries are replayed on next open.
    - `logosdb_open` replays any pending WAL entries before serving requests,
      ensuring consistency across all three stores after partial failures.
    - New `wal.log` file in database directory; backward compatible with
      existing databases (WAL is empty on first open).
    - New test `test_wal_crash_recovery` validates crash recovery behavior.

0.2.1  (2026-04-20)
-------------------

  * Replaced hand-rolled JSON parser with nlohmann/json (v3.11.3, single-header,
    vendored in third_party/nlohmann/). This fixes multiple parsing edge cases:
    unicode escape sequences (\uXXXX), key ordering independence, extra
    whitespace tolerance, and proper escape handling. Closes #5.
    Added regression tests for unicode, empty strings, complex backslash
    escapes, and key-order variations. All 130 C++ tests pass; total assertions
    up from 115 to 130.

0.2.0  (2026-04-17)
-------------------

  Highlights: first PyPI release with Python bindings, public delete/update
  API, and CI across Linux and macOS.

  Delete / update API (closes #3)

  * Added logosdb_delete(id) and logosdb_update(id, ...) to the public C API,
    with matching DB::del() and DB::update() on the C++ wrapper. Deletions
    are persisted as tombstone records in the JSONL metadata log and replayed
    onto the HNSW index on reopen (including when the index file is missing
    and has to be rebuilt from vectors.bin).
  * Added logosdb_count_live() / DB::count_live() returning total rows minus
    deleted rows. logosdb_count() continues to report the total (including
    tombstoned rows) for storage-level introspection.
  * Metadata JSONL format now also carries tombstone records of the form
    {"op":"del","id":N}. Older data files remain readable.
  * New C++ tests cover delete/update/re-put, persistence across reopen, and
    tombstone replay after an index rebuild.

  Python bindings and PyPI release (closes #4)

  * Added Python bindings (pybind11 + scikit-build-core). pip install logosdb
    installs a native extension that exposes logosdb.DB with put, search,
    delete, update, count, count_live, dim, raw_vectors (zero-copy numpy
    view) and logosdb.SearchHit. Smoke tests under tests/python/ and a
    numpy/sentence-transformers example under examples/python/.
  * CMakeLists.txt gained LOGOSDB_BUILD_TOOLS, LOGOSDB_BUILD_TESTS, and
    LOGOSDB_BUILD_PYTHON options so the Python wheel build skips CLI /
    benchmark / C++ test targets. The static core library now builds with
    position-independent code so it can be linked into the Python extension.
  * Added PyPI release plumbing: cibuildwheel configuration in
    pyproject.toml and a .github/workflows/publish.yml workflow that, on
    every v* tag, builds wheels for {Linux, macOS} x {x86_64, arm64} x
    CPython {3.9..3.13}, builds the sdist, and uploads everything to PyPI
    via Trusted Publishing (OIDC, no API tokens). A RELEASING.md runbook
    documents the one-time Trusted Publisher setup and the release flow.

  CI and infrastructure (closes #1)

  * New .github/workflows/ci.yml builds and tests on ubuntu-latest and
    macos-latest in both Debug and Release via CMake + Ninja, runs ctest,
    and smoke-tests the CLI. Build directory is cached across runs.
  * New .github/workflows/python.yml builds the Python wheel and runs the
    pytest smoke suite on Linux and macOS for CPython 3.10 and 3.12.
  * Fixed a latent transitive-include bug in tools/logosdb-bench.cpp that
    broke the build on Ubuntu's libstdc++ (std::partial_sort requires an
    explicit <algorithm> include).

0.1.0  (2025-06-25)
-------------------

  Initial release.

  * HNSW-based approximate nearest-neighbor search (hnswlib).
  * Binary mmap-backed vector storage with fixed-stride rows.
  * Append-only JSONL metadata store (text + ISO 8601 timestamp per row).
  * RocksDB/LevelDB-style C API with opaque handles and errptr convention.
  * C++ convenience wrapper (logosdb::DB) with RAII and exceptions.
  * logosdb-cli command-line tool (put, search, info).
  * logosdb-bench benchmark tool (HNSW vs brute-force).
  * Crash recovery: HNSW index backfill from vector store on open.
  * Thread-safe writes; lock-free concurrent reads.
  * Integer overflow and bounds-checked storage arithmetic.
  * 76 unit tests covering core operations, persistence, edge cases,
    dimension mismatch rejection, and accessor bounds.
