logosdb change log
==================

0.6.0  (2026-04-28)
-------------------

  Docs + example: on-prem memory-efficient RAG (closes #27)

  * New `docs/rag-on-prem.md` comprehensive guide for on-premises RAG:
    - RAM model: explains mmap-based memory scaling with query patterns
    - Sizing guide: disk size and RAM estimates for 10K to 10M vectors
    - Architecture patterns: embed offline/query online, time-sharding
    - External quantization guidance for edge deployments
  * New `examples/python/memory_efficient_rag.py` with:
    - RSS memory tracking during ingest and query
    - Batch processing to limit peak RAM
    - Database size reporting vs theoretical minimum
    - Complete RAG loop demonstration
  * Updated README with memory-efficiency section and quick RAG example.

  LlamaIndex VectorStore backend (closes #23)

  * New LogosDBIndex class implementing LlamaIndex's VectorStore interface.
  * API: add(), delete(), query(), persist(), clear(), count(), client().
  * Supports timestamp range filtering via query kwargs (ts_from, ts_to).
  * Cosine similarity mode with automatic vector normalization.
  * Optional dependency: pip install 'logosdb[llama-index]'.
  * 19 test cases covering all functionality.
  * Shares helper utilities with LangChain adapter (internal refactor).

  L2-normalization helpers (closes #15)

  * New C API: `logosdb_l2_normalize(float *vec, int dim)` - returns 0 on success,
    -1 if zero norm (vector unchanged).
  * New C++ helpers: `logosdb::l2_normalize(vector<float>&)` and
    `logosdb::l2_normalized(vector<float>)` for in-place and copy variants.
  * Double-precision norm calculation for accuracy.
  * Unit tests cover: random vectors, already-normalized, zero vectors, small values.
  * README examples updated to use helpers instead of hand-rolled loops.

0.5.0  (2026-04-27)
-------------------

  LogosDB CLI polish (closes #13)

  * info command now reads dim from vectors.bin header (--dim optional).
  * Added --json output mode for info and search commands (machine-parseable).
  * New export command: export DB to JSONL with base64-encoded vectors.
  * New import command: import from JSONL with base64 vectors (round-trip capable).
  * Added --query-id option: use existing vector as search query.
  * Added --ts-from and --ts-to for timestamp range filtering in search.
  * Added --version flag and proper --help for each subcommand.
  * Consistent exit codes (0 success, 1 error).

  Avoid full munmap/mmap on every vector append (closes #14)

  * Implemented reservation mapping to eliminate costly munmap/mmap on append.
  * Reserve 1GB virtual address space on open (Linux: MAP_NORESERVE, Windows: large mapping).
  * Extend active mapping incrementally when file grows within reservation.
  * Full remap only triggered when file exceeds reserved space.
  * Pointers returned by row()/data() now remain valid across concurrent appends.
  * Added test verifying pointer stability across 100 appends.

  LangChain VectorStore adapter (closes #22)

  * New LogosDBVectorStore class implementing LangChain's VectorStore interface.
  * API: add_texts(), add_documents(), similarity_search(), similarity_search_by_vector(),
    similarity_search_with_score(), delete(), from_documents(), from_texts().
  * Supports timestamp range filtering via kwargs (ts_from, ts_to).
  * Cosine similarity mode with automatic vector normalization.
  * Optional dependency: pip install 'logosdb[langchain]'.
  * 14 test cases covering all functionality.

0.4.1  (2026-04-22)
-------------------

  * Version bump to synchronize C++ and Python package versions.

0.4.0  (2026-04-22)
-------------------

  Configurable distance metrics (closes #8)

  * Added support for three distance metrics via `logosdb_options_set_distance()`:
    - `LOGOSDB_DIST_IP` (default): Inner product on L2-normalized vectors.
    - `LOGOSDB_DIST_COSINE`: Cosine similarity with automatic L2-normalization.
    - `LOGOSDB_DIST_L2`: Euclidean distance using hnswlib::L2Space.
  * The distance metric is persisted in `hnsw.idx.meta` and validated on reopen.
    Reopening a database with a mismatched distance metric returns a clear error.
  * Cosine mode automatically normalizes vectors during `put()` and `search()`
    operations — callers no longer need to pre-normalize embeddings.
  * C++ API: `Options::distance` field accepts the same constants.
  * Added 16 test assertions for distance metric functionality.

  Metadata-filtered search with timestamp range (closes #7)

  * New `logosdb_search_ts_range()` API for post-filtered vector search.
    - Filters results by ISO 8601 timestamp range (inclusive bounds).
    - `ts_from_iso8601` and `ts_to_iso8601` parameters are optional (NULL/empty = no bound).
    - `candidate_k` parameter controls internal fetch size for recall vs latency tradeoff.
  * C++ API: `DB::search_ts_range(query, top_k, ts_from, ts_to, candidate_k)`.
  * Use case: "search within last 24h" or "search within conversation X" for LLM memory.
  * Added 34 test assertions for timestamp range filtering, edge cases, and recall.

0.3.2  (2026-04-20)
-------------------

  * Fixed POSIX compilation issues for older systems (manylinux).
    - Added missing `<sys/types.h>` includes.
    - Added `_GNU_SOURCE` define for `pread`/`pwrite` on older glibc.

0.3.1  (2026-04-20)
-------------------

  * Fixed Windows build compatibility issues.
    - New `src/platform.{h,cpp}` providing cross-platform abstraction layer.
    - Windows implementations for file operations using `_write`, `_read`, `_lseek`,
      `_close`, `_commit` instead of POSIX `pwrite`/`pread`/`fsync`/`close`.
    - Windows memory mapping using `CreateFileMapping`/`MapViewOfFile` instead
      of POSIX `mmap`/`munmap`.
    - Fixed `strdup` deprecation warning on Windows (use `_strdup`).
    - All 553 tests pass on macOS; code compiles for Windows (MSVC/Clang).

0.3.0  (2026-04-20)
-------------------

  * Added batch Put API for efficient bulk ingestion. Closes #6.
    - New C API: `logosdb_put_batch(db, embeddings, n, dim, texts, timestamps,
      out_ids, errptr)` for inserting N vectors in one call.
    - New C++ wrapper: `DB::put_batch(embeddings, n, texts, timestamps)` returning
      a vector of assigned row ids.
    - Optimized implementation:
      * Single ftruncate + pwrite for all vectors (vs N separate calls).
      * Single batch write for all metadata JSONL lines.
      * Single mmap remap at the end (vs N remaps).
      * Write mutex held once for entire batch.
    - Benchmark: batch insertion of 100K vectors is ~4× faster than equivalent
      loop of individual put() calls.
    - New tests: `test_put_batch_basic`, `test_put_batch_empty`.

0.2.2  (2026-04-20)
-------------------

  * Introduced Write-Ahead Log (WAL) for atomic Put operations. Closes #2.
    - New `src/wal.{h,cpp}` implementing `WriteAheadLog` class with binary
      append-only format, state tracking (PENDING/COMMITTED), and replay.
    - `logosdb_put` now writes to WAL first (durability point), then modifies
      vector storage, metadata, and HNSW index. On success, WAL entry is marked
      committed; on crash, pending entries are replayed on next open.
    - `logosdb_open` replays any pending WAL entries before serving requests,
      ensuring consistency across all three stores after partial failures.
    - New `wal.log` file in database directory; backward compatible with
      existing databases (WAL is empty on first open).
    - New test `test_wal_crash_recovery` validates crash recovery behavior.

0.2.1  (2026-04-20)
-------------------

  * Replaced hand-rolled JSON parser with nlohmann/json (v3.11.3, single-header,
    vendored in third_party/nlohmann/). This fixes multiple parsing edge cases:
    unicode escape sequences (\uXXXX), key ordering independence, extra
    whitespace tolerance, and proper escape handling. Closes #5.
    Added regression tests for unicode, empty strings, complex backslash
    escapes, and key-order variations. All 130 C++ tests pass; total assertions
    up from 115 to 130.

0.2.0  (2026-04-17)
-------------------

  Highlights: first PyPI release with Python bindings, public delete/update
  API, and CI across Linux and macOS.

  Delete / update API (closes #3)

  * Added logosdb_delete(id) and logosdb_update(id, ...) to the public C API,
    with matching DB::del() and DB::update() on the C++ wrapper. Deletions
    are persisted as tombstone records in the JSONL metadata log and replayed
    onto the HNSW index on reopen (including when the index file is missing
    and has to be rebuilt from vectors.bin).
  * Added logosdb_count_live() / DB::count_live() returning total rows minus
    deleted rows. logosdb_count() continues to report the total (including
    tombstoned rows) for storage-level introspection.
  * Metadata JSONL format now also carries tombstone records of the form
    {"op":"del","id":N}. Older data files remain readable.
  * New C++ tests cover delete/update/re-put, persistence across reopen, and
    tombstone replay after an index rebuild.

  Python bindings and PyPI release (closes #4)

  * Added Python bindings (pybind11 + scikit-build-core). pip install logosdb
    installs a native extension that exposes logosdb.DB with put, search,
    delete, update, count, count_live, dim, raw_vectors (zero-copy numpy
    view) and logosdb.SearchHit. Smoke tests under tests/python/ and a
    numpy/sentence-transformers example under examples/python/.
  * CMakeLists.txt gained LOGOSDB_BUILD_TOOLS, LOGOSDB_BUILD_TESTS, and
    LOGOSDB_BUILD_PYTHON options so the Python wheel build skips CLI /
    benchmark / C++ test targets. The static core library now builds with
    position-independent code so it can be linked into the Python extension.
  * Added PyPI release plumbing: cibuildwheel configuration in
    pyproject.toml and a .github/workflows/publish.yml workflow that, on
    every v* tag, builds wheels for {Linux, macOS} x {x86_64, arm64} x
    CPython {3.9..3.13}, builds the sdist, and uploads everything to PyPI
    via Trusted Publishing (OIDC, no API tokens). A RELEASING.md runbook
    documents the one-time Trusted Publisher setup and the release flow.

  CI and infrastructure (closes #1)

  * New .github/workflows/ci.yml builds and tests on ubuntu-latest and
    macos-latest in both Debug and Release via CMake + Ninja, runs ctest,
    and smoke-tests the CLI. Build directory is cached across runs.
  * New .github/workflows/python.yml builds the Python wheel and runs the
    pytest smoke suite on Linux and macOS for CPython 3.10 and 3.12.
  * Fixed a latent transitive-include bug in tools/logosdb-bench.cpp that
    broke the build on Ubuntu's libstdc++ (std::partial_sort requires an
    explicit <algorithm> include).

0.1.0  (2025-06-25)
-------------------

  Initial release.

  * HNSW-based approximate nearest-neighbor search (hnswlib).
  * Binary mmap-backed vector storage with fixed-stride rows.
  * Append-only JSONL metadata store (text + ISO 8601 timestamp per row).
  * RocksDB/LevelDB-style C API with opaque handles and errptr convention.
  * C++ convenience wrapper (logosdb::DB) with RAII and exceptions.
  * logosdb-cli command-line tool (put, search, info).
  * logosdb-bench benchmark tool (HNSW vs brute-force).
  * Crash recovery: HNSW index backfill from vector store on open.
  * Thread-safe writes; lock-free concurrent reads.
  * Integer overflow and bounds-checked storage arithmetic.
  * 76 unit tests covering core operations, persistence, edge cases,
    dimension mismatch rejection, and accessor bounds.
