Metadata-Version: 2.4
Name: mongoeco
Version: 3.2.0
Summary: Async-first MongoDB-like persistence library with pluggable storage engines.
Author: Vicente Ruiz
License-Expression: Apache-2.0
Keywords: mongodb,pymongo,testing,sqlite,embedded-database
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Database
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cxp>=2.0.0
Requires-Dist: pyuca<2,>=1.2
Requires-Dist: shapely<3,>=2.1
Requires-Dist: usearch<3,>=2.23
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: coverage; extra == "dev"
Requires-Dist: pre-commit; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: twine; extra == "dev"
Provides-Extra: mongodb7
Requires-Dist: pymongo<5,>=4.9; extra == "mongodb7"
Provides-Extra: mongodb8
Requires-Dist: pymongo<5,>=4.9; extra == "mongodb8"
Provides-Extra: mongodb-real
Requires-Dist: pymongo<5,>=4.9; extra == "mongodb-real"
Provides-Extra: wire
Requires-Dist: pymongo<5,>=4.9; extra == "wire"
Provides-Extra: json-fast
Requires-Dist: orjson<4,>=3.11; extra == "json-fast"
Dynamic: license-file

# mongoeco

`mongoeco` is an async-first embedded runtime aligned with the canonical CXP
`database/mongodb` interface, with a MongoDB-like surface and pluggable
storage engines.

It is built for local development, test environments, embedded persistence and
compatibility work where a PyMongo-shaped API is useful without requiring a
real MongoDB server for every workflow.

## Current Scope

What is already in place:

* async and sync client APIs
* memory and SQLite engines
* transactional local sessions and local admin/runtime introspection
* aggregation runtime with pushdown and spill guardrails
* a canonical public capability model for `database/mongodb` through CXP
* compatibility modeling across MongoDB dialects and PyMongo profiles
* local wire/driver runtime
* local geospatial, classic `$text`, `$search` and ANN-backed `$vectorSearch`

What this is not:

* a drop-in replacement for a production MongoDB cluster
* a full Atlas Search implementation
* a geodesic geospatial engine
* a full-text/vector engine with server-grade scaling guarantees

## When To Use `mongoeco`

`mongoeco` fits well when you want:

* a local `database/mongodb` runtime for development or CI;
* a local PyMongo-like runtime without a server process;
* more semantic fidelity than a lightweight mock;
* embedded persistence with either memory or SQLite;
* local `$text`, `$search` and `$vectorSearch` without standing up a server;
* explicit compatibility and `explain()` diagnostics instead of opaque best
  effort behavior.

Reference:

* [docs/use-cases.md](/Users/uve/Proyectos/mongoeco2/docs/use-cases.md)
* [docs/use-cases/embedded-app.md](/Users/uve/Proyectos/mongoeco2/docs/use-cases/embedded-app.md)
* [docs/use-cases/test-runtime.md](/Users/uve/Proyectos/mongoeco2/docs/use-cases/test-runtime.md)
* [docs/use-cases/local-search-and-retrieval.md](/Users/uve/Proyectos/mongoeco2/docs/use-cases/local-search-and-retrieval.md)
* [docs/comparisons.md](/Users/uve/Proyectos/mongoeco2/docs/comparisons.md)

## Installation

Editable local install:

```bash
python -m pip install -e .
```

Development install:

```bash
python -m pip install -e .[dev]
```

The base install now also includes `cxp>=2.0.0`, so `mongoeco` can expose the
canonical `database/mongodb` contract directly.
Reference:

* [docs/cxp.md](/Users/uve/Proyectos/mongoeco2/docs/cxp.md)

The public compatibility export and the top-level `cxp` blocks surfaced by
`find(...).explain()` / `aggregate(...).explain()` are now projected from that
canonical CXP capability model.

The public CXP story now includes canonical first-level operation bindings as
well:

* `read` -> `find`, `find_one`, `count_documents`, `estimated_document_count`,
  `distinct`
* `write` -> `insert_one`, `insert_many`, `update_one`, `update_many`,
  `replace_one`, `delete_one`, `delete_many`, `bulk_write`
* `aggregation` -> `aggregate`
* `change_streams` -> `watch`
* `transactions` -> `start_session`, `with_transaction`

For `search` and `vector_search`, `mongoeco` still binds the public operation
`aggregate` and uses metadata to describe the supported stage-level subset.

`mongoeco` does not ship a live CXP provider wrapper for its clients. Instead,
it exposes the canonical catalog and projects the active capability path
through `compat` and `explain()`. External systems can wrap `mongoeco` if they
want to negotiate profiles or instantiate resources through CXP.

The direct path from `mongoeco` itself is:

```python
from mongoeco import MongoClient, MONGODB_CATALOG, export_cxp_catalog
from mongoeco.engines.memory import MemoryEngine


print(export_cxp_catalog()["interface"])

with MongoClient(MemoryEngine()) as client:
    collection = client.get_database("demo").get_collection("items")
    collection.insert_one({"_id": 1, "score": 8})
    explain = collection.aggregate([{"$match": {"score": {"$gte": 8}}}]).explain()
    print(MONGODB_CATALOG.interface)
    print(explain["cxp"])
```

The base package now includes `pyuca` as a runtime dependency so Unicode
collations keep a deterministic UCA-backed behavior even when `PyICU` is not
installed.

Advanced ICU collation backend (optional):

```bash
python -m pip install PyICU
```

Collation backend policy:

* `PyICU` stays optional by contract: it is never required for the supported
  baseline subset
* `PyICU` if available: preferred backend, including advanced collation knobs
  such as `backwards`, `alternate`, `maxVariable` and `normalization`
* `pyuca` fallback: Unicode collation for the supported basic subset
  (`locale=en`, `strength`, `caseLevel`, `numericOrdering`)
* if advanced knobs are requested without `PyICU`, `mongoeco` raises an error
  instead of silently ignoring them

Optional fast JSON backend:

```bash
python -m pip install -e .[json-fast]
```

`mongoeco` uses the standard library `json` module by default, even if
`orjson` is installed. You can choose the backend at process start with
`MONGOECO_JSON_BACKEND`:

* `stdlib`: always use the standard library JSON backend
* `orjson`: require `orjson` and use it
* `auto`: use `orjson` when available, otherwise fall back to `stdlib`

Example:

```bash
MONGOECO_JSON_BACKEND=orjson python your_app.py
```

Unicode collation backend:

* `mongoeco` prefers `PyICU` when it is available
* otherwise it uses the bundled `pyuca` dependency
* the `simple` collation keeps using the BSON/Python baseline comparator and
  rejects Unicode tailoring knobs such as `caseLevel` or `numericOrdering`
* the currently supported locale surface is `simple` and `en`
* the currently supported strengths are `1`, `2` and `3`
* `numericOrdering` and `caseLevel` are supported for `locale=en`
* `PyICU` and `pyuca` are intentionally close, but may still differ on
  advanced tailoring details outside the currently supported locale surface
* local change streams retain a bounded in-memory history; the retention size
  can be tuned with `change_stream_history_size` on async/sync clients and on
  direct async database/collection constructors
* local change streams can also persist that retained history to a journal file
  with `change_stream_journal_path`, allowing `resume_after` / `start_after`
  to survive client recreation inside the same local environment
* the journal can also be hardened with `change_stream_journal_fsync=True`
  and bounded by size with `change_stream_journal_max_bytes`
* when journaling is enabled, `mongoeco` keeps an incremental event log and
  compacts it back into a retained snapshot as the local history rolls forward;
  each log entry carries an integrity checksum and truncated tail writes are
  ignored on reload
* clients, databases and direct collections expose `change_stream_state()` so
  local retained history, journal files and compaction progress can be
  inspected at runtime
* clients, databases and direct collections also expose
  `change_stream_backend_info()`, which makes the contract explicit:
  change streams are local, optionally persistent via journal, resumable inside
  that local environment, and not distributed across nodes
* the local driver now starts non-direct single-seed topologies as
  provisional `UNKNOWN` and relies on `hello` discovery to converge towards
  `standalone`, `replicaSet` or `sharded` topology shapes
* retryable reads and writes now apply to real wire connection failures too:
  connect/read/write socket errors are normalized to `ConnectionFailure`
* replica-set discovery also tracks per-server health states (`healthy`,
  `recovering`, `degraded`, `unreachable`) and uses them to prefer healthier
  candidates when ordering eligible servers
* clients expose `sdam_capabilities()` so the supported SDAM subset is
  inspectable at runtime instead of being implicit in the implementation
* `mongoeco.collation_backend_info()` reports the active Unicode backend,
  while `mongoeco.collation_capabilities_info()` reports the supported locale
  surface and which advanced knobs require `PyICU`

## Quick Start

Async with the in-memory engine:

```python
import asyncio

from mongoeco import AsyncMongoClient
from mongoeco.engines.memory import MemoryEngine


async def main() -> None:
    async with AsyncMongoClient(MemoryEngine()) as client:
        collection = client.demo.users
        await collection.insert_one({"_id": "1", "name": "Ada"})
        document = await collection.find_one({"name": "Ada"})
        print(document)


asyncio.run(main())
```

Sync with SQLite:

```python
from mongoeco import MongoClient
from mongoeco.engines.sqlite import SQLiteEngine


with MongoClient(SQLiteEngine("mongoeco.db")) as client:
    collection = client.demo.users
    collection.insert_one({"_id": "1", "name": "Ada"})
    print(collection.find_one({"_id": "1"}))
```

## Examples

Executable examples live under [examples/README.md](/Users/uve/Proyectos/mongoeco2/examples/README.md):

* [memory_quickstart.py](/Users/uve/Proyectos/mongoeco2/examples/memory_quickstart.py)
* [sqlite_embedded_app.py](/Users/uve/Proyectos/mongoeco2/examples/sqlite_embedded_app.py)
* [test_runtime_local.py](/Users/uve/Proyectos/mongoeco2/examples/test_runtime_local.py)
* [search_and_vector_local.py](/Users/uve/Proyectos/mongoeco2/examples/search_and_vector_local.py)
* [vector_search_diagnostics.py](/Users/uve/Proyectos/mongoeco2/examples/vector_search_diagnostics.py)
* [cxp_adapter.py](/Users/uve/Proyectos/mongoeco2/examples/cxp_adapter.py)

The local `$search` subset now includes:

* `text`
* `phrase` with optional `slop`
* `autocomplete`
* `wildcard`
* `regex`
* `exists`
* `in`
* `equals`
* `range`
* `near`
* `compound`

Examples worth showing first:

* [test_runtime_local.py](/Users/uve/Proyectos/mongoeco2/examples/test_runtime_local.py)
  demonstrates `MemoryEngine` and `SQLiteEngine` as local contract runtimes
  with the same `$search.phrase` behavior.
* [search_and_vector_local.py](/Users/uve/Proyectos/mongoeco2/examples/search_and_vector_local.py)
  demonstrates exact `phrase` versus `phrase.slop`, plus a local `compound`
  query with title/body `phrase` + `in` + `range` + `exists` + `regex`.
* [vector_search_diagnostics.py](/Users/uve/Proyectos/mongoeco2/examples/vector_search_diagnostics.py)
  demonstrates how to read `similarity`, `numCandidates`, `minScore`,
  projected `vectorSearchScore`, residual filtering and exact fallback in local
  `$vectorSearch`.
* [cxp_adapter.py](/Users/uve/Proyectos/mongoeco2/examples/cxp_adapter.py)
  demonstrates the canonical CXP `database/mongodb` catalog and the `cxp`
  projection exposed by `aggregate(...).explain()`.

## Compatibility

`mongoeco` now exposes three public contract layers:

* the canonical CXP capability model for `database/mongodb`
* MongoDB server semantics through `mongodb_dialect`
* PyMongo surface compatibility through `pymongo_profile`

Planning mode is a third, separate concern:

* `STRICT` fails fast when a query, update or aggregation shape is not
  executable under the current runtime
* `RELAXED` preserves the request metadata and reports `planning_issues`
  instead of compiling an executable plan for unsupported shapes

See:

* [COMPATIBILITY.md](COMPATIBILITY.md)
* [DIALECTS.md](DIALECTS.md)
* [docs/architecture/index.md](docs/architecture/index.md)
* [docs/comparisons.md](docs/comparisons.md)

## Testing

The repository currently uses the standard library test runner:

```bash
python -m pip install -e .[dev,wire]
python -m unittest discover -s tests -p 'test*.py'
```

Contract-testing rule for new features:

* every new public feature should land with async/sync parity coverage when
  both surfaces expose it
* engine-visible behavior should also add cross-engine parity coverage for
  `MemoryEngine` and `SQLiteEngine` whenever the contract is meant to be shared
* regressions caused by facade reconstruction (`with_options()`, `database`,
  `get_collection()`, `rename()`) should be fixed with explicit tests for the
  inherited runtime options involved, not only with the implementation change
* feature work that changes public errors or degraded planning behavior should
  pin the relevant user-facing message or error shape in tests

Architecture reference:

* [docs/architecture/index.md](docs/architecture/index.md)

## Benchmarks

There is a benchmark harness under
[benchmarks/README.md](benchmarks/README.md) intended for reproducible local
profiling, regression tracking and community-facing performance analysis.

Quick smoke run:

```bash
python -m benchmarks.run \
  --engine all \
  --size 1000 \
  --warmup 0 \
  --repetitions 1
```

The harness currently covers:

* reads and point lookups;
* sort/limit and cursor materialization;
* mostly-streamable vs materializing aggregation;
* targeted local `search` and `vectorSearch` diagnostics.

Current rule of thumb from local diagnostics:

* `MemoryEngine` remains strongest on many Python-baseline filter paths;
* `SQLiteEngine` is strongest when it can push work to SQL, FTS5 or `usearch`;
* `wildcard`, `exists`, `in`, `equals`, `range` and some `compound` search shapes in
  SQLite now use a mix of materialized candidate prefilters and exact Python
  matching, depending on the operator/backend path;
* `vectorSearch` on SQLite is already materially faster than the exact
  baseline when the ANN backend is materialized.
* the public vector diagnostics also expose `similarity`, effective
  `numCandidates`, candidate evaluation counts and exact fallback reasons in
  benchmark metadata, so benchmark discussions can stay concrete instead of
  anecdotal.

For anything you plan to cite publicly, use the reproducible commands in
[benchmarks/README.md](benchmarks/README.md) instead of copying ad hoc local
numbers into docs.

## Project Status

The repository is in active development and the public package surface is still
best treated as pre-release.

Release-readiness checklist:

* [docs/release-checklist.md](docs/release-checklist.md)
* [docs/roadmap-3.2.md](/Users/uve/Proyectos/mongoeco2/docs/roadmap-3.2.md)
* [docs/release-3.2.0-draft.md](/Users/uve/Proyectos/mongoeco2/docs/release-3.2.0-draft.md)

## License

This project is licensed under the Apache License 2.0. See
[LICENSE](LICENSE).
