Metadata-Version: 2.4
Name: nestfind
Version: 0.1.1
Summary: Powerful deep search for nested dict/list structures
Project-URL: Homepage, https://github.com/romysaputrasihanandaa/nestfind
Project-URL: Issues, https://github.com/romysaputrasihanandaa/nestfind/issues
License: MIT
Keywords: data,deep-search,dict,json,nested,search
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown

# nestfind

Powerful deep search for nested dict/list structures in Python.

Traverses arbitrarily nested `dict`/`list` data using a flexible path-based syntax,
supporting fallback paths, multiple sources, wildcard matching, predicate filtering, and more.

## Installation

```bash
pip install nestfind
```

## Quick Start

```python
from nestfind import deep_search

data = {
    "user": {
        "profile": {
            "name": "Alice",
            "email": "alice@example.com"
        }
    }
}

deep_search(data, "user", "profile", "name")   # → "Alice"
deep_search(data, "email")                      # → "alice@example.com"  (wide search)
```

## Path Segment Types

| Segment | Description | Example |
|---------|-------------|---------|
| `str` | Wide search key — finds key anywhere in nested structure | `"name"` |
| `str + "!"` | Condition key — returns the **parent dict** containing this key | `"uri!"` |
| `str + "?"` | Optional key — exact match only, skips wide search if not found | `"nickname?"` |
| `"*"` | Wildcard — matches ALL keys/items at this level | `"*"` |
| `int` | List index — exact positional access, supports negative | `0`, `-1` |
| `callable` | Predicate filter — include item only if callable returns truthy | `lambda u: u.get("active")` |

## Modes

### Single path
```python
deep_search(data, "a", "b", "c")
```

### Fallback mode — tries paths in order, returns first non-empty result
```python
deep_search(data, ["uri"], ["browser_native_hd_url"])
```

### Multi-source mode — each list is `[source, *keys]`
```python
deep_search([source1, "key1"], [source2, "key2", "key3"])
```

## Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `return_first` | `bool` | `True` | Return first match or list of all matches |
| `default` | `Any` | `None` | Value to return if nothing found |
| `type_filter` | `type` or `tuple` | `None` | Only return results of this type |
| `value_filter` | `callable` | `None` | Only return results where `value_filter(v)` is truthy |
| `transform` | `callable` | `None` | Apply function to each result before returning |
| `max_depth` | `int` | `None` | Maximum nesting depth for wide search |
| `exclude_keys` | `list[str]` | `None` | Skip these keys during wide search |
| `strict` | `bool` | `False` | Disable wide search — exact path traversal only |
| `with_path` | `bool` | `False` | Return `(value, path)` tuples instead of bare values |
| `debug` | `bool` | `False` | Enable debug logging |

## Examples

```python
from nestfind import deep_search, DeepSearch

data = {
    "users": [
        {"id": 1, "name": "Alice", "active": True},
        {"id": 2, "name": "Bob",   "active": False},
    ]
}

# Get all emails using wildcard
deep_search(data, "users", "*", "name", return_first=False)
# → ["Alice", "Bob"]

# Filter with predicate
deep_search(data, "users", lambda u: u.get("active"), "name")
# → "Alice"

# Return with path
deep_search(data, "name", with_path=True)
# → ("Alice", ["users", 0, "name"])

# Type filter
deep_search(data, "id", type_filter=int)
# → 1

# Class wrapper — bind config once, reuse
ds = DeepSearch(exclude_keys=["metadata"], max_depth=5)
ds(data, "users", "*", "name", return_first=False)
# → ["Alice", "Bob"]
```

### `DeepSearch` class

Bind configuration once and reuse across calls:

```python
class FacebookMapper:
    deep_search = DeepSearch(exclude_keys=["metadata"])

    def map(self, raw):
        return self.deep_search(raw, "user", "name")
```

---

## Advanced Examples

### Parsing inconsistent API responses

Real-world APIs often return the same data under different keys depending on the endpoint or version.
Use **fallback mode** to handle all variants transparently:

```python
# Instagram-style response — video URL can live under many keys
media = {
    "video_versions": [
        {"type": 101, "url": "https://cdn.example.com/video_hd.mp4"},
        {"type": 102, "url": "https://cdn.example.com/video_sd.mp4"},
    ]
}

url = deep_search(
    media,
    ["video_versions", 0, "url"],       # preferred: first video version
    ["video_dash_manifest"],             # fallback 1
    ["browser_native_hd_url"],           # fallback 2
    ["browser_native_sd_url"],           # fallback 3
)
# → "https://cdn.example.com/video_hd.mp4"
```

### Multi-source with priority

When you have multiple raw payloads and want the first one that has a given value:

```python
post    = {"media": {"image_versions": {"candidates": [{"url": "https://img.example.com/post.jpg"}]}}}
story   = {}   # empty / missing
reel    = {"image_versions": {"candidates": [{"url": "https://img.example.com/reel.jpg"}]}}

thumbnail = deep_search(
    [story,  "image_versions", "candidates", 0, "url"],
    [post,   "media", "image_versions", "candidates", 0, "url"],
    [reel,   "image_versions", "candidates", 0, "url"],
)
# → "https://img.example.com/post.jpg"  (story was empty, post matched first)
```

### Wildcard + predicate chaining

Collect the display URL of every **video** item in a feed that has more than 1M views:

```python
feed = {
    "items": [
        {"media_type": 2, "view_count": 1_500_000, "video_url": "https://cdn.example.com/a.mp4"},
        {"media_type": 1, "view_count": 3_000_000, "image_url": "https://cdn.example.com/b.jpg"},
        {"media_type": 2, "view_count": 800_000,   "video_url": "https://cdn.example.com/c.mp4"},
        {"media_type": 2, "view_count": 2_200_000, "video_url": "https://cdn.example.com/d.mp4"},
    ]
}

viral_videos = deep_search(
    feed,
    "items",
    lambda item: item.get("media_type") == 2 and item.get("view_count", 0) > 1_000_000,
    "video_url",
    return_first=False,
)
# → ["https://cdn.example.com/a.mp4", "https://cdn.example.com/d.mp4"]
```

### Condition key `"!"` — grab the parent dict

Useful when you need the whole object that *contains* a specific key, not just the value at that key:

```python
story = {
    "reel": {
        "items": [
            {
                "id": "abc123",
                "media": {
                    "uri": "https://cdn.example.com/story.mp4",
                    "width": 1080,
                    "height": 1920,
                }
            }
        ]
    }
}

# Get the entire media dict that contains "uri", not just the uri value
media_obj = deep_search(story, "media", "uri!")
# → {"uri": "https://cdn.example.com/story.mp4", "width": 1080, "height": 1920}

# Now you can access sibling keys directly
print(media_obj["width"], media_obj["height"])   # 1080 1920
```

### Optional key `"?"` — graceful missing fields

Skip a segment silently when it may or may not exist, without falling back to wide search:

```python
user_a = {"profile": {"display_name": "Alice",  "nickname": "ali"}}
user_b = {"profile": {"display_name": "Bob"}}   # no nickname

# "nickname?" won't error or wide-search if missing — just moves on
for user in [user_a, user_b]:
    label = deep_search(
        user,
        "profile", "nickname?",     # use nickname if present …
        default=deep_search(user, "profile", "display_name"),  # … else display_name
    )
    print(label)
# → "ali"
# → "Bob"
```

### `with_path` — audit where a value came from

When debugging deeply nested structures, knowing *where* a value was found is as important as the value itself:

```python
config = {
    "services": {
        "auth": {
            "database": {
                "host": "db-auth.internal",
                "port": 5432,
            }
        },
        "api": {
            "database": {
                "host": "db-api.internal",
                "port": 5432,
            }
        }
    }
}

results = deep_search(config, "host", return_first=False, with_path=True)
# → [
#     ("db-auth.internal", ["services", "auth", "database", "host"]),
#     ("db-api.internal",  ["services", "api",  "database", "host"]),
# ]

for value, path in results:
    print(" → ".join(str(p) for p in path), "=", value)
# services → auth → database → host = db-auth.internal
# services → api  → database → host = db-api.internal
```

### `transform` + `type_filter` — extract and reshape in one pass

```python
raw = {
    "stats": {
        "impressions": "12400",   # string from API
        "clicks":      "837",
        "spend":       "42.50",
    }
}

# Pull all numeric-looking strings and cast to float in one call
values = deep_search(
    raw,
    "stats",
    "*",
    return_first=False,
    value_filter=lambda v: isinstance(v, str) and v.replace(".", "").isdigit(),
    transform=float,
)
# → [12400.0, 837.0, 42.5]
```

### `exclude_keys` + `max_depth` — scoped search in large payloads

Prevent the wide search from wandering into noisy or irrelevant subtrees:

```python
response = {
    "data": {
        "user": {"id": 1, "name": "Alice"},
    },
    "metadata": {
        "user": {"id": 999, "name": "__system__"},   # should be ignored
    },
    "debug": {
        "trace": {"user": {"id": -1}}                # deep noise, also ignored
    }
}

name = deep_search(
    response,
    "user", "name",
    exclude_keys=["metadata", "debug"],
    max_depth=3,
)
# → "Alice"  (metadata and debug subtrees are skipped entirely)
```

### `strict=True` — exact path, no surprises

When you know the exact structure and want to disable wide search for performance or correctness:

```python
data = {
    "a": {
        "b": {
            "c": 42,
            "extra": {"c": 999}   # would be found by wide search
        }
    }
}

deep_search(data, "a", "b", "c")                    # → 42  (wide search off by default for exact hit)
deep_search(data, "a", "b", "c", strict=True)       # → 42  (exact path only)
deep_search(data, "b", "c",     strict=True)        # → None (strict: won't descend into "a" automatically)
```

### Reusable mapper class with `DeepSearch`

Bind a shared configuration at the class level and override per-call as needed:

```python
from nestfind import DeepSearch

class InstagramMediaMapper:
    ds = DeepSearch(exclude_keys=["debug", "logging"], max_depth=8)

    def map(self, raw: dict) -> dict:
        return {
            "id":        self.ds(raw, "pk"),
            "shortcode": self.ds(raw, "code"),
            "type":      self.ds(raw, "media_type", type_filter=int),
            "url":       self.ds(
                             raw,
                             ["video_versions", 0, "url"],
                             ["image_versions", "candidates", 0, "url"],
                         ),
            "width":     self.ds(raw, "original_width",  type_filter=int),
            "height":    self.ds(raw, "original_height", type_filter=int),
            "owner_id":  self.ds(raw, "owner", "pk"),
            "timestamp": self.ds(raw, "taken_at",        type_filter=int),
        }
```

## License

MIT
