Metadata-Version: 2.4
Name: pydlist
Version: 2.0.0
Summary: Data list management library
Author-email: Willy Pregliasco <willy.pregliasco@gmail.com>
License: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: ipykernel>=7.2.0
Provides-Extra: pivot
Requires-Dist: duckdb>=1.0.0; extra == "pivot"
Provides-Extra: excel
Requires-Dist: openpyxl>=3.0.0; extra == "excel"
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Provides-Extra: all
Requires-Dist: pydlist[dev,excel,pivot]; extra == "all"

# Dlist

A list of dictionaries with database-like operations. Think of it as a lightweight, in-memory table where each row is a Python dictionary.

**Requires Python ≥ 3.10**

## Contents

- [Quickstart](#quickstart)
  - [Installation](#installation)
  - [Core concepts](#core-concepts)
  - [Exploring data](#exploring-data)
- [Indexing](#indexing)
- [Four core operations](#four-core-operations)
- [Arithmetic](#arithmetic)
- [Sorting](#sorting)
- [Input/Output](#inputoutput)
  - [Reading JSON files](#reading-json-files)
  - [Reading Excel files](#reading-excel-files)
  - [Formatted output](#formatted-output)
- [Quick reference](#quick-reference)
- [Project structure](#project-structure)
- [License](#license)

## Quickstart

```python
from dlist import Dlist

data = [
    {'name': 'alice', 'age': '30', 'city': 'london'},
    {'name': 'bob',   'age': '25', 'city': 'london'},
    {'name': 'charlie', 'age': '28', 'city': 'paris'},
]

d = Dlist(data, id_='name')

d.filter(city='london')          # select rows  → new Dlist (2 items)
d.assign(status='active')        # set fields   → new Dlist (3 items, all with status)
d.partition('city')              # group by key → {'london': Dlist, 'paris': Dlist}
d.tree()                         # explore structure in terminal
```

### Installation

```bash
pip install dlist
```

Or for development:

```bash
pip install dlist[dev]
```

---

### Core concepts

#### Data model

A `Dlist` is a list of dictionaries. Scalar values are stored as **strings**; **list** values are preserved as-is. An optional `id_` field acts as a unique primary key.

```python
d = Dlist([
    {'id': '1', 'product': 'laptop',   'price': '1200'},
    {'id': '2', 'product': 'mouse',    'price': '25'},
    {'id': '3', 'product': 'keyboard', 'price': '80'},
], id_='id')
```

#### Nested dictionaries

Values can be nested dicts. Access nested keys using `__` (double underscore), similar to Django's ORM syntax:

```python
data = [
    {'name': 'john', 'address': {'city': 'NYC', 'zip': '10001'}},
    {'name': 'jane', 'address': {'city': 'LA',  'zip': '90001'}},
]
d = Dlist(data, id_='name')

d.vals('address__city')              # → ['LA', 'NYC']
d.filter(address__city='NYC')        # → Dlist with john
```

Internally, the nested structure is always preserved. The `__` syntax is just for addressing.

#### List values

Fields can contain lists. List values are preserved (not coerced to strings) and get special treatment in `filter`, `vals`, and `partition`:

```python
data = [
    {'id': '1', 'name': 'photo1', 'tags': ['photo', 'landscape']},
    {'id': '2', 'name': 'video1', 'tags': ['video', 'interview']},
    {'id': '3', 'name': 'mixed',  'tags': ['photo', 'video']},
    {'id': '4', 'name': 'doc1',   'tags': 'report'},
]
d = Dlist(data, id_='id')

d.filter(tags='photo')       # → items 1, 3 (membership test)
d.filter(tags='-photo')      # → items 2, 4 (exclude)
d.filter(tags='*port*')      # → item 4 (glob on each element)
d.vals('tags')               # → ['interview', 'landscape', 'photo', 'report', 'video']
d.vals('tags', count=True)   # → {'photo': 3, 'video': 2, ...}
```

When displayed in tables, lists are shown as comma-separated values: `photo, landscape`.

---

### Exploring data

#### `keys()` — available keys

```python
d.keys()                    # all keys (sorted)
d.keys('address')           # subkeys under 'address'
d.keys(All=True)            # only keys present in every record
d.keys(count=True)          # {'name': 3, 'address': 3, ...}
```

#### `vals()` — values for a key

```python
d.vals('name')              # ['jane', 'john']
d.vals('address__city', count=True)   # {'NYC': 1, 'LA': 1}
```

#### `tree()` — visual overview

```python
d.tree()
```

```
Dlist (2 items, id='name')
├── name    2/2 ['john' (1), 'jane' (1)]
└── address
    ├── city    2/2 ['NYC' (1), 'LA' (1)]
    └── zip     2/2 ['10001' (1), '90001' (1)]
```

Each leaf shows: `key  unique_values/records_with_key [top values (count), ...]`

Use `d.tree(root='address')` to zoom into a subtree, or `d.tree(top=5)` to show more values.

---

## Indexing

Access records by integer position, by `id`, or by list:

```python
d[0]                # → dict (first record)
d['john']           # → dict (by id)
d[[0, 1]]           # → Dlist (subset)
d[['john', 'jane']] # → Dlist (subset by ids)
```

### Assignment

Assignment **merges** the new dict into the existing record:

```python
d['john'] = {'phone': '555-1234'}
# john now has name, address AND phone

d[['john', 'jane']] = {'verified': 'True'}
# both records get the new key
```

### Setting or generating an id

Promote an existing field to id, or generate sequential ids:

```python
d2 = d.set_id('name')                        # existing field → id
d2 = d.sort('name').set_id('id', generate='R')  # sort first, then generate R01, R02, …
d2 = d.set_id('id', generate='E', digits=5)  # E00001, E00002, …
```

Auto digits uses one more digit than needed (5 records → 2 digits, 150 → 4). Raises `ValueError` if the key already exists in records.

---

## Four core operations

All operations are **non-mutating** — they return a new `Dlist`, leaving the original untouched.

### `filter(**query)` — select rows

Returns a subset of records matching the query.

```python
d.filter(city='london')              # exact match
d.filter(name='a*')                  # glob pattern
d.filter(city='-paris')              # exclude value
d.filter(phone=True)                 # key must exist
d.filter(phone=False)                # key must NOT exist
d.filter(city='london', age='30')    # AND (all must match)
d.filter(tags='photo')               # list membership (if tags is a list)
```

Get the complement (matched + rest):

```python
matched, rest = d.filter(complement=True, city='london')
```

### `map(f, inputL, outputL)` — transform fields

Applies a function to each record, reading from `inputL` keys and writing to `outputL` keys.

```python
# Single output — return a bare value
d2 = d.map(lambda name: name.upper(), ['name'], ['upper_name'])

# Multiple outputs — return a tuple or list
d2 = d.map(lambda name, age: (name.upper(), int(age) + 1),
           ['name', 'age'], ['upper', 'next_age'])
```

Apply only to matching records (others are kept unchanged):

```python
d2 = d.map(lambda price: str(float(price) * 0.9),
           ['price'], ['price'],
           query={'product': 'laptop'})
```

If `f` returns `None`, the record is kept unchanged. If an input key is missing, the record is skipped.

### `assign(query={}, **kwargs)` — set constant values

Shorthand for `map` when you just want to set fixed values:

```python
d2 = d.assign(status='active')                          # all records
d2 = d.assign(query={'city': 'london'}, status='uk')    # only matching
```

### `partition(key)` — group by key

Splits into a dict of Dlists, one per unique value:

```python
groups = d.partition('city')
# {'london': Dlist(2 items), 'paris': Dlist(1 items)}
```

Records where the key is missing go to `None`.

---

## Arithmetic

```python
dl1 + dl2    # merge: combine records, dl1 values win on conflict
dl1 - dl2    # subtract: remove dl2 records from dl1
dl1 == dl2   # equality: same records and same id
```

With `id`, `+` merges dicts with matching ids and appends new ones. Without `id`, it compares by full dict equality.

---

## Sorting

`sort()` is the only in-place operation. It returns `self` for chaining:

```python
d.sort()                          # sort by id field
d.sort('name')                    # sort by any key
d.sort('price', reverse=True)     # descending
d.sort('address__city')           # nested key
d.sort(['type', 'name'])          # multi-key: primary by type, secondary by name
```

---

## Input/Output

### Reading JSON files

```python
d = Dlist.read('data.json', id_='id')          # from file
d = Dlist.read('[{"id": "1", "name": "a"}]')   # from JSON string
d = Dlist.read([{'id': '1', 'name': 'a'}])     # from list of dicts
```

Join multiple JSON files by a shared key (requires `duckdb`):

```python
d = Dlist.read_pivot('evidence/*.json')                     # glob pattern
d = Dlist.read_pivot({'files.json': 'file',
                      'types.json': 'type'}, pivot='id')    # explicit mapping
```

---

### Reading Excel files

Parse `.xlsx` workbooks with automatic header detection and subcategory recognition:

```python
d = Dlist.read('report.xlsx')                          # format auto-detected from .xlsx
d = Dlist.read('report.xlsx', header_row=3)            # explicit header row
d = Dlist.read('report.xlsx', sheets=[1, 2])           # specific sheets
d = Dlist.read('report.xlsx', header_row={1: 3, 2: 1}) # per-sheet header
```

Each record gets a `row` field (original row number). Multi-sheet workbooks add a `tab` field. Subcategory rows (column A empty, column B has a label) inject a `category` field into subsequent records.

```python
d = Dlist.read('report.xlsx', header_row=3, sheets=[1], id_='ID')
d.filter(category='Forensic*')    # filter by subcategory
d.partition('category')           # group by subcategory
```

The Excel formatter writes subcategory labels to column B, so `write` → `read` round-trips preserve categories.

See [docs/parsers.md](docs/parsers.md) for full details.

---

### Formatted output

Write tables in multiple formats — ASCII, Markdown, LaTeX, Excel, and JSON:

```python
print(d.write(format='ascii', keys=['id', 'name']))
print(d.write(format='md', keys=['id', 'name']))
print(d.write(format='latex', keys=['id', 'name']))
d.write('output.xlsx', keys=['id', 'name'])   # format auto-detected from .xlsx
d.write('data.json')                           # format auto-detected from .json
```

Categorized output groups records by one or more keys:

```python
print(d.write(format='ascii', ctree=['type'], keys=['id', 'name'],
              titles={'type': 'Type: {}', 'id': 'ID', 'name': 'Name'}))
```

All table formatters default to `width='auto'` (columns sized to content). Override with `width=10` for fixed-width or `maxwidth=20` to cap auto-width.

See [docs/formatters.md](docs/formatters.md) for full details on each format, styling options, multi-column layout, pagination, and Excel subcategory output.

---

## Quick reference

| Method | Description |
|--------|-------------|
| **Creation** | |
| `Dlist(data, id_='key')` | Create from list of dicts |
| `Dlist.read('file.json')` | Read from JSON file, string, or list |
| `Dlist.read('file.xlsx')` | Read from Excel (format auto-detected) |
| `Dlist.read_pivot('*.json')` | Join multiple JSON files by shared key |
| **Querying** | |
| `d.filter(**query)` | Select rows matching query → new Dlist |
| `d.filter(complement=True, ...)` | Select + return non-matching rows |
| `d.vals('key')` | Unique values for a key |
| `d.vals('key', count=True)` | Value frequencies as dict |
| `d.keys()` | All available keys |
| `d.tree()` | Visual structure overview |
| **Indexing** | |
| `d[0]`, `d['id_val']` | Get record by position or id |
| `d[[0, 1]]`, `d[['a', 'b']]` | Get subset → new Dlist |
| `d['id_val'] = {...}` | Merge dict into record |
| **Transforming** | |
| `d.map(f, inputs, outputs)` | Apply function to each record → new Dlist |
| `d.assign(**kwargs)` | Set constant values → new Dlist |
| `d.assign(query={...}, **kw)` | Set values on matching records only |
| `d.partition('key')` | Group by key → dict of Dlists |
| `d.set_id('key')` | Promote existing field to id |
| `d.set_id('id', generate='R')` | Generate sequential ids (R01, R02, …) |
| `d.sort('key')` | Sort in place (returns self) |
| `d.sort(['k1', 'k2'])` | Multi-key sort (primary, secondary, …) |
| **Arithmetic** | |
| `d1 + d2` | Merge two Dlists |
| `d1 - d2` | Subtract records |
| `d1 == d2` | Equality check |
| **Output** | |
| `d.write()` | JSON string |
| `d.write('file.json')` | Save to JSON file |
| `d.write('file.xlsx', keys=[...])` | Save to Excel (auto-detected) |
| `d.write(format='ascii', keys=[...])` | ASCII table string |
| `d.write(format='md', keys=[...])` | Markdown table string |
| `d.write(format='latex', keys=[...])` | LaTeX table string |
| `d.write(..., ctree=['key'])` | Categorized output |

---

## Project structure

```
dlist/
├── src/dlist/
│   ├── __init__.py      # Package entry point
│   ├── dlist.py         # Dlist class (core operations)
│   ├── helpers.py       # flattenDict, structDict, merge_dicts, getDitem
│   ├── formatters/      # ASCII, Markdown, LaTeX, Excel, JSON output
│   └── parsers/         # JSON, Excel input, pivot joins
├── tests/
│   ├── test_core.py         # core operations + list fields
│   ├── test_formatters.py   # ASCII, Markdown, LaTeX, Excel
│   └── test_excel_parser.py # Excel parser
├── docs/
│   ├── formatters.md    # formatter reference
│   └── parsers.md       # parser reference (Excel details)
├── example_input.xlsx   # sample Excel file for parser demo
└── pyproject.toml
```

## License

MIT
