Metadata-Version: 2.4
Name: bag-nl
Version: 0.3.0
Summary: BAG tools
Author-email: Hans Gremmen <hans@leptonix.net>
License: MIT
Requires-Python: >=3.11
Requires-Dist: corylus>=0.1.3
Requires-Dist: dpath>=2.1.6
Requires-Dist: sqlalchemy>=2
Requires-Dist: tqdm>=4.66.0
Provides-Extra: test
Requires-Dist: pytest>=8; extra == 'test'
Description-Content-Type: text/markdown

# bag-nl

**bag-nl** is a small toolkit for importing **Dutch BAG (Basisregistratie Adressen en Gebouwen)** extracts into a **SQLite** (or other SQLAlchemy-compatible) database.

It focuses on a fast, one-way ETL process: read BAG ZIP archives, normalize a subset of the data, and write it into a relational schema that is easier to query from Python or SQL.

## Features

- **Command-line interface**: `bag-nl import ...` for one-shot imports.
- **SQLite support out of the box**; any SQLAlchemy-supported database can be used via a DSN.
- Imports the following **BAG object types** into normalized tables:

  - **Nummeraanduidingen** (`bag_num`)
  - **Openbare ruimte** (`bag_opr`)
  - **Panden** (`bag_pnd`)
  - **Verblijfsobjecten** (`bag_vbo`)
  - **Adressen met status** (`bag_sta`)
  - **Ligplaatsen** (`bag_lig`)
  - **Woonplaatsen** (`bag_wpl`)
  - **Woonplaats–gemeente relaties** (`bag_wpl_gem`)
  - **Relaties verblijfsobject–pand** (`bag_vbo_pnd`)
  - **Gebruiksdoelen van verblijfsobjecten** (`bag_vbo_purpose`)
  - A shared **text code table** (`bag_text`) containing status and purpose codes

- Calculates simple **tile codes** from geometry to speed up spatial prefix queries.
- Uses **SQLAlchemy ORM** so you can query the resulting database directly from Python.

## Installation

This package requires **Python 3.12** or later.

Install from source:

```bash
git clone https://git.leptonix.net/bag-nl.git
cd bag-nl
pip install .
```

Or install using `pip` directly from the remote if your environment allows:

```bash
pip install git+https://git.leptonix.net/bag-nl.git
```

The installer pulls in the runtime dependencies:

- `corylus` (XML parsing and time utilities)
- `dpath` (deep dictionary access)
- `SQLAlchemy >= 2`
- `tqdm`

## Getting BAG Data

`bag-nl` expects **BAG extract ZIP archives** in the current working directory when you run the import.

Roughly:

- For most collections, it looks for files matching:

  - `9999*<TYPE>*`

  where `<TYPE>` is the uppercase prefix of the collection name (for example, `VBO`, `PND`, `NUM`, `OPR`, `STA`, `LIG`, `WPL`).

- For the `wpl_gem` collection, it looks for files matching:

  - `GEM-WPL*`

The exact filenames and formats follow the official BAG delivery conventions. If the expected files are not present, the importer will fail with an error.

Obtain the official BAG extracts from the Dutch Kadaster or your data provider, then place the relevant ZIP files in a directory and run `bag-nl` from there.

## Usage

### Command-line Interface

The installed console script is `bag-nl`.

Basic usage:

```bash
bag-nl import /path/to/bag.db
```

This will:

1. Create (or open) the SQLite database at `/path/to/bag.db`.
2. Create all necessary tables if they do not exist.
3. Import all default collections.

### Database Argument

The `db` parameter can be:

- A **SQLite file path**, for example:

  ```bash
  bag-nl import data/bag.sqlite
  ```

- A full **SQLAlchemy DSN** for another database backend, for example:

  ```bash
  bag-nl import "postgresql+psycopg2://user:pass@host/dbname"
  ```

When you pass a plain path without `://`, it is treated as a SQLite file.

### Selecting Collections

By default, the following collections are imported:

- `text`
- `wpl`
- `wpl_gem`
- `opr`
- `num`
- `vbo`
- `vbo_purpose`
- `pnd`
- `vbo_pnd`
- `sta`
- `lig`

You can restrict the import using the `--collections` option:

```bash
# Import only numbers and public spaces
bag-nl import bag.db --collections num opr

# Comma-separated syntax also works
bag-nl import bag.db --collections vbo,num,pnd
```

Both forms can be mixed; internally, the option is normalized to a list of collection names.

### Test Mode

Use `--test` to limit how many rows are imported per collection.

- Without a value: imports a small sample (currently **10 rows** per collection):

  ```bash
  bag-nl import bag.db --test
  ```

- With an integer: imports at most **N rows** per collection:

  ```bash
  bag-nl import bag.db --test 1000
  ```

This is useful for quickly validating that the pipeline works on your environment.

### Verbose Error Output

Add `--verbose` to see more detailed exception information when the import fails:

```bash
bag-nl import bag.db --collections vbo --verbose
```

Without `--verbose`, only the exception message is printed; with `--verbose`, the full Python representation (including type and traceback location) is shown.

## Database Schema Overview

All tables live in a single schema, with simple integer primary keys and some indexes for common queries.

A brief overview:

- `bag_text`  
  Simple mapping between **text codes** (status, usage types, etc.) and integer IDs used throughout the schema.

- `bag_num`  
  BAG **address numbers** (nummeraanduidingen). Fields include:

  - `id` (primary key)
  - `postcode`
  - `number`
  - `extra`
  - `letter`
  - `type`
  - `opr` (linked to `bag_opr.id`)
  - `date` (julian day of validity start)

- `bag_opr`  
  **Public spaces** (openbare ruimten):

  - `id`, `name`, `type`, `status`, `wpl`, `date`

- `bag_pnd`  
  **Buildings** (panden):

  - `id`, `geo`, `year`, `status`, `date`

- `bag_vbo`  
  **Verblijfsobjecten** (occupiable units):

  - `id`, `num` (main address), `geo`, `area`, `tile`, `date`

- `bag_sta` / `bag_lig`  
  Addresses with **status**, for addresses on land and water:

  - `id`, `num`, `status`, `geo`, `tile`, `date`

- `bag_vbo_pnd`  
  Relationship between **verblijfsobjecten** and **panden**:

  - composite key `(vbo, pnd)`

- `bag_vbo_purpose`  
  Relationship between **verblijfsobjecten** and **usage purposes**:

  - composite key `(vbo, purpose)`, where `purpose` references `bag_text.id`.

- `bag_wpl`  
  **Woonplaatsen** (localities):

  - `id`, `name`, `geo`, `tile`, `date`

- `bag_wpl_gem`  
  Relationship between **woonplaatsen** and **gemeenten**:

  - composite key `(wpl, gem)`

Geometry fields (`geo`) are stored as text using an internal, compact representation derived from the BAG XML geometries. Tile fields (`tile`) are short string prefixes representing location buckets, intended to speed up prefix filtering.

## Programmatic Use

You can also import and use the components directly from Python:

```python
from bag_nl.importer import importer
from bag_nl.models import BagNum, BagVbo
from sqlalchemy.orm import Session

# Run an import
importer(db="bag.sqlite", collections=["num", "vbo"])

# Query results
from bag_nl.db import get_session_factory

SessionFactory = get_session_factory("bag.sqlite")
with SessionFactory() as session:
    some_address = session.query(BagNum).filter_by(postcode="1234AB", number=1).first()
    print(some_address)
```

This flows through the same schema and importer as the CLI.

## Limitations and Caveats

- The tool assumes **specific BAG XML structures** and **file naming conventions**. Changes in upstream BAG release formats may require code adjustments.
- There is **no upsert logic**; running the importer twice into the same database can produce integrity errors from conflicting primary keys.
- Geometry is stored as **unnamed, internal textual encodings**; there is no GeoJSON or WKT export built in.
- Some parts of the parsing (for example, status codes and usage purposes) rely on a fixed internal table of strings. Unknown strings will cause failures.

Use this tool as an ETL helper for your own BAG workflows, and be prepared to adapt the importer or models as Kadaster changes their formats.

## License

This project is licensed under the **MIT License**. See the `LICENSE` file for details.

