Metadata-Version: 2.3
Name: datanomy
Version: 0.3.1
Summary: Explore the anatomy of your columnar data files (Parquet, Arrow, and more)
Author: Raúl Cumplido
Author-email: Raúl Cumplido <raulcumplido@gmail.com>
Requires-Dist: textual>=0.90.0
Requires-Dist: pyarrow>=24.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: click>=8.1.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pytest>=8.0.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0 ; extra == 'dev'
Requires-Dist: ruff>=0.8.0 ; extra == 'dev'
Requires-Dist: mypy>=1.0.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.2.0 ; extra == 'dev'
Requires-Python: >=3.10
Project-URL: Homepage, https://github.com/raulcd/datanomy
Project-URL: Repository, https://github.com/raulcd/datanomy
Project-URL: Issues, https://github.com/raulcd/datanomy/issues
Provides-Extra: dev
Description-Content-Type: text/markdown

# Datanomy

> Explore the anatomy of your columnar data files

**Datanomy** is a terminal-based tool for inspecting and understanding data files.
It provides an interactive view of your data's structure, metadata, and internal organization.

## Supported formats

- **Parquet** (`.parquet`, `.parq`)
- **Arrow IPC** (`.arrow`, `.feather`, `.ipc`)

## Features for Parquet view

### General Structure

![General Structure](https://github.com/user-attachments/assets/eee4ea85-e5c8-4661-a2e2-0321b26076f1)

### Schema

![Schema](https://github.com/user-attachments/assets/e66087ce-f8b4-439d-b7fe-78da5f5d8a48)

### Data

![Data](https://github.com/user-attachments/assets/cbe278af-0240-4ded-9b0e-704ddb489e71)

### Metadata

![Metadata](https://github.com/user-attachments/assets/a71cf396-8c00-40e2-94de-da38ce4af745)

### Stats

![Stats](https://github.com/user-attachments/assets/f437a6a8-be71-413b-b15f-10b4376df981)

## Features for Arrow IPC view

### Structure

File-level layout showing header, record batches, and footer.

### Schema

Arrow schema with per-column type and nullability details.

### Data

Preview of the first 50 rows.

### Metadata

File and schema-level metadata.

### Buffers

Physical buffer layout for each column — validity bitmap bits (color-coded valid/null), hex preview of values, offsets, and data buffers. For nested types (list, struct, map, dictionary) child array buffers are shown recursively.

## Installation

```bash
# From PyPI
uv tool install datanomy
## with pip
pip install datanomy

# From source
uv tool install "datanomy @ git+https://github.com/raulcd/datanomy.git"
## cloning the repo 
git clone https://github.com/raulcd/datanomy.git
cd datanomy
uv sync
```

## Usage

```bash
# Run without installing using uvx
uvx datanomy data.parquet

# Inspect a Parquet file
datanomy data.parquet

# Inspect an Arrow IPC file
datanomy data.arrow
```

You can also use from source using uvx. This uses the development version:

```bash
uvx "git+https://github.com/raulcd/datanomy.git" data.parquet
uvx "git+https://github.com/raulcd/datanomy.git" data.arrow
```

## Keyboard Shortcuts

- `q` - Quit the application

## Development

```bash
# Install dependencies
uv sync

# Run from source
uv run datanomy path/to/file.parquet
uv run datanomy path/to/file.arrow
```

```bash
# Install dev dependencies
uv sync --extra dev

# Run tests
uv run pytest

# Format code
uv run ruff format .

# Lint
uv run ruff check .

# Lint
uv run mypy .
```

## License

Apache License 2.0

## Contributing

Contributions welcome! Please open an issue or PR.

---

Built with [Textual](https://textual.textualize.io/) and [PyArrow](https://arrow.apache.org/docs/python/)
