Metadata-Version: 2.4
Name: napsack
Version: 0.1.0
Summary: NAPsack records and aggregates your computer use — screenshots plus input events (click, keypress, scroll, cursor move). It groups activity into event bursts and uses a VLM pipeline to generate human-readable captions describing what happened.
Requires-Python: <=3.13,>=3.11
Description-Content-Type: text/markdown
Requires-Dist: mss<11.0.0,>=10.0.0
Requires-Dist: numpy==2.2
Requires-Dist: opencv-python<5.0.0.0,>=4.11.0.86
Requires-Dist: pandas<3.0.0,>=2.3.0
Requires-Dist: matplotlib<4.0.0,>=3.10.3
Requires-Dist: scikit-learn<2.0.0,>=1.7.0
Requires-Dist: ruptures<2.0.0,>=1.1.9
Requires-Dist: ipdb<0.14.0,>=0.13.13
Requires-Dist: plotly<7.0.0,>=6.2.0
Requires-Dist: nbformat<6.0.0,>=5.10.4
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: google-generativeai>=0.8.5
Requires-Dist: datasets<4.0.0,>=3.0.0
Requires-Dist: pillow>=11.3.0
Requires-Dist: imageio>=2.37.0
Requires-Dist: screeninfo>=0.8.1
Requires-Dist: pynput>=1.8.1
Requires-Dist: scikit-image>=0.25.2
Requires-Dist: google-genai>=1.45.0
Requires-Dist: openai>=2.8.1
Requires-Dist: google-cloud-storage>=3.6.0
Requires-Dist: google-cloud-bigquery>=3.38.0

# NAPsack

**NAPsack** records and structures your computer use by generating natural language caption from screenshots and input events (click, keypress, scroll, cursor move).

<img alt="napsack_overview" src="https://github.com/user-attachments/assets/dd9ca2c5-288c-4977-8dc9-10ca343e56db" />

---
# Quickstart

> Requires Python 3.11+ and `ffmpeg` for video generation. Use `uv` to run the commands below.

## API Keys

NAPsack uses a VLM to generate captions. Create a `.env` file in the project root (or export variables in your shell):

```shell
cp .env.example .env
```

Then fill in the key for your chosen client:

| Client | Variable | Where to get it |
|--------|----------|-----------------|
| `gemini` (default) | `GEMINI_API_KEY` | [Google AI Studio](https://aistudio.google.com/apikey) |
| `vllm` | _(none — pass `--vllm-url`)_ | Self-hosted vLLM server |
| `bigquery` | _(uses Application Default Credentials)_ | `gcloud auth application-default login` |

For Gemini, your `.env` should contain:

```
GEMINI_API_KEY=your_key_here
```

**Record** a session (press CTRL+C to stop)
```shell
uv run -m record --monitor
```
**Label** the recorded session
```shell
uv run -m label --session logs/session_name --client gemini
```

> NAPsack supports `gemini` and `vllm` for data labeling and integrates with `big query`

# Output

```shell
logs/session_name
├── screenshots         # Recorded screenshots
├── aggregations.jsonl  # Recorded event bursts
├── captions.jsonl	    # All VLM-generated captions
├── annotated.mp4       # Final video showing generated captions and input events
└── data.jsonl          # Final data containing raw input events and LLM generated captions
```

# Method

## Record

NAPsack groups temporally adjacent input events of the same type into **event bursts**. An event is assigned to the current burst if the time since the preceding event of that type does not exceed the corresponding **gap** threshold and the elapsed time since the burst start remains within the **max** duration.
* If the **gap** threshold is exceeded, a new burst is started.
* If the **max** duration is exceeded, the first half of the current burst is finalized and saved, while the second half becomes the active burst.
A burst is force-restarted when the active monitor changes.

## Label

The `label` module:

* Loads sessions or raw screenshots and chunks.
* Uses prompts (in `label/prompts`) to instruct the VLM to generate captions that describe the user's actions and context.
* Produces `captions.jsonl` and `data.jsonl` (captions aligned to screenshots and events).
* Optionally renders an annotated video (`annotated.mp4`) showing captions and event visualizations overlayed on frames.

The label step performs a second layer of aggregation: it uses the bursts detected at recording time and further refines and annotates them with VLM outputs to create final human-readable summaries.


