Metadata-Version: 2.4
Name: vscode-ark
Version: 2.0.0
Summary: Comprehensive analysis system for VS Code/Copilot Chat sessions with behavioral signal extraction and heat scoring
Home-page: https://github.com/goCosmix/vscode-ark
Author: Ernie Butcher
Author-email: Ernie Butcher <ernie@fiosii.com>
Maintainer-email: Ernie Butcher <ernie@fiosii.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/goCosmix/vscode-ark
Project-URL: Repository, https://github.com/goCosmix/vscode-ark.git
Project-URL: Issues, https://github.com/goCosmix/vscode-ark/issues
Project-URL: Documentation, https://github.com/goCosmix/vscode-ark#readme
Project-URL: Changelog, https://github.com/goCosmix/vscode-ark/blob/main/CHANGELOG.md
Keywords: vscode,copilot,chat,analysis,behavioral,signals,heat-score,ai,conversation
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Logging
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: watchfiles>=0.20
Requires-Dist: click>=8.0
Requires-Dist: sentence-transformers>=2.2.2
Requires-Dist: numpy>=1.26
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: isort; extra == "dev"
Requires-Dist: flake8; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# VS Code Ark

[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/vscode-ark.svg)](https://pypi.org/project/vscode-ark)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A complete analysis system for VS Code + Copilot Chat sessions that turns raw editor activity into behavioral signals, semantic intelligence, and a local web dashboard.

## ✨ Key Benefits

- **Behavioral signal intelligence** for Copilot Chat sessions.
- **Heat scoring** to surface friction, recovery points, and session quality.
- **Semantic search** across session transcripts, code symbols, and tool calls.
- **Background web UI** with structured panels, alerts, and session drilldown.
- **Live watcher daemon** to keep session analytics current.
- **Exportable data** for JSON, JSONL, and text workflows.

## 📋 Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [Web UI](#web-ui)
- [CLI Reference](#cli-reference)
- [Architecture](#architecture)
- [Roadmap](#roadmap)
- [Configuration](#configuration)
- [Development](#development)
- [Contributing](#contributing)
- [License](#license)

## 🚀 Installation

### Prerequisites

- Python 3.8+
- VS Code with the Copilot Chat extension installed

### Install from PyPI

```bash
pip install vscode-ark
```

### Install with pipx

```bash
pipx install vscode-ark
```

### Install from source

```bash
git clone https://github.com/goCosmix/vscode-ark.git
cd vscode-ark
pip install -e .
```

### Install development dependencies

```bash
pip install -e ".[dev]"
# or
make install-dev
```

> The `cda` console command is installed into your active Python environment's `bin` directory. Activate your virtual environment before running `cda`.

## ⚡ Quick Start

1. **Initialize the database**

```bash
cda sync
```

2. **Start the watcher daemon**

```bash
cda watch start
```

3. **Inspect the PMF runtime services**

```bash
cda pmf services
```

4. **Build semantic intelligence**

```bash
cda embed build
```

4. **Start the web UI**

```bash
cda ui start
```

5. **Open your browser**

Visit `http://127.0.0.1:10001`

## 🌐 Web UI

- **Background service**: `cda ui start`
- **Stop service**: `cda ui stop`
- **Service status**: `cda ui status`
- **Foreground mode**: `cda serve`

The web UI includes:

- Session drilldown panels and charts
- Behavioral signal summaries
- Alert and recommendation views
- Searchable transcript and tool-call detail
- File/VFS browsing and raw session inspection

## 🧠 Core Features

- Behavioral signals with 200+ keyword patterns across six categories
- Frustration heat scoring and recovery analytics
- Full-text search and semantic search with embeddings
- Code symbol indexing for Python/JS/TS
- Incremental ingestion with crash-resilient queue replay
- Export workflows for JSON, JSONL, and text

## 📦 Package and Release

- Published on PyPI as `vscode-ark`
- Current release version: `0.1.2`
- CLI entry point: `cda`
- License: MIT

## 🛣 Roadmap

See `docs/ROADMAP.md` for product direction, milestone planning, and release priorities.

## 🤝 Contributing

See `CONTRIBUTING.md` for development setup, test guidance, and PR workflow.

## 📜 License

This project is licensed under the MIT License.

## 🧠 SQLite limits and mitigation

- **Single writer in WAL mode**: the system uses one writer process for ingest/reconstruct/extract/embed and allows many concurrent readers via SQLite WAL.
- **Large VFS blob handling**: for very large raw artifacts, the clean approach is chunked storage or external file references instead of a single enormous BLOB.
- **Default 8KB page size / cache**: this code now sets `PRAGMA cache_size=-2000`, `PRAGMA mmap_size=268435456`, and `PRAGMA temp_store=MEMORY` to improve read/cache performance on larger databases.
- **Further tuning**: rebuild the DB with a larger page size (e.g. `PRAGMA page_size=32768`) if you need more efficient storage for very large session history.

## 🔧 Configuration

- **VS Code Data Directory**: By default, assumes macOS paths (`~/Library/Application Support/Code/User`). Override with `export VSCODE_DATA_DIR=/path/to/vscode/data` (e.g., on Linux: `~/.config/Code/User`).
- **No other config needed**: Everything is CLI-driven with local SQLite.

## 🏗️ Architecture

```
VS Code Storage → ingest.py → vfs + sessions + transcripts
                      ↓
               reconstruct.py → exchanges (structured conversations)
                      ↓
               extract.py → signals + tokens + heat scores + analysis
                      ↓
               embed.py → semantic embeddings + summaries + alerts
                      ↓
               watcher.py → live sync + FTS indexing + queue resilience
                      ↓
               cda → query interface + policy enforcement
```

### Core Components

| Component | Purpose | Key Features |
|-----------|---------|--------------|
| **ingest.py** | Data ingestion | VFS storage, gzip compression, session metadata |
| **reconstruct.py** | Conversation processing | Exchange threading, tool call linking, FTS indexing |
| **extract.py** | Signal analysis | Behavioral pattern recognition, heat scoring, token accounting |
| **watcher.py** | Live monitoring | File watching, incremental updates, crash recovery |
| **cda** | Query interface | 25+ commands, policy filtering, rich formatting |

### Database Schema

- **workspaces** - VS Code workspace metadata
- **sessions** - Chat session information and metadata
- **vfs** - Gzip-compressed file storage with SHA256 hashes
- **exchanges** - Structured conversation turns with tool calls
- **exchange_signals** - Behavioral signal annotations
- **symbols** - Code symbol index (functions, classes, etc.)
- **token_usage** - Per-request token consumption tracking
- **compactions** - Context window summarization events
- **session_analysis** - Aggregated session metrics and heat scores

## 🖥️ CLI Reference

### Core Commands

```bash
# System Management
cda status              # Show daemon status and queue information
cda stats               # System-wide statistics and coverage
cda sync                # Full data ingestion and rebuild
cda reconstruct         # Rebuild conversations and search index
cda pmf services        # List embedded PMF runtime services
cda pmf status [service] # Show runtime status for PMF services
cda pmf start <service>  # Start a PMF-managed Ark service
cda pmf stop <service>   # Stop a PMF-managed Ark service
cda pmf restart <service> # Restart a PMF-managed Ark service
cda pmf logs <service>   # Tail runtime logs for a PMF service

# Session Analysis
cda sessions            # List all sessions (newest first)
cda session <id>        # Show detailed session information
cda workspace <id>      # Show sessions for a workspace
cda workspaces          # List all workspaces

# Search & Query
cda search <query>      # Full-text search across conversations
cda code-search <pattern> [--symbol] [--regex]  # Search code symbols or code content
cda semantic-search <query> # Semantic search using embeddings
cda similar <session>     # Find sessions similar to a session
cda related <session>     # Alias for semantic related sessions
cda summarize <session>   # Show session summary, topics, and recommendations
cda topics                # List semantic topic tags
cda alerts <session>      # Show semantic anomaly alerts
cda recommend <session>   # Show session recommendations
cda tools <query>       # Search tool call arguments
cda memory              # Show memory files and global state

# Behavioral Analysis
cda signals [session]   # Show behavioral signals
cda heat [session]      # Frustration and heat analysis
cda behavior            # Aggregate behavioral intelligence
cda saved               # Sessions that recovered from high heat

# Data Export
cda export <session>    # Export session as JSON/JSONL/text
cda replay <session>    # Print conversation as readable text

# Advanced
cda query <sql>         # Execute raw SQL queries
cda tokens [session]    # Token usage analysis
cda compactions [session] # Context compaction events
cda edits               # Edit session analytics

# Policy Management
cda policy allow <pattern>   # Add allow pattern
cda policy deny <pattern>    # Add deny pattern
cda policy list              # Show current policies

# Live Monitoring
cda watch start             # Start watcher daemon
cda watch stop              # Stop watcher daemon
cda watch restart           # Restart watcher daemon
cda ui start                # Start web UI background service
cda ui stop                 # Stop web UI background service
cda ui status               # Show web UI background service status
```

### Command Examples

```bash
# Search for error handling discussions
cda search "error handling" --limit 20

# Find sessions with high frustration
cda heat --limit 10

# Search for specific functions in code
cda code-search "def process_data" --symbol

# Search code content with regex or plain text
cda code-search "timeout" --regex

# Find semantically related sessions
cda related abc123

# Summarize a session with semantic topics and recommendations
cda summarize abc123

# Export a session for external analysis
cda export abc123 --format jsonl --output session.jsonl

# Monitor live sessions
cda watch start
cda status  # Check queue status
```

## 📊 Data Analysis

### Behavioral Signals

The system recognizes 6 signal types with 200+ keyword patterns:

| Signal Type | Weight | Description | Example Keywords |
|-------------|--------|-------------|------------------|
| **correction** | 3 | User correcting agent behavior | "stop", "wrong", "nope", "wait" |
| **pre_correction** | 2 | Early frustration signs | "actually", "hold on", "slow down" |
| **redirect** | 1 | User changing direction | "pivot", "change direction", "instead" |
| **affirmation** | 0 | Positive feedback | "good", "right", "perfect", "thanks" |
| **approval** | 0 | Task completion approval | "that works", "looks good", "approved" |
| **frustration** | 5 | Strong negative signals | "this is broken", "not working", "terrible" |

### Heat Score Algorithm

```
Heat Score = min(100, Σ(signal_weights))
```

- **Peak Heat**: Maximum heat reached in session
- **Final Heat**: Heat at session end
- **Recovery**: Sessions that return to low heat after high peaks
- **Saved Sessions**: High-heat sessions that recover with affirmations

### Token Usage Tracking

- Per-request token consumption (prompt + completion)
- Model identification and version tracking
- Context compaction event logging
- Cost estimation capabilities

## ⚙️ Configuration

### Automatic Detection

VS Code Ark automatically detects paths using standard locations:

- **macOS**: `~/Library/Application Support/Code/User/`
- **Windows**: `%APPDATA%\Code\User\`
- **Linux**: `~/.config/Code/User/`

### Environment Variables

```bash
export VSCODE_ARK_DB=/path/to/custom.db    # Custom database location
export VSCODE_ARK_CONFIG=/path/to/config   # Custom config directory
```

### Policy Configuration

Data access policies are stored in `policy.txt`:

```
ALLOW important-project
DENY sensitive-data
ALLOW *.py
```

## 🔧 Development

### Setup Development Environment

```bash
make install-dev
```

### Running Tests

```bash
make test              # Run test suite
make test-cov          # Run with coverage report
```

### Code Quality

```bash
make lint              # Run flake8 and mypy
make format            # Format with black and isort
```

### Building

```bash
make build             # Build distribution packages
make publish           # Publish to PyPI (requires credentials)
```

### Project Structure

```
vscode-ark/
├── vscode_ark/           # Main package
│   ├── __init__.py
│   └── cli.py           # Command-line interface
├── scripts/             # Utility scripts
│   ├── ingest.py        # Data ingestion
│   ├── reconstruct.py   # Conversation processing
│   ├── extract.py       # Signal analysis
│   └── watcher.py       # Live monitoring
├── tests/               # Test suite
├── docs/                # Documentation
├── pyproject.toml       # Package configuration
├── setup.py            # Legacy setup
├── Makefile            # Development tasks
└── README.md           # This file
```

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/amazing-feature`
3. Make your changes and add tests
4. Run the test suite: `make test`
5. Format code: `make format`
6. Commit your changes: `git commit -m 'Add amazing feature'`
7. Push to the branch: `git push origin feature/amazing-feature`
8. Open a Pull Request

### Development Guidelines

- **Type Hints**: All functions should have type annotations
- **Docstrings**: Comprehensive docstrings for public APIs
- **Tests**: Unit tests for all new functionality
- **Linting**: Code must pass flake8 and mypy checks
- **Formatting**: Code must be formatted with black and isort

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Built for analyzing VS Code/Copilot Chat interaction patterns
- Inspired by the need for better human-AI interaction insights
- Uses SQLite FTS5 for high-performance full-text search
- Implements behavioral signal processing for conversation analysis

---

**VS Code Ark** - Understanding the human side of AI conversations.
