Metadata-Version: 2.1
Name: cobblequery
Version: 1.1.0
Summary: CLI tool for doing data joining
Home-page: https://github.com/jdotpy/cobble
Author: KJ
Author-email: jdotpy@users.noreply.github.com
License: UNKNOWN
Download-URL: https://github.com/jdotpy/cobble/tarball/master
Keywords: tools
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Description-Content-Type: text/markdown
Provides-Extra: dev

# Cobble

A CLI tool for ad-hoc data joining, filtering, and analysis. Pipe in CSV, JSON, or plain text and query it with a shell-friendly syntax.

## Install

```bash
pip install cobblequery
```

## Quick Start

```bash
# Pipe CSV and filter
cat users.csv | cobble '| search role=admin | select name,email'

# Read a file directly (skip stdin)
cobble -s '| from data.csv | head 5'

# Chain JSON lines from another tool
kubectl get pods -o json | cobble '| select name,status'
```

## CLI Usage

```
usage: cobble [-h] [--dry-run] [-i] [-s] [-q QUERY_FILE] [query]

positional arguments:
  query                 Query to run

optional arguments:
  -h, --help            show this help message and exit
  --dry-run             Parse and validate query without running
  -i, --interactive     Do an interactive edit of the query
  -s, --no-stdin        Dont insert the stdin generator
  -q QUERY_FILE, --query-file QUERY_FILE
                        Read query text from a file
```

Queries are pipe-delimited chains of commands:

```
| command1 args | command2 args | ...
```

## Examples

### Filtering rows

```bash
# Exact match
cat servers.csv | cobble '| search env=prod'

# Numeric comparison
cat orders.csv | cobble '| search total>100'

# Multiple conditions (AND)
cat orders.csv | cobble '| search status=shipped,total>=50'

# Regex match
cat logs.csv | cobble '| search path~=/api/v[23]'

# Not equal
cat users.csv | cobble '| search role!=guest'

# Python expression for complex logic
cat data.csv | cobble '| py "age > 21 and status == \"active\""'
```

### Selecting and renaming fields

```bash
# Pick specific fields
cat users.csv | cobble '| select name,email,role'

# Rename a field
cat data.csv | cobble '| rename new_name=old_name'
```

### Sorting

```bash
# Sort ascending
cat scores.csv | cobble '| sort name'

# Sort descending
cat scores.csv | cobble '| sort -score'

# Multi-field sort
cat employees.csv | cobble '| sort department,-salary'
```

### Limiting results

```bash
# First 10 rows
cat huge.csv | cobble '| head 10'

# Rows 5 through 15
cat data.csv | cobble '| slice 5,15'
```

### Aggregation

```bash
# Count and sum with auto-named output fields (field_operation)
cat sales.csv | cobble '| agg by=region sum(revenue) count(id)'
# Output: {"region": "west", "revenue_sum": 48000, "id_count": 12}

# Named output fields
cat sales.csv | cobble '| agg by=region, total=sum(revenue), n=count(id)'
# Output: {"region": "west", "total": 48000, "n": 12}

# No group-by (aggregate everything)
cat sales.csv | cobble '| agg total=sum(revenue)'

# Multiple group-by fields
cat data.csv | cobble '| agg by=year,quarter avg(revenue) min(cost) max(cost)'
```

Available aggregation functions: `sum`, `count`, `avg`, `min`, `max`, `first`, `last`, `dc` (distinct count), `values`, `unique_values`.

### Joining datasets

```bash
# Join users with their departments (left join, first match)
cobble -s '| from users.csv | join dept_id [ | from departments.csv ] | select name,dept_name'

# Join with different key names (source_key:target_key)
cobble -s '| from orders.csv | join customer_id:id [ | from customers.csv ] | select order_id,name'

# Inner join (only matching rows)
cobble -s '| from orders.csv | join product_id, type=inner [ | from products.csv ]'

# Outer join (all rows from both sides)
cobble -s '| from left.csv | join id, type=outer [ | from right.csv ]'

# Expand join (one output row per match, like SQL)
cobble -s '| from students.csv | join class_id, target=expand [ | from enrollments.csv ]'

# Join with field selection
cobble -s '| from users.csv | join team_id [ | from teams.csv ] | select name,team_name'
```

Join types: `left` (default), `inner`, `outer`.
Join targets: `first` (default), `last`, `expand`, `agg`, `agg_str`.

### Computed fields

```bash
# Add a new field (fields are available by name directly)
cat products.csv | cobble '| set margin="price - cost"'

# String manipulation
cat users.csv | cobble '| set domain="email.split(\"@\")[1]"'

# The value. prefix also works for dot-access on nested data
cat data.json | cobble '| set full="value.first + \" \" + value.last"'
```

### Combining datasets

```bash
# Append rows from another file
cobble -s '| from jan.csv | append { | from feb.csv }'
```

### Generating data

```bash
# Generate a numbered sequence
cobble -s '| range end=100 | set squared="i ** 2"'
```

### Real-world examples

```bash
# Find top 5 customers by total spend
cobble -s '| from orders.csv | agg by=customer_id sum(total) count(total) | sort -total_sum | head 5'

# Join server metrics with inventory, filter to production
cobble -s '| from metrics.csv
    | join hostname [ | from inventory.csv ]
    | search env=prod
    | sort -cpu
    | select hostname,cpu,memory,team'

# Aggregate log counts by status code, show top errors
cat access.log.csv | cobble '| search status>=400 | agg by=status count(path) | sort -path_count'

# Compare two CSVs - find entries only in the second file
cobble -s '| from new.csv | join id, type=inner [ | from old.csv ] | select id,name'
```

### Quoting rules

- Values with spaces or special characters should be quoted: `"my value"`
- Expressions containing operators should be quoted: `"Total Sales>100"`
- Escape quotes inside quoted strings with backslash: `"say \"hello\""`

## Command Reference

| Command | Aliases | Description |
|---------|---------|-------------|
| `from` | | Read from file or stdin (`-`) |
| `search` | `s` | Filter rows by field matching |
| `py` | `where` | Filter rows with Python expressions |
| `select` | `table` | Pick specific fields |
| `set` | `eval` | Add/modify fields with Python expressions |
| `sort` | | Sort by fields (`-field` for descending) |
| `agg` | `aggregate`, `stats` | Group and aggregate |
| `join` | | Join with another dataset |
| `slice` | `head`, `limit` | Limit row range |
| `rename` | | Rename fields |
| `unpack` | | Expand list fields into rows |
| `append` | | Concatenate another pipeline |
| `range` | | Generate numbered rows |

## Search Operators

| Operator | Example | Description |
|----------|---------|-------------|
| `=` | `field=value` | Exact match (numeric-aware) |
| `!=` | `field!=value` | Not equal |
| `>` | `field>10` | Greater than |
| `>=` | `field>=10` | Greater or equal |
| `<` | `field<10` | Less than |
| `<=` | `field<=10` | Less or equal |
| `~=` | `field~=pat.*` | Regex match |

Numeric comparisons handle type coercion transparently -- string `"25"` from CSV is correctly compared as a number.


