Metadata-Version: 2.4
Name: diskcomp
Version: 1.0.0
Summary: Compare two drives and find duplicate files. Zero dependencies, cross-platform, with undo.
Project-URL: Repository, https://github.com/w1lkns/diskcomp
Project-URL: Issues, https://github.com/w1lkns/diskcomp/issues
Author-email: Wilkins Morales <wilkinscom@gmail.com>
License: MIT
License-File: LICENSE
Keywords: cli,cross-platform,deduplication,drive-comparison,file-management
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Filesystems
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Provides-Extra: rich
Requires-Dist: rich>=13.0.0; extra == 'rich'
Description-Content-Type: text/markdown

# diskcomp

[![CI](https://github.com/w1lkns/diskcomp/workflows/CI/badge.svg)](https://github.com/w1lkns/diskcomp/actions)
[![PyPI version](https://img.shields.io/pypi/v/diskcomp.svg)](https://pypi.org/project/diskcomp/)

Find and safely delete duplicate files — across two drives or within one. Zero dependencies, cross-platform, with undo.

## Quick Install

**Download binary** (no Python required):

**macOS:**
```bash
# Homebrew
brew tap w1lkns/diskcomp
brew install diskcomp

# Or download directly
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-macos
chmod +x diskcomp
./diskcomp --help
```

**Linux:**
```bash
# Download directly  
curl -L -o diskcomp https://github.com/w1lkns/diskcomp/releases/latest/download/diskcomp-linux
chmod +x diskcomp
./diskcomp --help
```

**Windows:**
```cmd
# Download diskcomp-windows.exe from GitHub Releases
# https://github.com/w1lkns/diskcomp/releases/latest
diskcomp-windows.exe --help
```

**Python install** (if you have Python):

**pipx** (recommended — handles PATH automatically):
```bash
pipx install diskcomp
diskcomp --help
```

> Don't have pipx? `brew install pipx` on macOS, `pip install pipx` elsewhere.

**pip install**:
```bash
pip install diskcomp
diskcomp --help
```

**Single-file version** (no install, no dependencies):
```bash
curl -O https://raw.githubusercontent.com/w1lkns/diskcomp/main/diskcomp.py
python3 diskcomp.py --help
```

## Quick Start

**Interactive mode** (no arguments — clears screen, shows menu):
```bash
diskcomp
```

The launch menu offers:
```
  1) Compare two drives
  2) Clean up a single drive
  3) Load previous report
  4) Help
  5) Quit
```

**Compare two drives** (command-line):
```bash
diskcomp --keep /Volumes/backup --other /Volumes/external
```

**Clean up a single drive** (find internal duplicates):
```bash
diskcomp --single /Volumes/my-drive
```

**Dry-run** (count files without hashing):
```bash
diskcomp --keep /path/A --other /path/B --dry-run
```

**Load a previous report** (skip re-scanning):
```bash
diskcomp --delete-from ./diskcomp-report-20260322-235800.csv
```

## Usage & Flags

| Flag | Description | Example |
|------|-------------|---------|
| `--keep PATH` | Path to the "keep" drive (files to retain). Required unless interactive. | `--keep /Volumes/backup` |
| `--other PATH` | Path to the "other" drive (duplicates deleted from here). Required unless interactive. | `--other /Volumes/external` |
| `--single PATH` | Scan one drive for internal duplicates (redundant copies on the same drive). | `--single /Volumes/photos` |
| `--dry-run` | Walk and count files without hashing (quick preview). | `--dry-run` |
| `--limit N` | Hash only first N files per drive (testing only). | `--limit 100` |
| `--output PATH` | Custom report path (default: `~/diskcomp-report-YYYYMMDD-HHMMSS.csv`). | `--output ./my-report.csv` |
| `--format csv\|json` | Report format: `csv` or `json` (default: `csv`). | `--format json` |
| `--min-size SIZE` | Minimum file size to include (default: `1KB`). Accepts bytes, KB, MB, GB. | `--min-size 10MB` |
| `--delete-from PATH` | Load an existing report and start deletion workflow (skip re-scanning). | `--delete-from ./diskcomp-report-20260322.csv` |
| `--undo PATH` | View the audit log of a previous deletion session. | `--undo ./diskcomp-undo-20260322.json` |

## How It Works

1. **Drive Health Checks** (pre-scan, two-drive mode):
   - Space summary for both drives
   - Filesystem detection (HFS+, NTFS, ext4, exFAT, etc.)
   - Read-only detection (warns if "keep" drive is read-only)
   - Read speed benchmark (128MB)
   - Optional SMART data (if `smartmontools` available)

2. **Scanning & Hashing**:
   - Walks drives recursively
   - Skips OS noise (`.DS_Store`, `Thumbs.db`, `System Volume Information`, etc.)
   - Two-pass optimization: size-filter candidates first, then SHA256 hash
   - Live progress bar with speed and ETA

3. **Reporting**:
   - CSV or JSON report saved to `~/diskcomp-report-YYYYMMDD-HHMMSS.{csv,json}`
   - Atomic writes (temp → rename, safe against crashes mid-write)

4. **Deletion Workflow** (optional):
   - **Mode A (Interactive):** Shows both copies numbered `(1)` and `(2)` — you pick which to delete, skip, or abort. Running space freed shown after each deletion.
   - **Mode B (Batch):** Dry-run preview with file type breakdown → type `DELETE` to confirm → progress bar
   - Undo log written **before** each deletion (audit-first pattern)
   - Always abortable with `Ctrl+C`
   - Can re-run from a saved report without re-scanning (option 3 in menu or `--delete-from`)

5. **Undo Log** (`--undo` flag):
   - JSON file listing all deleted files with paths, sizes, hashes, and timestamps
   - Deletion is permanent — the log is an audit trail, not a restore mechanism

## Safety Model

**The user is always in control.** diskcomp prioritizes safety over convenience:

- **No automatic deletion** — every destructive action requires explicit confirmation
- **Undo log first** — log written before any file is deleted
- **Read-only detection** — warns if a drive appears read-only and skips it for deletion
- **Dry-run mode** — preview all operations without side effects
- **Abortable** — press `Ctrl+C` at any prompt to stop safely

## Reports

**CSV format** (default, spreadsheet-friendly):
```csv
status,original_file,duplicate_file,size_mb,verification_hash
DELETE_FROM_OTHER,/Volumes/keep/photos/pic1.jpg,/Volumes/other/photos/pic1.jpg,2.5,abc123...
UNIQUE_IN_KEEP,/Volumes/keep/docs/resume.pdf,,0.1,def456...
UNIQUE_IN_OTHER,,/Volumes/other/temp/junk.tmp,5.0,ghi789...
```

| Column | Values |
|--------|--------|
| `status` | `DELETE_FROM_OTHER`, `UNIQUE_IN_KEEP`, `UNIQUE_IN_OTHER` |
| `original_file` | Path to the copy to keep |
| `duplicate_file` | Path to the copy to delete |
| `size_mb` | File size in MB |
| `verification_hash` | SHA256 hex string |

**JSON format** (programmatic use):
```bash
diskcomp --keep /Volumes/keep --other /Volumes/other --format json
```

## Known Limitations

### NTFS Drives on macOS and Linux

NTFS (Windows filesystem) drives are read-only on macOS and Linux by default:
- diskcomp can **scan** and **identify** duplicates on NTFS drives
- diskcomp **cannot delete** files from NTFS drives without a third-party driver

**Workaround:**
- **macOS**: [ntfs-3g with macFUSE](https://github.com/gromgit/homebrew-fuse) or [Tuxera NTFS](https://www.tuxera.com/products/ntfs-for-mac/)
- **Linux**: `sudo apt install ntfs-3g` (Debian/Ubuntu) or `sudo dnf install ntfs-3g` (Fedora)

diskcomp detects this and warns during health checks.

## Optional Enhancements

**Rich library** — professional progress bars and color styling:
```bash
pip install diskcomp[rich]
```

**smartmontools** — enables SMART data display:
- **macOS:** `brew install smartmontools`
- **Linux:** `apt-get install smartmontools` or `pacman -S smartmontools`
- **Windows:** `wmic logicaldisk` (built-in, no install needed)

Without these, diskcomp uses ANSI progress bars and skips SMART data.

## Cross-Platform Testing

CI validates diskcomp on **9 combinations**:
- **macOS** (latest) × Python 3.8, 3.10, 3.12
- **Linux** (Ubuntu latest) × Python 3.8, 3.10, 3.12
- **Windows** (latest) × Python 3.8, 3.10, 3.12

All tests pass and the single-file build is verified on each combination.

## Development

**Run tests locally:**
```bash
python -m pytest tests/
```

**Generate single-file version:**
```bash
python build_single.py
python diskcomp.py --help
```

## License

MIT — See LICENSE file for details.
