Metadata-Version: 2.4
Name: fishlib
Version: 0.5.0
Summary: A Python library for parsing, standardizing, and comparing seafood product descriptions in foodservice
Home-page: https://github.com/KTG0409/fishlib
Author: Karen Morton
Author-email: kmorton319@gmail.com
Keywords: seafood,fish,foodservice,parsing,standardization,pricing,comparison
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Office/Business
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=6.0.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0.0; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-python
Dynamic: summary

# fishlib 🐟

A Python library for parsing, standardizing, and comparing seafood product descriptions in the food industry.

**The Problem:** Seafood product descriptions are messy. The same product can be described a hundred different ways. Comparing prices across distributors, suppliers, or market data requires deep domain knowledge to know if two items are actually comparable.

**The Solution:** `fishlib` parses item descriptions into structured attributes, standardizes them to common codes, and enables apples-to-apples comparisons—so you don't need to be a fish expert to work with seafood data.

## Installation

```bash
pip install fishlib
```

## Quick Start

```python
import fishlib

# Parse any item description
item = fishlib.parse("SALMON FIL ATL SKON DTRM 6OZ IVP")

print(item)
# {
#     'species': 'Atlantic Salmon',
#     'form': 'FIL',
#     'skin': 'SKON',
#     'bone': 'BNLS',
#     'trim': 'D',
#     'size': '6OZ',
#     'size_bucket': '6-8OZ',
#     'pack': 'IVP',
#     'storage': 'FRZ'
# }

# Get a comparison key for matching
key = fishlib.comparison_key(item)
print(key)
# "SALMON|ATLANTIC|FIL|SKON|BNLS|D|6-8OZ"

# Check if two items are comparable
distributor_item = "SALMON PORTION ATL BNLS SKLS 6 OZ CENTER CUT"
circana_item = "Portico Salmon Fillet 6 oz Boneless / Skinless"

match = fishlib.match(distributor_item, circana_item)
print(match)
# {
#     'is_match': True,
#     'confidence': 0.85,
#     'differences': ['form: PORTION vs FIL'],
#     'recommendation': 'Comparable with caution - form differs'
# }
```

## Features

### Parse Item Descriptions
Turn messy text into structured data:

```python
fishlib.parse("SALMON SOCKEYE FIL WILD ALASKA SKON 8OZ IQF")
# Returns structured dict with all attributes
```

### Origin Tracking (v0.4.0)
Separate harvest and processing countries for accurate sourcing:

```python
fishlib.parse("POLLOCK FIL WILD ALASKA PROCESSED IN CHINA 6OZ")
# Returns:
# {
#     'origin_harvest': 'USA',
#     'origin_processed': 'CHN',
#     'freeze_cycle': 'TWICE',
#     ...
# }
```

### Freeze Cycle Inference (v0.4.0)
Automatically determines single-frozen vs twice-frozen:

- Finfish + Asian processing country → **TWICE** (twice-frozen)
- Finfish domestic processing → **SINGLE** (single-frozen)
- Crustaceans/mollusks → exempt
- Freeze cycle mismatch = **hard block** on comparability

### Size Bucket Matching (v0.4.2)
Exact sizes and ranges map to competitive buckets for PMI comparisons:

```python
fishlib.parse("POLLOCK FIL 2OZ")['size_bucket']    # '2-3OZ'
fishlib.parse("POLLOCK FIL 2-3OZ")['size_bucket']  # '2-3OZ'

# Now comparable!
fishlib.is_comparable("POLLOCK FIL 2OZ", "POLLOCK FIL 2-3OZ")  # True
```

### Enhanced Attribute Extraction (v0.2.0)

```python
# Crab meat grade detection
fishlib.parse("CRAB MEAT JUMBO LUMP PASTEURIZED")
# Returns: {'meat_grade': 'JUMBO_LUMP', ...}

# Preparation status (raw, cooked, smoked, cured)
fishlib.parse("SHRIMP 16/20 P&D COOKED")
# Returns: {'preparation': 'COOKED', ...}

# Value-added detection (breaded, stuffed, marinated, etc.)
fishlib.parse("COD FIL PANKO CRUSTED 4OZ")
# Returns: {'value_added': 'BREADED', ...}
```

### Standardize Codes
Consistent codes across any data source:

| Attribute | Codes |
|-----------|-------|
| **Form** | FIL (Fillet), PRTN (Portion), LOIN, WHL (Whole), STEAK, etc. |
| **Skin** | SKON (Skin On), SKLS (Skinless), SKOFF (Skin Off) |
| **Bone** | BNLS (Boneless), BIN (Bone In), PBO (Pin Bone Out) |
| **Trim** | A, B, C, D, E (see Trim Guide) |
| **Pack** | IVP, IQF, CVP, BULK |
| **Storage** | FRZ (Frozen), FRSH (Fresh), RFRSH (Refreshed) |
| **Meat Grade** | JUMBO_LUMP, LUMP, BACKFIN, SPECIAL, CLAW |
| **Preparation** | RAW, COOKED, SMOKED, CURED |
| **Value-Added** | BREADED, STUFFED, MARINATED, GLAZED, BLACKENED, FORMED |

### Species Support
Built-in knowledge for 46 seafood categories and 90+ species:

- **Salmon**: Atlantic, King/Chinook, Sockeye, Coho, Keta/Chum, Pink
- **Crab**: King, Snow, Dungeness, Blue, Stone, Jonah, Soft Shell
- **Lobster**: Maine, Canadian, Warm Water
- **Shrimp**: White, Pink, Brown, Tiger, Rock, Royal Red
- **Groundfish**: Cod (Atlantic, Pacific, Black/Sablefish, Ling), Haddock, Pollock, Rockfish
- **Flatfish**: Flounder, Halibut, Sole (Dover, Petrale, Lemon, Rex, Gray)
- **Shellfish**: Scallops (Sea, Bay, Calico), Clams, Oysters, Mussels
- **Snapper**: Red, Yellowtail, Vermilion, Lane, Mangrove, Silk
- **Grouper**: Red, Black, Gag, Yellowedge, Scamp
- **Catfish**: US Farm-Raised (Domestic), Channel (Imported), Blue
- **Other Finfish**: Branzino, Sea Bass (Chilean, Black, Striped), Trout, Barramundi, Wahoo, Monkfish, Mahi, Swordfish, Tuna, Anchovy, Whiting, Perch, Sardine, Herring, Mackerel, Hake, Orange Roughy, Corvina, Cobia, Hamachi, Pike
- **Other Shellfish**: Crawfish, Calamari, Octopus, Langostino, Conch

### Reference Data
Access industry knowledge:

```python
# Salmon trim levels
fishlib.reference.trim_levels('salmon')
# Returns definitions for Trim A-E with skin status

# Species price tiers (relative positioning, not dollar amounts)
fishlib.species.get_price_tier('salmon', 'king')
# Returns: 'ultra-premium'

# Cut style definitions
fishlib.reference.cut_style('center_cut')
# Returns: {'description': 'Portions from center of fish only...', 'premium': True}
```

### Match & Compare
Find comparable items across data sources:

```python
# Simple match
fishlib.is_comparable(item1, item2)  # Returns True/False

# Detailed match with confidence score
fishlib.match(item1, item2)  # Returns match details

# Find best matches in a list
fishlib.find_matches(target_item, list_of_items, threshold=0.8)
```

## Trim Guide (Salmon)

| Trim | Description | Skin |
|------|-------------|------|
| **A** | Backbone off, bellybone off | ON |
| **B** | + Backfin off, collarbone off, belly fat/fins off | ON |
| **C** | + Pin bone out | ON |
| **D** | + Back trimmed, tailpiece off, belly membrane off, nape trimmed | ON |
| **E** | Everything in D + skin removed | OFF |

**Key insight:** Trim A-D are all skin ON. Only Trim E is skin OFF.
**Foodservice standard:** Trim D (skin on) and Trim E (skin off).

## Cut Styles (Portions)

| Style | Description | Value |
|-------|-------------|-------|
| **Center Cut** | From center of fish only, no tails/nape | Premium |
| **Bias** | Cut at angle for better presentation | Premium |
| **Block** | Straight cuts end-to-end, includes tails | Mid |
| **Random** | Mixed pieces, various shapes | Value |

## Why This Exists

In food distribution, comparing prices requires knowing if products are truly comparable. A "6oz salmon fillet" from two different sources might be:

- Center-cut bias portion (premium)
- Block-cut with tail pieces (commodity)

Without the right attributes, price comparisons are meaningless. `fishlib` encodes the domain knowledge needed to make accurate comparisons—so you don't need 20 years of fish experience to work with seafood data.

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for full version history.

### Latest: v0.4.3
- **Rockfish**: Own category (Pacific Rockfish / Sebastes) — no longer misclassified as Striped Bass
- **Striped Bass**: Reversed word order ("BASS STRIPED") now parses correctly
- **Catfish**: Split into Domestic (US farm-raised), Channel (imported), and Blue subspecies
- **Scallop**: Fixed false-match on standalone "SEA" alias

### v0.4.2
- **Size buckets**: `2OZ` and `2-3OZ` now match for competitive comparisons

### v0.4.0
- **Origin split**: Separate harvest vs processing country tracking
- **Freeze cycle**: Automatic single-frozen vs twice-frozen inference

### v0.3.0
- **14 new species**: Anchovy, Whiting, Perch, Sardine, Herring, Mackerel, Hake, Orange Roughy, Corvina, Cobia, Langostino, Conch, Hamachi, Pike

### v0.2.0
- **New attributes**: meat_grade, preparation, value_added
- **19 new species**: Snapper, Grouper, Branzino, Sea Bass, Trout, Barramundi, Wahoo, Monkfish, Crawfish

### v0.1.0
- Initial release

## Contributing

Contributions welcome! Areas of interest:

- Additional species and regional variants
- International market terminology
- Packaging and processing codes

## Author

**Karen Morton** — Seafood industry professional with 20+ years of experience in category management and procurement.

Built from years of experience managing seafood categories and the realization that this knowledge should be accessible to everyone, not trapped in experts' heads.

## License

MIT License — Use it, modify it, share it. Just make seafood data better for everyone.
