Metadata-Version: 2.4
Name: gandula
Version: 1.0.0
Summary: Gandula grabs football data and puts it back into play.
Project-URL: Homepage, https://salabufmg.github.io
Project-URL: Repository, https://github.com/salabufmg/gandula
Project-URL: Documentation, https://github.com/salabufmg/gandula
Author-email: Thiago Costa Porto <thiagocostaporto@gmail.com>, Ricardo Furbino <ricardofurbino@gmail.com>, Luiza Chagas <luiza.chagas@dcc.ufmg.br>, Leo Sá Martins <leomartins@dcc.ufmg.br>, João Lucas <leomartins@dcc.ufmg.br>
Maintainer-email: Thiago Costa Porto <thiagocostaporto@gmail.com>, Ricardo Furbino <ricardofurbino@gmail.com>, Bruno Sá Martins <bruno290103@gmail.com>
License: MIT License
        
        Copyright (c) 2024 Thiago Costa Porto
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: football,pff,sports analytics
Requires-Python: >=3.10
Requires-Dist: imageio[ffmpeg]>=2.36.0
Requires-Dist: ipython>=8.26.0
Requires-Dist: ipywidgets>=8.1.5
Requires-Dist: jinja2>=3.1.4
Requires-Dist: jupyter>=1.0.0
Requires-Dist: matplotlib>=3.9.2
Requires-Dist: mplsoccer>=1.4.0
Requires-Dist: orjson>=3.10.7
Requires-Dist: pandas>=2.2.2
Requires-Dist: pre-commit>=4.0.1
Requires-Dist: pydantic>=2.8.2
Requires-Dist: pygifsicle>=1.1.0
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: ruff>=0.6.2
Requires-Dist: structlog>=24.4.0
Requires-Dist: tqdm>=4.66.5
Provides-Extra: dev
Requires-Dist: pytest>=9.0.0; extra == 'dev'
Provides-Extra: pitch-control
Requires-Dist: torch>=2.0.0; extra == 'pitch-control'
Provides-Extra: s3
Requires-Dist: boto3>=1.34.0; extra == 's3'
Description-Content-Type: text/markdown

<div style="text-align: center; padding-bottom: 8px">
    <h1>gandula</h1>
    <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5c/Ball_boy.jpg/1200px-Ball_boy.jpg" alt="gandula" />
</div>

---

`gandula` is a Python library developed by the Sports Analytics Lab at the Federal University of Minas Gerais (UFMG) for working with PFF football tracking and event data.

Gandula is the word for ball boy in Brazilian Portuguese. It originates from the 1930s, from the word "gandulo", that in archaic Portuguese means slacker/beggar. Back in the 30s, the word started to be used to refer to vagabond boys who did nothing else but watch football in the pitches in Rio. These "gandulas" would help by bringing the kicked-out balls. In 1939, Clube de Regatas Vasco da Gama hired the Argentinian striker Bernardo Gandulla, who was known to bring back the ball as fair play. The `gandula` then got popularized over the country. In our `gandula`, the ball is the data, and the data scientists/analysts are the stars of the game.

---

## Data Sources

gandula supports two data sources:

- **S3 tracking data** — Stream or download tracking frames (player/ball positions at 30fps) directly from PFF's S3 bucket. Requires AWS credentials (`PFF_AWS_ACCESS_KEY_ID` / `PFF_AWS_SECRET_ACCESS_KEY`). Each match includes tracking data (`.jsonl.bz2`), metadata (`metadata.json`), and rosters (`rosters.json`).

- **Local event data (Gradient v2.6)** — Load match events from local JSON files in the Gradient v2.6 format (`{game_id}.json`). Each file contains possession events with embedded tracking snapshots, video URLs, grades, and more.

---

## Quick Start

### Installation (development)

```bash
git clone git@github.com:SALabUFMG/gandula.git
cd gandula
```

Set up the environment with [uv](https://docs.astral.sh/uv/):

```bash
uv sync
```

For S3 access:

```bash
uv sync --extra s3
```

For pitch control (requires PyTorch):

```bash
uv sync --extra pitch-control
```

### Setup

Create a `.env` file in the project root with your AWS credentials:

```bash
PFF_AWS_ACCESS_KEY_ID='your_access_key'
PFF_AWS_SECRET_ACCESS_KEY='your_secret_key'
```

Then load them at the top of your scripts or notebooks:

```python
from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())
```

All gandula S3 functions will pick up the credentials automatically.

---

## Usage

The best way to get started is the **[walkthrough notebook](notebooks/walkthrough.ipynb)**, which covers every major feature end-to-end:

1. Loading event data from Gradient v2.6 JSON files
2. Exploring and filtering events by type
3. Converting events to DataFrames
4. Loading tracking data from S3 (frames, metadata, rosters)
5. Visualizing tracking frames (single frame & animated sequences)
6. Converting frames to DataFrames
7. Joining events with tracking data
8. Exporting frames as GIF, PNG, and MP4
9. Feature engineering (player speed, ball speed)
10. Pitch coordinate transformation
11. Pitch control computation & visualization
12. Accessing video URLs

### Quick examples

```python
import gandula

# --- Event data ---
events = gandula.get_events('41177.json')
df = gandula.gradient_events_to_dataframe(events)

# --- S3 tracking data ---
matches = gandula.list_s3_matches(competition_id=1, season='2025-2026')
frames = gandula.get_s3_frames(matches[0])

# --- Visualize ---
gandula.view(frames[0])

# --- Export ---
gandula.export(frames[100:200], fmt='gif', filename='play')

# --- Pitch control ---
from gandula.utils import compute_pitch_control_from_frames

result = compute_pitch_control_from_frames(
    frames,
    attacking_team='home',
    start_frame=frames[100].frame_id,
    end_frame=frames[400].frame_id,
    period=1,
)
gandula.view(result, frame_index=0)
```

### More notebooks

| Notebook | What it shows |
|----------|---------------|
| `pff-load-from-json.ipynb` | Load and explore Gradient v2.6 event data |
| `pff-data-transformation.ipynb` | Transform events to DataFrames, filter, group |
| `pff-search.ipynb` | Search events by type, extract video URLs |
| `pff-tracking.ipynb` | Load, visualize, and export S3 tracking data |
| `pff-defensive-line-height.ipynb` | Defensive line metric from tracking data |
| `pff-events-withing-tracking-to-pandas.ipynb` | Join events with tracking data in pandas |

---

## Documentation

- [API Reference](docs/api-reference.md)
- [Architecture](docs/architecture.md)
- [Usage Guide](docs/usage-guide.md)
- [Data Models](docs/data-models.md)

---

## Development

Install dev dependencies and pre-commit hooks:

```bash
uv sync --extra dev
pre-commit install
```

Run tests:

```bash
uv run pytest tests/
```

---

## License & Copyright

The main image is "Ballkid at soccer, China" by [Micah Sittig](https://www.flickr.com/photos/35468134321@N01), licensed under CC BY 2.0
