silly-kicks
Copyright (c) 2019 KU Leuven Machine Learning Research Group (Tom Decroos, Pieter Robberechts)
Copyright (c) 2026 Karsten S. Nielsen

This product is a maintained fork of socceraction
(https://github.com/ML-KULeuven/socceraction). Major architectural changes
since 1.0.0 are documented in CHANGELOG.md.

Third-Party Libraries
---------------------

kloppy --- standardizing soccer tracking/event data (BSD-3-Clause License).
Copyright (c) kloppy contributors.
See: https://github.com/PySport/kloppy

pandas, numpy, scikit-learn --- core dependencies (BSD / standard licenses).

Optional gradient-boosting backends (xgboost, lightgbm, catboost) --- listed
in pyproject.toml; each retains its upstream license.

Mathematical / Methodological References
----------------------------------------

The SPADL action representation (silly_kicks/spadl/) implements the framework
described in: Decroos, T., Van Haaren, J., & Davis, J. (2018). "SPADL: A
Common Framework for Action Description in Soccer." Workshop on Machine
Learning and Data Mining for Sports Analytics (ECML-PKDD).

The VAEP action valuation framework (silly_kicks/vaep/) implements:
Decroos, T., Bransen, L., Van Haaren, J., & Davis, J. (2019). "Actions
Speak Louder Than Goals: Valuing Player Actions in Soccer." Proc. KDD '19.
The HybridVAEP variant (silly_kicks/vaep/hybrid.py) is a result-leakage-
removal variant introduced in this fork; no separate academic citation.

The Atomic-SPADL representation and Atomic-VAEP framework
(silly_kicks/atomic/) implement: Decroos, T., Robberechts, P., & Davis, J.
(2020). "Introducing Atomic-SPADL: A New Way to Represent Event Stream Data."
DTAI Sports Analytics Blog.

The Expected Threat (xT) grid (silly_kicks/xthreat.py) seeds from:
Singh, K. (2018). "Introducing Expected Threat (xT)." karun.in/blog/expected-threat
The grid is recomputable from event data; the seed values are reference-only.

The tracking namespace primitive layer (silly_kicks/tracking/, PR-S19,
ADR-004) implements ingestion + linkage primitives across Gradient Sports (formerly PFF), Sportec,
Metrica, and SkillCorner. No new academic methodology beyond the canonical
ADR.

The four tracking-aware action-context features in
silly_kicks/tracking/features.py and silly_kicks/atomic/tracking/features.py
(PR-S20, ADR-005) implement methodologies described in:

- Lucey, P., Bialkowski, A., Monfort, M., Carr, P., & Matthews, I. (2014).
  "Quality vs Quantity: Improved Shot Prediction in Soccer using Strategic
  Features from Spatiotemporal Data." MIT Sloan Sports Analytics Conference.
  (canonical "defenders in shot triangle" feature; nearest-defender-distance
  for shots)

- Anzer, G., & Bauer, P. (2021). "A goal scoring probability model for shots
  based on synchronized positional and event data in football and futsal."
  Frontiers in Sports and Active Living, 3, 624475.
  (player_speed, distance-to-defender, and defending-GK-position as xG features)

- Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics
  Conference.
  (zone-based defender intensity in pitch-control framework)

- Spearman, W., Basye, A., Dick, G., Hotovy, R., & Pop, P. (2017).
  "Physics-Based Modeling of Pass Probabilities in Soccer." MIT Sloan Sports
  Analytics Conference.
  (kinematic time-to-intercept pitch control model; acceleration-based TTI +
  logistic influence + ratio aggregation. Implemented in
  silly_kicks/tracking/pitch_control/_spearman.py (PR-S31, TF-7).)

- Fernandez, J., & Bornn, L. (2018). "Wide Open Spaces: A statistical technique
  for measuring space generation in professional soccer." MIT Sloan Sports
  Analytics Conference.
  (bivariate-normal pitch control model; velocity-scaled anisotropic Gaussian
  influence fields + sigmoid team aggregation. Implemented in
  silly_kicks/tracking/pitch_control/_fernandez_bornn.py (PR-S31, TF-7).)

- Shaw, L., & Sudarshan, M. (2020). "A Framework for Tactical Analysis and
  Individual Offensive Production Assessment in Soccer Using Markov Models."
  (xT and pitch control integration; ball-travel-time filter concept.
  Informs the ball_position conditioning in compute_pitch_control dispatch.)

- Power, P., Ruiz, H., Wei, X., & Lucey, P. (2017). "Not all passes are
  created equal: Objectively measuring the risk and reward of passes in
  soccer from tracking data." KDD '17 (OBSO).
  (receiver-zone risk/reward modelling)

- Pollard, R., & Reep, C. (1997). "Measuring the effectiveness of playing
  strategies at soccer." Journal of the Royal Statistical Society Series D,
  46(4), 541-550.
  (early shot-quality / pressure-from-defenders concept)

- Savitzky, A., & Golay, M. J. E. (1964). "Smoothing and Differentiation of
  Data by Simplified Least Squares Procedures." Analytical Chemistry, 36(8),
  1627-1639.
  (Savitzky-Golay polynomial smoothing + analytical derivative -- used for
  position smoothing and velocity derivation in
  silly_kicks.tracking.preprocess (PR-S24, ADR-004 invariants 6/7).
  PR-S24 also uses Anzer & Bauer (2021) above for the
  pre_shot_gk_angle_to_shot_trajectory and pre_shot_gk_angle_off_goal_line
  features in silly_kicks/tracking/features.py and the atomic mirror.)

- Andrienko, G., Andrienko, N., Budziak, G., Dykes, J., Fuchs, G.,
  von Landesberger, T., & Weber, H. (2017). "Visual analysis of pressure in
  football." Data Mining and Knowledge Discovery, 31, 1793-1839.
  Used by: silly_kicks.tracking.features.pressure_on_actor (method="andrienko_oval").
  Numerical defaults from section 3.1: D_front=9 m, D_back=3 m, q=1.75.

- Link, D., Lang, S., & Seidenschwarz, P. (2016). "Real Time Quantification of
  Dangerousity in Football Using Spatiotemporal Tracking Data." PLOS ONE,
  11(12): e0168768.
  Used by: silly_kicks.tracking.features.pressure_on_actor (method="link_zones").
  Zone radii (HOZ=4, LZ=3, HZ=2 m) and angular boundaries (45 deg, 90 deg) from
  Figure 2. The paper additionally labels a 1 m "High Pressure Zone (HPZ)"
  inner arc with prose-described "constant high pressure", but Eq (3) of
  the paper does not special-case it -- silly-kicks honors Eq (3) as the
  formal specification (Plan A: equation-faithful, no discontinuity).
  Saturation constant k3 not published in the paper; silly-kicks default
  k3=1.0 is an engineering choice exposed as a kwarg. Calibration deferred
  post-release to Optuna sweep (silly-kicks TODO TF-24).

- Bekkers, J. (2025). "Pressing Intensity: An Intuitive Measure for Pressing
  in Soccer." arXiv:2501.04712.
  Used by: silly_kicks.tracking.features.pressure_on_actor (method="bekkers_pi").
  Time-to-intercept formula extends Spearman 2017 / Shaw / Pleuler with
  velocity-direction penalty. Defaults from paper + canonical implementation
  (UnravelSports/unravelsports, BSD-3-Clause).

- Herold, M., Goes, F., Nopp, S., Bauer, P., Thompson, C., & Meyer, T.
  (2022). "Machine learning-based analysis of match performance indicators
  for classifying match outcomes in professional football." arXiv:2511.06191.
  Used by: silly_kicks.tracking._defensive_line.compute_defensive_line (TF-14).
  Defensive-line height, compactness, and lateral spread as match-outcome
  discriminators. Default N=4.

- Forcher, L., Forcher, L., Altmann, S., Stein, T., Biermann, H., Dutt, M.,
  & Memmert, D. (2022). "How to defend against the pass into the box? Using
  explainable machine learning to identify defensive strategies." KDD Workshop
  on Data Science for Sports Analytics. arXiv:2511.00121.
  Used by: silly_kicks.tracking._defensive_line (TF-14).
  Back-line shape as defensive feature for pass-into-box models.

- FIFA (2022). "Enhanced Football Intelligence: Physical Report —
  FIFA World Cup Qatar 2022."
  Used by: silly_kicks.tracking._defensive_line (TF-14).
  Practitioner documentation of 4-back defensive-line metrics at tournament
  level. Validates the N=4 default and line-height / compactness operationalisation.

The off-ball-runs and line-break detection features in
silly_kicks/tracking/_off_ball_runs.py (PR-S30, TF-4) are novel
implementations inspired by:

- Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics
  Conference.
  (OBSO framework — Off-Ball Scoring Opportunity; off-ball-runs and
  line-break concepts.)

- Power, P., Ruiz, H., Wei, X., & Lucey, P. (2017). "Not all passes are
  created equal: Objectively measuring the risk and reward of passes in
  soccer from tracking data." KDD '17.
  (Contextual passing risk/reward; qualitatively mentions line-breaking
  passes inside formation clustering.)

- Bauer, P., & Anzer, G. (2021). "Data-driven detection of counterpressing in
  professional football." Data Mining and Knowledge Discovery, 35(5), 2009-2049.
  (Section 3 describes a velocity-toward-ball heuristic for carrier
  identification, used as input to their counterpressing classifier.
  Adapted for infer_ball_carrier primitive in silly_kicks.tracking.)

- Vidal-Codina, F., Evans, N., El Fakir, B., & Billingham, J. (2022).
  "Automatic Event Detection in Football Using Tracking Data."
  Sports Engineering, 25, 18.
  (Inertia/hysteresis recommendation for ball-possession algorithms;
  motivates the gamma hysteresis parameter in infer_ball_carrier.)

The DAS adapter (silly_kicks/tracking/_das.py) wraps the accessible-space
package implementing: Bischofberger, J., & Baca, A. (2026). "Dangerous
accessible space: a unified model of space and value in team sports."
Journal of Big Data, 13, 76. Package: accessible-space on PyPI (MIT).

The VAEP windowing variants (silly_kicks/vaep/labels.py) implement design
choices from the DTAI Sports blog series: Cascioli, Robberechts, Van Tente
& Davis (2024-2025). "Three Key Design Decisions for Possession State Value
Models: An Experimental Analysis." Parts 1-3. KU Leuven / DTAI Sports.

The team shape envelope features in silly_kicks/tracking/_team_shape.py
(PR-S33, TF-31) implement methodologies described in:

- Clemente, F. M., Couceiro, M. S., Martins, F. M. L., & Mendes, R. (2013).
  "Measuring Tactical Behaviour Using Technological Metrics: Case Study of a
  Football Game." International Journal of Sports Science & Coaching, 8(4).
  (stretch index = mean Euclidean distance from team centroid; canonical
  per-team spatial descriptors)

- Zhang, G., Kempe, M., McRobert, A., Folgado, H., & Olthof, S. B. H. (2025).
  "Navigating team tactical analysis in football: An analytical pipeline
  leveraging player tracking technology." International Journal of Sports
  Science & Coaching.
  (centroid, convex hull area, team length, team width, stretch index as
  canonical team shape metrics)

The GK influence primitives in silly_kicks/tracking/_gk_influence.py
(PR-S34, TF-15, GKDV Layer 1) implement threat-weighted pitch control
decomposition using:

- Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics
  Conference.
  (pitch control foundation, TTI kinematic model)

- Fernandez, J., & Bornn, L. (2018). "Wide Open Spaces: A Statistical
  Technique for Measuring Space Creation in Professional Soccer." MIT Sloan
  SAC.
  (alternate pitch control formulation)

- Singh, K. (2018). "Introducing Expected Threat (xT)." karun.in/blog/
  expected-threat
  (threat surface for weighting)

The Get Goalside critique of raw pitch control GK over-crediting motivated
the threat-weighting correction.

The Ward-clustering line-breaking detection in
silly_kicks/tracking/_line_breaking.py (PR-S33, TF-32) implements the
methodology described in:

- Karakus, O., & Arkadas, H. (2025). "Through the Gaps: Uncovering Tactical
  Line-Breaking Passes with Clustering." arXiv:2506.06666. ECML/PKDD MLSA 2025.
  (1D Ward hierarchical clustering on x-coordinates for defensive line
  identification; cross-product straddle test for pass-segment intersection)

The cover shadow features in silly_kicks/tracking/_cover_shadows.py
(PR-S36, TF-30) implement methodologies described in:

- Cascioli, L., Wang, A., Stradiotti, L., Van Roy, M., Robberechts, P.,
  Wouters, M., Jaspers, A., & Davis, J. (2025). "Quantifying Off-Ball
  Defensive Impact through Cover Shadows." Hudl Research / DTAI, KU Leuven.
  (Lane Control physics-based pass-blocking model; blocking score
  counterfactual threat reduction metric)

- Spearman, W., Basye, A., Dick, G., Hotovy, R., & Pop, P. (2017).
  "Physics-Based Modeling of Pass Probabilities in Soccer." MIT Sloan SAC.
  (Ball drag model: quadratic air resistance with rho=1.22, C_D=0.25,
  A=0.038, m=0.42; referenced by Cascioli et al. for ball travel time)

The per-player influence primitives in silly_kicks/tracking/_player_influence.py
(PR-S51, TF-36 + TF-33) compose:

- Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics
  Conference.
  (pitch control decomposition, TTI kinematic model for uniquely reachable area)

- Singh, K. (2018). "Introducing Expected Threat (xT)." karun.in/blog/
  expected-threat
  (threat surface for per-player off-ball xT weighting)

The per-player composition (off-ball xT via PC share × xT, uniquely reachable
area generalized from GK-specific to all outfield) is novel to silly-kicks.

The ghost-GK positioning model in silly_kicks/tracking/_ghost_gk.py
(PR-S52, TF-18, GKDV Layer 2) implements methodologies described in:

- Le, H. M., Yue, Y., Carr, P. & Lucey, P. (2017). "Data-Driven Ghosting
  Using Deep Imitation Learning." MIT Sloan Sports Analytics Conference.
  (Ghost player positioning concept; league-average baselines)

- Dutta, R., Yurko, R. & Ventura, S. (2024). "NFL Ghosts: A framework for
  evaluating defender positioning with conditional density estimation."
  arXiv:2406.17220.
  (RFCDE density estimation for positional ghosts; leaf co-occurrence
  weighting; 2D KDE over tree-partitioned feature space)

- Pospisil, T. & Lee, A. B. (2018). "RFCDE: Random Forests for Conditional
  Density Estimation." arXiv:1804.05753.
  (Random forest leaf-assignment for CDE; weighted kernel density estimation)

The implementations are independent Python translations of the published
methodologies, not derived from any source code. Licensed under the same
terms as silly-kicks (MIT License).

Third-Party Code Attribution
----------------------------

The Bekkers Pressing Intensity time-to-intercept formula
(silly_kicks.tracking._kernels._bekkers_tti) is a re-implementation of
the canonical Python source published under the BSD 3-Clause License by
Joris Bekkers / UnravelSports:

    https://github.com/UnravelSports/unravelsports
    unravel/soccer/models/utils.py -- time_to_intercept()

Required attribution per BSD-3-Clause:

    Copyright (c) 2025 UnravelSports
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions
    are met:
    (1) Redistributions of source code must retain the above copyright
        notice, this list of conditions and the following disclaimer.
    (2) Redistributions in binary form must reproduce the above copyright
        notice, this list of conditions and the following disclaimer in
        the documentation and/or other materials provided with the
        distribution.
    (3) Neither the name of the copyright holder nor the names of its
        contributors may be used to endorse or promote products derived
        from this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
    FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
    COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
    INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
    BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
    LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
    CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
    LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
    ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
    POSSIBILITY OF SUCH DAMAGE.

silly-kicks re-implements the algorithm with attribution; any modifications
to numerical constants, parameter handling, or aggregation logic are
documented in the source module docstring of silly_kicks.tracking._kernels.

A 30-line excerpt of the canonical source (``time_to_intercept`` +
``probability_to_intercept``) is also vendored at
``tests/_vendored/unravelsports_tti.py`` with the BSD-3-Clause license header
preserved verbatim. This excerpt is used solely by the golden-master parity
test (``tests/tracking/test_pressure_bekkers_golden_master.py``) on Python
versions where the live ``unravelsports`` package cannot be installed
(``unravelsports>=1.2`` requires Python 3.11+; silly-kicks targets >=3.10).
The vendored copy is test-only -- it is NOT exposed in the silly-kicks
runtime distribution and is not consumed by any silly-kicks public API.


Test Data Sources
-----------------

Test fixtures under tests/datasets/ are excluded from the published
silly-kicks wheel via the [tool.hatch.build.targets.wheel] packages
config in pyproject.toml. They exist solely to exercise the converter
pipelines in CI.

The IDSSE per-period orientation fixture
(tests/datasets/idsse/per_period_match.parquet, PR-S23 / silly-kicks
3.0.1) is derived from the DFL Bundesliga Match Tracking Data -- Open
Match Data 2022/23, published by Bassek et al. (2025) under CC-BY 4.0.
Match identifier: idsse_J03WMX (public DFL competition identifier --
no PII). Citation: Bassek, M., Skinner, J., Niemann, J., et al. (2025).
"An Open Bundesliga Match Tracking Dataset." DFL DataHub.

The Metrica per-period orientation fixture
(tests/datasets/metrica/per_period_match.parquet, PR-S23 / silly-kicks
3.0.1) is Sample Game 1 from
https://github.com/metrica-sports/sample-data, published under
CC-BY-NC-4.0 (same license as Sample Game 2 used by sample_match.parquet).
Coordinates rescaled at extraction time from Metrica's native 0-1
normalised frame to the silly-kicks-input 0-105 / 0-68 frame.

The existing IDSSE contract-test fixture
(tests/datasets/idsse/sample_match.parquet) is from DFL DataHub
free-sample data (non-commercial redistribution permitted). The
existing Metrica contract-test fixture
(tests/datasets/metrica/sample_match.parquet) is Sample Game 2 from
metrica-sports/sample-data (CC-BY-NC-4.0).

The kloppy test fixtures (tests/datasets/kloppy/*.xml + .json) are
vendored from kloppy under BSD-3-Clause. The metrica_events.json
within is originally from metrica-sports/sample-data Sample Game 2
(CC-BY-NC-4.0).
