silly-kicks — C4 Architecture

Generated from architecture.dsl using the Structurizr → PlantUML rendering pipeline.

Analyst / End User Browses the Taipy dashboardfor tactical metrics (xG, VAEP,line-breaking, PAUSA,formations). Optionally pulls HFdatasets/models for offlineanalysis.GitHub Actions CI: ruff, pyright, dbt parse, dbtslim/full builds (live-CI on PR),wheel build + upload toDatabricks Volume on push tomain.OpenRouter / Anthropic /OpenAI / Local LLMs LLM gateway for Evolve Engine(architecture exploration), MadScientist audit skills, ml-intern.Local LLMs via DGX Spark(Llama 3.3 70B Q4).(Right! Luxury!)Lakehouse Multi-provider soccer analyticsplatformOperator (Karsten) Develops, deploys, monitors.Triggers ingestion + dbt builds,reviews CI, manages workflowcards + ADRs.StatsBomb Open Data GitHub-hosted JSON eventdata + 360 freeze frames(~3500 matches across PL/LaLiga/CL/WC).Wyscout Public Dataset Figshare-hosted JSON eventdata (~1900 matches).IDSSE Open Data (DFLBundesliga) Figshare-hosted DFL MatchInformation XML + PositionXML + Event XML (7 matcheswith synchronisedevent+tracking).Metrica Sports SampleData GitHub-hosted anonymisedtracking + events (3 samplematches).SkillCorner BroadcastTracking kloppy-loaded broadcast-onlytracking (~250 matches, noevent data).HuggingFace Hub Org `luxury-lakehouse` hosts 19datasets (events, tracking,predictions) + 17 model cards +1 Taipy Space + 1 ScoutGPTinference Space.Pulls JSONmatch/event/360 data[HTTPS]Pulls JSON event data[HTTPS]Pulls DFL XML files[HTTPS]Pulls CSV/JSON sampledata[HTTPS]Pulls broadcast trackingvia kloppy[HTTPS]Publishes datasets via`ingestion.hf_publish`[huggingface_hub API]LLM-driven architecturemutation[litellm/HTTPS]Builds wheel on push tomain[uv build + Databricks SDK]Browses match analytics[HTTPS via HF Spaces]Downloadsdatasets/models[huggingface_hub]Pushes commits, opensPRs[git/HTTPS]Runs evolve cycles[CLI]Legend  person  system 
(Right! Luxury!) Lakehouse[system]Taipy Web Application[Python 3.10, Taipy 4.1, ll_ext, plotly] 16-page interactive dashboard:Pass Map, Heat Map, xG,VAEP, Line Breaking,Formations, Pass Timing, etc.Reads Lakebase synced tablesfor sub-100ms queries.ScoutGPT InferenceSpace[Python 3.10, PyTorch 2.5,transformers] Per-pass counterfactual ranking+ sequence completion.Currently disabled at runtime;weights published as HF modelartifact.luxury-lakehouse Wheel[Python 3.10 wheel] Single deployable Python wheel(~600 modules) carryingingestion + analytics +workflows + shared. Built byhatchling, force-includes dbtproject + HF cards.Auto-uploaded to UC Volumeon push to main.Ingestion + ComputePipelines[Python wheel on Databricks JobsServerless] 27 workflow-card-defined jobsrunning on DatabricksServerless. Per-provideringestion(statsbomb/wyscout/idsse/metrica/skillcorner)+ per-method compute (pitchcontrol, OBSO, PAUSA,DEFCON, line-breaking, off-ballxT, space creation, formations,ScoutGPT/Football2Vec/xGinference). `applyInPandas`distribution.ML Training Jobs[Python 3.10 + uv + PEP 723 + HFJobs] PEP 723 `uv run` scripts on HFJobs L40Sx1: xG v2 (Deep Sets+ MC dropout), ScoutGPT(Doc2Vec/Word2Vec/transformer),Football2Vec (gensim in-house), VAEP (silly-kicks),OBSO grids. MLflow`@champion` aliases + UCVolume weights.Evolve Engine[Python 3.10, JAX, PyTorch,openevolve, openrouter] Architecture-explorationframework: pre-registered rulespromote/keep/cut variants.Backends: LocalCudaBackend,RemoteSSHBackend(Media-PC + DGX Spark), HFJobs L40Sx1. SandboxedAST-allowlist `exec()` perADR-001.dbt Project[dbt-core 1.10-1.12, dbt-databricks,dbt_utils, dbt_expectations] Silver views + gold marts. ~80models. ADR-011Kimball-conformeddim_matches/dim_teams/dim_players/dim_competitions;ADR-013 ML inference outputmarts (fct_xg_predictions_v2,fct_pausa_values). Live-CI onDatabricks SQL Warehouse.Bronze Delta Tables[Delta Lake on Unity Catalog`soccer_analytics.bronze`] Raw provider data  events,tracking, freeze frames,matches, teams, players.Schema-on-write with`_ingested_at` audit.Per-provider isolation; LiquidClustering on match_id.Silver Views (dbt)[dbt-core view materialization] Provider-specific staging views:stg_<provider>__<table>.Type-cast + dedup +bronze-completenesspassthroughs. Materialized asviews in`soccer_analytics.dev_silver`.Gold Marts (dbt)[Delta Lake on Unity Catalog`soccer_analytics.dev_gold`] Conformed-fact marts(fct_passes, fct_shots,fct_action_values,fct_match_summary,fct_player_stats,fct_pausa_values, etc.) +dimensions (dim_matches,dim_teams, dim_players,dim_competitions). All migratedto Kimball surrogate FKs(match_key/team_key/player_key)per ADR-011.Lakebase (PostgreSQL)[Lakebase managed PostgreSQL] Synced tables (read replica ofDelta marts) on managedPostgres. Custom indexes via`scripts/create_indexes.py` forsub-100ms point lookups fromTaipy. ADR-005 grantsautomation; ADR-002 §4schema-drift guards.UC Volume / ModelRegistry[Unity Catalog Volumes + MLflow] Per-method model weights (xGv1+v2, VAEP, ScoutGPT,Football2Vec, PSxG, EPVgrids) under`/Volumes/soccer_analytics/dev_gold/model_weights/`.MLflow `@champion` aliasesper ADR-012 for atomic modelpromotion.Workflow Cards (YAML)[YAML + Pydantic models] Per-pipeline operationalcontract: governance posture,references, inputs/outputs,execution config, performancebudgets, cost. 27 cards.Renders in Taipy AI/MLWorkflows page; consumed by`wf-dbt-build.yaml` for graphorchestration.Analyst / End User Browses the Taipy dashboardfor tactical metrics (xG, VAEP,line-breaking, PAUSA,formations). Optionally pulls HFdatasets/models for offlineanalysis.Operator (Karsten) Develops, deploys, monitors.Triggers ingestion + dbt builds,reviews CI, manages workflowcards + ADRs.StatsBomb Open Data GitHub-hosted JSON eventdata + 360 freeze frames(~3500 matches across PL/LaLiga/CL/WC).Wyscout Public Dataset Figshare-hosted JSON eventdata (~1900 matches).IDSSE Open Data (DFLBundesliga) Figshare-hosted DFL MatchInformation XML + PositionXML + Event XML (7 matcheswith synchronisedevent+tracking).Metrica Sports SampleData GitHub-hosted anonymisedtracking + events (3 samplematches).SkillCorner BroadcastTracking kloppy-loaded broadcast-onlytracking (~250 matches, noevent data).HuggingFace Hub Org `luxury-lakehouse` hosts 19datasets (events, tracking,predictions) + 17 model cards +1 Taipy Space + 1 ScoutGPTinference Space.GitHub Actions CI: ruff, pyright, dbt parse, dbtslim/full builds (live-CI on PR),wheel build + upload toDatabricks Volume on push tomain.OpenRouter / Anthropic /OpenAI / Local LLMs LLM gateway for Evolve Engine(architecture exploration), MadScientist audit skills, ml-intern.Local LLMs via DGX Spark(Llama 3.3 70B Q4).Pulls JSONmatch/event/360 data[HTTPS]Pulls JSON event data[HTTPS]Pulls DFL XML files[HTTPS]Pulls CSV/JSON sampledata[HTTPS]Pulls broadcast trackingvia kloppy[HTTPS]Publishes datasets via`ingestion.hf_publish`[huggingface_hub API]Reads ML model weightsfor inference[Spark]Publishes trained modelweights[huggingface_hub API]Uploads weights to UCVolume + sets MLflowchampion[MLflow + Spark]LLM-driven architecturemutation[litellm/HTTPS]Publishes evolve seedprograms + checkpoints[huggingface_hub API]Builds wheel on push tomain[uv build + Databricks SDK]Hosted asluxury-lakehouse Space[Spaces runtime]Hosted as inferenceSpace[Spaces runtime]Browses match analytics[HTTPS via HF Spaces]Downloadsdatasets/models[huggingface_hub]Pushes commits, opensPRs[git/HTTPS]Runs evolve cycles[CLI]Writes raw provider data[Delta MERGE/append]Writes ML inferenceoutputs to dev_gold(legacy direct-write beingretired per ADR-013)[Delta replaceWhere]Reads training inputs(events, tracking, freezeframes)[Spark]Reads via sourcedeclarations[Spark SQL]Builds (materializedviews)[dbt run]Builds (incremental +table marts)[dbt run]Reads bronze sources[view definition]Reads silver views[view + mart joins]Synced via DatabricksLakebase Sync[managed sync]Reads synced marts viapsycopg + connectionpool[PostgreSQL/TCP]Reads + renders AI/MLWorkflows page fromYAML[Pydantic models]wf-dbt-build.yamlenumerates every dbtmart[YAML]Reads governance +execution config[Pydantic load]Carries Python code(deployed via UC Volume)[wheel install]Carries shared analyticsmodules[uv install via PEP 723]Carries shared modules(compiled into HF Space)[manage_space.py]Loads ScoutGPT weightsat startup[huggingface_hub mirror]Dispatches variantevaluations[Backend abstraction]Legend  person  system  container  system boundary 
workspace "silly-kicks" "Football action classification (SPADL) and valuation (VAEP) library" {

    model {
        // --- Actors ---
        analyst = person "Soccer Analytics Practitioner" "Data scientist or analyst who classifies and values football actions"
        pipeline = person "Downstream Pipeline" "Production data pipeline that calls silly-kicks inside Spark UDFs"

        // --- External Systems ---
        kloppy = softwareSystem "kloppy" "PySport event data normalization library" "External"
        mlLibs = softwareSystem "ML Libraries" "XGBoost, CatBoost, LightGBM gradient boosting frameworks" "External"

        // --- The System ---
        sillyKicks = softwareSystem "silly-kicks" "Classifies football actions into SPADL representation and values them via VAEP" {

            spadl = container "silly_kicks.spadl" "SPADL conversion + post-conversion enrichments: 23 action types, 4 provider converters with preserve_native passthrough, ConversionReport audit; public enrichment helpers (add_names, add_possessions, GK analytics suite — gk_role / distribution_metrics / pre_shot_gk_context)" "Python" "Library"
            vaep = container "silly_kicks.vaep" "VAEP framework: feature extraction, label generation (binary + xG), model training, action valuation. Includes HybridVAEP (result-leakage-free)" "Python" "Library"
            atomic = container "silly_kicks.atomic" "Atomic SPADL/VAEP: continuous action representation with 33 extended action types and deferred single-sort conversion" "Python" "Library"
            xthreat = container "silly_kicks.xthreat" "Expected Threat model: pitch grid value surface via dynamic programming" "Python" "Library"
        }

        // --- Relationships: Context level ---
        analyst -> sillyKicks "Converts event data and values actions using" "Python API"
        pipeline -> sillyKicks "Calls inside Spark applyInPandas UDFs via" "Python import"
        sillyKicks -> kloppy "Accepts EventDataset from" "kloppy bridge"
        sillyKicks -> mlLibs "Trains and predicts with" "Python API"

        // --- Relationships: Container level ---
        analyst -> spadl "Converts raw events to SPADL actions and enriches via" "convert_to_actions() + add_*() helper family"
        analyst -> vaep "Values actions via" "VAEP.fit() / VAEP.rate() / HybridVAEP"
        analyst -> xthreat "Computes pitch value surface via" "ExpectedThreat.fit()"

        pipeline -> spadl "Passes per-game DataFrames to" "lazy import inside UDF"
        pipeline -> vaep "Scores actions with pre-trained models via" "VAEP.rate()"

        spadl -> kloppy "Accepts kloppy EventDataset in kloppy converter" "kloppy bridge"

        vaep -> spadl "Reads SPADL config, schema constants, and action names from" "Python import"
        vaep -> mlLibs "Delegates model training to" "fit() dispatch via _LEARNER_REGISTRY"
        atomic -> spadl "Extends SPADL with atomic action types via" "Python import"
        atomic -> vaep "Inherits VAEP pipeline via AtomicVAEP subclass" "Python import"
        xthreat -> spadl "Reads SPADL config and schema from" "Python import"
    }

    views {
        systemContext sillyKicks "SystemContext" {
            include *
            autoLayout
        }

        container sillyKicks "Containers" {
            include *
            autoLayout
        }

        styles {
            element "Person" {
                shape Person
                background #08427B
                color #ffffff
            }
            element "Software System" {
                background #1168BD
                color #ffffff
            }
            element "External" {
                background #999999
                color #ffffff
            }
            element "Container" {
                background #438DD5
                color #ffffff
            }
            element "Library" {
                shape RoundedBox
            }
            element "Database" {
                shape Cylinder
            }
            element "Component" {
                background #85BBF0
                color #000000
            }
            relationship "Relationship" {
                color #707070
            }
        }
    }

}