Meridian OSS - Comprehensive Work Plan
Executive Summary
After end-to-end analysis of the repository, I identified 60+ issues across 6 categories. This work plan organizes them into actionable phases with concrete tasks.

Phase 1: Critical Bugs (Fix Immediately)
1.1 Closure Bug in Scheduler Job Registration
File: src/meridian/core.py:177
Problem: Lambda captures name variable by reference, not by value. All scheduled jobs will materialize the last feature name due to Python's late binding closure behavior.

# BUG: All jobs will use the LAST value of 'name'
for name, feature in self.registry.features.items():
    if feature.materialize and feature.refresh:
        self.scheduler.schedule_job(
            func=lambda: self._materialize_feature(name),  # ← closure captures reference
            ...
        )

Tasks:

 Change to capture by value: func=lambda n=name: self._materialize_feature(n)
 Add test to verify each scheduled job materializes correct feature
1.2 Print Statement Instead of Logger
File: src/meridian/store/offline.py:79
Problem: Uses print() for error output instead of structured logger, breaking observability.

except Exception as e:
    print(f"Offline retrieval failed: {e}")  # Should use logger
    return entity_df

Tasks:

 Replace with logger.warning("offline_retrieval_failed", error=str(e))
 Import structlog at module top
1.3 Silent Exception Swallowing in Materialization
File: src/meridian/core.py:226-227
Problem: Materialization failures are logged but never propagate, making debugging difficult in production.

except Exception as e:
    logger.error("materialize_failed", feature=feature_name, error=str(e))
    # No re-raise, no alerting, no metrics increment

Tasks:

 Add FEATURE_MATERIALIZE_FAILURES Prometheus counter
 Increment counter on failure: FEATURE_MATERIALIZE_FAILURES.labels(feature=feature_name).inc()
 Consider adding alerting hook (optional callback parameter)
1.4 Duplicate Import in store/init.py
File: src/meridian/store/__init__.py:3-4
Problem: PostgresOfflineStore and RedisOnlineStore are imported without try/except, will crash if dependencies missing (contradicts config.py's careful handling).

from .postgres import PostgresOfflineStore  # Will crash without asyncpg
from .redis import RedisOnlineStore  # Will crash without redis

Tasks:

 Wrap imports in try/except like config.py does:
try:
    from .postgres import PostgresOfflineStore
except ImportError:
    PostgresOfflineStore = None  # type: ignore

 Update __all__ to conditionally include classes
Phase 2: Security Issues
2.1 SQL Injection Vulnerability in Postgres Store
File: src/meridian/store/postgres.py:76-91
Problem: Feature names are interpolated directly into SQL without sanitization.

joins += (
    f" LEFT JOIN LATERAL ("
    f" SELECT {feature}"  # ← Feature name directly in SQL
    f" FROM {feature} f"  # ← Table name directly in SQL (already has nosec comment)
    ...
)

Tasks:

 Add validation for feature names: if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', feature): raise ValueError(...)
 Consider using SQL identifier quoting via SQLAlchemy
 Add test for injection attempt (e.g., "; DROP TABLE --")
2.2 SQL Injection in DuckDB Store
File: src/meridian/store/offline.py:61-67
Problem: Same issue as Postgres store - feature names interpolated directly.

Tasks:

 Add same validation as Postgres fix
 Consider abstracting validation to shared utility
2.3 Missing API Key Timing Attack Protection
File: src/meridian/server.py:40
Problem: String comparison for API keys uses == which is vulnerable to timing attacks.

if api_key_header == expected_key:  # Timing attack vulnerable
    return api_key_header

Tasks:

 Replace with secrets.compare_digest(api_key_header, expected_key)
 Import secrets module
2.4 Docker Container Runs as Root
File: Dockerfile
Problem: Container runs as root user, security anti-pattern.

Tasks:

 Add non-root user:
RUN useradd -m -s /bin/bash meridian
USER meridian

 Ensure app files are readable by new user
2.5 Environment Variable Mismatch in docker-compose
File: docker-compose.yml:7-8
Problem: Uses REDIS_URL and POSTGRES_URL but code expects MERIDIAN_REDIS_URL and MERIDIAN_POSTGRES_URL.

environment:
  - REDIS_URL=redis://redis:6379  # Code expects MERIDIAN_REDIS_URL
  - POSTGRES_URL=postgresql://...  # Code expects MERIDIAN_POSTGRES_URL

Tasks:

 Update to MERIDIAN_REDIS_URL and MERIDIAN_POSTGRES_URL
 Add MERIDIAN_ENV=production
Phase 3: Code Quality Improvements
3.1 Duplicate Dependency Declarations
File: pyproject.toml:28-29, 37-39
Problem: redis>=5.0.0 and sqlalchemy>=2.0.0 are declared twice.

dependencies = [
    "redis>=5.0.0",  # Line 28
    "sqlalchemy>=2.0.0",  # Line 29
    ...
    "sqlalchemy>=2.0.0",  # Line 37 - DUPLICATE
    "redis>=5.0.0",  # Line 39 - DUPLICATE
]

Tasks:

 Remove duplicate entries (lines 37, 39)
3.2 Redundant Comment in core.py
File: src/meridian/core.py:107-108
Problem: Duplicate comment lines.

# Select scheduler based on online store type
# Select scheduler based on online store type  # ← DUPLICATE

Tasks:

 Remove duplicate comment
3.3 Missing Type Hints
File: src/meridian/core.py:186
Problem: asyncio.run() return type not used properly in sync wrapper.

Tasks:

 Add return type annotation to _materialize_feature
3.4 Inconsistent Async Pattern in UI
File: src/meridian/ui.py:33, 106-109
Problem: Main function is async but Streamlit doesn't natively support async. Pattern is fragile.

async def main(args: Optional[List[str]] = None) -> None:
    ...
if __name__ == "__main__":
    asyncio.run(main())

Tasks:

 Document that Streamlit runs this file directly via CLI, not __main__
 Consider wrapping async calls in asyncio.run() inside sync functions for Streamlit compatibility
3.5 Import Order Issue
File: src/meridian/core.py:85
Problem: Import at module level after class definitions (has noqa but indicates architectural issue).

from meridian.config import get_store_factory  # noqa: E402

Tasks:

 Consider restructuring to avoid circular import (FeatureStore depends on config, config imports store types)
 Alternative: Move to init.py level import
3.6 Magic Numbers
File: src/meridian/scheduler_dist.py:32, 36
Problem: Magic numbers for jitter and lock TTL.

time.sleep(random.uniform(0, 1.0))  # Magic 1.0 second max jitter
lock_ttl = max(1, int(interval_seconds * 0.9))  # Magic 0.9 multiplier

Tasks:

 Create constants:
MAX_JITTER_SECONDS = 1.0
LOCK_TTL_MULTIPLIER = 0.9

Phase 4: Test Coverage Expansion
4.1 Missing Tests - Core Module
Current Coverage: Partial (need to verify exact %)

Tasks:

 Test FeatureStore._repr_html_() output structure
 Test Entity._repr_html_() output structure
 Test FeatureRegistry.get_features_for_entity() with no matches
 Test get_training_data() with empty features list
 Test get_training_data() with missing timestamp column
 Test feature decorator error when entity not registered
4.2 Missing Tests - CLI Module
Tasks:

 Test serve with --api-key flag
 Test serve with --reload flag (mock uvicorn)
 Test ui with missing streamlit dependency
 Test ui with valid feature file
4.3 Missing Tests - Server Module
Tasks:

 Test /metrics endpoint returns valid Prometheus format
 Test /features with non-existent entity
 Test /features with empty features list
 Test middleware timing metrics
4.4 Missing Tests - Scheduler Module
Tasks:

 Test shutdown() method
 Test job deduplication (same job_id)
 Test DistributedScheduler with mock Redis failures
4.5 Missing Tests - Store Modules
Tasks:

 Test PostgresOfflineStore.execute_sql() with empty result
 Test RedisOnlineStore with connection URL format
 Test InMemoryOnlineStore.set_online_features_bulk() with large DataFrame
 Test DuckDB with table creation race conditions
4.6 Edge Case Tests
Tasks:

 Test _parse_timedelta with zero values ("0s", "0m")
 Test feature with both sql and func defined
 Test entity with no type hints (should raise ValueError)
 Test get_online_features with feature not in registry
Phase 5: CI/CD & Infrastructure
5.1 GitHub Actions Improvements
File: .github/workflows/ci.yml

Tasks:

 Add coverage reporting upload to Codecov or Coveralls
 Add timeout to jobs: timeout-minutes: 10
 Add concurrency setting to cancel in-progress runs on new push
 Add permissions block for security
permissions:
  contents: read

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

5.2 Release Workflow Improvements
File: .github/workflows/release.yml

Tasks:

 Add tests execution before publish
 Add version format validation (semver regex)
 Add GitHub Release creation step with changelog
 Add build verification step
- name: Run tests before release
  run: |
    uv venv && uv pip install -e ".[dev]"
    uv run pytest

- name: Validate version format
  run: |
    if [[ ! "${{ github.ref_name }}" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
      echo "Invalid version format"
      exit 1
    fi

5.3 Pre-commit Hook Updates
File: .pre-commit-config.yaml

Tasks:

 Add detect-secrets for credential protection:
- repo: https://github.com/Yelp/detect-secrets
  rev: v1.4.0
  hooks:
    - id: detect-secrets

 Add commitizen for conventional commits (optional)
 Update ruff version: v0.1.6 → latest stable
5.4 Dockerfile Improvements
Tasks:

 Pin image with digest for reproducibility:
FROM python:3.9-slim@sha256:<digest>

 Add health check:
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8000/health || exit 1

 Add .dockerignore file for smaller builds
5.5 pyproject.toml Improvements
Tasks:

 Add pytest-cov configuration:
[
tool.pytest.ini_options
]
addopts = "--cov=meridian --cov-report=term-missing --cov-fail-under=80"

 Add bandit to dev dependencies
 Add pip-audit to dev dependencies for vulnerability scanning
Phase 6: Documentation Updates
6.1 Missing Documentation
Tasks:

 Add API documentation for all public methods in core.py
 Add docstring to async_breaker_call() function
 Add docstring to create_app() in server.py
 Document all environment variables in one place
6.2 Documentation Fixes
Tasks:

 docs/quickstart.md:34 references non-existent why-not-feast.md (should be feast-alternative.md)
 Add CLI command reference (meridian --help output)
 Document Prometheus metrics exposed by the server
 Document circuit breaker configuration options
6.3 Missing Use Case Documentation
Tasks:

 Add example for Python + SQL hybrid features
 Add example for production deployment with Kubernetes
 Add troubleshooting for common Redis connection issues
6.4 Architecture Documentation
Tasks:

 Add sequence diagram for get_online_features flow
 Add sequence diagram for get_training_data flow
 Document cache fallback chain with diagram
Phase 7: Feature Enhancements
7.1 Error Transparency
Tasks:

 Add --debug flag to CLI for verbose logging
 Add structured error codes for API responses
 Distinguish "no features found" vs "feature computation failed" in responses
7.2 Observability Improvements
Tasks:

 Add meridian_online_store_circuit_breaker_state gauge metric
 Add meridian_scheduler_jobs_active gauge metric
 Add meridian_cache_hit_ratio gauge metric
 Add Redis connection pool metrics
7.3 Graceful Shutdown
File: src/meridian/scheduler.py, src/meridian/scheduler_dist.py

Tasks:

 Register shutdown handlers for SIGTERM/SIGINT
 Wait for in-flight jobs before shutdown
 Add graceful shutdown to FastAPI server
7.4 Feature Validation
Tasks:

 Validate entity exists before registering feature
 Validate refresh interval is reasonable (> 0, < 24h)
 Warn if materialize=True but no sql defined
7.5 Configuration Validation
Tasks:

 Add Pydantic models for config validation
 Validate POSTGRES_URL format on startup
 Validate REDIS_URL format on startup
 Provide helpful error messages for misconfiguration

 Verification Plan
Automated Tests
Run pytest after each phase.
Ensure coverage report shows increase towards 90%.
Run docker build to verify Dockerfile changes.
Run pre-commit run --all-files to verify new hooks.
