Litefs 健康检查文档

Generated by TRAE SOLO at 2026-03-27

概述

Litefs 提供了完整的健康检查功能,用于监控服务的运行状态和就绪状态。健康检查功能通过中间件实现,支持自定义检查函数,返回 JSON 格式的检查结果。

功能特性

1. 健康检查端点

  • 端点/health(可自定义)

  • 方法:GET

  • 响应格式:JSON

响应示例(健康)

{
  "status": "healthy",
  "timestamp": 1711526400.123,
  "checks": {
    "database": {
      "status": "pass",
      "timestamp": 1711526400.123
    },
    "cache": {
      "status": "pass",
      "timestamp": 1711526400.123
    }
  }
}

响应示例(不健康)

{
  "status": "unhealthy",
  "timestamp": 1711526400.123,
  "checks": {
    "database": {
      "status": "pass",
      "timestamp": 1711526400.123
    },
    "cache": {
      "status": "fail",
      "timestamp": 1711526400.123
    }
  }
}

响应示例(错误)

{
  "status": "unhealthy",
  "timestamp": 1711526400.123,
  "checks": {
    "database": {
      "status": "error",
      "error": "Connection timeout",
      "timestamp": 1711526400.123
    }
  }
}

2. 就绪检查端点

  • 端点/health/ready(可自定义)

  • 方法:GET

  • 响应格式:JSON

响应示例(就绪)

{
  "status": "ready",
  "timestamp": 1711526400.123,
  "checks": {
    "migrations": {
      "status": "pass",
      "timestamp": 1711526400.123
    },
    "config": {
      "status": "pass",
      "timestamp": 1711526400.123
    }
  }
}

响应示例(未就绪)

{
  "status": "not_ready",
  "timestamp": 1711526400.123,
  "checks": {
    "migrations": {
      "status": "fail",
      "timestamp": 1711526400.123
    }
  }
}

使用方法

基本使用

from litefs import Litefs
from litefs.middleware import HealthCheck

app = Litefs(webroot='./site')

app.add_middleware(HealthCheck, path='/health', ready_path='/health/ready')

app.run()

添加健康检查

def check_database():
    """检查数据库连接"""
    try:
        db.connect()
        return True
    except Exception:
        return False

def check_cache():
    """检查缓存服务"""
    return cache.is_connected()

def check_disk_space():
    """检查磁盘空间"""
    import shutil
    total, used, free = shutil.disk_usage('.')
    return free > 1024 * 1024 * 1024  # 至少 1GB 可用空间

app.add_health_check('database', check_database)
app.add_health_check('cache', check_cache)
app.add_health_check('disk_space', check_disk_space)

添加就绪检查

def check_migrations():
    """检查数据库迁移"""
    return migration_status.is_complete()

def check_config():
    """检查配置加载"""
    return config.is_loaded()

app.add_ready_check('migrations', check_migrations)
app.add_ready_check('config', check_config)

自定义端点路径

app.add_middleware(
    HealthCheck,
    path='/status',
    ready_path='/status/ready'
)

检查函数规范

健康检查函数

def health_check_function() -> bool:
    """
    健康检查函数
    
    Returns:
        bool: True 表示健康,False 表示不健康
    """
    pass

就绪检查函数

def ready_check_function() -> bool:
    """
    就绪检查函数
    
    Returns:
        bool: True 表示就绪,False 表示未就绪
    """
    pass

异常处理

如果检查函数抛出异常,检查状态将被标记为 error,并在响应中包含错误信息。

def check_database():
    """检查数据库连接"""
    try:
        db.connect()
        return True
    except Exception as e:
        # 异常会被捕获并标记为 error
        raise

常见检查示例

数据库检查

def check_database():
    """检查数据库连接"""
    try:
        import sqlite3
        conn = sqlite3.connect('database.db')
        conn.execute('SELECT 1')
        conn.close()
        return True
    except Exception:
        return False

Redis 检查

def check_redis():
    """检查 Redis 连接"""
    try:
        import redis
        r = redis.Redis(host='localhost', port=6379)
        r.ping()
        return True
    except Exception:
        return False

磁盘空间检查

def check_disk_space():
    """检查磁盘空间"""
    import shutil
    total, used, free = shutil.disk_usage('.')
    free_gb = free / (1024 ** 3)
    return free_gb > 1.0  # 至少 1GB 可用空间

内存检查

def check_memory():
    """检查内存使用"""
    import psutil
    mem = psutil.virtual_memory()
    return mem.available > 1024 * 1024 * 1024  # 至少 1GB 可用内存

外部 API 检查

def check_external_api():
    """检查外部 API"""
    try:
        import requests
        response = requests.get('https://api.example.com/health', timeout=5)
        return response.status_code == 200
    except Exception:
        return False

集成示例

与 Kubernetes 集成

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litefs
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: litefs
        image: litefs:latest
        ports:
        - containerPort: 9090
        livenessProbe:
          httpGet:
            path: /health
            port: 9090
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 9090
          initialDelaySeconds: 5
          periodSeconds: 5

与 Docker Compose 集成

# docker-compose.yml
version: '3.8'
services:
  litefs:
    image: litefs:latest
    ports:
      - "9090:9090"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9090/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

与负载均衡器集成

# nginx.conf
upstream litefs {
    server 127.0.0.1:9090;
}

server {
    listen 80;
    server_name example.com;

    location /health {
        proxy_pass http://litefs/health;
        access_log off;
    }

    location / {
        proxy_pass http://litefs;
    }
}

最佳实践

1. 检查函数应该快速

健康检查函数应该在几秒钟内完成,避免超时。

def check_database():
    """好的实践:设置超时"""
    try:
        import sqlite3
        conn = sqlite3.connect('database.db', timeout=5)
        conn.execute('SELECT 1')
        conn.close()
        return True
    except Exception:
        return False

2. 检查函数应该是幂等的

多次调用检查函数应该返回相同的结果。

def check_database():
    """好的实践:不改变状态"""
    try:
        import sqlite3
        conn = sqlite3.connect('database.db', timeout=5)
        conn.execute('SELECT 1')  # 只读操作
        conn.close()
        return True
    except Exception:
        return False

3. 区分健康检查和就绪检查

  • 健康检查:检查服务是否正常运行

  • 就绪检查:检查服务是否准备好处理请求

app.add_health_check('database', check_database_connection)
app.add_ready_check('migrations', check_database_migrations)

4. 提供有意义的检查名称

使用描述性的名称,便于问题排查。

app.add_health_check('database_primary', check_primary_db)
app.add_health_check('database_replica', check_replica_db)
app.add_health_check('cache_redis', check_redis_cache)

故障排查

健康检查返回 503

  1. 检查检查函数是否正确实现

  2. 检查检查函数是否抛出异常

  3. 检查依赖服务是否正常运行

  4. 查看日志中的错误信息

就绪检查返回 503

  1. 检查就绪检查函数是否正确实现

  2. 检查初始化过程是否完成

  3. 检查配置是否正确加载

  4. 查看日志中的错误信息

检查超时

  1. 优化检查函数,减少执行时间

  2. 为外部调用设置合理的超时

  3. 考虑使用异步检查

  4. 缓存检查结果

测试

健康检查功能包含完整的单元测试:

python tests/unit/test_health_check.py

测试覆盖:

  • ✅ 默认初始化

  • ✅ 自定义路径初始化

  • ✅ 添加健康检查

  • ✅ 添加就绪检查

  • ✅ 非健康检查端点请求

  • ✅ 非 GET 方法请求

  • ✅ 所有检查通过

  • ✅ 部分检查失败

  • ✅ 检查抛出异常

  • ✅ 没有检查时的响应

示例代码

完整的健康检查示例:

#!/usr/bin/env python
# coding: utf-8

import sys
import os

sys.path.insert(0, os.path.join(os.path.dirname(__file__), '../src'))

from litefs import Litefs
from litefs.middleware import (
    CORSMiddleware,
    LoggingMiddleware,
    SecurityMiddleware,
    HealthCheck,
)


def check_database():
    """检查数据库连接"""
    try:
        import sqlite3
        conn = sqlite3.connect('database.db', timeout=5)
        conn.execute('SELECT 1')
        conn.close()
        return True
    except Exception:
        return False


def check_cache():
    """检查缓存服务"""
    return True


def check_disk_space():
    """检查磁盘空间"""
    import shutil
    total, used, free = shutil.disk_usage('.')
    return free > 1024 * 1024 * 1024  # 至少 1GB 可用空间


def check_external_api():
    """检查外部 API"""
    return True


def check_migrations():
    """检查数据库迁移"""
    return True


def main():
    """启动服务器"""
    app = Litefs(webroot='./examples/basic/site', debug=True)
    
    app.add_middleware(LoggingMiddleware)
    app.add_middleware(SecurityMiddleware)
    app.add_middleware(CORSMiddleware)
    app.add_middleware(HealthCheck, path='/health', ready_path='/health/ready')
    
    app.add_health_check('database', check_database)
    app.add_health_check('cache', check_cache)
    app.add_health_check('disk_space', check_disk_space)
    app.add_health_check('external_api', check_external_api)
    
    app.add_ready_check('migrations', check_migrations)
    
    print("Starting Litefs server with health checks...")
    print("Health check endpoint: http://localhost:9090/health")
    print("Ready check endpoint: http://localhost:9090/health/ready")
    
    app.run()


if __name__ == '__main__':
    main()

相关文档