零侵入分析 AI 工作负载,SQL 驱动的分析能力
{% else %}Zero-intrusion profiling for AI workloads with SQL-powered analytics
{% endif %}无需修改代码即可附加到运行中的进程。无需插桩、无需重启、不中断工作流。
{% else %}Attach to running processes without code changes. No instrumentation, no restarts, no workflow disruption.
{% endif %}使用标准 SQL 查询性能数据。用熟悉的语法分析 torch 追踪、内存使用等。
{% else %}Query performance data with standard SQL. Use familiar syntax to analyze torch traces, memory usage, and more.
{% endif %}直接在目标进程中运行 Python 代码。检查变量、修改状态、实时调试。
{% else %}Run Python code directly in target processes. Inspect variables, modify state, and debug in real-time.
{% endif %}捕获带有变量值的执行堆栈。精确了解代码在任何时刻的执行状态。
{% else %}Capture execution stacks with variable values. Understand exactly what your code is doing at any moment.
{% endif %}监控跨多节点的进程。通过跨节点关联调试分布式训练问题。
{% else %}Monitor processes across multiple nodes. Debug distributed training issues with cross-node correlation.
{% endif %}典型工作负载下性能影响小于 5%。生产级效率,适用于真实部署。
{% else %}Less than 5% performance impact in typical workloads. Production-grade efficiency for real deployments.
{% endif %}pip install probing
# {% if config.theme.language == "zh" %}查找进程{% else %}Find your process{% endif %}
pgrep -f "python.*train"
# {% if config.theme.language == "zh" %}注入探针{% else %}Inject probes{% endif %}
probing -t <pid> inject
# {% if config.theme.language == "zh" %}查询性能数据{% else %}Query performance data{% endif %}
probing -t <pid> query "SELECT * FROM python.torch_trace LIMIT 10"
# {% if config.theme.language == "zh" %}在进程中执行代码{% else %}Execute code in process{% endif %}
probing -t <pid> eval "print(torch.cuda.memory_allocated())"
# {% if config.theme.language == "zh" %}捕获堆栈跟踪{% else %}Capture stack trace{% endif %}
probing -t <pid> backtrace
使用熟悉的 SQL 语法查询您的 AI 工作负载数据。使用标准数据库操作 分析 PyTorch 追踪、内存使用和自定义指标。
基于 Apache DataFusion 构建的高性能分析引擎,支持 聚合、窗口函数和复杂连接。
了解更多 → {% else %}Query your AI workload data with familiar SQL syntax. Analyze PyTorch traces, memory usage, and custom metrics using standard database operations.
Built on Apache DataFusion for high-performance analytics with support for aggregations, window functions, and complex joins.
Learn More → {% endif %}-- {% if config.theme.language == "zh" %}查找最慢操作{% else %}Find slowest operations{% endif %}
SELECT
module,
AVG(duration) as avg_time,
MAX(allocated) as peak_memory
FROM python.torch_trace
WHERE step > 100
GROUP BY module
ORDER BY avg_time DESC
LIMIT 10;
在运行中的进程中执行任意 Python 代码而无需停止它们。 检查变量、查看 GPU 内存、实时修改行为。
交互式 REPL 模式用于探索性调试。Tab 补全和 完全访问进程上下文。
了解更多 → {% else %}Execute arbitrary Python code in running processes without stopping them. Inspect variables, check GPU memory, and modify behavior on the fly.
Interactive REPL mode for exploratory debugging. Tab completion and full access to the process context.
Learn More → {% endif %}# {% if config.theme.language == "zh" %}检查 GPU 内存{% else %}Check GPU memory{% endif %}
probing -t <pid> eval "
import torch
print(f'Allocated: {torch.cuda.memory_allocated()/1024**3:.2f} GB')
print(f'Cached: {torch.cuda.memory_reserved()/1024**3:.2f} GB')"
# {% if config.theme.language == "zh" %}交互式 REPL{% else %}Interactive REPL{% endif %}
probing -t <pid> repl
调试训练不稳定、卡住和性能下降问题。实时了解训练为何发散或停止。
{% else %}Debug training instabilities, hangs, and performance regressions. Real-time insight into why training diverges or stops.
{% endif %}追踪训练步骤中的 GPU/CPU 内存使用。检测内存泄漏并优化内存效率。
{% else %}Track GPU/CPU memory usage across training steps. Detect memory leaks and optimize memory efficiency.
{% endif %}识别前向/反向传播中的瓶颈。找到慢操作并优化模型性能。
{% else %}Identify bottlenecks in forward/backward passes. Find slow operations and optimize model performance.
{% endif %}无需重启即可监控 AI 服务。收集自定义指标并实时调试生产问题。
{% else %}Monitor AI services without restarts. Collect custom metrics and debug production issues live.
{% endif %}Probing 是开源的,欢迎参与!
{% else %}Probing is open source. Get involved!
{% endif %}