← Back to Dashboard

📊 Overview

IPMI Monitor is a web-based tool for monitoring server hardware via IPMI (Intelligent Platform Management Interface) and Redfish APIs. It provides real-time visibility into your server fleet's health.

What It Monitors

  • System Event Log (SEL) - Hardware events, errors, warnings
  • Sensor Readings - Temperature, voltage, fan speed, power consumption
  • Hardware Inventory - CPU, memory, storage, GPU information
  • Connectivity Status - BMC and OS reachability
  • Power State - On/off status with remote control
💡 Tip IPMI Monitor works with any server that has an IPMI-compliant BMC (Baseboard Management Controller), including Dell iDRAC, HP iLO, Supermicro, ASUS, and Lenovo servers.

🚀 Quick Start

1. Add Your First Server

  1. Go to Settings → Manage Servers
  2. Click ➕ Add New Server
  3. Enter the BMC IP address (e.g., 192.168.1.100)
  4. Give it a friendly name (e.g., server-01)
  5. Click Add Server

2. Configure IPMI Credentials

If your servers use custom IPMI credentials:

  1. Click the server in the list to edit
  2. Enter the IPMI username and password
  3. Click 🔌 Test BMC to verify
  4. Save changes

3. View Server Health

Return to the Dashboard to see your servers. Click any server card to view detailed events, sensors, and inventory.

🎯 Key Concepts

BMC (Baseboard Management Controller)

A dedicated processor on the server motherboard that operates independently of the main CPU. It allows remote monitoring and management even when the server is powered off or the OS has crashed.

IPMI vs Redfish

IPMI Redfish
Legacy protocol (port 623) Modern REST API (HTTPS port 443)
Widely supported More detailed information
Binary protocol JSON responses
💡 Recommendation Use Auto protocol mode - IPMI Monitor will try Redfish first for more detailed data, then fall back to IPMI.

BMC IP vs OS IP

  • BMC IP - The management network IP (often ends in .0, e.g., 192.168.1.100)
  • OS IP - The server's main network IP where the OS runs (often .1, e.g., 192.168.1.101)

📱 Dashboard

The main dashboard shows all monitored servers in a grid view.

Server Cards

Each card displays:

  • Server Name and BMC IP
  • Status Badge: 🟢 Online, 🔴 Offline, 🟡 Warning
  • Event Count: Recent events in last 24 hours
  • Temperature: Current CPU/inlet temperature

Auto-Refresh

Data refreshes automatically every 60 seconds. Event collection runs every 5 minutes by default (configurable via POLL_INTERVAL).

🖥️ Server Details

Click any server card to view detailed information across three tabs.

Events Tab

Shows System Event Log (SEL) entries with:

  • Timestamp - When the event occurred
  • Severity - Critical (🔴), Warning (🟡), Info (🔵)
  • Description - Event message from BMC

Event Actions

  • Clear DB Events - Remove from IPMI Monitor only (BMC unaffected)
  • Clear BMC SEL - Clear actual BMC log (⚠️ Admin only, use carefully)

Sensors Tab

Real-time sensor readings including:

  • Temperature sensors (CPU, inlet, exhaust, DIMMs)
  • Voltage sensors (3.3V, 5V, 12V, battery)
  • Fan speeds (RPM)
  • Power consumption (Watts)

Inventory Tab

Hardware information collected via IPMI FRU, Redfish, and SSH:

  • System manufacturer, model, serial number
  • CPU model, core count
  • Memory total, slots used
  • Storage devices with sizes
  • GPU information (if present)
💡 For detailed inventory Enable SSH in Settings → SSH tab. This allows collection of exact CPU model, memory configuration, and storage details.

📋 Events & Logs

Common Event Types

Event Meaning Action
Correctable ECC Error Memory error detected and corrected Monitor frequency; replace DIMM if recurring
Uncorrectable ECC Error Memory error that couldn't be fixed Replace DIMM immediately
Temperature Threshold Component exceeded temperature limit Check cooling, clean dust, verify airflow
Fan Failure Fan stopped or below speed threshold Replace fan ASAP to prevent overheating
Power Supply Failure PSU issue detected Check/replace PSU, verify redundancy

🌡️ Sensor Readings

Temperature Guidelines

Sensor Normal Warning Critical
CPU Temperature < 70°C 70-85°C > 85°C
Inlet Temperature < 30°C 30-40°C > 40°C
DIMM Temperature < 60°C 60-75°C > 75°C

Voltage Guidelines

Rail Normal Range
3.3V 3.1V - 3.5V
5V 4.75V - 5.25V
12V 11.4V - 12.6V
VBAT (Backup Battery) 2.8V - 3.3V
⚠️ Low VBAT Warning If VBAT drops below 2.5V, the CMOS battery needs replacement. This can cause BIOS settings to reset.

🔧 Hardware Inventory

Data Sources

Source Data Collected Requirements
IPMI FRU Manufacturer, model, serial, board info IPMI access
IPMI SDR Sensor list, CPU/DIMM counts IPMI access
Redfish API Detailed CPU, memory, storage, GPU Redfish-enabled BMC
SSH to OS Exact CPU model, memory config, drives SSH enabled + credentials

Collecting Inventory

Inventory is collected automatically during setup. To refresh:

  1. Go to Server Detail → Inventory tab
  2. Click 📦 Collect Inventory
  3. Wait for collection to complete

For bulk collection: Settings → Manage Servers → 📦 Collect All Inventory

⚙️ Manage Servers

Adding Servers

Go to Settings → Manage Servers → Add New Server:

  • BMC IP - The IPMI management IP
  • Server Name - A friendly name for identification
  • OS IP - Optional, for SSH inventory collection
  • Protocol - Auto (recommended), IPMI only, or Redfish only

Editing Servers

Click any server in the list to open the edit dialog:

  • Change name, IPs, protocol
  • Set custom IPMI credentials
  • Configure SSH credentials
  • Test BMC - Verify IPMI connection
  • Test SSH - Verify SSH connection
  • Check Redfish - Test Redfish availability

Bulk Import

Import servers from a YAML/JSON file. Mount your config file to /app/config/servers.yaml

# servers.yaml example
servers:
  - name: server-01
    bmc_ip: 192.168.1.100
    server_ip: 192.168.1.101
  - name: server-02
    bmc_ip: 192.168.1.102
    server_ip: 192.168.1.103

🔐 SSH Configuration

SSH enables detailed inventory collection from the server's OS.

Enable SSH

  1. Go to Settings → SSH tab
  2. Toggle Enable SSH to OS
  3. Configure default credentials

SSH Key Management

Store SSH keys centrally and assign them to servers:

  1. Click ➕ Add New Key
  2. Give it a name (e.g., "Production Key")
  3. Paste the private key content
  4. Use the dropdown in server edit to assign
💡 Key Format Keys should be in OpenSSH format, starting with -----BEGIN OPENSSH PRIVATE KEY-----

Per-Server Overrides

Each server can have custom SSH settings different from the defaults:

  • Custom OS IP (if different from BMC IP pattern)
  • Custom username
  • Different SSH key
  • Custom port

🔔 Alerts & Rules

Alert Rules

Pre-configured rules watch for:

  • Temperature exceeding thresholds
  • Fan speed below minimum
  • ECC memory errors
  • Power supply issues
  • Critical BMC events

Creating Custom Rules

  1. Go to Settings → Alerts
  2. Click Add Rule
  3. Select alert type and condition
  4. Set threshold and severity
  5. Enable notification channels

Cooldown

Each rule has a cooldown period to prevent alert spam. Default is 5-15 minutes depending on severity.

📬 Notifications

Telegram Setup

  1. Message @BotFather on Telegram
  2. Create a new bot with /newbot
  3. Copy the bot token
  4. Get your chat ID (message @userinfobot)
  5. Paste both in Settings → Notifications → Telegram
  6. Click Test to verify

Email Setup

Configure SMTP settings for email notifications. Works with Gmail, SendGrid, or any SMTP server.

Webhook

Send alerts to Slack, Discord, or custom endpoints. Webhooks receive JSON payloads with alert details.

🛡️ Security & Users

User Roles

Role Permissions
Admin Full access: manage users, security, AI features, power control
Read-Write Manage servers, run power commands, but not user management
Read-Only View only - no changes allowed

Anonymous Access

Enable to allow viewing the dashboard without login. Anonymous users get read-only access.

⚠️ Security Note Only enable anonymous access on trusted networks. Sensitive information like server names and events will be visible.

📊 Prometheus & Grafana Integration

IPMI Monitor provides a built-in Prometheus exporter for integration with your existing monitoring stack.

Prometheus Metrics Endpoint

Metrics are exposed at /metrics in Prometheus text format:

http://your-ipmi-monitor:5000/metrics

Available Metrics

MetricTypeDescription
ipmi_server_reachableGaugeWhether BMC is reachable (1=yes, 0=no)
ipmi_server_power_onGaugeServer power state (1=on, 0=off)
ipmi_temperature_celsiusGaugeTemperature readings per sensor
ipmi_fan_speed_rpmGaugeFan speed readings
ipmi_voltage_voltsGaugeVoltage sensor readings
ipmi_power_wattsGaugePower consumption
ipmi_events_totalGaugeTotal events collected per server
ipmi_events_critical_24hGaugeCritical events in last 24h
ipmi_events_warning_24hGaugeWarning events in last 24h
ipmi_total_serversGaugeTotal monitored servers
ipmi_reachable_serversGaugeNumber of reachable servers
ipmi_alerts_totalGaugeTotal fired alerts
ipmi_alerts_unacknowledgedGaugeUnacknowledged alerts
ipmi_last_collection_timestampGaugeUnix timestamp of last collection

Prometheus Configuration

Add this to your prometheus.yml:

scrape_configs:
  - job_name: 'ipmi-monitor'
    static_configs:
      - targets: ['ipmi-monitor:5000']
    scrape_interval: 60s
    scrape_timeout: 30s
    metrics_path: /metrics
💡 Target Configuration
  • ipmi-monitor:5000 - Docker network (container name)
  • localhost:5000 - Same host
  • 192.168.1.50:5000 - Remote IP

Pre-built Grafana Dashboard

We provide a ready-to-import Grafana dashboard with:

  • Fleet Overview - Total servers, reachable count, alerts
  • Server Health - Per-server temperature, power, events
  • Event Timeline - Critical/warning events over time
  • Temperature Heatmap - Temperature trends across fleet
  • Alert History - Alert counts and status

Import Dashboard

  1. Go to Grafana → Dashboards → Import
  2. Download the dashboard JSON from:
    github.com/cryptolabsza/ipmi-monitor/grafana/dashboards/ipmi-monitor.json
  3. Upload or paste the JSON
  4. Select your Prometheus data source
  5. Click Import

Example Grafana Alerts

Create Grafana alerts based on IPMI Monitor metrics:

# High Temperature Alert
ipmi_temperature_celsius{sensor=~"CPU.*"} > 80

# Server Unreachable
ipmi_server_reachable == 0

# Critical Events Spike
increase(ipmi_events_critical_24h[1h]) > 5

# Multiple Servers Down
count(ipmi_server_reachable == 0) > 2
💡 Scrape Interval Note Scraping /metrics reads cached data from the last collection cycle (default: every 5 minutes). Faster scrape intervals won't give you fresher data - they'll just read the same values repeatedly.

🤖 AI Features

Premium AI features provide intelligent analysis of your server fleet.

Features Included

  • Fleet Health Summaries - Daily overview of all servers
  • Maintenance Tasks - AI-identified work items with priorities
  • Predictive Analytics - Failure predictions before they happen
  • Root Cause Analysis - Deep analysis of specific events
  • AI Chat - Interactive assistant for questions

Getting Started

  1. Go to Settings → AI Features
  2. Click Start Free Trial
  3. Sign up for a CryptoLabs account
  4. AI features activate automatically

Pricing

  • 1 month free trial
  • Then $100/month for up to 50 servers
  • +$15 per additional 10 servers

💬 AI Chat

Ask questions about your servers in natural language.

Example Questions

  • "Which servers have high temperatures?"
  • "Show me servers with ECC errors"
  • "What maintenance is needed this week?"
  • "Explain this error: [paste event]"
  • "How do I add a new server?"
  • "What does ECC mean?"

Tips for Better Responses

  • Be specific about which server if asking about one
  • Include time ranges when relevant ("in the last 24 hours")
  • Ask follow-up questions for more detail

🔧 Maintenance Tasks

AI analyzes events and sensors to generate maintenance work items.

Priority Levels

Priority Meaning Timeframe
Critical Immediate risk of outage Today
High Component degrading This week
Medium Needs attention Next maintenance window
Low Monitor and plan When convenient

Task Information

Each task includes:

  • Affected Servers - Specific server names
  • Component - What hardware needs attention
  • Reason - Why this task was generated
  • Suggested Action - What to do
  • Evidence - Supporting data from events/sensors

🔍 Troubleshooting

Server Shows Offline

  1. Verify BMC IP is reachable: ping 192.168.1.100
  2. Check IPMI credentials in server edit
  3. Use Test BMC button to diagnose
  4. Verify firewall allows port 623 (IPMI)
  5. Try accessing BMC web interface directly

SSH Test Fails

  • "Permission denied" - Wrong SSH key or password
  • "Connection refused" - SSH not running or wrong port
  • "No route to host" - Wrong IP or network issue
  • "error in libcrypto" - Key format issue, re-paste the key

Missing Inventory Data

  1. Enable SSH in Settings → SSH tab
  2. Configure SSH credentials for the server
  3. Click Collect Inventory
  4. Check SSH connectivity with Test SSH button

No Events Showing

  • Wait for collection cycle (default 5 minutes)
  • Verify server is enabled in settings
  • Some BMCs have empty SEL by default
  • Check BMC firmware supports SEL

📚 Glossary

TermDefinition
BMCBaseboard Management Controller - dedicated processor for server management
IPMIIntelligent Platform Management Interface - protocol for BMC communication
RedfishModern REST API alternative to IPMI
SELSystem Event Log - BMC's record of hardware events
FRUField Replaceable Unit - hardware inventory data
SDRSensor Data Record - sensor configuration data
ECCError Correcting Code - memory error detection/correction
DIMMDual Inline Memory Module - RAM stick
PSUPower Supply Unit
VBATBackup battery voltage (usually CR2032 for CMOS)
iDRACDell's BMC implementation
iLOHP's BMC implementation

🔌 API Reference

IPMI Monitor provides a REST API for integration.

Authentication

API endpoints require session authentication. Login via POST to /login.

Key Endpoints

GET  /api/servers           - List all servers
GET  /api/servers/managed   - List managed servers
GET  /api/server/{ip}/events - Get server events
GET  /api/server/{ip}/sensors - Get sensor readings
GET  /api/servers/{ip}/inventory - Get hardware inventory
POST /api/servers/{ip}/inventory - Collect inventory
GET  /api/auth/status       - Check auth status
POST /api/test/bmc          - Test BMC connection
POST /api/test/ssh          - Test SSH connection
📖 Full API Documentation For complete API docs, see the GitHub repository.