Metadata-Version: 2.4
Name: inferyx-monitoring
Version: 1.0.15
Summary: Monitor batch pipelines via API and email alerts — install and deploy on Linux servers from PyPI
Author-email: Inferyx DevOps <devops@inferyx.com>
License: Copyright (c) 2026 Inferyx. All rights reserved.
        
        Proprietary software. Unauthorized copying, distribution, or use is prohibited
        unless agreed in writing with Inferyx.
        
        For open-source release, replace this file with your chosen license (e.g. MIT, Apache-2.0)
        before publishing to public PyPI.
        
Keywords: batch,monitoring,pipeline,email,inferyx
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: System Administrators
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: Topic :: System :: Monitoring
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=2.0.0
Requires-Dist: requests>=2.31.0
Requires-Dist: python-dateutil>=2.8.2
Provides-Extra: dev
Requires-Dist: build; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: check-manifest; extra == "dev"
Dynamic: license-file

# inferyx-monitoring — Server deployment

Monitor batch jobs from CSV, poll status via API, and send email alerts for failures, missed runs, long-running jobs, and missing API data.

**Package:** `inferyx-monitoring`  
**CLI:** `inferyx-monitoring`

---

## What's new in 1.0.14

- **`PIPELINE_CHECK_MODE=schedule_windows`** — Option B: check each batch only in two short windows per schedule slot (start + end), not all day.
- **`PIPELINE_CHECK_WINDOW_MINUTES`** — how long each check window stays open (default `10` minutes).
- **`Status` column** — `Active` monitors; `Suspended` skips the batch.
- **Timezone fix** — API timestamps with timezone no longer cause compare errors.
- **CSV repair** — auto-splits glued rows like `Activebatch_other` or `Statusbatch_other`.
- **API URL fix** — always sends correct `name=` per batch; empty API body treated as no records.
- **`PIPELINE_API_FILTER_BY_SCHEDULE_DATE=false`** — recommended for Inferyx API (default).

### Check modes

| `PIPELINE_CHECK_MODE` | Behavior |
|-----------------------|----------|
| `schedule_windows` | **Recommended.** START window at `ExpectedStartTime + grace`; END window at `ExpectedStartTime + AvgExecutionTime + grace`. No API calls outside those windows. |
| `full_window` | Legacy: poll from expected start until end of day. |

Example — `Daily, 7:00, "10 mins", Active`, grace `5`, window `10`:

| Window | Time | Alerts |
|--------|------|--------|
| START | 7:05 – 7:15 | missed, no_data, failed |
| END | 7:15 – 7:25 | long-running, failed |

---

## Paths

| Item | Path |
|------|------|
| Install directory | `/opt/pipeline-monitor` |
| Config | `/opt/pipeline-monitor/.env` |
| Batch list | `/opt/pipeline-monitor/jfl_batch.csv` |
| Log file | `/opt/pipeline-monitor/pipeline_script.log` |
| Python venv | `/opt/pipeline-monitor/.venv` |
| Service user | `inferyx` |
| systemd unit | `inferyx-monitoring.service` |

---

## 1. Install

```bash
sudo apt update
sudo apt install -y python3 python3-venv python3-pip
```

**Service user** — check first; create only if missing:

```bash
id inferyx
```

If the command returns `no such user`, create the user:

```bash
sudo useradd --system --home-dir /opt/pipeline-monitor --shell /usr/sbin/nologin inferyx
```

If the user already exists, skip the command above and continue.

```bash
sudo mkdir -p /opt/pipeline-monitor
sudo chown inferyx:inferyx /opt/pipeline-monitor

sudo -u inferyx python3 -m venv /opt/pipeline-monitor/.venv
sudo -u inferyx /opt/pipeline-monitor/.venv/bin/pip install --upgrade pip inferyx-monitoring
```

---

## 2. Create config files (first time)

```bash
cd /opt/pipeline-monitor
sudo -u inferyx /opt/pipeline-monitor/.venv/bin/inferyx-monitoring --init-config --work-dir /opt/pipeline-monitor
```

Creates `.env` and `jfl_batch.csv` **only if missing**.  
`pip install --upgrade` **never** overwrites your live `.env` or `jfl_batch.csv`.

---

## 3. Configure `.env`

```bash
sudo -u inferyx vi /opt/pipeline-monitor/.env
sudo chmod 600 /opt/pipeline-monitor/.env
```

### Required

| Variable | Description |
|----------|-------------|
| `PIPELINE_SMTP_HOST` | SMTP server (e.g. `smtp.office365.com`) |
| `PIPELINE_SMTP_PORT` | SMTP port (usually `587`) |
| `PIPELINE_SMTP_USERNAME` | SMTP login / sender email |
| `PIPELINE_FROM_NAME` | Display name in From header |
| `PIPELINE_SMTP_PASSWORD` | SMTP password |
| `PIPELINE_MAIL_TO` | Primary alert recipients (comma-separated) |
| `PIPELINE_API_BASE_URL` | API path only — **no** `name=` in URL |
| `PIPELINE_API_TOKEN` | API authentication token |
| `PIPELINE_API_TOKEN_HEADER` | Header name for token (usually `token`) |
| `PIPELINE_DEVOPS_EMAIL` | Recipient for `no_data` and script-failure alerts |

### API example

```env
PIPELINE_API_BASE_URL=http://your-host:8080/framework/metadata/getBaseEntityStatusByCriteria
PIPELINE_API_TOKEN=your_token_here
PIPELINE_API_TOKEN_HEADER=token
PIPELINE_API_FILTER_BY_SCHEDULE_DATE=false
```

- Use the **base path only**. The monitor adds `name=<batch>` for each row in `jfl_batch.csv`.
- Keep `PIPELINE_API_FILTER_BY_SCHEDULE_DATE=false` unless your API supports `startDate` / `endDate`.

### Email templates (optional)

| Variable | Purpose |
|----------|---------|
| `PIPELINE_MAIL_CC` | CC recipients |
| `PIPELINE_MAIL_SUBJECT_*` | Subject per alert: `DEFAULT`, `NO_DATA`, `FAILED`, `RUNNING`, `MISSED` |
| `PIPELINE_MAIL_BODY_*` | HTML body per alert (same keys as subjects) |
| `PIPELINE_MAIL_SIGNATURE` | HTML footer on every alert |

Placeholders: `{batch_name}`, `{issue_type}`, `{issue_type_upper}`, `{status}`, `{error}`, `{error_line}`, `{expected_start_time}`, `{expected_end_time}`, `{frequency}`, `{avg_time}`, `{current_time}`, `{signature}`

### Scheduling (optional)

| Variable | Default | Meaning |
|----------|---------|---------|
| `PIPELINE_CHECK_MODE` | `full_window` | `schedule_windows` = check only at start/end windows; `full_window` = poll all day |
| `PIPELINE_CHECK_WINDOW_MINUTES` | `10` | Minutes each start/end check window stays open |
| `PIPELINE_CHECK_INTERVAL` | `60` | Seconds between service poll cycles |
| `PIPELINE_SCHEDULE_GRACE_MINUTES` | `5` | Minutes after expected start/end before alerts |
| `PIPELINE_POST_RUN_GRACE_MINUTES` | `60` | Used only when `CHECK_MODE=full_window` |
| `PIPELINE_ALERT_COOLDOWN_MINUTES` | `60` | Cooldown between repeat alerts |
| `PIPELINE_FAILED_ALERT_ONCE_PER_DAY` | `true` | One failed alert per day per batch |
| `PIPELINE_ALERT_ONCE_PER_DAY_ALL_SCENARIOS` | `true` | One alert per issue type per day |

Recommended production settings:

```env
PIPELINE_CHECK_MODE=schedule_windows
PIPELINE_CHECK_WINDOW_MINUTES=10
PIPELINE_SCHEDULE_GRACE_MINUTES=5
```

---

## 4. Configure `jfl_batch.csv`

```bash
sudo -u inferyx vi /opt/pipeline-monitor/jfl_batch.csv
```

**Rules:**

- **One batch per line** — each row must end with `Active` or `Suspended` on its own line.
- Do not glue rows (bad: `Activebatch_other` or `Statusbatch_other`).
- Quote values with commas or spaces: `"09:00,10:00"`, `"3 mins"`.
- Use 24-hour time: `0:30:00`, `16:30:00`.

| Column | Required | Description |
|--------|----------|-------------|
| `Name` | Yes | Batch name as in the API |
| `Frequency` | Yes | `Daily`, `Hourly`, `Monthly`, weekday name, etc. |
| `ExpectedStartTime` | Scheduled | `9:30:00` or `"09:00,10:00,11:00"` |
| `AvgExecutionTime` | Recommended | `"3 mins"`, `"10 mins"`, `"1 Hr"` |
| `ExpectedDayOfMonth` | Monthly | Day 1–31 |
| `Status` | Optional | `Active` or `Suspended` (default: Active) |

### Example

```csv
Name,Frequency,ExpectedStartTime,AvgExecutionTime,ExpectedDayOfMonth,Status
batch_appdb_events,Daily,"09:00,10:00,11:00,12:00","3 mins",,Active
batch_appsflyer,Daily,0:30:00,"10 mins",,Active
batch_appdb_jiosense_bronze_silver,Daily,"01:00,08:00,15:00,21:00","12 mins",,Active
dashboard_batch_my_finances,Daily,5:45:00,"10 mins",,Active
dashboard_batch_ppl_rewards,Daily,6:50:00,"4 mins",,Active
batch_ignosis_part1,Daily,11:00:00,"80 mins",,Active
dashboard_batch_agentic,Daily,8:40:00,"10 mins",,Suspended
```

---

## 5. Test

Run as **one single line** (do not use `\` line breaks — they cause `unrecognized arguments` errors):

```bash
sudo -u inferyx /opt/pipeline-monitor/.venv/bin/inferyx-monitoring --once --work-dir /opt/pipeline-monitor --env-file /opt/pipeline-monitor/.env --csv-file /opt/pipeline-monitor/jfl_batch.csv
```

```bash
tail -100 /opt/pipeline-monitor/pipeline_script.log
```

---

## 6. Start service (systemd)

```bash
sudo tee /etc/systemd/system/inferyx-monitoring.service <<'EOF'
[Unit]
Description=Inferyx Pipeline Batch Monitor
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=inferyx
Group=inferyx
WorkingDirectory=/opt/pipeline-monitor
ExecStart=/opt/pipeline-monitor/.venv/bin/inferyx-monitoring --work-dir /opt/pipeline-monitor --env-file /opt/pipeline-monitor/.env --csv-file /opt/pipeline-monitor/jfl_batch.csv
Restart=always
RestartSec=10
Environment=PYTHONUNBUFFERED=1
Environment=PIPELINE_ENV_FILE=/opt/pipeline-monitor/.env
Environment=PIPELINE_LOG_FILE=/opt/pipeline-monitor/pipeline_script.log
Environment=PIPELINE_CSV_FILE=/opt/pipeline-monitor/jfl_batch.csv

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now inferyx-monitoring.service
sudo systemctl status inferyx-monitoring.service
```

Logs:

```bash
sudo journalctl -u inferyx-monitoring.service -f
```

---

## 7. Upgrade package

When a newer version is available on PyPI:

```bash
sudo -u inferyx /opt/pipeline-monitor/.venv/bin/pip install --upgrade inferyx-monitoring
sudo systemctl restart inferyx-monitoring.service
```

Your `.env` and `jfl_batch.csv` are not changed by upgrade.

---

## Troubleshooting

| Symptom | Fix |
|---------|-----|
| `unrecognized arguments:` | Run the test command as **one line** — no `\` at end of lines |
| API error / empty response | `PIPELINE_API_FILTER_BY_SCHEDULE_DATE=false`; remove `name=` from `PIPELINE_API_BASE_URL` |
| Wrong batch in API URL | Base URL must not contain a fixed batch name |
| CSV errors / wrong batches | One batch per line; fix glued rows like `Activebatch_...` |
| No email | Check SMTP settings in `.env` |
| Skip a batch | `Status=Suspended` in CSV |
