Your Prometheus stack fires the alert.
Wachd explains why.
You have already done the hard part: curated alert rules, set up Loki, wired Grafana dashboards. When those alerts fire, you still spend 40 minutes reconstructing what happened across commits, logs, and metrics. Wachd runs that reconstruction automatically and delivers the result before your on-call engineer opens a terminal.
Wachd does not replace your observability stack
Prometheus, Grafana, Loki, and Tempo are doing their jobs well. They collect the signals, store the data, and fire alerts when thresholds are crossed.
The gap is what happens after the alert fires. Your on-call engineer gets a notification that HighErrorRate firing: checkout-api and then opens four tabs — Grafana for metrics, Loki for logs, GitHub for recent commits, and a Confluence runbook that may or may not be current.
Wachd closes that gap. It receives the alert via webhook, queries your existing Prometheus and Loki endpoints, reads recent commits from GitHub, and returns a diagnosis before the page reaches anyone.
How your stack fits together with Wachd
Wachd reads from your existing endpoints. No agents, no schema changes, no new exporters.
| Tool | What it does today | What Wachd adds |
|---|---|---|
| Prometheus / Alertmanager | Fires the alert, routes it via webhook_config | Receives alert, queries Prometheus for metric context around alert time |
| Grafana Alerting | Fires the alert, routes to contact points | Receives alert via webhook contact point, HMAC validated |
| Loki | Stores logs | Pulls last 30 min of error logs for the affected service automatically |
| Tempo / Jaeger | Stores traces | Trace IDs in logs can be surfaced in the root-cause summary |
| Mimir / Thanos | Long-term metric storage | Queries metric history for baseline comparison around alert time |
| GitHub / GitLab | Stores code | Reads last N commits to the affected service repo — read-only |
Connect in under 5 minutes
Wachd receives alerts via standard webhook — the same mechanism Grafana and Alertmanager already support.
Grafana Alerting
- 1.Go to Alerting → Contact points → New contact point
- 2.Type: Webhook
- 3.URL: your Wachd webhook URL for the team
- 4.Optional: add the shared secret as an Authorization header
- 5.Add to a notification policy and you are done
Prometheus Alertmanager
- 1.Add a webhook_config receiver in alertmanager.yml
- 2.url: your Wachd webhook URL
- 3.Wachd accepts the standard Alertmanager webhook payload
- 4.No schema changes required
# alertmanager.yml — Prometheus
receivers:
- name: wachd
webhook_configs:
- url: 'https://wachd.company.internal/api/v1/webhook/<teamId>/<secret>'
# send_resolved: true # optional
What Wachd delivers when an alert fires
Every alert triggers automatic collection from your Prometheus, Loki, and GitHub endpoints. The result is delivered via SMS, voice, email, or Slack — with the analysis already attached.
- →A two-sentence probable cause in plain English
- →The most likely contributing signal — recent deploy, config change, dependency timeout, or resource exhaustion
- →The metric values before and after the anomaly window
- →The commit(s) that touched the affected service in the hours before the alert
- →A suggested action: rollback, investigate dependency, or escalate
- →A link to the most similar past incident if one exists in your team's history
PII is stripped before the AI sees anything. Runs with Ollama in-cluster (air-gapped), or Claude, OpenAI, Gemini — configured in one line in values.yaml.
Built for teams who already run their own stack
Kubernetes operators with curated alert rules
You have already invested in Cilium, cert-manager, Tempo, and Mimir alerts. The alert quality is good — the problem is triage speed after they fire. Wachd adds the explanation layer without changing your alert configuration.
Teams running Prometheus + Grafana + Loki together
The three-tool combination gives you metrics, dashboards, and logs. Wachd reads from all three simultaneously when an alert fires and correlates the result into a single diagnosis. No need to open each tool separately.
Self-hosted and air-gapped environments
Wachd runs entirely inside your cluster. With Ollama enabled, there are zero outbound API calls. All incident data, AI analysis, and on-call history stays inside your Kubernetes namespace.
Add root-cause analysis to your existing Prometheus stack
Keep your Prometheus, Grafana, Loki, and Tempo setup exactly as it is. Add Wachd as the layer that explains alerts — deploys in under 30 minutes, Apache 2.0, no account required.
Questions? sales@wachd.io or Discord.