Monitoring
Dagy provides health checks, notification channels, and alert rules for monitoring your pipelines in production.
Health Checks
The API exposes health check endpoints that verify connectivity to all infrastructure components:
curl https://api.dagy.io/health
Response:
{
"status": "healthy",
"version": "1.0.0",
"components": [
{"name": "dynamodb", "status": "healthy", "message": "Connected"},
{"name": "s3", "status": "healthy", "message": "Bucket configured"},
{"name": "sqs", "status": "healthy", "message": "Queue configured"}
],
"timestamp": "2026-01-15T10:30:00Z"
}
Component Status Values
| Status | Meaning |
|---|---|
healthy | Component is operational |
degraded | Component is accessible but misconfigured (e.g., no table or bucket configured) |
unhealthy | Component is unreachable or erroring |
The overall status is healthy only if all components are healthy. Any degraded component downgrades the overall status to degraded, and any unhealthy component sets overall status to unhealthy.
Checked Components
| Component | What It Checks |
|---|---|
| DynamoDB | Flows table is configured and accessible |
| S3 | Artifact bucket is configured |
| SQS | Events queue URL is configured |
Notification Channels
Dagy supports four notification channel types for delivering alerts:
| Channel Type | Description |
|---|---|
slack | Posts messages to a Slack webhook URL |
email | Sends email notifications |
webhook | Sends HTTP POST to a custom URL |
pagerduty | Creates PagerDuty incidents |
Managing Channels
Create a channel:
curl -X POST https://api.dagy.io/notifications/channels \
-H "Authorization: Bearer <token>" \
-d '{
"channel_type": "slack",
"name": "Pipeline Alerts",
"config": {"webhook_url": "https://hooks.slack.com/services/..."}
}'
List channels:
curl https://api.dagy.io/notifications/channels \
-H "Authorization: Bearer <token>"
Channels are scoped to the organization. See the Notifications API Reference for full endpoint documentation.
Alert Rules
Alert rules define when to send notifications and through which channels.
Trigger Types
| Trigger | Description |
|---|---|
on_failure | Fire when a flow run fails |
on_success | Fire when a flow run succeeds |
on_sla_breach | Fire when a run exceeds the sla_seconds threshold |
on_retry | Fire when a task retries |
Creating Alert Rules
curl -X POST https://api.dagy.io/notifications/rules \
-H "Authorization: Bearer <token>" \
-d '{
"name": "ETL Failure Alert",
"trigger": "on_failure",
"flow_name": "daily_etl",
"channel_ids": ["ch_abc123"],
"sla_seconds": null
}'
SLA Monitoring
Set trigger: "on_sla_breach" with a sla_seconds value to alert when a run exceeds the expected duration:
curl -X POST https://api.dagy.io/notifications/rules \
-H "Authorization: Bearer <token>" \
-d '{
"name": "ETL SLA Breach",
"trigger": "on_sla_breach",
"flow_name": "daily_etl",
"channel_ids": ["ch_abc123"],
"sla_seconds": 1800
}'
Audit Logging
All mutation operations are recorded in the audit log with before/after snapshots:
{
"org_id": "org_123",
"event_time": "2026-01-15T10:30:00Z#abc123",
"resource_type": "flow",
"resource_id": "daily_etl",
"action": "deploy",
"actor_email": "user@example.com",
"before_json": null,
"after_json": {"version": "3", "deployment": "prod"},
"ip_address": "203.0.113.1"
}
Query the audit trail:
curl "https://api.dagy.io/audit-logs?resource_type=flow&limit=50" \
-H "Authorization: Bearer <token>"
Requires admin.audit permission. See Audit Logs API Reference for full details.
Local Run Monitoring
When using run_local(), Dagy records run events and metadata locally:
| Location | Content |
|---|---|
~/.dagy/runs/<run_id>/run.log | Timestamped run log |
~/.dagy/runs/<run_id>/metadata.json | Run and task metadata |
~/.dagy/runs/<run_id>/task_runs/<task_id>/ | Per-task logs and metadata |
~/.dagy/dagy.duckdb | Historical run data |
The local event recorder tracks task start/end, duration, status, and stdout/stderr capture for each attempt.