Components
Dagy is composed of several distinct components working together. This document describes each component's responsibilities, interfaces, and implementation.
Control Plane
API Lambda
The central component serving all REST endpoints. Built with FastAPI and deployed to AWS Lambda via the Mangum adapter.
Responsibilities:
- Serve 114 REST API endpoints
- Authenticate requests (access tokens, API keys, JWT)
- Enforce RBAC permissions on all endpoints
- Apply rate limiting (token-bucket, 120 req/min)
- Audit-log all mutations
- Route events to SQS for async processing
Key files:
src/dagy_api/app.py: FastAPI application and all endpoint definitionssrc/dagy_api/auth.py: Authentication and RBAC logicsrc/dagy_api/rate_limit.py: Token-bucket rate limiter middlewaresrc/dagy_api/models.py: Pydantic request/response modelssrc/dagy_api/config.py: Settings from environment variables
Event Processor (SQS Consumer)
Processes asynchronous events from the SQS queue. Runs as the same Lambda function but triggered by SQS instead of API Gateway.
Responsibilities:
- Execute flow runs via the selected backend
- Process scheduler ticks (find due schedules, trigger runs)
- Poll async backend status (Step Functions, ECS)
- Update run status and trigger notifications
Key files:
src/dagy_api/events.py: Event routing and processing logicsrc/dagy_api/event_bus.py: SQS message publishingsrc/dagy_api/execution.py: Direct execution from artifact
Scheduler
Implemented as an EventBridge rule that sends scheduler_tick events to SQS every minute. The SQS consumer processes ticks by querying due schedules and triggering runs.
Responsibilities:
- Query due schedules (enabled schedules where
next_run_epochis in the past) - Trigger runs for due schedules
- Update schedule metadata (next_run_at, last_triggered_at)
- Handle catchup policies (none, all)
Execution Backends
Backend Abstraction
All backends implement the ExecutionBackend abstract base class:
class ExecutionBackend(ABC):
@property
def name(self) -> str: ...
@property
def capabilities(self) -> BackendCapabilities: ...
def submit_run(...) -> BackendRunResult: ...
def get_run_status(...) -> BackendRunResult: ...
def cancel_run(...) -> BackendRunResult: ...
def get_logs(...) -> List[BackendLogEntry]: ...
Lambda Backend
Synchronous execution within the Lambda runtime. Best for short, simple tasks.
| Property | Value |
|---|---|
| Max duration | 15 minutes (Lambda timeout) |
| Max memory | 10 GB |
| Parallel tasks | No (sequential) |
| Cost model | Per-invocation + duration |
| Native retry | No (Dagy SDK handles retries) |
Key file: src/dagy_api/backends/lambda_backend.py
Step Functions Backend
Orchestrates flows as AWS Step Functions state machines. Best for parallel workflows.
| Property | Value |
|---|---|
| Max duration | 1 year |
| Max memory | N/A (delegates to Lambda) |
| Parallel tasks | Yes (Parallel state) |
| Cost model | Per state transition ($0.000025) |
| Native retry | Yes (ASL Retry policies) |
Translates FlowSpec to Amazon States Language (ASL) with support for sequential chains, parallel branches, retry policies, and timeouts.
Key file: src/dagy_api/backends/step_functions.py
ECS Fargate Backend
Runs flows as containerized tasks on ECS Fargate. Best for resource-intensive workloads.
| Property | Value |
|---|---|
| Max duration | Unlimited |
| Max memory | 120 GB |
| Parallel tasks | Yes |
| Cost model | Per vCPU-hour + GB-hour |
| Native retry | No (Dagy handles retries) |
Supports custom Docker images, flexible CPU/memory allocation, VPC networking, and CloudWatch log streaming.
Key file: src/dagy_api/backends/ecs.py
Backend Router
Automatically selects the optimal backend using a 5-level resolution strategy:
- Explicit: Per-run
executorparameter override - Deployment:
default_executoron the deployment - Flow:
default_executoron the flow - Rules engine: Duration, resource, and complexity rules
- Default: Falls back to Lambda
Routing rules:
- Memory > 3 GB → ECS
- Duration > 1 hour → ECS
- Duration 15 min–1 hour → Step Functions
- Task count > 50 → Step Functions
- Default → Lambda
Key files:
src/dagy_api/backends/router.py: Router and rulessrc/dagy_api/backends/registry.py: Backend registration
Persistence Layer
All data access goes through a repository layer that provides org-scoped CRUD methods. The platform manages 21 entities organized by domain:
| Domain | Entities |
|---|---|
| Execution | Flow, Deployment, Run, TaskRun, Schedule |
| Auth | User, AccessToken, AccessLog |
| Organization | Organization, Membership, ApiKey |
| Flow Builder | DagDraft |
| Billing | UsageEvent, UsageAggregate, Subscription |
| Enterprise | AuditLog, Secret, NotificationChannel, AlertRule, Environment, Sensor |
See Data Model for entity schemas and relationships.
S3 Artifact Store
Stores flow artifacts (ZIP packages) uploaded during deployment.
- Bucket enforces SSL and blocks public access
- Artifacts organized as:
dagy/flows/{flow_name}/{version}/artifact.zip - Retrieved by execution backends at run time
Enterprise Features
Audit Logging
Captures all mutation operations across all endpoints.
- Org-scoped audit entries
- Per-resource audit trail queries
- Records: actor_email, resource_type, resource_id, action, before/after JSON
log_audit_event()called from every mutation endpoint (wrapped in try/except)
Key file: src/dagy_api/audit.py
Secret Management
Fernet-encrypted secrets with per-org and per-environment scoping.
- Encryption key from
DAGY_SECRETS_KEYenvironment variable - Secrets encrypted at rest, decrypted only via
GET /secrets/{name}/value - Environment-specific overrides (e.g., different DB credentials per env)
Key file: src/dagy_api/secrets.py
Notifications
Multi-channel alerting system with configurable rules.
- Channel types: Slack, email, webhook, PagerDuty
- Alert triggers: on_failure, on_success, on_sla_breach, on_retry
evaluate_and_dispatch()called from events.py on terminal run states- Optional flow_name filter for flow-specific alerts
Key file: src/dagy_api/notifications.py
Health Monitoring
Component-level health checks with latency measurement.
- Checks database, S3, and SQS connectivity
- Returns healthy/degraded/unhealthy status
/health(public) and/health/detailed(authenticated)
Key file: src/dagy_api/sla.py
Frontend
Next.js Application
React-based SPA with Clerk authentication, served via Vercel or CloudFront+S3.
Key pages:
| Route | Purpose |
|---|---|
/dashboard | Usage summary, recent runs, top flows |
/runs | Run list with status filtering |
/flows | Flow catalog and versions |
/flow-builder | Visual flow editor (React Flow) |
/events | Schedules and sensors |
/observability | Metrics and monitoring |
/environments | Environment and secrets management |
/team | RBAC and member management |
/settings | Audit logs, notifications, API keys |
Key libraries:
- React Flow (
@xyflow/react) for DAG canvas - React Query (
@tanstack/react-query) for API data fetching - Clerk (
@clerk/nextjs) for authentication - Tailwind CSS for styling
Infrastructure (CDK)
All AWS resources are defined in infrastructure/dagy_stack.py using AWS CDK:
- 21 database tables (on-demand billing)
- 1 S3 bucket (artifacts, versioned)
- 1 SQS queue (event processing)
- 1 Lambda function (API + SQS consumer)
- 1 ECS cluster (Fargate capacity)
- 1 ECR repository (custom worker images)
- IAM roles for Lambda, Step Functions, ECS
- CloudWatch log groups