System Overview
Dagy is a Python-native DAG orchestration platform designed for building, scheduling, and monitoring data pipelines at scale. It follows a serverless-first architecture deployed on AWS.
Architecture Diagram
+--------------------------------------------------------------+
| Clients |
| +----------+ +----------+ +----------+ +--------------+ |
| | Next.js | | Dagy CLI | | Python | | Webhooks / | |
| | Frontend | | | | SDK | | Sensors | |
| +----+-----+ +----+-----+ +----+-----+ +------+-------+ |
+-------+--------------+--------------+---------------+--------+
| | | |
v v v v
+--------------------------------------------------------------+
| API Gateway (HTTPS) |
+----------------------------+---------------------------------+
|
+----------------------------v---------------------------------+
| API Lambda |
| +----------------------------------------------------------+|
| | FastAPI Application (Mangum Adapter) ||
| | +----------+ +----------+ +----------+ +-----------+ ||
| | | Auth | | Rate | | RBAC | | Audit | ||
| | | Middle.. | | Limiter | | Enforcer | | Logger | ||
| | +----------+ +----------+ +----------+ +-----------+ ||
| | +----------------------------------------------------------+
| | | 69 REST Endpoints |
| | | Flows | Runs | Schedules | Billing | Secrets | ... |
| | +----------------------------------------------------------+
| +----------------------------------------------------------+|
+------+----------+----------+----------+----------+-----------+
| | | | |
v v v v v
+----------+ +--------+ +--------+ +--------+ +------------+
| Database | | S3 | | SQS | | Stripe | | Backends |
| 21 tables| |Artifacts| | Events | |Billing | | |
+----------+ +--------+ +---+----+ +--------+ |+----------+|
| || Lambda ||
v || Backend ||
+-----------+ |+----------+|
| SQS Cons. | || Step ||
| (Lambda) |------------>|| Functions||
+-----------+ |+----------+|
|| ECS ||
|| Fargate ||
|+----------+|
+------------+
Request Flow
Flow Registration
- Developer writes
@flow/@taskdecorated Python code dagy buildpackages code + dependencies into a ZIP artifactdagy deployuploads artifact to S3 and callsPOST /flows- API Lambda stores FlowSpec and creates a Deployment record
- If schedule is included, creates a Schedule record
Run Execution
- Client sends
POST /runswith deployment name and parameters - API Lambda validates request, checks RBAC and quota limits
- Run record created (status: QUEUED)
- Run trigger event published to SQS queue
- SQS Consumer Lambda picks up event
- BackendRouter selects optimal backend (Lambda, Step Functions, or ECS)
- Selected backend executes the flow:
- Lambda: Inline execution within the consumer Lambda
- Step Functions: Creates/starts state machine execution
- ECS: Registers task definition, launches Fargate task
- Task results written (TaskRun records)
- Run status updated (SUCCEEDED/FAILED)
- Notifications evaluated and dispatched (if alert rules match)
- Usage metrics recorded
Scheduling
- EventBridge sends
scheduler_tickevent to SQS every minute - SQS Consumer processes tick, queries for due schedules
- For each due schedule, triggers a run (same as step 2 above)
- Schedule
next_run_atandlast_triggered_atupdated
Design Principles
Serverless-first: The entire control plane runs on Lambda with no always-on infrastructure. The database provides zero-maintenance persistence. SQS decouples API from execution.
Multi-backend: Flows are routed to the optimal backend based on their characteristics. Short tasks run on Lambda, parallel workflows on Step Functions, and resource-intensive jobs on ECS Fargate.
Multi-tenant: Every resource is scoped by org_id. Database queries always filter by organization. RBAC enforces role-based permissions within each org.
Event-driven: All execution is asynchronous via SQS. Run triggers, schedule ticks, and status polls are all events. Sensors and webhooks provide external event ingestion.
Security by default: All mutation endpoints are audit-logged. Secrets are encrypted at rest with Fernet. RBAC is enforced on every endpoint. Rate limiting prevents abuse.