Back to docs
Architecture

System Overview

Dagy is a Python-native DAG orchestration platform designed for building, scheduling, and monitoring data pipelines at scale. It follows a serverless-first architecture deployed on AWS.

Architecture Diagram

+--------------------------------------------------------------+
|                         Clients                               |
|  +----------+  +----------+  +----------+  +--------------+  |
|  | Next.js  |  | Dagy CLI |  | Python   |  |  Webhooks /  |  |
|  | Frontend |  |          |  | SDK      |  |  Sensors     |  |
|  +----+-----+  +----+-----+  +----+-----+  +------+-------+  |
+-------+--------------+--------------+---------------+--------+
        |              |              |               |
        v              v              v               v
+--------------------------------------------------------------+
|                    API Gateway (HTTPS)                        |
+----------------------------+---------------------------------+
                             |
+----------------------------v---------------------------------+
|                     API Lambda                                |
|  +----------------------------------------------------------+|
|  |  FastAPI Application (Mangum Adapter)                     ||
|  |  +----------+ +----------+ +----------+ +-----------+    ||
|  |  | Auth     | | Rate     | | RBAC     | | Audit     |    ||
|  |  | Middle.. | | Limiter  | | Enforcer | | Logger    |    ||
|  |  +----------+ +----------+ +----------+ +-----------+    ||
|  |  +----------------------------------------------------------+
|  |  | 69 REST Endpoints                                      |
|  |  | Flows | Runs | Schedules | Billing | Secrets | ...     |
|  |  +----------------------------------------------------------+
|  +----------------------------------------------------------+|
+------+----------+----------+----------+----------+-----------+
       |          |          |          |          |
       v          v          v          v          v
+----------+ +--------+ +--------+ +--------+ +------------+
| Database | |   S3   | |  SQS   | | Stripe | | Backends   |
| 21 tables| |Artifacts| | Events | |Billing | |            |
+----------+ +--------+ +---+----+ +--------+ |+----------+|
                             |                  ||  Lambda   ||
                             v                  ||  Backend  ||
                      +-----------+             |+----------+|
                      | SQS Cons. |             ||  Step    ||
                      | (Lambda)  |------------>|| Functions||
                      +-----------+             |+----------+|
                                                ||  ECS     ||
                                                || Fargate  ||
                                                |+----------+|
                                                +------------+

Request Flow

Flow Registration

  1. Developer writes @flow/@task decorated Python code
  2. dagy build packages code + dependencies into a ZIP artifact
  3. dagy deploy uploads artifact to S3 and calls POST /flows
  4. API Lambda stores FlowSpec and creates a Deployment record
  5. If schedule is included, creates a Schedule record

Run Execution

  1. Client sends POST /runs with deployment name and parameters
  2. API Lambda validates request, checks RBAC and quota limits
  3. Run record created (status: QUEUED)
  4. Run trigger event published to SQS queue
  5. SQS Consumer Lambda picks up event
  6. BackendRouter selects optimal backend (Lambda, Step Functions, or ECS)
  7. Selected backend executes the flow:
    • Lambda: Inline execution within the consumer Lambda
    • Step Functions: Creates/starts state machine execution
    • ECS: Registers task definition, launches Fargate task
  8. Task results written (TaskRun records)
  9. Run status updated (SUCCEEDED/FAILED)
  10. Notifications evaluated and dispatched (if alert rules match)
  11. Usage metrics recorded

Scheduling

  1. EventBridge sends scheduler_tick event to SQS every minute
  2. SQS Consumer processes tick, queries for due schedules
  3. For each due schedule, triggers a run (same as step 2 above)
  4. Schedule next_run_at and last_triggered_at updated

Design Principles

Serverless-first: The entire control plane runs on Lambda with no always-on infrastructure. The database provides zero-maintenance persistence. SQS decouples API from execution.

Multi-backend: Flows are routed to the optimal backend based on their characteristics. Short tasks run on Lambda, parallel workflows on Step Functions, and resource-intensive jobs on ECS Fargate.

Multi-tenant: Every resource is scoped by org_id. Database queries always filter by organization. RBAC enforces role-based permissions within each org.

Event-driven: All execution is asynchronous via SQS. Run triggers, schedule ticks, and status polls are all events. Sensors and webhooks provide external event ingestion.

Security by default: All mutation endpoints are audit-logged. Secrets are encrypted at rest with Fernet. RBAC is enforced on every endpoint. Rate limiting prevents abuse.