Dagy Documentation
Product docs, SDK references, API guides, architecture notes, and deployment runbooks, all accessible from the exported site.
Dagy provides a code-first SDK with @flow and @task decorators, a visual drag-and-drop DAG builder, three execution backends (Lambda, Step Functions, ECS Fargate), enterprise features (RBAC, audit logging, secrets management), and a complete SaaS billing stack.
Quick Navigation
Getting Started
- Installation: Install the Dagy SDK and CLI
- Quickstart: Build and run your first flow in 5 minutes
- Local Development: Develop and test flows locally
Core Concepts
- Core Model: Flows, tasks, and DAG structure
- Execution: How flows run across backends
- Retries & Timeouts: Fault tolerance configuration
- Scheduling: Cron, interval, and one-time schedules
- States & Lifecycle: Task and run state machine
CLI Reference
- CLI Reference: Complete command documentation (install, auth, build, deploy, run, logs)
- CLI Refactoring Plan: Enterprise-grade improvements roadmap
SDK Reference
- SDK Overview: Decorators, classes, and key functions
- SDK API Reference: Complete
@flow,@task, and runtime API - SDK Cookbook: 57 complete example flows covering all patterns
- Flow Lists: CLI commands for managing flows
REST API
- API Overview: Authentication, rate limiting, and conventions
- API Endpoints: Complete reference for 60+ endpoints
- Authentication: Access tokens, JWT, API keys, and RBAC
- Error Model: Error responses and status codes
- OpenAPI: Auto-generated OpenAPI specification
Architecture
- System Overview: High-level architecture and data flow
- Components: Control plane, executors, and UI
- Data Model: Entity schemas and relationships
- Flow Node Architecture: Node framework design, connector model, lifecycle, and persistence
Guides
- Flow Builder Guide: Visual drag-and-drop pipeline builder
- Creating Custom Nodes: Step-by-step guide to building custom flow nodes
- Integration Guides: S3, Kafka, databases, Slack, and more
- Deployment Guide: Self-hosted deployment on AWS
Configuration
- Configuration Overview: Environment variables and settings
- Backend Selection: Lambda, Step Functions, and ECS
- Executors: Executor configuration and routing
- Database: Database configuration
- Profiles: CLI profile management
- UI Customization: Frontend configuration
Operations
- Deployment: CDK deployment runbooks
- Monitoring: Health checks, metrics, and alerting
- Runbooks: Operational procedures
- Scheduler Troubleshooting: Debugging schedules
- Incident Response: Incident management
Troubleshooting
- Common Issues: Frequently encountered problems and solutions
Contributing
- Documentation Guide: How to contribute to these docs
Platform Capabilities
Dagy supports the full lifecycle of data pipeline development:
Define pipelines using Python decorators (@flow, @task), the FlowNode base class for custom nodes, or the visual DAG builder with 27 node types across 7 categories (including control flow). Nodes declare typed connectors that enforce data compatibility at connection time.
Execute on three backends: AWS Lambda for short tasks, Step Functions for parallel orchestration, and ECS Fargate for resource-intensive workloads. The BackendRouter automatically selects the optimal backend based on duration, memory, and complexity rules.
Schedule with cron expressions, fixed intervals, one-time execution, or manual triggers. Timezone-aware scheduling with configurable catchup policies.
Monitor with built-in health checks (component-level: database, S3, SQS), audit logging on all mutations, notification channels (Slack, email, webhook, PagerDuty), and alert rules for failure, success, SLA breach, and retry events.
Secure with 4-role RBAC (owner, admin, developer, viewer) covering 22 granular permissions, Fernet-encrypted secrets management, and API key authentication with scoped access.
Bill with usage metering (run count, compute seconds, API calls), plan-based quotas (free, pro, enterprise), and Stripe integration for checkout, subscriptions, and customer portal.
Architecture at a Glance
┌─────────────┐
│ Next.js UI │
│ (Clerk Auth)│
└──────┬──────┘
│
┌──────▼──────┐
│ FastAPI │
│ (Lambda) │
└──┬───┬───┬──┘
│ │ │
┌────────┘ │ └────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Lambda │ │ Step │ │ ECS │
│ Backend │ │Functions │ │ Fargate │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└────────┐ │ ┌────────┘
▼ ▼ ▼
┌──────────────┐
│ Database │
│ (21 tables) │
└──────────────┘
┌──────────────┐
│ S3 Artifacts│
└──────────────┘
┌──────────────┐
│ SQS Events │
└──────────────┘
Support
For issues and feature requests, please use the GitHub issue tracker. For questions about the SDK, API, or deployment, refer to the relevant documentation section above or the Troubleshooting Guide.
Browse by Section
Getting Started
Concepts
Core Model
Definitions of Flow, Task, DAG, and how they relate.
Dependency Packages
Dependency packages let you bundle Python dependencies into reusable archives that can be attached to one or more flow deployments.
Execution
Dagy offers six runtime tiers to match your workflow's resource and duration needs. The system automatically routes each flow run to the appropriate tier. You can also override the tier explicitly with the `execution_mode` parameter at deploy time or the task level.
Namespaces, Versioning & Tags
Three metadata dimensions organize flows across teams, environments, and release cycles.
Retries & Timeouts
Dagy provides configurable retry policies, backoff strategies, jitter, and hard timeouts at both the task and flow level.
Python SDK
SDK API Reference
Complete reference for all public classes, decorators, and functions in the Dagy SDK.
Dagy SDK Cookbook
A comprehensive guide with 50+ practical examples for building workflows with the Dagy Python SDK. Each example is complete, runnable, and demonstrates real-world patterns.
SDK Examples
This page provides a quick overview of common Dagy SDK patterns. For the complete collection of 57 example flows covering all use cases, see the [SDK Cookbook](cookbook.md).
Flow Lists
List all deployed flows using the CLI or the Python SDK.
SDK Overview
The Dagy SDK is a Python library for defining, building, deploying, and running data pipelines. It provides a decorator-based API for defining flows and tasks, local execution for development, and packaging tools for cloud deployment.
Backend API
Authentication & Authorization
Dagy supports multiple authentication methods and enforces role-based access control (RBAC) on all API endpoints.
Dagy API Reference
Complete API endpoint documentation for the Dagy platform. All endpoints require authentication via Bearer token or API key unless otherwise noted.
API Error Model
All API errors are returned as JSON with an HTTP status code and a `detail` field describing the error.
OpenAPI Specification
Dagy's API is built with FastAPI, which automatically generates an OpenAPI 3.0 specification from the endpoint definitions, Pydantic models, and type annotations.
API Overview
The Dagy REST API provides programmatic access to all platform features: flow management, run execution, scheduling, team management, billing, secrets, notifications, and monitoring.
Configuration
Backend Configuration
Environment variables and service settings for the control plane.
Database Configuration
Dagy uses managed database tables for control-plane metadata. The infrastructure provisions all required tables automatically via CDK.
Executors Configuration
Dagy uses **runtime tiers** as the primary user-facing concept for configuring execution environments. Each runtime tier is backed by one of three execution backends: **Lambda**, **Step Functions**, and **ECS Fargate**. These are selected automatically based on your tier choice and flow requirements. The backend router handles the internal infrastructure mapping, ensuring your flows run with the right amount of compute resources.
Configuration Overview
Configuration layering and defaults.
Profiles
Dagy supports profile-based configuration (similar to AWS CLI profiles) to manage multiple environments from a single machine.
Architecture
Components
Dagy is composed of several distinct components working together. This document describes each component's responsibilities, interfaces, and implementation.
Data Model
Dagy organizes platform data into domain-specific entities. This document describes the core entities, their relationships, and access patterns.
ECS Fargate Execution Framework Architecture
The Dagy ECS Fargate execution framework enables distributed execution of DAG workloads on AWS ECS Fargate, suitable for long-running workloads that exceed Lambda's 15-minute execution limit or require more than 10 GB of memory. The framework follows a control-plane pattern where Lambda functions coordinate DAG launches while containerized ECS Fargate tasks execute the actual work.
Dagy Flow Node Architecture
---
System Overview
Dagy is a Python-native DAG orchestration platform designed for building, scheduling, and monitoring data pipelines at scale. It follows a serverless-first architecture deployed on AWS.
Guides
Creating Custom Flow Nodes
This guide walks you through building custom nodes for the Dagy Flow Designer, from a minimal example to production-grade patterns. Custom nodes extend the platform's capabilities by adding domain-specific processing steps that integrate seamlessly with the visual builder and execution engine.
Dependency Packages
Dependency packages let you bundle Python libraries that your flows need at runtime. Instead of managing `requirements.txt` files manually, DAGY provides a managed experience: search PyPI, validate compatibility, and build a deployable ZIP artifact. All of this works from the UI or API.
Dagy Self-Hosted Deployment Guide
This guide provides comprehensive instructions for deploying Dagy, a DAG orchestration platform, in a self-hosted AWS environment. It covers infrastructure deployment, configuration, and operational best practices.
ECS Fargate Execution Guide
ECS Fargate execution allows you to run Dagy DAG flows as containerized workloads in AWS ECS, suitable for workloads exceeding Lambda's 15-minute timeout or requiring more than 10 GB of memory. This guide covers setup, usage, and operational patterns.
Flow Builder User Guide
Welcome to the Dagy Flow Builder – a powerful, intuitive visual interface for creating data pipelines without writing code. This guide will walk you through everything you need to know to build, deploy, and manage your data workflows.
Operations
Deployment
Infrastructure deployment process for Dagy.
ECS Fargate Operations Runbook
The Dagy ECS infrastructure is defined in `infrastructure/dagy_stack.py` using AWS CDK. Before deploying, configure your stack settings in the StackConfig dataclass.
Incident Response
This page covers how Dagy captures exceptions, provides diagnostic data, and supports incident investigation.
Monitoring
Dagy provides health checks, notification channels, and alert rules for monitoring your pipelines in production.
Runbooks
Operational playbooks and procedures.
Troubleshooting
Overview
Dagy CLI Refactoring Plan: Enterprise-Grade Standards
This document identifies implementation gaps, inconsistencies, and missing enterprise patterns in the Dagy CLI (`src/dagy/cli/`), and outlines a prioritized plan to bring it to production-ready quality.
Dagy CLI Reference
Complete reference for the `dagy` command-line interface, the primary tool for building, deploying, and operating Dagy workflows.