Dagy Self-Hosted Deployment Guide
This guide provides comprehensive instructions for deploying Dagy, a DAG orchestration platform, in a self-hosted AWS environment. It covers infrastructure deployment, configuration, and operational best practices.
Table of Contents
- Prerequisites
- Architecture Overview
- Infrastructure Deployment
- Environment Variables Reference
- Frontend Deployment
- Backend Configuration
- Security Configuration
- Monitoring & Observability
- Scaling & Performance
- Backup & Disaster Recovery
- Upgrading
- Troubleshooting
Prerequisites
AWS Account Requirements
- Active AWS account with appropriate IAM permissions
- Access to AWS CloudFormation, Lambda, S3, SQS, EC2, ECS, and IAM services
- EC2 key pair created (for ECS cluster access if needed)
Local Development Environment
- Node.js 18+ (for frontend builds)
- Python 3.11+ (for backend and CDK)
- AWS CLI v2 configured with appropriate credentials
- AWS CDK CLI v2+:
npm install -g aws-cdk cdk --version # Should be v2.x.x or higher - Docker 20.10+ (for building Lambda container images)
docker --version docker login # To push to ECR - Git for cloning the repository
- uv (Python package manager - recommended):
curl -LsSf https://astral.sh/uv/install.sh | sh
External Services
- Clerk Account (for frontend authentication)
- Create account at https://clerk.com
- Obtain API keys and webhook signing secret
- Stripe Account (optional, for billing features)
- Create account at https://stripe.com
- Obtain publishable and secret keys, plus product price IDs
AWS IAM Permissions
The IAM user deploying Dagy needs these permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"apigateway:*",
"cloudformation:*",
"ec2:*",
"ecr:*",
"ecs:*",
"events:*",
"iam:*",
"lambda:*",
"logs:*",
"s3:*",
"sns:*",
"sqs:*",
"states:*"
],
"Resource": "*"
}
]
}
Architecture Overview
Component Diagram
┌─────────────────────────────────────────────────────────────┐
│ Frontend (Next.js) │
│ Hosted on Vercel, CloudFront+S3, or ECS │
│ (Clerk Authentication) │
└────────────────────────┬────────────────────────────────────┘
│ HTTPS
▼
┌─────────────────────────────────────────────────────────────┐
│ API Gateway (HTTP) + JWT Authorizer │
└────────────────────────┬────────────────────────────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────────┐ ┌──────────┐
│ Lambda │ │ Step │ │ ECS │
│ (API) │ │ Functions │ │ Fargate │
│ Runner │ │ (Workflow) │ │ (Tasks) │
└────┬────┘ └──────┬──────┘ └────┬─────┘
│ │ │
└────────────────┼───────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
┌─────────┐ ┌──────────────┐ ┌─────────┐
│Database │ │ S3 │ │ SQS │
│ (21 tbl)│ │ (Artifacts) │ │ (Events)│
└─────────┘ └──────────────┘ └─────────┘
Components Overview
| Component | Purpose | Technology |
|---|---|---|
| Frontend | Web UI for DAG management, monitoring, scheduling | Next.js 14+, Clerk Auth |
| API Layer | REST API for flows, runs, deployments, scheduling | FastAPI + Mangum on Lambda |
| Lambda Runner | Default execution backend for tasks | Python 3.12 Lambda Runtime |
| Step Functions | Workflow state machine execution | AWS Step Functions |
| ECS Fargate | Long-running task execution | ECS on Fargate |
| Database | Core data persistence (21 tables) | Managed |
| S3 | Flow artifacts, logs, task outputs | S3 Buckets |
| SQS | Event queue for async task scheduling | SQS Standard Queue |
Data Flow Examples
Flow Registration
User submits flow artifact → API Lambda validates →
Store in S3 → Create FLOWS table entry →
Return flow ID to user
Run Execution
Trigger run request → API Lambda creates RUN record →
Enqueue to SQS → Execution backend polls SQS →
Execute tasks → Update RUN/TASK_RUNS status →
Store artifacts in S3 → Send completion event
Scheduling
Schedule created in UI → Store in SCHEDULES table →
EventBridge CloudWatch Event triggers API Lambda →
Create run via API → Follow normal run execution flow
Infrastructure Deployment
Step 1: Clone Repository and Install Dependencies
# Clone the Dagy repository
git clone https://github.com/equinox-data/dagy.git
cd dagy
# Install Python dependencies
uv sync --extra api
# Install Node dependencies (for frontend build, if deploying together)
cd web
npm install
cd ..
Step 2: Configure CDK Context Parameters
Create environment-specific configuration files in the infrastructure/ directory. The CDK app expects YAML files named after environments.
Example: infrastructure/develop.yml
---
environment: develop
region: us-east-1
app: dagy
owner: data
company: my-company
project_cost: engineering
aws_account: "123456789012"
python_version: "3.12"
dagy:
ecr_repository_name: "dagy-service-worker-develop"
ecr_push_principals:
- "arn:aws:iam::123456789012:role/eqx-buildmaster"
# Optional VPC configuration (for private environments)
# vpc_id: "vpc-12345678"
# subnet_ids:
# - "subnet-12345678"
# - "subnet-87654321"
# security_group_ids:
# - "sg-12345678"
# JWT authentication
# jwt_required: true
# jwt_issuer: "https://your-auth-provider.com"
# jwt_audience: "your-api-audience"
# jwks_url: "https://your-auth-provider.com/.well-known/jwks.json"
# CORS configuration
api_cors_allowed_origins:
- "http://localhost:3000"
- "https://yourdomain.com"
# Lambda container image (leave blank to auto-build)
# lambda_image_uri: "123456789012.dkr.ecr.us-east-1.amazonaws.com/dagy-service-worker-develop:latest"
# lambda_image_tag: "latest"
Example: infrastructure/production.yml
---
environment: production
region: us-east-1
app: dagy
owner: data
company: my-company
project_cost: engineering
aws_account: "987654321098"
python_version: "3.12"
dagy:
ecr_repository_name: "dagy-service-worker-prod"
ecr_push_principals:
- "arn:aws:iam::987654321098:role/ci-cd-role"
vpc_id: "vpc-prod123456"
subnet_ids:
- "subnet-prod111111"
- "subnet-prod222222"
security_group_ids:
- "sg-prod123456"
jwt_required: true
jwt_issuer: "https://your-auth-provider.com"
jwt_audience: "dagy-api"
jwks_url: "https://your-auth-provider.com/.well-known/jwks.json"
api_cors_allowed_origins:
- "https://dagy.yourdomain.com"
- "https://api.dagy.yourdomain.com"
Step 3: Build Lambda Container Image
The CDK deployment includes an automatic image build step via publish_image.sh. This script builds a Docker image optimized for AWS Lambda.
# Navigate to infrastructure directory
cd infrastructure
# Set environment variables
export DAGY_ENVIRONMENT=develop
export AWS_REGION=us-east-1
# The CDK app will automatically build and push the image to ECR
# Or manually build:
docker build -t dagy-api:latest ..
The Docker image is based on public.ecr.aws/lambda/python:3.12 and includes all necessary dependencies (FastAPI, Mangum, boto3, etc.).
Step 4: Bootstrap and Deploy CDK Stack
# Bootstrap CDK (one-time setup per AWS account/region)
cd infrastructure
cdk bootstrap aws://ACCOUNT-ID/REGION \
--profile your-aws-profile
# Example:
cdk bootstrap aws://123456789012/us-east-1 --profile default
# Deploy the stack
cdk deploy --env develop \
--require-approval never \
--profile your-aws-profile
# Or with explicit parameters:
cdk deploy --env develop \
-c environment=develop \
-c region=us-east-1 \
-c account=123456789012 \
-c aws_profile=default \
--require-approval never
The deployment process will:
- Validate environment configuration
- Build and push Lambda Docker image to ECR
- Create/update CloudFormation stack with all resources
- Output stack outputs with resource names and endpoints
Step 5: Collect Deployment Outputs
After successful deployment, the CDK outputs will include:
Outputs:
dagy-development.APIEndpoint = https://xyz123.execute-api.us-east-1.amazonaws.com
dagy-development.FlowsTableName = dagy-flows-development
dagy-development.DeploymentsTableName = dagy-deployments-development
dagy-development.RunsTableName = dagy-runs-development
dagy-development.TaskRunsTableName = dagy-task-runs-development
dagy-development.SchedulesTableName = dagy-schedules-development
dagy-development.UsersTableName = dagy-users-development
dagy-development.AccessTokensTableName = dagy-access-tokens-development
dagy-development.AccessLogsTableName = dagy-access-logs-development
dagy-development.OrganizationsTableName = dagy-organizations-development
dagy-development.MembershipsTableName = dagy-memberships-development
dagy-development.APIKeysTableName = dagy-api-keys-development
dagy-development.DAGDraftsTableName = dagy-dag-drafts-development
dagy-development.UsageEventsTableName = dagy-usage-events-development
dagy-development.UsageAggregatesTableName = dagy-usage-aggregates-development
dagy-development.SubscriptionsTableName = dagy-subscriptions-development
dagy-development.AuditLogsTableName = dagy-audit-logs-development
dagy-development.SecretsTableName = dagy-secrets-development
dagy-development.NotificationChannelsTableName = dagy-notification-channels-development
dagy-development.AlertRulesTableName = dagy-alert-rules-development
dagy-development.EnvironmentsTableName = dagy-environments-development
dagy-development.SensorsTableName = dagy-sensors-development
dagy-development.ArtifactBucketName = dagy-artifacts-123456789012-us-east-1-development
dagy-development.EventsQueueURL = https://sqs.us-east-1.amazonaws.com/123456789012/dagy-events-development
dagy-development.LambdaFunctionName = dagy-lambda-development
dagy-development.EventBridgeRuleArn = arn:aws:events:us-east-1:123456789012:rule/dagy-schedule-rule-development
Save these outputs for the next configuration steps.
Environment Variables Reference
The Dagy API Lambda requires the following environment variables to be set. These are automatically configured by the CDK stack based on the resources it creates.
Tables (Core)
| Variable | Description | Required |
|---|---|---|
DAGY_FLOWS | Flows table name | Yes |
DAGY_DEPLOYMENTS | Deployments table name | Yes |
DAGY_RUNS | Flow runs table name | Yes |
DAGY_TASK_RUNS | Task runs table name | Yes |
DAGY_SCHEDULES | Schedules table name | Yes |
Tables (Authentication & Access)
| Variable | Description | Required |
|---|---|---|
DAGY_USERS | Users table name | Yes |
DAGY_ACCESS_TOKENS | Access tokens table name | Yes |
DAGY_ACCESS_LOGS | Access logs for auditing | Yes |
Tables (Organizations & Teams)
| Variable | Description | Required |
|---|---|---|
DAGY_ORGANIZATIONS | Organizations table name | Yes |
DAGY_MEMBERSHIPS | Organization memberships | Yes |
DAGY_API_KEYS | API keys for programmatic access | Yes |
Tables (Flow Builder)
| Variable | Description | Required |
|---|---|---|
DAGY_DAG_DRAFTS | Unsaved DAG drafts | Yes |
Tables (Billing & Usage)
| Variable | Description | Required |
|---|---|---|
DAGY_USAGE_EVENTS | Individual API call events | Yes |
DAGY_USAGE_AGGREGATES | Aggregated usage metrics | Yes |
DAGY_SUBSCRIPTIONS | Subscription/plan information | Yes |
Tables (Enterprise Features)
| Variable | Description | Required |
|---|---|---|
DAGY_AUDIT_LOGS | Detailed audit trail | Yes |
DAGY_SECRETS | Encrypted secrets storage | Yes |
DAGY_NOTIFICATION_CHANNELS | Alert notification destinations | Yes |
DAGY_ALERT_RULES | Alert rule definitions | Yes |
DAGY_ENVIRONMENTS | Deployment environments | Yes |
DAGY_SENSORS | Sensor/trigger configurations | Yes |
Storage & Queue
| Variable | Description | Required |
|---|---|---|
DAGY_ARTIFACT_BUCKET | S3 bucket for flow artifacts | Yes |
DAGY_EVENTS_QUEUE_URL | SQS queue URL for task events | Yes |
DAGY_FLOW_EXECUTOR_FUNCTION | Lambda function name/ARN for the Flow Executor backend | No (required for in-process/task-isolated execution modes) |
Secrets & Encryption
| Variable | Description | Required |
|---|---|---|
DAGY_SECRETS_KEY | Fernet key for encrypting secrets | No (required if secrets used) |
Format: Base64-encoded Fernet key. Generate with:
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
JWT Authentication
| Variable | Description | Required |
|---|---|---|
DAGY_JWT_REQUIRED | Enable JWT authentication (true/false) | No (default: false) |
DAGY_JWT_ISSUER | JWT issuer URL | Yes if JWT required |
DAGY_JWT_AUDIENCE | JWT audience claim | Yes if JWT required |
DAGY_JWKS_URL | JWKS endpoint URL | Yes if JWT required |
DAGY_ACCESS_TOKEN_TTL_SECONDS | Token expiration in seconds | No (default: 86400 = 24 hours) |
Stripe (Optional - For Billing)
| Variable | Description | Required |
|---|---|---|
STRIPE_SECRET_KEY | Stripe API secret key | No (required for billing) |
STRIPE_WEBHOOK_SECRET | Stripe webhook signing secret | No (required for billing) |
STRIPE_PRICE_PRO | Stripe price ID for Pro plan | No |
STRIPE_PRICE_ENTERPRISE | Stripe price ID for Enterprise plan | No |
Example Lambda Environment Variables Configuration
Via AWS Lambda console or CDK, set:
DAGY_FLOWS=dagy-flows-development
DAGY_DEPLOYMENTS=dagy-deployments-development
DAGY_RUNS=dagy-runs-development
DAGY_TASK_RUNS=dagy-task-runs-development
DAGY_SCHEDULES=dagy-schedules-development
DAGY_USERS=dagy-users-development
DAGY_ACCESS_TOKENS=dagy-access-tokens-development
DAGY_ACCESS_LOGS=dagy-access-logs-development
DAGY_ORGANIZATIONS=dagy-organizations-development
DAGY_MEMBERSHIPS=dagy-memberships-development
DAGY_API_KEYS=dagy-api-keys-development
DAGY_DAG_DRAFTS=dagy-dag-drafts-development
DAGY_USAGE_EVENTS=dagy-usage-events-development
DAGY_USAGE_AGGREGATES=dagy-usage-aggregates-development
DAGY_SUBSCRIPTIONS=dagy-subscriptions-development
DAGY_AUDIT_LOGS=dagy-audit-logs-development
DAGY_SECRETS=dagy-secrets-development
DAGY_NOTIFICATION_CHANNELS=dagy-notification-channels-development
DAGY_ALERT_RULES=dagy-alert-rules-development
DAGY_ENVIRONMENTS=dagy-environments-development
DAGY_SENSORS=dagy-sensors-development
DAGY_ARTIFACT_BUCKET=dagy-artifacts-123456789012-us-east-1-development
DAGY_EVENTS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/123456789012/dagy-events-development
DAGY_SECRETS_KEY=<base64-fernet-key-here>
DAGY_JWT_REQUIRED=false
DAGY_ACCESS_TOKEN_TTL_SECONDS=86400
Frontend Deployment
Option 1: Deploy on Vercel (Recommended)
Vercel provides the easiest deployment path with automatic builds, edge caching, and HTTPS.
Prerequisites
- Vercel account (https://vercel.com)
- GitHub repository with Dagy code
Steps
-
Connect GitHub repository to Vercel
- Go to https://vercel.com/new
- Select your Dagy GitHub repository
- Vercel auto-detects it as a Next.js project
-
Configure Clerk environment variables
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_live_xxxxx CLERK_SECRET_KEY=sk_live_xxxxx NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL=/flows NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL=/flows -
Configure API endpoint environment variables
NEXT_PUBLIC_API_URL=https://api.dagy.io/app -
Deploy
- Click "Deploy"
- Vercel builds and deploys automatically on every push to main
- Get your frontend URL (e.g.,
https://dagy.vercel.app)
Option 2: CloudFront + S3 Deployment
For organizations preferring AWS-only solutions.
Prerequisites
- AWS CloudFront and S3 setup
- ACM certificate for domain
Steps
-
Build Next.js application
cd web npm install npm run build -
Create S3 bucket for static exports
aws s3 mb s3://dagy-frontend-production --region us-east-1 # Enable static website hosting aws s3api put-bucket-website \ --bucket dagy-frontend-production \ --website-configuration '{ "IndexDocument": {"Suffix": "index.html"}, "ErrorDocument": {"Key": "404.html"} }' -
Upload built files
# Export static site from Next.js npm run export # Requires static export config in next.config.js # Sync to S3 aws s3 sync out/ s3://dagy-frontend-production/ --delete -
Create CloudFront distribution
# Create invalidation to clear cache aws cloudfront create-invalidation \ --distribution-id E123ABC \ --paths "/*"
Option 3: ECS Fargate Deployment
For full containerization within AWS.
Prerequisites
- ECS cluster
- ECR repository
- ALB/NLB for routing
Steps
-
Build Docker image
cd web docker build -t dagy-frontend:latest -f Dockerfile.prod . -
Push to ECR
aws ecr get-login-password --region us-east-1 | \ docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com docker tag dagy-frontend:latest \ 123456789012.dkr.ecr.us-east-1.amazonaws.com/dagy-frontend:latest docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/dagy-frontend:latest -
Create ECS task definition with:
- Image:
123456789012.dkr.ecr.us-east-1.amazonaws.com/dagy-frontend:latest - Port:
3000 - Environment variables for Clerk and API URL
- Image:
-
Create ECS service with ALB target group
Clerk Configuration
-
Create Clerk application
- Go to https://dashboard.clerk.com
- Create new application
- Choose "Web" and "Next.js"
-
Get API keys
- Navigate to "API Keys"
- Copy "Publishable Key" and "Secret Key"
-
Configure allowed origins
- Go to "Domains"
- Add your frontend domain(s)
- Add API Gateway domain if using cross-origin auth
-
Setup webhooks (for user sync to database)
- Go to "Webhooks"
- Create webhook for
user.createdanduser.deletedevents - Point to
https://api.yourdomain.com/webhooks/clerk
Frontend Environment Variables
# Clerk Authentication
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_live_xxxxx
CLERK_SECRET_KEY=sk_live_xxxxx
# API Configuration
NEXT_PUBLIC_API_URL=https://api.dagy.io/app
NEXT_PUBLIC_API_VERSION=v1
# Clerk URLs
NEXT_PUBLIC_CLERK_SIGN_IN_URL=/sign-in
NEXT_PUBLIC_CLERK_SIGN_UP_URL=/sign-up
NEXT_PUBLIC_CLERK_AFTER_SIGN_IN_URL=/flows
NEXT_PUBLIC_CLERK_AFTER_SIGN_UP_URL=/flows
# Optional: Analytics, error tracking, etc.
NEXT_PUBLIC_SENTRY_DSN=https://xxxxx@sentry.io/xxxxx
CORS Configuration for API
If frontend and API are on different domains, configure CORS in CDK:
# In infrastructure/develop.yml
dagy:
api_cors_allowed_origins:
- "https://dagy.yourdomain.com"
- "https://dagy.vercel.app"
- "http://localhost:3000" # For local development
Backend Configuration
Execution Backends
Dagy supports three execution backends for running tasks. Configure which backends are available in your environment.
1. Lambda Backend (Default)
Simplest option; no additional configuration needed beyond Lambda function permissions.
# In dagy_api/backends/lambda_backend.py
# Automatically invokes task functions as Lambda functions
# Default concurrency: 1000 (account limit)
Pros:
- Zero infrastructure management
- Automatic scaling
- Pay-per-execution pricing
Cons:
- 15-minute timeout limit per execution
- Limited to Lambda execution environment
- Cold starts impact latency
Configuration:
- Set
EXECUTION_BACKEND=lambda(default) - No additional environment variables needed
2. Step Functions Backend
For complex workflow orchestration with state machines.
Pros:
- 1-year execution duration
- Complex branching and retry logic
- Visual workflow monitoring in AWS Console
Cons:
- More expensive per execution
- Requires separate state machine definition
- Additional complexity
Setup:
- Create IAM role for Step Functions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"states:StartExecution"
],
"Resource": "*"
}
]
}
- Configure in CDK:
# infrastructure/develop.yml
dagy:
step_functions_role_arn: "arn:aws:iam::123456789012:role/step-functions-role"
3. ECS Fargate Backend
For long-running tasks, custom dependencies, or GPU workloads.
Pros:
- Full container control
- GPU support via instance types
- Custom runtimes and libraries
- 15-hour task duration
Cons:
- Requires ECS cluster management
- Higher baseline costs
- More operational overhead
Setup:
- Create ECS cluster:
aws ecs create-cluster --cluster-name dagy-tasks
# Create CloudWatch log group
aws logs create-log-group --log-group-name /ecs/dagy-tasks
- Create task definition:
aws ecs register-task-definition \
--family dagy-task-worker \
--network-mode awsvpc \
--requires-compatibilities FARGATE \
--cpu 256 \
--memory 512 \
--container-definitions '{
"name": "task-worker",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/dagy-worker:latest",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/dagy-tasks",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}'
- Configure in CDK:
# infrastructure/develop.yml
dagy:
ecs_cluster_name: "dagy-tasks"
ecs_task_definition_arn: "arn:aws:ecs:us-east-1:123456789012:task-definition/dagy-task-worker:1"
ecs_subnets:
- "subnet-12345678"
ecs_security_groups:
- "sg-12345678"
Rate Limiting Configuration
Dagy includes token bucket rate limiting to prevent abuse.
Default configuration:
- 120 requests per minute per API key/IP
- 200 token burst capacity
Customize in src/dagy_api/app.py:
from dagy_api.rate_limit import RateLimitMiddleware
app.add_middleware(
RateLimitMiddleware,
requests_per_minute=120, # Requests per minute
burst_size=200 # Max burst tokens
)
Rate limit headers in responses:
X-RateLimit-Limit: 120
X-RateLimit-Remaining: 45
Retry-After: 30
RBAC (Role-Based Access Control)
Dagy implements organization and membership-based RBAC.
Roles:
- Owner: Full access to organization
- Admin: Manage flows, runs, schedules, users
- Developer: Create and run flows
- Viewer: Read-only access
Membership management:
- Store in
DAGY_MEMBERSHIPStable - Contains: user_id, org_id, role
- Check role on every API request
Example permission check:
from dagy_api.auth import check_org_permission
async def create_flow(org_id: str, request: Request):
check_org_permission(
org_id=org_id,
user_id=request.state.user_id,
required_role="developer"
)
# ... create flow
Security Configuration
JWT Authentication Setup
Enable JWT authentication to require valid tokens for all API requests.
Prerequisites
- JWT issuer (e.g., Auth0, Clerk, Cognito)
- JWKS (JSON Web Key Set) endpoint
- JWT issuer URL and audience
Enable JWT Authentication
- Configure in environment YAML:
# infrastructure/develop.yml
dagy:
jwt_required: true
jwt_issuer: "https://your-auth-provider.com"
jwt_audience: "dagy-api"
jwks_url: "https://your-auth-provider.com/.well-known/jwks.json"
- Set Lambda environment variables:
DAGY_JWT_REQUIRED=true
DAGY_JWT_ISSUER=https://your-auth-provider.com
DAGY_JWT_AUDIENCE=dagy-api
DAGY_JWKS_URL=https://your-auth-provider.com/.well-known/jwks.json
- Redeploy CDK stack:
cd infrastructure
cdk deploy --env develop
JWT Validation Flow
1. Client sends request: Authorization: Bearer eyJhbGc...
2. API Gateway HttpJwtAuthorizer validates token
3. JWKS endpoint verifies signature
4. Lambda receives request with claims in context
5. API checks scopes/permissions
API Key Management
For programmatic access, implement API keys as alternative to JWT.
API key format: dagy_[base32-encoded-32-random-bytes]
Example:
dagy_JBSWY3DPEBLW64TMMQ======
Storage:
- Hash API keys before storing in
DAGY_API_KEYStable - Use bcrypt or Argon2
Validation in middleware:
async def validate_api_key(request: Request):
auth_header = request.headers.get("authorization", "")
if auth_header.startswith("Bearer dagy_"):
api_key = auth_header.split(" ", 1)[1]
# Hash and look up in database
# Set request.state.user_id and request.state.org_id
Secrets Encryption
Store sensitive data (API keys, passwords, credentials) encrypted.
Generate Fernet Encryption Key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Output: gAAAAABl4_2_wqtfT8qjU...
Set in Lambda
aws lambda update-function-configuration \
--function-name dagy-api-lambda \
--environment Variables={DAGY_SECRETS_KEY=gAAAAABl4_2_wqtfT8qjU...}
Encrypt Secrets in Code
from cryptography.fernet import Fernet
import os
key = os.getenv("DAGY_SECRETS_KEY").encode()
cipher = Fernet(key)
def encrypt_secret(value: str) -> str:
return cipher.encrypt(value.encode()).decode()
def decrypt_secret(encrypted: str) -> str:
return cipher.decrypt(encrypted.encode()).decode()
# Usage
encrypted = encrypt_secret("my-api-key-123")
decrypted = decrypt_secret(encrypted)
Store in DAGY_SECRETS Table
# Database schema
{
"secret_id": "sec_12345", # Partition key
"org_id": "org_abc", # Sort key
"name": "github-token", # Friendly name
"encrypted_value": "gAAAAABl...", # Encrypted value
"created_at": 1704067200,
"created_by": "user_123"
}
VPC and Security Groups
For production, deploy in a VPC with restricted network access.
Configure VPC in CDK
# infrastructure/production.yml
dagy:
vpc_id: "vpc-prod123456"
subnet_ids:
- "subnet-prod111111" # Private subnet 1
- "subnet-prod222222" # Private subnet 2
security_group_ids:
- "sg-prod-api" # Security group for Lambda
Security Group Rules
# Allow API Gateway to invoke Lambda
aws ec2 authorize-security-group-ingress \
--group-id sg-prod-api \
--protocol tcp \
--port 443 \
--cidr 0.0.0.0/0
# Allow Lambda to database (same SG)
aws ec2 authorize-security-group-ingress \
--group-id sg-prod-db \
--protocol tcp \
--port 443 \
--source-group sg-prod-api
Lambda VPC Execution
Lambda functions in VPC require:
- ENI in private subnets
- NAT Gateway for external access (S3, database endpoints)
- VPC endpoints for AWS services
# Create VPC endpoint for S3
aws ec2 create-vpc-endpoint \
--vpc-id vpc-prod123456 \
--service-name com.amazonaws.us-east-1.s3 \
--route-table-ids rtb-prod111111
# Create VPC endpoint for database
aws ec2 create-vpc-endpoint \
--vpc-id vpc-prod123456 \
--service-name com.amazonaws.us-east-1.dynamodb \
--route-table-ids rtb-prod111111
IAM Least-Privilege Policies
Create minimal IAM roles for Lambda function.
Lambda execution role:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DatabaseAccess",
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:Query",
"dynamodb:Scan",
"dynamodb:DeleteItem"
],
"Resource": [
"arn:aws:dynamodb:us-east-1:123456789012:table/dagy-flows-*",
"arn:aws:dynamodb:us-east-1:123456789012:table/dagy-runs-*",
"arn:aws:dynamodb:us-east-1:123456789012:table/dagy-*"
]
},
{
"Sid": "S3ArtifactAccess",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::dagy-artifacts-*/*"
},
{
"Sid": "SQSAccess",
"Effect": "Allow",
"Action": [
"sqs:SendMessage",
"sqs:ReceiveMessage",
"sqs:DeleteMessage"
],
"Resource": "arn:aws:sqs:us-east-1:123456789012:dagy-events-*"
},
{
"Sid": "LambdaInvoke",
"Effect": "Allow",
"Action": "lambda:InvokeFunction",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:dagy-*"
},
{
"Sid": "CloudWatchLogs",
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/dagy-*"
}
]
}
Monitoring & Observability
Health Check Endpoints
Dagy provides health check endpoints for monitoring.
Endpoints:
GET /health
GET /health/detailed
Example health response:
{
"status": "healthy",
"timestamp": "2024-03-01T12:00:00Z",
"version": "0.1.0"
}
Detailed health response:
{
"status": "healthy",
"timestamp": "2024-03-01T12:00:00Z",
"version": "0.1.0",
"components": {
"database": {
"status": "healthy",
"latency_ms": 42
},
"s3": {
"status": "healthy",
"latency_ms": 156
},
"sqs": {
"status": "healthy",
"latency_ms": 31
}
}
}
Configure health checks:
# In application load balancer
aws elbv2 create-target-group \
--name dagy-api \
--protocol HTTP \
--port 80 \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3
CloudWatch Metrics and Alarms
Enable Custom Metrics
import boto3
cloudwatch = boto3.client('cloudwatch')
def publish_metric(metric_name: str, value: float, unit: str = "Count"):
cloudwatch.put_metric_data(
Namespace='Dagy',
MetricData=[
{
'MetricName': metric_name,
'Value': value,
'Unit': unit,
'Dimensions': [
{'Name': 'Environment', 'Value': 'production'},
{'Name': 'Service', 'Value': 'api'}
]
}
]
)
# Example: Track flow execution
publish_metric('FlowExecutionTime', execution_time_ms, 'Milliseconds')
publish_metric('FailedRuns', 1, 'Count')
Create Alarms
# Lambda error rate alarm
aws cloudwatch put-metric-alarm \
--alarm-name dagy-lambda-errors \
--alarm-description "Alert if Lambda error rate > 1%" \
--metric-name Errors \
--namespace AWS/Lambda \
--statistic Sum \
--period 300 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
# Database throttling alarm
aws cloudwatch put-metric-alarm \
--alarm-name dagy-database-throttles \
--alarm-description "Alert if database is throttled" \
--metric-name ConsumedWriteCapacityUnits \
--namespace AWS/DynamoDB \
--statistic Sum \
--period 60 \
--threshold 100 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts
Audit Logging Configuration
Store all user actions in DAGY_AUDIT_LOGS for compliance and debugging.
Audit log schema:
{
"audit_id": "aud_abc123", # Partition key
"timestamp": 1704067200, # Sort key
"org_id": "org_xyz",
"user_id": "user_123",
"action": "flow_created",
"resource_type": "flow",
"resource_id": "flow_abc",
"changes": {
"name": {"old": null, "new": "my-flow"},
"version": {"old": null, "new": "1.0.0"}
},
"ip_address": "203.0.113.42",
"user_agent": "Mozilla/5.0...",
"status": "success" # or "failure"
}
Log all significant actions:
from dagy_api.audit import log_audit_event
async def create_flow(org_id: str, data: FlowData, request: Request):
# Create flow...
# Log audit event
log_audit_event(
org_id=org_id,
user_id=request.state.user_id,
action="flow_created",
resource_type="flow",
resource_id=flow.id,
changes={"name": {"old": None, "new": flow.name}},
ip_address=request.client.host,
status="success"
)
Alert Rules for Pipeline Monitoring
Configure alerts to notify on flow execution failures, SLA breaches, etc.
Alert rule schema:
{
"rule_id": "rule_123", # Partition key
"org_id": "org_xyz", # Sort key
"name": "High failure rate",
"enabled": True,
"condition": {
"metric": "run_failure_rate",
"threshold": 0.1, # 10% failure rate
"window_minutes": 5,
"operator": "GreaterThan"
},
"notification_channels": ["channel_slack_123"],
"created_at": 1704067200
}
Example: Create alert rule via API
curl -X POST https://api.example.com/alerts/rules \
-H "Authorization: Bearer ${TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"name": "Flow failure rate",
"metric": "run_failure_rate",
"threshold": 0.15,
"window_minutes": 10,
"notification_channel_ids": ["channel_123"]
}'
Managing Deployment Settings
After a flow is deployed, its runtime settings can be updated without redeploying the artifact. This is useful for changing execution strategy, adjusting schedules, or attaching new dependency packages.
Settings UI
The Dagy web UI provides a Flow Settings dialog accessible from the Flows page. Click the dropdown menu on any flow row and select Settings, or use the Flow Settings button in the flow detail panel.
The settings dialog allows you to update:
- Runtime tier: Choose from nano through xlarge runtime tiers based on workload size
- Default executor: Auto (determined by tier)
- Schedule: Set or change the cron expression or interval for automated runs
- Dependency packages: Attach or remove dependency package slugs resolved at runtime
- Tags: Add, update, or remove key-value metadata tags
Changes that affect runtime behavior (runtime tier, schedule, dependency packages) trigger a confirmation prompt before saving. Existing in-progress runs are not affected; changes apply to new runs only.
Settings API
Use PUT /deployments/{name}/settings to update settings programmatically. Only the fields included in the request body are updated:
curl -X PUT https://api.dagy.io/v1/deployments/daily-etl/settings \
-H "Authorization: Bearer {token}" \
-H "Content-Type: application/json" \
-d '{
"execution_mode": "micro",
"schedule": "0 9 * * 1-5",
"dep_package_slugs": ["pandas-layer"]
}'
See the API Reference for the full field list and response schema.
Scaling & Performance
Database Capacity Planning
The database supports two billing modes:
On-Demand (Default)
- Recommended for variable workloads
- Auto-scales capacity
- Pay per request
Enable on-demand:
# CDK automatically defaults to on-demand
# In dagy_stack.py:
table = dynamodb.Table(
self, "flows-table",
partition_key=Attribute(name="flow_id", type=AttributeType.STRING),
billing_mode=BillingMode.PAY_PER_REQUEST # Default
)
Provisioned
- Better for predictable workloads
- Cheaper at higher scale
- Requires capacity planning
Example: Switch to provisioned
table = dynamodb.Table(
self, "flows-table",
partition_key=Attribute(name="flow_id", type=AttributeType.STRING),
billing_mode=BillingMode.PROVISIONED,
read_capacity=100, # RCUs
write_capacity=100 # WCUs
)
# Enable auto-scaling
table.auto_scale_read_capacity(
min_capacity=10,
max_capacity=1000
)
table.auto_scale_write_capacity(
min_capacity=10,
max_capacity=1000
)
Capacity Calculation
For read capacity:
- 1 RCU = 1 strongly consistent read/sec or 2 eventually consistent reads/sec
- Example: 1000 flow reads/minute = 17 RCUs minimum
For write capacity:
- 1 WCU = 1 write/sec
- Example: 500 flow creates/minute = 9 WCUs minimum
Add 30% buffer for spikes:
Required RCUs = (peak_reads_per_sec / 1.0) * 1.3
Required WCUs = (peak_writes_per_sec / 1.0) * 1.3
Lambda Concurrency Limits
Control Lambda concurrency to manage costs and prevent throttling.
Account concurrency: Default 1000 concurrent executions
Set function concurrency:
aws lambda put-function-concurrency \
--function-name dagy-api-lambda \
--reserved-concurrent-executions 500
Reserved concurrency:
- Guarantees capacity for critical functions
- Deducts from account total
- Useful for production environments
Cold start optimization:
# Provision capacity to eliminate cold starts
aws lambda put-provisioned-concurrency-config \
--function-name dagy-api-lambda \
--provisioned-concurrent-executions 50 \
--qualifier LIVE
ECS Task Scaling
Auto-scale ECS tasks based on CPU/memory metrics.
# Create auto-scaling target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/dagy-tasks/dagy-worker \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 2 \
--max-capacity 100
# CPU-based scaling policy
aws application-autoscaling put-scaling-policy \
--policy-name scale-by-cpu \
--service-namespace ecs \
--resource-id service/dagy-tasks/dagy-worker \
--scalable-dimension ecs:service:DesiredCount \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300
}'
SQS Visibility Timeout Tuning
Configure SQS visibility timeout to match task execution time.
Default: 30 seconds
Set visibility timeout:
aws sqs set-queue-attributes \
--queue-url https://sqs.us-east-1.amazonaws.com/123456789012/dagy-events \
--attributes VisibilityTimeout=300 # 5 minutes for longer tasks
Recommended:
- Short tasks (< 1 min): 120 seconds
- Medium tasks (1-5 min): 300 seconds
- Long tasks (5+ min): 900 seconds
Performance Optimization Checklist
- Enable database auto-scaling or set appropriate capacity
- Configure Lambda reserved concurrency for predictable workloads
- Use VPC endpoints for private access to AWS services
- Enable database caching for hot data
- Implement query pagination for large result sets
- Use batch operations for bulk inserts/updates
- Configure appropriate CloudWatch metrics and alarms
- Set up CloudFront caching for static assets
- Implement database connection pooling
- Monitor Lambda cold start times and optimize layer size
Backup & Disaster Recovery
Database Point-in-Time Recovery
Enable automatic backups for all database tables.
# Enable PITR for all tables
for table in dagy-flows-dev dagy-runs-dev dagy-deployments-dev; do
aws dynamodb update-continuous-backups \
--table-name $table \
--point-in-time-recovery-specification \
PointInTimeRecoveryEnabled=true
done
Restore from backup:
# Restore to a specific time
aws dynamodb restore-table-to-point-in-time \
--source-table-name dagy-runs-dev \
--target-table-name dagy-runs-dev-restored \
--restore-date-time 2024-03-01T12:00:00Z
S3 Versioning for Artifacts
Enable versioning on artifact buckets to protect against accidental deletion.
# Enable versioning
aws s3api put-bucket-versioning \
--bucket dagy-artifacts-123456789012-us-east-1-development \
--versioning-configuration Status=Enabled
# Enable lifecycle policy to expire old versions after 90 days
aws s3api put-bucket-lifecycle-configuration \
--bucket dagy-artifacts-123456789012-us-east-1-development \
--lifecycle-configuration '{
"Rules": [
{
"Id": "expire-old-versions",
"Status": "Enabled",
"NoncurrentVersionExpirationInDays": 90
}
]
}'
Cross-Region Replication
For disaster recovery, replicate critical data across regions.
# Create replication role
aws iam create-role \
--role-name s3-replication-role \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {"Service": "s3.amazonaws.com"},
"Action": "sts:AssumeRole"
}
]
}'
# Attach replication policy
aws iam put-role-policy \
--role-name s3-replication-role \
--policy-name replication \
--policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetReplicationConfiguration", "s3:ListBucket"],
"Resource": "arn:aws:s3:::dagy-artifacts-*"
},
{
"Effect": "Allow",
"Action": ["s3:GetObjectVersionForReplication", "s3:GetObjectVersionAcl"],
"Resource": "arn:aws:s3:::dagy-artifacts-*/*"
},
{
"Effect": "Allow",
"Action": ["s3:ReplicateObject", "s3:ReplicateDelete"],
"Resource": "arn:aws:s3:::dagy-artifacts-replica/*"
}
]
}'
# Create replica bucket in different region
aws s3api create-bucket \
--bucket dagy-artifacts-replica \
--region us-west-2 \
--create-bucket-configuration LocationConstraint=us-west-2
# Enable replication
aws s3api put-bucket-replication \
--bucket dagy-artifacts-123456789012-us-east-1-development \
--replication-configuration '{
"Role": "arn:aws:iam::123456789012:role/s3-replication-role",
"Rules": [
{
"Status": "Enabled",
"Priority": 1,
"DeleteMarkerReplication": {"Status": "Enabled"},
"Filter": {"Prefix": ""},
"Destination": {
"Bucket": "arn:aws:s3:::dagy-artifacts-replica",
"ReplicationTime": {"Status": "Enabled", "Time": {"Minutes": 15}},
"Metrics": {"Status": "Enabled", "EventThreshold": {"Minutes": 15}}
}
}
]
}'
Backup Strategy
Recommended backup frequency:
- Database: Continuous (PITR enabled) + daily snapshots
- S3: Versioning enabled + cross-region replication
- Secrets: Encrypted backups in separate AWS account
Restore procedure:
- Restore database tables from PITR to new table
- Verify data integrity in non-production
- Update Lambda environment variables to point to restored tables
- Validate API functionality
- Gradually shift traffic to restored environment
Upgrading
CDK Stack Updates
Minor updates (configuration changes, security patches):
cd infrastructure
# Review changes
cdk diff --env production
# Deploy
cdk deploy --env production \
--require-approval never
Database Migration Considerations
Schema changes:
- The database is schemaless, but application code enforces structure
- Add backward compatibility for new fields
- Use feature flags to enable new functionality gradually
Example: Add new field with default
# Old code
run = {
"run_id": "run_123",
"status": "completed"
}
# New code - handle missing field
status = run.get("completion_time", None)
Migrate existing records:
import boto3
dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("dagy-runs-prod")
# Scan and update all records
response = table.scan()
for item in response["Items"]:
if "completion_time" not in item:
table.update_item(
Key={"run_id": item["run_id"]},
UpdateExpression="SET completion_time = :ct",
ExpressionAttributeValues={":ct": 0}
)
Zero-Downtime Deployment Strategy
Blue-Green Deployment:
- Deploy new Lambda version alongside existing version
- Update 10% of traffic to new version using API Gateway weighted routing
- Monitor errors and metrics
- Gradually increase traffic: 25% → 50% → 100%
- Rollback immediately if issues detected
# Create alias for traffic shifting
aws lambda create-alias \
--function-name dagy-api \
--name LIVE \
--function-version 1
# Update alias to shift traffic
aws lambda update-alias \
--function-name dagy-api \
--name LIVE \
--function-version 2 \
--routing-config AdditionalVersionWeight=0.1 # 10% to v2, 90% to v1
Canary Deployment:
# API Gateway canary setting
aws apigatewayv2 create-deployment \
--api-id abc123 \
--stage-name prod \
--canary-settings traceEnabled=true,useStageCache=false,percentTraffic=10
Troubleshooting
Common Deployment Issues
Issue: CDK bootstrap fails
Symptoms: TemplateURL must be a valid S3 URL
Solution:
# Ensure AWS credentials are correct
aws sts get-caller-identity
# Try bootstrap again with explicit parameters
cdk bootstrap aws://123456789012/us-east-1 \
--profile default \
--force
Issue: Lambda cannot access the database
Symptoms: User: arn:aws:lambda:... is not authorized to perform: dynamodb:GetItem
Solution:
# Verify Lambda execution role has database permissions
aws iam list-attached-role-policies \
--role-name dagy-lambda-role
# Attach database policy if missing
aws iam attach-role-policy \
--role-name dagy-lambda-role \
--policy-arn arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess
Issue: Lambda environment variables not set
Symptoms: KeyError: 'DAGY_FLOWS'
Solution:
# Check current Lambda config
aws lambda get-function-configuration \
--function-name dagy-api-lambda \
--query Environment
# Update environment variables
aws lambda update-function-configuration \
--function-name dagy-api-lambda \
--environment Variables={DAGY_FLOWS=dagy-flows-dev,DAGY_RUNS=dagy-runs-dev}
Health Check Debugging
Lambda health endpoint failing
# Test health endpoint directly
curl -X GET \
https://xyz123.execute-api.us-east-1.amazonaws.com/health
# Check Lambda logs
aws logs tail /aws/lambda/dagy-api-lambda --follow
# Invoke Lambda directly for debugging
aws lambda invoke \
--function-name dagy-api-lambda \
--payload '{"resource": "/health", "httpMethod": "GET"}' \
response.json
cat response.json
Database connectivity issues
# Test database connection
import boto3
dynamodb = boto3.resource("dynamodb", region_name="us-east-1")
table = dynamodb.Table("dagy-flows-dev")
try:
response = table.get_item(Key={"flow_id": "test"})
print("Database connectivity: OK")
except Exception as e:
print(f"Database error: {e}")
Lambda Cold Start Optimization
Measure cold start time
import time
start = time.time()
# Lambda handler code
cold_start_ms = (time.time() - start) * 1000
print(f"Cold start time: {cold_start_ms}ms")
Cold start is > 500ms? Consider:
- Reducing Lambda package size (250MB max)
- Using Lambda layers for dependencies
- Provisioned concurrency for predictable traffic
- Moving to Lambda@Edge for reduced latency
Optimize Lambda package size
# Check current package size
aws lambda get-function \
--function-name dagy-api-lambda \
--query 'Configuration.CodeSize'
# Remove unnecessary files from Docker image
# In Dockerfile:
RUN find . -name "*.pyc" -delete
RUN find . -name "__pycache__" -type d -delete
Database Throttling
Symptoms
ProvisionedThroughputExceededException- Lambda timeouts
- API 5xx errors
Solutions
# Check consumed capacity
aws cloudwatch get-metric-statistics \
--namespace AWS/DynamoDB \
--metric-name ConsumedWriteCapacityUnits \
--dimensions Name=TableName,Value=dagy-flows-dev \
--start-time 2024-03-01T00:00:00Z \
--end-time 2024-03-01T23:59:59Z \
--period 300 \
--statistics Sum
# Increase capacity (for provisioned mode)
aws dynamodb update-table \
--table-name dagy-flows-dev \
--provisioned-throughput ReadCapacityUnits=200,WriteCapacityUnits=200
# Or switch to on-demand mode
aws dynamodb update-table \
--table-name dagy-flows-dev \
--billing-mode PAY_PER_REQUEST
Optimization strategies
- Enable database change streams for CDC
- Use batch operations for bulk reads and writes
- Implement query pagination
- Use TTL for automatic data cleanup
- Create secondary indexes for common queries
Rate Limiting Issues
Symptoms
429 Too Many RequestsresponsesX-RateLimit-Remaining: 0header
Adjust rate limits
# In src/dagy_api/app.py
app.add_middleware(
RateLimitMiddleware,
requests_per_minute=240, # Increased from 120
burst_size=400 # Increased from 200
)
Test rate limiting
# Send 150 requests in 1 minute
for i in {1..150}; do
curl -X GET \
https://api.example.com/health \
-H "Authorization: Bearer dagy_test" \
-w "Status: %{http_code}\n"
done
Frontend Issues
Clerk authentication not working
# Check Clerk keys are correctly set
echo $NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY
# Verify domain is allowed in Clerk dashboard
# Settings → Domains → check yourdomain.com is listed
# Check webhook is configured
# Settings → Webhooks → user.created event points to API
API CORS errors
Access to XMLHttpRequest at 'https://api.example.com/flows'
from origin 'https://frontend.example.com' has been blocked by CORS policy
Solution: Update CDK configuration
# infrastructure/production.yml
dagy:
api_cors_allowed_origins:
- "https://frontend.example.com"
- "https://www.example.com"
Then redeploy:
cdk deploy --env production
Monitoring and Debugging
Enable verbose logging
# In Lambda environment
DAGY_LOG_LEVEL=DEBUG
# In local testing
export DAGY_LOCAL_VERBOSE=true
uv run python -m dagy_api.app
CloudWatch Insights queries
# Find errors in logs
fields @timestamp, @message, @logStream
| filter @message like /error|exception/i
| stats count() as errors by @logStream
# Track API latency
fields @duration
| stats avg(@duration), max(@duration), pct(@duration, 99)
# Find slow database queries
fields @duration, @message
| filter @message like /database|DynamoDB/
| stats pct(@duration, 95), pct(@duration, 99)
Support and Resources
- Documentation: https://docs.dagy.io
- GitHub Issues: https://github.com/equinox-data/dagy/issues
- Slack Community: https://dagy-community.slack.com
- Email Support: support@dagy.io
Next Steps
- Complete the deployment steps above
- Run health checks on the API:
curl https://api.example.com/health - Deploy the frontend and configure authentication
- Create your first flow using the SDK
- Deploy the flow and execute it via the API
- Set up monitoring, alerting, and backup policies
- Configure RBAC and security policies for your organization
Version: 1.0.0 Last Updated: March 2024 Author: Dagy Team