Troubleshooting Guide
Common issues and their resolutions when working with the Dagy platform.
Flow Executor Issues
"Flow Executor backend was requested but is not available"
Error message:
Flow Executor backend was requested (execution_mode='micro') but is not available.
Ensure the DAGY_FLOW_EXECUTOR_FUNCTION environment variable is set.
Cause: The run's execution_mode is set to nano or micro, which routes to the flow-executor backend. However, the Flow Executor Lambda is not registered because DAGY_FLOW_EXECUTOR_FUNCTION is not set on the API Lambda.
Resolution:
- Deploy the Flow Executor Lambda using the CDK stack (it is provisioned automatically as
{app}-flow-executor-{environment}). - Verify the
DAGY_FLOW_EXECUTOR_FUNCTIONenvironment variable is set on both the API Lambda and the Flow Executor Lambda itself (the CDK stack handles this viacommon_env). - After deploying, the
BackendRegistrywill register theflow-executorbackend on startup and routing will succeed.
Import errors after deployment (missing dependency packages)
Symptoms: The flow starts but fails with ModuleNotFoundError for packages that should be available (e.g., pandas, numpy).
Cause: Dependency packages are configured on the deployment but either the packages don't exist, or the S3 URIs can't be resolved.
Resolution:
- Verify the deployment has the correct
dep_package_slugsset:dagy deployments show my-deployment - Ensure the dependency packages exist and are in
ACTIVEstatus:dagy dep-packages list - Check that the Flow Executor Lambda's IAM role has
s3:GetObjectpermission on thedagy/dep-packages/*prefix in the artifacts bucket. - Check CloudWatch logs for the Flow Executor Lambda for download errors.
Per-task invocation can't find upstream output
Symptoms: In micro runtime tier, a downstream task fails with errors like KeyError or None when trying to access output from an upstream task.
Cause: Per-task invocation mode persists each task's output to S3 and loads it for downstream consumers. If S3 permissions are misconfigured or the output was too large, the handoff fails.
Resolution:
- Verify the Flow Executor Lambda's IAM role has both
s3:GetObjectands3:PutObjecton thedagy/task-state/*prefix. - Check CloudWatch logs for S3 upload/download errors between task invocations.
- Ensure upstream tasks return serializable output (JSON-safe dicts, lists, strings, numbers).
- For large outputs, verify the Lambda's memory is sufficient. Task state serialization happens in-memory.
Flow times out (15-minute Lambda limit)
Symptoms: The run status shows FAILED with a timeout error or the Lambda times out before all tasks complete.
Cause: Lambda functions have a maximum execution time of 900 seconds (15 minutes). If your flow's total execution exceeds this, the invocation is terminated.
Resolution:
- Use per-task invocation: Set
execution_modetomicroso each task runs in its own 15-minute invocation rather than sharing one:dagy deployments settings my-deployment --execution-mode micro - Use an alternative runtime tier: For flows that need more than 15 minutes per task, consider larger tiers (
small,medium,large,xlarge) which support longer durations:dagy deployments settings my-deployment --execution-mode large - Optimize tasks: Break large tasks into smaller units to fit within the invocation timeout.
Routing Issues
Flow runs on the wrong backend
Symptoms: A flow configured with execution_mode runs on the plain lambda backend instead of flow-executor, missing workspace bootstrap and dependency packages.
Cause: This was a known issue (C1 in the gap analysis) that has been fixed. With the fix, the system now fails explicitly instead of silently falling through. If you still see silent fallthrough, ensure you are running the latest version.
Resolution:
- Verify
DAGY_FLOW_EXECUTOR_FUNCTIONis set (see first section above). - Check the run record's
executorfield to confirm which backend was used:dagy runs show <run_id> - If the executor shows
lambdainstead offlow-executor, check thatexecution_modeis set on the deployment or run.
Scheduled/triggered runs ignore execution_mode
Symptoms: Runs triggered by the scheduler or via flow_trigger events don't use the expected execution mode.
Cause: This was a known issue (C2 in the gap analysis) where flow_trigger events did not forward execution_mode. This has been fixed. The execution_mode is now included in the RunRequest and the run_execute event payload.
Resolution: Ensure you are running the latest version. If the issue persists, check:
- The deployment record has
execution_modeset. - The schedule's trigger event includes
execution_modein the payload. - CloudWatch logs for the event handler show
execution_modebeing passed through.
Retried runs lose execution_mode
Symptoms: A run that is retried ends up on a different backend than the original run.
Cause: This was a known issue (C3 in the gap analysis) where run_retry events did not forward execution_mode. This has been fixed. The retry handler now looks up the run record and includes execution_mode in the retry event.
Resolution: Ensure you are running the latest version. The fix reads execution_mode from the original run record when constructing the retry event.
Deployment Issues
"Invalid execution_mode" validation error
Symptoms: Deploy or settings update fails with Invalid execution_mode: <value>. Must be one of {'nano', 'micro', 'small', 'medium', 'large', 'xlarge'}.
Cause: The execution_mode field only accepts valid runtime tier values.
Resolution: Use one of the valid values:
nano(default): entire flow runs in a single invocationmicro: each task runs in a separate invocationsmall,medium,large,xlarge: larger runtime tiers with extended execution times
CLI Issues
"Failed to trigger run" error
Symptoms: dagy run my-deployment fails with an API error.
Resolution:
- Verify the API URL is configured:
dagy config show - Ensure your token is valid:
dagy login - Check the deployment exists and is in
ACTIVEstatus - If using
--execution-mode, ensure the value is one ofnano,micro,small,medium,large, orxlarge