Guides

Flow Builder User Guide

Welcome to the Dagy Flow Builder – a powerful, intuitive visual interface for creating data pipelines without writing code. This guide will walk you through everything you need to know to build, deploy, and manage your data workflows.

Overview
Interface Layout
The Task Node
Connectors & Typed Data Flow
Building Your First DAG
Edge Creation & Dependencies
Task Configuration Panel
Flow Configuration Panel
Validation
Custom Nodes
Saving & Loading
Deploying from the Builder
Editing Existing Flows
Keyboard Shortcuts
Best Practices
Troubleshooting

Overview

What is the Flow Builder?

The Flow Builder is a visual, drag-and-drop interface for creating Directed Acyclic Graphs (DAGs) – data pipelines that process information through a series of interconnected tasks. Built with React Flow, it provides an intuitive canvas where you can design complex workflows without writing a single line of code.

Who is it for?

Data Analysts who want to build pipelines without learning Python
Business Users who need to orchestrate data workflows
Data Engineers who want a faster way to prototype pipelines
Teams that need to collaborate on pipeline design

When to Use the Visual Builder vs Code SDK

Use the Flow Builder when you:

Want a quick visual overview of your pipeline
Are building relatively simple to moderately complex workflows
Need to collaborate with non-technical team members
Want to iterate rapidly without coding
Prefer drag-and-drop simplicity

Use the Code SDK when you:

Need fine-grained control over task behavior
Are building highly complex, conditional workflows
Have custom logic that doesn't fit pre-built nodes
Want version control and code review for your pipelines
Are integrating with existing Python codebases

Accessing the Builder

You can access the Flow Builder in two ways:

From the Dashboard: Click the "New Flow" button in the left sidebar (usually at the top or under a "Create" section)
Direct URL: Navigate directly to /flow-builder in your Dagy instance

Interface Layout

The Flow Builder is organized into four main areas, designed to make pipeline creation intuitive and efficient.

Left Sidebar: Node Panel

The left sidebar is a compact 180px strip containing the node panel. Features include:

Task Node: A single draggable Task node that serves as the universal node type
Drag Interface: Click and drag the Task node onto the canvas to add it to your workflow
Simplicity: With a single node type, you focus on configuration rather than selection

To add a task to your pipeline, simply drag the Task node from the library onto the canvas, then configure it with a Python import path that defines what it does.

Center: Canvas

The canvas is the main workspace where you build your DAG:

Grid Background: Helps with alignment and visual organization
Nodes: Appear as rectangular blocks with connector handles
Connector Handles: Small colored circles on node edges. Hover over any handle to see its name, accepted data types, and whether it's required
Edges: Lines connecting specific connectors between nodes that show typed data flow
Minimap: Small preview in the corner showing your entire workflow
Zoom Controls: Buttons (+/-) or scroll wheel to zoom in/out
Pan: Click and drag on empty canvas to move around your workflow

Right Sidebar: Task Configuration Panel

When you click on a node, the right panel opens with two tabs:

Config Tab - Core task configuration:

Task Name: Editable field to identify the task (must be unique within the DAG)
Description: Optional explanation of what the node does
Import Path: The Python module and function to execute (format: module:function)
Retries: Number of retry attempts (0-10, default: 0)
Retry Delay: Seconds to wait between retry attempts
Timeout: Maximum execution time in seconds
Concurrency Limit: How many instances can run in parallel
Action Buttons: Duplicate or delete the node

Code Tab - Generated Python code:

Shows the auto-generated @task decorator and function signature with proper arguments
Displays Python code with syntax highlighting
Read-only view of what will be executed
Helps visualize the task in code form

The panel closes when you click elsewhere on the canvas or on another node.

Top Toolbar

The toolbar at the top provides quick access to workflow management:

Flow Name Display: Shows the current flow name with an "Editing" badge when editing an existing flow
Settings Button (Settings2 icon): Opens the Flow Configuration Panel
Save Button: Manually save your current draft (or press Ctrl+S)
Deploy Button: Prepare your DAG for production
Undo/Redo Buttons: Navigate through your editing history
Validation Button: Check for issues before saving/deploying

The Task Node

The Flow Builder uses a single, universal Task node type. Instead of choosing from 28+ specialized node types, you drag a Task node onto the canvas and configure it with a Python import path that defines what it does.

Why a Single Node Type?

Simplicity: No need to search through dozens of node categories
Flexibility: Any Python function can be executed, from data ingestion to transformations to notifications
Consistency: All tasks are configured the same way
Extensibility: Add new capabilities without modifying the builder

How Task Nodes Work

Drag a Task node from the left sidebar onto the canvas
Configure it with a name and import path (e.g., s3_operations:read_csv)
Connect it to other tasks via edges
View the generated Python code in the Code tab

The import path determines what your task does:

s3_operations:read_csv – reads CSV from S3
data_transforms:filter_active – filters active records
notifications:send_slack_message – sends Slack notifications
ml_models:predict – runs ML inference
Any custom Python function you write

Task Node Anatomy

A Task node on the canvas displays:

Task Name (center): The identifier for this task
Top Connector (Inbound): Input data from upstream tasks
Bottom Connector (Outbound): Output data to downstream tasks
Configuration Indicator: Shows if the task is fully configured

Connectors & Typed Data Flow

Every node in the Flow Builder has typed connectors — small handles on the top (inputs) and bottom (outputs) of each node. Connectors carry type information that determines what connections are valid.

How Connectors Work

Each connector has a name, description, data types, cardinality, and required flag. When you hover over any connector handle, a tooltip shows this information. The Flow Designer only allows connections between compatible connectors.

Data Types

The system supports 15 data types. When you draw an edge, the source connector's data types must be compatible with the target connector's types:

any — Universal wildcard, connects to everything
string — Text data (also compatible with json and document targets)
number — Numeric data
boolean — True/false values
json — Structured JSON objects
dataframe — Tabular data (also compatible with json and list targets)
list — Arrays/sequences (also compatible with json targets)
embedding — Vector embeddings (also compatible with list targets)
image, audio, document — Media types
event — Event payloads (also compatible with json targets)
trigger — Execution signals with no data payload
error — Error information for error handling channels

Connection Validation

When you drag an edge from one node to another, the system validates the connection in real time:

Direction check: You must connect an outbound connector (bottom) to an inbound connector (top)
Type compatibility: At least one source data type must match or be coercible to a target data type
Cardinality check: The connector hasn't exceeded its maximum number of connections

If a connection is invalid, the edge is silently rejected. Check the browser console for specific rejection reasons.

Node Configuration Properties

Every task, regardless of its purpose, is configured with these properties:

Property	Type	Required	Description
name	string	Yes	Unique identifier for the task within the DAG
description	string	No	Human-readable explanation of the task's purpose
import_path	string	Yes	Python module and function (format: `module:function`)
retries	integer	No	Number of retry attempts (0-10, default: 0)
retry_delay	integer	No	Seconds to wait between retry attempts
timeout	integer	No	Maximum execution time in seconds
concurrency_limit	integer	No	Maximum parallel instances of this task

Building Your First DAG

Let's walk through creating a simple data pipeline step-by-step.

Step 1: Access the Builder

Navigate to the Flow Builder by clicking "New Flow" in the sidebar or visiting /flow-builder.

Step 2: Add Your First Task Node

Look at the left sidebar – you'll see the Task node in the node panel
Drag the Task node onto the center canvas
Release to drop the node

You'll see a rectangular node appear on the canvas with:

A node title (default: "Task")
A connector on the top (input)
A connector on the bottom (output)

Step 3: Add More Task Nodes

Let's build a three-task pipeline:

Drag another Task node onto the canvas to the right of the first one
Drag a third Task node to the right of the second one

You now have three nodes ready to be configured and connected.

Step 4: Configure Each Task

Start with the first task:

Click on the first Task node – the right panel will open showing its configuration
Change the Name from "Task" to something descriptive like "Load Customer Data"
Enter a Description: "Reads customer CSV from S3 bucket"
Set Import Path: s3_operations:read_csv
Click the Code tab to see the generated Python code
Click elsewhere to close the configuration panel

Now configure the second task:

Click on the second Task node
Set Name: "Filter Active Customers"
Set Description: "Keeps only customers with active status"
Set Import Path: data_transforms:filter_active
Click elsewhere to save

Finally, configure the third task:

Click on the third Task node
Set Name: "Notify Team"
Set Description: "Send summary to #data-ops channel"
Set Import Path: notifications:send_slack_message
Click elsewhere to save

Step 5: Connect the Tasks

Now connect your nodes to show data flow:

Hover over the bottom (output) of the "Load Customer Data" node – you'll see the connector highlight
Click and drag from this connector to the top (input) of the "Filter Active Customers" node
An edge (line) appears connecting the two nodes
Repeat: Connect "Filter Active Customers" to "Notify Team"

Your pipeline is now complete: Load Data → Filter Data → Notify Team

Step 6: Review Your DAG

Check the canvas – you should see three connected nodes
Use the minimap in the corner to see the overall structure
Use scroll or zoom controls to get a better view

Step 7: Save Your Work

Click the Save button in the top toolbar (or press Ctrl+S). Your draft is now stored and you can return to it later.

Edge Creation & Dependencies

Edges are the connections between nodes that represent both data flow and execution order.

Creating Edges

Method 1: Drag from Connector Handle

Hover over a connector handle (bottom of the source node for outputs, top of the target node for inputs)
A tooltip appears showing the connector name and accepted data types
Click and drag from the source connector to the target connector
Release to create the edge — the system validates type compatibility automatically
If the types are incompatible, the connection is rejected silently

Method 2: Smart Connection

If a node has only one output connector and the target has only one input connector, React Flow will snap to the correct handles automatically
For nodes with multiple connectors, aim for the specific handle you want

Understanding Data Flow

Direction: Edges flow from outbound connectors (bottom) to inbound connectors (top)
Typed connections: Each edge carries data between specific typed connectors
Execution Order: Target nodes don't execute until their source nodes complete
Animated Edges: Edges animate to show direction of data flow
Multiple Inputs: An inbound connector can accept edges from multiple sources (depending on cardinality)
Multiple Outputs: An outbound connector can send edges to multiple targets

Deleting Edges

Click on an edge (the line between nodes) to select it
Press Delete or Backspace to remove it
The nodes remain; only the connection is deleted

Creating Complex Pipelines

Edges enable sophisticated workflows:

Fan-out: One node connects to multiple downstream nodes (parallel processing)
Fan-in: Multiple nodes connect to one downstream node (data merging)
Sequential: Nodes connect in a line (serial processing)
Diamond Pattern: Node A → B and C, then B → D and C → D (merge and split)

Task Configuration Panel

The right sidebar panel provides detailed control over each task's behavior. Understanding these options will help you build robust pipelines.

Config Tab: Basic Identification

Name (Required)

Must be unique within your DAG
Should be descriptive (e.g., "Load User Data" not "Task 1")
Used in logs, alerts, and execution reports
Changed by editing the text field in the configuration panel

Description (Optional)

Free-form text explaining the task's purpose
Useful for team collaboration
Appears in tooltips when hovering over nodes
Helps future maintainers understand your pipeline

Config Tab: Import Path (Required)

The Import Path tells Dagy which Python function to execute for your task.

Format: module_name:function_name

Examples:

customer_data:load_from_s3 – calls load_from_s3() from customer_data module
transformations:clean_text – calls clean_text() from transformations module
ml_models:predict – calls predict() from ml_models module

The function must:

Be defined in a Python module your Dagy instance can access
Accept parameters passed by the DAG
Return data for downstream nodes
Be decorated with @task if using the Python SDK

Config Tab: Retry Settings

Retries (Default: 0)

Number of times to re-run a failed task
Range: 0-10 attempts
Useful for flaky network operations or transient errors
Example: Set to 3 for API calls that occasionally timeout

Retry Delay (Default: 0)

Seconds to wait between retry attempts
Example: Set to 5 for API calls with rate limiting
Allows external systems time to recover

Config Tab: Timeout Settings

Timeout (No default)

Maximum execution time in seconds
Task fails if it doesn't complete within this window
Example: Set to 300 for a 5-minute maximum
Prevents hanging tasks from blocking your pipeline

Config Tab: Concurrency Control

Concurrency Limit (No default)

Maximum number of parallel instances of this task
Example: Set to 1 to ensure sequential execution
Set to 5 to allow up to 5 simultaneous runs
Useful when tasks have resource constraints

Config Tab: Action Buttons

Duplicate Button

Creates an exact copy of the current node
Copy appears on the canvas with "_copy" suffix
Useful for similar tasks in parallel branches
Also available via Ctrl+D

Delete Button

Removes the node and all its connected edges
Cannot be undone (use Undo button to recover)
Useful for removing experimental nodes

Code Tab: Generated Python Code

The Code tab displays the auto-generated Python code for your task:

@task(
    name="Load Customer Data",
    retries=0,
    timeout=None,
    concurrency_limit=None
)
def load_customer_data():
    """Reads customer CSV from S3 bucket"""
    # Import and call the function from your module
    from s3_operations import read_csv
    return read_csv()

This read-only view shows:

The @task decorator with your configuration
The function name (derived from your task name)
Configuration parameters (retries, timeout, concurrency_limit)
The import statement and function call
Helpful for understanding what will be executed

The Code tab is automatically updated when you change your task configuration in the Config tab.

Flow Configuration Panel

The Flow Configuration Panel contains flow-level settings that apply to the entire pipeline. Access it by clicking the Settings2 icon (gear icon) in the toolbar.

Flow Name

Required for deployment
The identifier for your entire pipeline
Used in logs, dashboards, and execution history
Appears in the toolbar when editing

Flow Version

Semantic versioning (e.g., 1.0.0, 1.1.0, 2.0.0)
Auto-suggested when deploying edited flows
Helps track pipeline evolution
Appears in deployment history

Flow Description

Optional explanation of what this flow does
Useful for team collaboration
Helps future maintainers understand the pipeline's purpose

Executor Selection

Choose where tasks in this flow execute:

Lambda: Fast, serverless execution. Good for short tasks (< 5 min), low memory (< 512MB), stateless operations
Step Functions: AWS-native orchestration. Good for AWS-integrated workflows and state machines
ECS: Containerized execution. Best for long-running tasks (> 15 min), memory-intensive operations, or custom Docker images

This setting is the default for all tasks; individual tasks can override this if needed.

Environment Selector

Select the execution environment (e.g., development, staging, production)
Determines which database, API keys, and configuration your tasks access
Helpful for testing flows before production deployment

Summary Badges

Quick visual indicators showing:

Node Count: Number of tasks in the flow
Status: Draft, Published, or Editing
Last Modified: Timestamp of the last change

Validation

Before saving or deploying your DAG, the builder validates it for common issues. Understanding validation helps you catch problems early.

Validation Checks

The builder performs these checks:

1. Cycle Detection (Kahn's Algorithm)

What it checks: Ensures your DAG has no circular dependencies
Why it matters: A cycle would cause infinite loops
Example error: "Task A → B → A" creates a cycle
How to fix: Review your connections and remove the circular edge

2. Required Fields

What it checks: Every task must have a name and import_path
Why it matters: These fields are essential for execution
Example error: "Node 'Task 1': name is required"
How to fix: Click the node and fill in the missing fields

3. Orphan Detection

What it checks: Identifies nodes with no incoming or outgoing edges
Why it matters: Orphaned nodes never execute
Example warning: "Node 'Old Task' has no connections"
How to fix: Either connect the node or delete it

4. Duplicate Labels

What it checks: No two nodes can have the same name
Why it matters: Names are unique identifiers for execution tracking
Example error: "Duplicate node name: 'Process Data' appears 2 times"
How to fix: Rename one of the duplicate nodes

5. Import Path Validation

What it checks: Attempts to verify the module:function exists
Why it matters: Invalid paths cause runtime execution failures
Example error: "Cannot resolve import_path 'nonexistent:function'"
How to fix: Verify the module name and function exist in your codebase

6. Connection Type Compatibility

What it checks: Source connector data types must be compatible with target connector types
Why it matters: Type mismatches cause runtime data errors
How it works: The builder prevents invalid connections during edge creation. Existing edges are re-validated when you run validation.

Running Validation

Automatic Validation

Runs before every save and deploy
Displays issues in a panel below the canvas
Does not prevent saving (warnings only)

Manual Validation

Click the "Validation" button in the top toolbar
Opens a detailed validation report
Shows all errors, warnings, and successful checks

Reading Validation Output

Validation results appear in a panel with:

Errors (Red)

Must be fixed before deployment
Examples: cycles, missing required fields, duplicates

Warnings (Yellow)

Do not prevent deployment but should be reviewed
Examples: orphaned nodes, unusual configurations

Info (Blue)

Informational messages about your DAG
Examples: "DAG is valid", "Total nodes: 5"

Invalid Nodes Highlighting

When validation finds issues:

Error nodes appear with a red border
Error edges appear in red
Affected nodes are highlighted on the canvas
Click a highlighted node to see details in the configuration panel

Custom Nodes

Beyond the built-in Task node, your organization can create custom node types that appear in the Flow Designer. Custom nodes extend functionality for domain-specific use cases.

What Are Custom Nodes?

Custom nodes are specialized processing steps built by your team. They extend the FlowNode base class in Python, defining their own connectors, configuration schema, and execution logic. Once registered, they appear in the sidebar and can be used just like built-in nodes — drag them onto the canvas, configure them, and connect them to other nodes.

How to Create Custom Nodes

Creating a custom node involves writing a Python class that extends FlowNode and implements three methods: metadata() (identity), connectors() (typed ports), and execute() (logic). The framework auto-registers your class when Python imports the module.

For the full step-by-step tutorial, see the Creating Custom Nodes guide.

How Custom Nodes Appear in the Builder

Custom nodes registered via the API or Python SDK are fetched automatically when you open the Flow Designer. They appear in the sidebar alongside the Task node. You can drag and configure them just like any other node.

Managing Custom Nodes via API

Your organization can register, update, and delete custom nodes through the Node Registry API:

POST /nodes/registry — Register a new custom node
PUT /nodes/registry/{node_type} — Update an existing custom node
DELETE /nodes/registry/{node_type} — Remove a custom node

See the API Endpoints reference for full details.

Saving & Loading

The builder automatically and manually saves your work, allowing you to iterate safely and return to previous drafts.

Auto-Save

How it works:

Every 30 seconds, the builder saves your canvas state automatically
Saves as a draft record in the database
Happens silently in the background
You'll see a brief "Saving..." indicator

What gets saved:

Node positions and configurations
Edge connections
Canvas zoom level and pan position
All parameter values

Scope:

Drafts are organization-scoped
Only users in your organization can access your drafts
Drafts are separate from deployed flows

Manual Save

Ctrl+S Keyboard Shortcut

Press Ctrl+S (or Cmd+S on Mac)
Your draft saves immediately
You'll see a confirmation message

Save Button

Click the "Save" button in the top toolbar
Your draft saves immediately
You'll see a confirmation message

Loading Drafts

From the Dashboard:

Navigate to the Flows or DAGs section
Look for "Drafts" or "Recent Drafts"
Click on a draft to open it in the builder
The canvas loads with your previous configuration

Draft Information:

Name and description
Last modified timestamp
Org-scoped access
Status indicator (draft/deployed)

Draft Storage

Storage Location:

Stored as draft records in the database
Format: JSON
Indexed by organization and user

Retention:

Drafts are retained indefinitely
You can have multiple drafts simultaneously
Archive old drafts when no longer needed

Deploying from the Builder

Once your DAG is complete and validated, deploying makes it a live, executable flow.

Step 1: Click Deploy

Click the Deploy button in the top toolbar
A deployment dialog opens with a form

Step 2: Fill Deployment Metadata

The deployment form is pre-populated from the Flow Configuration Panel when editing existing flows. For new flows, fill in:

Flow Name (Required)

Descriptive name for your pipeline (e.g., "Customer ETL v1.0")
Can include version numbers
Used in logs and dashboards

Version (Auto-generated)

Semantic versioning (e.g., 1.0.0)
When editing, the form suggests the next version (e.g., 1.0.1)
Helps track pipeline evolution

Description (Optional)

Explain what this flow does
Note any recent changes
Helps team members understand the pipeline

Executor (Pre-populated from Flow Config)

Choose where tasks execute:
- Lambda: Fast, serverless, good for short tasks
- Step Functions: AWS-native orchestration
- ECS: Containerized workloads, better for long-running tasks
Overridable per-task if needed

Schedule (Optional) Choose how often to run this flow:

Manual: Only run when triggered explicitly
One-time: Run a single time at a specified date/time
Cron Expression: Advanced scheduling (e.g., "0 2 * * *" for 2 AM daily)
Interval: Repeat every N minutes/hours/days

Example schedules:

0 0 * * * – Every day at midnight
0 */6 * * * – Every 6 hours
0 9 * * 1-5 – Weekdays at 9 AM
*/15 * * * * – Every 15 minutes

Tags (Optional)

Add labels like "production", "customer-data", "v2"
Help organize and filter flows
Useful for cost allocation and team assignment

Step 3: Review and Confirm

Review all metadata for accuracy
Check that executor choice matches your workload
Verify the schedule if setting one up
Click "Deploy" button to proceed

Step 4: Deployment Process

When you click Deploy, the builder:

Serializes your canvas into FlowSpec JSON format
Validates the entire specification
Posts to the Dagy API endpoint /flows
Creates a Flow record in your organization
Returns a flow ID and status

Step 5: Post-Deployment

After successful deployment:

You'll see a success message with the flow ID
The flow appears in your Flows dashboard
Execution begins according to your schedule
You can monitor execution in the Flow Details page

Deployment Failures

If deployment fails:

Error message explains the issue
Common causes: Missing required fields, invalid executor, validation errors
Solution: Review the error, correct the issue, try again
Drafts are not affected by failed deployments

Editing Existing Flows

The builder allows you to import and modify previously deployed flows, maintaining version control and deployment history.

Edit Flow Workflow

Step 1: Load the Flow

Navigate to the Flows dashboard
Click on a deployed flow
Click "Edit in Builder"
The builder opens with that flow loaded

What gets pre-populated:

All task configurations and connections
Flow name, version, description
Executor, environment, and other flow-level settings
Canvas layout and zoom level
The toolbar shows the flow name with an "Editing" badge

Step 2: Make Changes

Modify task configurations
Add or remove tasks
Adjust connections
Update flow-level settings via the Flow Config Panel
Test via validation

Step 3: Version Management

The deployment form automatically suggests the next version
Example: 1.0.0 → 1.0.1 (patch) or 1.1.0 (minor)
Document changes in the description field

Step 4: Re-Deploy

Click "Deploy" button
The deployment form is pre-populated with current flow settings
Update the version and description
Choose whether to create a new version or update existing
Execute the updated flow

Auto-Layout When Importing

When importing a FlowSpec JSON, the builder:

Analyzes the DAG structure
Calculates optimal node positions
Applies hierarchical layout algorithm
Preserves original node positions if included in JSON
Spacing ensures readability with no node overlaps

Preserving Node Positions

When you export and re-import a FlowSpec JSON with position data:

{
  "nodes": [
    {
      "id": "task1",
      "position": {"x": 100, "y": 200}
    }
  ]
}

The builder:

Uses stored positions instead of auto-layout
Respects your manual arrangement
Maintains organization across imports/exports

Editing Drafts

You can also edit unsaved drafts:

Navigate to Drafts section
Click on a draft
Make modifications
Save changes
Deploy when ready

Keyboard Shortcuts

Keyboard shortcuts accelerate your workflow in the builder. Commit these to memory for faster development.

Navigation & View

Shortcut	Action
Scroll Wheel	Zoom in/out on canvas
Ctrl + A / Cmd + A	Select all nodes and edges
Click + Drag (empty)	Pan canvas (move around)
Home	Fit entire DAG in view

Editing

Shortcut	Action
Ctrl + Z / Cmd + Z	Undo last action
Ctrl + Shift + Z / Cmd + Shift + Z	Redo last action
Delete / Backspace	Delete selected node(s) or edge(s)
Ctrl + D / Cmd + D	Duplicate selected node
Ctrl + X / Cmd + X	Cut selected node(s)
Ctrl + C / Cmd + C	Copy selected node(s)
Ctrl + V / Cmd + V	Paste copied node(s)

Saving & Validation

Shortcut	Action
Ctrl + S / Cmd + S	Save draft
Ctrl + Shift + S / Cmd + Shift + S	Save with name dialog
Ctrl + Enter / Cmd + Enter	Validate and show report

Node Selection

Shortcut	Action
Click	Select single node/edge
Shift + Click	Add to selection
Click + Drag	Select multiple nodes (box select)
Escape	Deselect all

Best Practices

Following these best practices will make your DAGs more maintainable, reliable, and efficient.

Naming Conventions

Descriptive Task Names

Use clear, action-oriented names
Good: "Load Customer Data", "Filter Active Users", "Send Slack Alert"
Avoid: "Task 1", "Process", "Step A"

Naming Consistency

Use consistent patterns across your org
Example: verb + object: "Load_Data", "Transform_Data", "Export_Data"
Example: action + system: "FetchFrom_API", "WriteTo_Database"

Unique Names

Every task must have a unique name within its DAG
Names become identifiers in logs and monitoring

DAG Complexity

Keep DAGs Simple

Recommend: < 20 nodes per DAG
Benefits: easier to understand, faster to execute, simpler debugging
If exceeding 20 nodes, consider splitting into multiple flows

Logical Grouping

Group related tasks together visually
Use descriptive names to show relationships
Organize left-to-right: ingestion → transform → export

Avoid Deep Nesting

Deep chains (100+ sequential tasks) are hard to debug
Break into separate flows with intermediate storage
Use parallel branches instead of sequential when possible

Validation & Testing

Always Validate Before Deploy

Click "Validation" button
Fix all errors and review warnings
Ensure no orphaned nodes or cycles

Test Locally First

Export as FlowSpec JSON
Review in the Code tab to see generated @task decorators
Verify your import paths are correct
Test with sample data similar to production

Version Semantically

Use semantic versioning: MAJOR.MINOR.PATCH
MAJOR: breaking changes
MINOR: new features
PATCH: bug fixes
Example: 1.2.3 → 1.2.4 (patch), 1.3.0 (minor), 2.0.0 (major)

Configuration

Timeout Settings

Always set timeouts for external API calls
Default: None (unlimited)
Recommended: API calls 30-60s, file operations 300s
Prevents hung tasks from blocking pipeline

Retry Logic

Use retries for flaky operations (APIs, network)
Avoid retries for deterministic failures
Set retry_delay for rate-limited APIs
Recommended: 2-3 retries for transient errors

Concurrency Limits

Set to 1 for serial, state-dependent tasks
Set to 5-10 for parallel, independent tasks
Consider downstream system capacity
Monitor resource usage during execution

Execution Planning

Choose the Right Executor

Lambda: < 5 min, < 512MB, stateless tasks
Step Functions: AWS-native workflows, state machines
ECS: Long-running (> 15 min), memory-intensive, containerized

Schedule Appropriately

Off-peak hours for heavy operations
Check downstream system availability
Consider timezone implications
Monitor execution history

Monitor and Alert

Set up notifications for failures
Monitor execution time trends
Alert on resource anomalies
Review logs regularly

Documentation

Describe Your Flows

Add meaningful descriptions to each task
Explain complex transformations
Document custom parameters
Help future maintainers understand intent

Document Changes

Use version descriptions to document updates
Note what changed and why
Reference related tickets or PRs
Maintain changelog

Troubleshooting

Common Issues and Solutions

Canvas Not Loading

Problem: Flow Builder page stays blank or shows loading spinner indefinitely.

Causes:

Browser JavaScript disabled
React Flow CSS not loaded
Network connection issue
Browser console errors

Solutions:

Check browser console (F12 → Console tab)
Look for JavaScript errors
Verify React Flow CSS is loading (Network tab)
Try refreshing the page (Ctrl+R / Cmd+R)
Clear browser cache and try again
Try a different browser to isolate issues

Debug steps:

// In browser console, check if React Flow is loaded:
console.log(typeof ReactFlow)
// Should print "object", not "undefined"

Nodes Not Connecting

Problem: Can't create edges between nodes; drag operation doesn't work.

Causes:

Data type incompatibility: The source connector's data types don't match the target connector's accepted types
Cardinality exceeded: The connector already has the maximum number of connections
Dragging from an inbound connector to another inbound connector (must be outbound → inbound)
Trying to connect a node to itself
Handles not properly rendered

Solutions:

Hover over both connectors to check their data types — they must overlap
Verify you're dragging from a BOTTOM handle (outbound) to a TOP handle (inbound)
Check the browser console (F12) for specific rejection messages (e.g., "Connection rejected: No compatible data types")
If a connector has a cardinality of "one", disconnect the existing edge first
Try zooming in to see individual connector handles
Refresh the page if handles don't appear

Visual indicators:

Outbound handles (bottom): Connect these as sources
Inbound handles (top): Connect these as targets
Hovering shows a tooltip with connector name, data types, and required status

Validation Errors

Problem: Validation shows errors you don't know how to fix.

Causes:

Missing required fields (name, import_path)
Import path doesn't exist in your codebase
Duplicate task names
Circular dependencies

Solutions by error type:

"Name is required"

Click the affected node
Enter a name in the "Name" field
Name must be unique and descriptive

"Cannot resolve import_path"

Verify the module exists in your Python path
Check spelling: module_name:function_name
Ensure function is exported and accessible
Test import locally: from module_name import function_name

"Duplicate task name"

Click each node with that name
Rename to unique values
Use suffixes if needed: "Process Data v1", "Process Data v2"

"Circular dependency detected"

Review your edge connections
Look for A → B → ... → A paths
Remove the edge that completes the circle
Use the canvas to visualize the cycle

Deploy Failures

Problem: Deploy button returns an error.

Causes:

Validation errors exist
Required metadata missing
Invalid executor choice
API connectivity issues
Permission/authentication issues

Solutions:

Run Validation button first
Fix any reported errors
Fill in all required deployment fields
Check network connectivity
Verify your organization has permission to create flows
Check API status page
Try again in a few moments

Performance Issues

Problem: Canvas is slow, zooming is laggy, selection is delayed.

Causes:

Very large DAG (100+ nodes)
Browser has low memory
Too many browser tabs open
GPU acceleration disabled

Solutions:

Split large DAG into smaller flows
Close other browser tabs
Try a different browser
Check browser performance in DevTools
Zoom out to see entire DAG
Restart browser if persistent

Lost Work

Problem: Changes not saved or draft disappeared.

Causes:

Didn't click Save button
Auto-save didn't complete
Browser crashed before save
Cleared browser cache/storage
Network disconnection during save

Prevention:

Manually save frequently (Ctrl+S)
Check for "Saving..." indicator
Monitor auto-save status
Avoid closing tab without saving
Use version control for exports

Recovery:

Check browser history for draft URL
Contact support for database recovery
Check backup/snapshot if available
Export from deployed flow if one exists

Import Issues

Problem: Can't import FlowSpec JSON or import fails.

Causes:

Invalid JSON format
Missing required fields in JSON
Version mismatch
File too large
Corrupt file

Solutions:

Validate JSON format (use online JSON validator)
Verify all required fields present: name, nodes, edges
Check file size (should be < 10MB)
Re-export from original source
Try manual import instead of drag-drop
Recreate in builder manually if needed

Valid FlowSpec example:

{
  "name": "My Flow",
  "version": "1.0.0",
  "nodes": [],
  "edges": []
}

Getting Help

Where to find help:

Documentation: Browse the Dagy docs site
API Reference: Detailed endpoint documentation
Community Forum: Ask questions and share solutions
Support: Contact the Dagy support team

Happy building! The Flow Builder makes it easy to create powerful data pipelines without writing code. Start simple, iterate frequently, and leverage the validator to catch issues early.

Flow Builder User Guide

Table of Contents

Overview

What is the Flow Builder?

Who is it for?

When to Use the Visual Builder vs Code SDK

Accessing the Builder

Interface Layout

Left Sidebar: Node Panel

Center: Canvas

Right Sidebar: Task Configuration Panel

Top Toolbar

The Task Node

Why a Single Node Type?

How Task Nodes Work

Task Node Anatomy

Connectors & Typed Data Flow

How Connectors Work

Data Types

Connection Validation

Node Configuration Properties

Building Your First DAG

Step 1: Access the Builder

Step 2: Add Your First Task Node

Step 3: Add More Task Nodes

Step 4: Configure Each Task

Step 5: Connect the Tasks

Step 6: Review Your DAG

Step 7: Save Your Work

Edge Creation & Dependencies

Creating Edges

Understanding Data Flow

Deleting Edges

Creating Complex Pipelines

Task Configuration Panel

Config Tab: Basic Identification

Config Tab: Import Path (Required)

Config Tab: Retry Settings

Config Tab: Timeout Settings

Config Tab: Concurrency Control

Config Tab: Action Buttons

Code Tab: Generated Python Code

Flow Configuration Panel

Flow Name

Flow Version

Flow Description

Executor Selection

Environment Selector

Summary Badges

Validation

Validation Checks

Running Validation

Reading Validation Output

Invalid Nodes Highlighting

Custom Nodes

What Are Custom Nodes?

How to Create Custom Nodes

How Custom Nodes Appear in the Builder

Managing Custom Nodes via API

Saving & Loading

Auto-Save

Manual Save

Loading Drafts

Draft Storage

Deploying from the Builder

Step 1: Click Deploy

Step 2: Fill Deployment Metadata

Step 3: Review and Confirm

Step 4: Deployment Process

Step 5: Post-Deployment

Deployment Failures

Editing Existing Flows

Edit Flow Workflow

Auto-Layout When Importing

Preserving Node Positions

Editing Drafts

Keyboard Shortcuts

Navigation & View

Editing

Saving & Validation