RPA Tools

RPA (Robotic Process Automation) tools are special tools that automate browser interactions using Playwright. They provide automatic screenshot capture, action logging, and browser session management.

What are RPA Tools?

RPA tools automate tasks on the web by controlling a browser programmatically. They are ideal for:

  • Web scraping and data extraction

  • Form filling and submission

  • Clicking buttons and navigating websites

  • Automating repetitive web-based workflows

  • Interacting with web applications that don't have APIs

RPA Tool Structure

RPA tools must implement the RPATool interface with type: 'rpa':

interface RPATool<Input extends z.ZodSchema, Memory> {
  name: string; // Unique tool identifier
  description: string; // What the tool does (used by LLM)
  input: Input; // Zod schema for input validation
  isGlobal?: boolean; // Optional: available across all LLM calls
  type: 'rpa'; // Required: marks this as an RPA tool
  proxy?: string; // Optional: proxy country code (e.g., "US", "GB", "DE")
  browserTaskMode?: BrowserTaskMode; // Optional: browser provider (local/cloud/onPrem)
  execute: ({ input, state, agent, page }) => Promise<{ result? }>;
}

Execute Function Signature

RPA tools receive an additional page parameter (Playwright Page object) in their execute function:

Parameters

  • input: Validated data matching your Zod schema

  • state: Current conversation state including memory, sessionId, and other context

  • agent: Agent instance providing access to PII gateway, logging, and other platform features

  • page: Playwright Page object for browser automation - automatically provided for RPA tools

Proxy Configuration

RPA tools can optionally specify a proxy country code to route browser traffic through a specific geographic location:

Note: Proxy configuration only applies when using cloud or on-prem browser providers. Local browser sessions ignore proxy settings.

Browser Provider Configuration

RPA tools can optionally specify which browser provider to use:

Available options:

  • BrowserTaskMode.LOCAL: Use local browser instance (browser lives in the same machine as the agent), for local development only.

  • BrowserTaskMode.CLOUD: Use cloud browser provider (allows for proxy configuration, also automatically passes CloudFlare and other anti-bot protection)

If not specified, the tool will use the browser provider configured in your environment (BROWSER_TASK_MODE environment variable).

Automatic Features

When you mark a tool as type: 'rpa', the platform automatically:

  1. Browser Session Management: Creates and manages a browser session

  2. Screenshot Capture: Takes screenshots before and after every Playwright action (click, fill, type, goto, etc.)

  3. Action Logging: Automatically logs all browser actions (e.g., "Navigate to: https://example.com", "Click: .submit-button")

  4. Error Handling: Captures screenshots and HTML on errors for debugging. No need to add any additional code to handle errors (try/catch blocks).

All screenshots and logs are automatically displayed in the UI when the tool executes.

Basic RPA Tool Example

Here's a simple RPA tool that navigates to a website and extracts data:

RPA Development Guidelines

Prerequisites

Before creating an RPA tool, gather:

  • The URL of the website you need to automate

  • The task to be performed (e.g., "Fill out a form", "Extract order details")

  • Any required information (e.g., login credentials, form data)

Development Workflow

  1. Plan the automation steps

    • Break down the task into discrete steps (e.g., "Navigate to URL", "Click on X button", "Type Y into Z input field")

    • Identify the selectors you'll need for each element

  2. Test with Playwright MCP

    • Use the Playwright MCP to interact with the browser interactively

    • Read the HTML of the page to find correct selectors

    • Test each step before writing the code

    • Iterate up to 3 times if a step fails

  3. Write the tool code

    • Use the page object provided in the execute function

    • Add comments to explain what each action does

    • Use appropriate Playwright methods (click, fill, type, goto, etc.)

  4. Test the complete tool

    • Run: npm run tool <toolName> <param1>=<value1> <param2>=<value2> ...

    • Add cookies=false flag to avoid using cookies from previous sessions (useful for login tasks)

    • Check screenshots and logs in the UI

    • If the tool fails, check the rpaTestResults folder for error screenshots and HTML

Available Data Sources

In the context of an RPA tool, you have access to:

  1. Tool input parameters: Data extracted by the LLM from the conversation

  2. Environment variables: process.env for configuration and secrets

  3. State and memory: Current conversation state and agent memory

If you need additional data, ask the user whether you should add it to the input parameters.

General Rules

  • Important - avoid using waitForNavigation: As it can lead to hangs. Use waitForSelector instead.

  • Test before implementing: Use Playwright MCP to test interactions before writing code

  • Don't close browser sessions: The platform manages browser sessions automatically

  • Never assume credentials: Always ask the user for credentials if they're unavailable

  • Verify changes: When modifying an RPA script, re-run the complete flow to verify it works

  • Follow instructions: Complete only the task specified - don't add extra steps

  • Use Playwright MCP during development: Prefer interactive testing over running the tool directly while building

CAPTCHA Handling

If a website requires a CAPTCHA:

  • Use the resolve_captcha MCP tool in the development phase.

  • Use the resolveCaptcha function exported by the @minded-ai/mindedjs package in the rpa tool code.

  • Captcha resolution is not always successful, add retry mechanism that would retry to resolve the CAPTCHA up to 5 times.

Tool Registration

RPA tools are registered the same way as regular tools:

Tool Nodes

RPA tools can be used in tool nodes just like regular tools:

The platform will automatically:

  • Create a browser session

  • Execute the RPA tool with screenshot and log capture

  • Display results in the UI

Testing

Local Testing

Debugging Failed Executions

If a tool fails:

  1. Check the rpaTestResults/ folder for:

    • screenshot.jpeg: Final state screenshot

    • content.html: HTML content at failure point

  2. Review logs in the UI to see which step failed

  3. Check screenshots to see the visual state at each step

Limitations

  • Browser Context: Each RPA tool execution uses a fresh browser context (cookies are preserved)

  • Session Management: Browser sessions are managed automatically - don't manually create or destroy them

  • Concurrent Execution: Multiple RPA tools in the same flow share the same browser session

See Also

Last updated