RPA Tools
RPA (Robotic Process Automation) tools are special tools that automate browser interactions using Playwright. They provide automatic screenshot capture, action logging, and browser session management.
What are RPA Tools?
RPA tools automate tasks on the web by controlling a browser programmatically. They are ideal for:
Web scraping and data extraction
Form filling and submission
Clicking buttons and navigating websites
Automating repetitive web-based workflows
Interacting with web applications that don't have APIs
RPA Tool Structure
RPA tools must implement the RPATool interface with type: 'rpa':
interface RPATool<Input extends z.ZodSchema, Memory> {
name: string; // Unique tool identifier
description: string; // What the tool does (used by LLM)
input: Input; // Zod schema for input validation
isGlobal?: boolean; // Optional: available across all LLM calls
type: 'rpa'; // Required: marks this as an RPA tool
proxy?: string; // Optional: proxy country code (e.g., "US", "GB", "DE")
browserTaskMode?: BrowserTaskMode; // Optional: browser provider (local/cloud/onPrem)
execute: ({ input, state, agent, page }) => Promise<{ result? }>;
}Execute Function Signature
RPA tools receive an additional page parameter (Playwright Page object) in their execute function:
Parameters
input: Validated data matching your Zod schemastate: Current conversation state including memory, sessionId, and other contextagent: Agent instance providing access to PII gateway, logging, and other platform featurespage: Playwright Page object for browser automation - automatically provided for RPA tools
Proxy Configuration
RPA tools can optionally specify a proxy country code to route browser traffic through a specific geographic location:
Note: Proxy configuration only applies when using cloud or on-prem browser providers. Local browser sessions ignore proxy settings.
Browser Provider Configuration
RPA tools can optionally specify which browser provider to use:
Available options:
BrowserTaskMode.LOCAL: Use local browser instance (browser lives in the same machine as the agent), for local development only.BrowserTaskMode.CLOUD: Use cloud browser provider (allows for proxy configuration, also automatically passes CloudFlare and other anti-bot protection)
If not specified, the tool will use the browser provider configured in your environment (BROWSER_TASK_MODE environment variable).
Automatic Features
When you mark a tool as type: 'rpa', the platform automatically:
Browser Session Management: Creates and manages a browser session
Screenshot Capture: Takes screenshots before and after every Playwright action (click, fill, type, goto, etc.)
Action Logging: Automatically logs all browser actions (e.g., "Navigate to: https://example.com", "Click: .submit-button")
Error Handling: Captures screenshots and HTML on errors for debugging. No need to add any additional code to handle errors (try/catch blocks).
All screenshots and logs are automatically displayed in the UI when the tool executes.
Basic RPA Tool Example
Here's a simple RPA tool that navigates to a website and extracts data:
RPA Development Guidelines
Prerequisites
Before creating an RPA tool, gather:
The URL of the website you need to automate
The task to be performed (e.g., "Fill out a form", "Extract order details")
Any required information (e.g., login credentials, form data)
Development Workflow
Plan the automation steps
Break down the task into discrete steps (e.g., "Navigate to URL", "Click on X button", "Type Y into Z input field")
Identify the selectors you'll need for each element
Test with Playwright MCP
Use the Playwright MCP to interact with the browser interactively
Read the HTML of the page to find correct selectors
Test each step before writing the code
Iterate up to 3 times if a step fails
Write the tool code
Use the
pageobject provided in the execute functionAdd comments to explain what each action does
Use appropriate Playwright methods (click, fill, type, goto, etc.)
Test the complete tool
Run:
npm run tool <toolName> <param1>=<value1> <param2>=<value2> ...Add
cookies=falseflag to avoid using cookies from previous sessions (useful for login tasks)Check screenshots and logs in the UI
If the tool fails, check the
rpaTestResultsfolder for error screenshots and HTML
Available Data Sources
In the context of an RPA tool, you have access to:
Tool input parameters: Data extracted by the LLM from the conversation
Environment variables:
process.envfor configuration and secretsState and memory: Current conversation state and agent memory
If you need additional data, ask the user whether you should add it to the input parameters.
General Rules
Important - avoid using waitForNavigation: As it can lead to hangs. Use waitForSelector instead.
Test before implementing: Use Playwright MCP to test interactions before writing code
Don't close browser sessions: The platform manages browser sessions automatically
Never assume credentials: Always ask the user for credentials if they're unavailable
Verify changes: When modifying an RPA script, re-run the complete flow to verify it works
Follow instructions: Complete only the task specified - don't add extra steps
Use Playwright MCP during development: Prefer interactive testing over running the tool directly while building
CAPTCHA Handling
If a website requires a CAPTCHA:
Use the
resolve_captchaMCP tool in the development phase.Use the
resolveCaptchafunction exported by the@minded-ai/mindedjspackage in the rpa tool code.Captcha resolution is not always successful, add retry mechanism that would retry to resolve the CAPTCHA up to 5 times.
Tool Registration
RPA tools are registered the same way as regular tools:
Tool Nodes
RPA tools can be used in tool nodes just like regular tools:
The platform will automatically:
Create a browser session
Execute the RPA tool with screenshot and log capture
Display results in the UI
Testing
Local Testing
Debugging Failed Executions
If a tool fails:
Check the
rpaTestResults/folder for:screenshot.jpeg: Final state screenshotcontent.html: HTML content at failure point
Review logs in the UI to see which step failed
Check screenshots to see the visual state at each step
Limitations
Browser Context: Each RPA tool execution uses a fresh browser context (cookies are preserved)
Session Management: Browser sessions are managed automatically - don't manually create or destroy them
Concurrent Execution: Multiple RPA tools in the same flow share the same browser session
See Also
Tools Documentation - General tool development guide
Playwright Documentation - Complete Playwright API reference
Node Types - Using tools in flow nodes
Last updated