RPA Tools

RPA (Robotic Process Automation) tools are special tools that automate browser interactions using Playwright. They provide automatic screenshot capture, action logging, and browser session management.

What are RPA Tools?

RPA tools automate tasks on the web by controlling a browser programmatically. They are ideal for:

Web scraping and data extraction
Form filling and submission
Clicking buttons and navigating websites
Automating repetitive web-based workflows
Interacting with web applications that don't have APIs

RPA Tool Structure

RPA tools must implement the RPATool interface with type: 'rpa':

// Example: RPA Tool with input and optional output schemas
interface RPATool<Input extends z.ZodSchema, Memory = any, Output extends z.ZodSchema = z.ZodTypeAny> {
  name: string; // Unique tool identifier
  description: string; // What the tool does (used by LLM)
  input: Input; // Zod schema for input validation
  output?: Output; // Optional: Zod schema for output validation (e.g., outputSchema)
  isGlobal?: boolean; // Optional: available across all LLM calls
  type: 'rpa'; // Required: marks this as an RPA tool
  proxyConfig?: ProxyConfig; // Optional: unified proxy configuration
  browserTaskMode?: BrowserTaskMode; // Optional: browser provider (local/cloud/onPrem)
  persistSession?: boolean; // Optional: persist cookies & localStorage across executions (default: true)
  execute: ({ input, state, agent, page }) => Promise<{ result? }>;
}

Execute Function Signature

RPA tools receive an additional page parameter (Playwright Page object) in their execute function:

execute: ({ input, state, agent, page }) => Promise<{ result? }>;

Parameters

input: Validated data matching your Zod schema
state: Current conversation state including memory, sessionId, and other context
agent: Agent instance providing access to PII gateway, logging, and other platform features
page: Playwright Page object for browser automation - automatically provided for RPA tools

Proxy Configuration

RPA tools can optionally specify proxy configuration using the unified ProxyConfig type. This allows you to use Minded-managed proxies or custom proxy servers:

import { ProxyConfig, ProxyProvider, MindedProxyMode } from '@minded-ai/mindedjs';

// Custom proxy server
const rpaToolCustom: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa',
  proxyConfig: {
    provider: ProxyProvider.CUSTOM,
    server: 'http://proxy.example.com:8080',
    username: 'user', // Optional
    password: 'pass', // Optional
  },
  browserTaskMode: BrowserTaskMode.CLOUD,
};

Proxy Configuration Options:

Minded Proxy - Trusted IP: Routes through Minded proxy for IP whitelisting without region selection
Minded Proxy - Region: Routes through Minded proxy with a specific country/region (requires countryCode)
Custom Proxy: Use your own proxy server (requires server, optional username and password)

Note: Proxy configuration only applies when using cloud or on-prem browser providers. Local browser sessions ignore proxy settings.

Browser Provider Configuration

RPA tools can optionally specify which browser provider to use:

import { BrowserTaskMode } from '@minded-ai/mindedjs';

const rpaTool: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa',
  browserTaskMode: BrowserTaskMode.CLOUD, // Optional: local/cloud/onPrem
  // ... rest of tool definition
};

Available options:

BrowserTaskMode.LOCAL: Use local browser instance (browser lives in the same machine as the agent), for local development only.
BrowserTaskMode.CLOUD: Use cloud browser provider (allows for proxy configuration, also automatically passes CloudFlare and other anti-bot protection)

If not specified, the tool will use the browser provider configured in your environment (BROWSER_TASK_MODE environment variable).

Automatic Features

When you mark a tool as type: 'rpa', the platform automatically:

Browser Session Management: Creates and manages a browser session
Screenshot Capture: Takes screenshots before and after every Playwright action (click, fill, type, goto, etc.)
Action Logging: Automatically logs all browser actions (e.g., "Navigate to: https://example.com", "Click: .submit-button")
Error Handling: Captures screenshots and HTML on errors for debugging. No need to add any additional code to handle errors (try/catch blocks).

All screenshots and logs are automatically displayed in the UI when the tool executes.

Basic RPA Tool Example

Here's a simple RPA tool that navigates to a website and extracts data:

import { z } from 'zod';
import { Tool, logger, BrowserTaskMode } from '@minded-ai/mindedjs';
import memorySchema from '../schema';

type Memory = z.infer<typeof memorySchema>;

const schema = z.object({
  url: z.string().describe('The website URL to navigate to'),
});

import { ProxyConfig, ProxyProvider, MindedProxyMode } from '@minded-ai/mindedjs';

const rpaGetOrderDetailsTool: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa', // Mark as RPA tool
  proxyConfig: {
    provider: ProxyProvider.MINDED,
    mode: MindedProxyMode.REGION,
    countryCode: 'US',
  }, // Optional: use US proxy
  browserTaskMode: BrowserTaskMode.CLOUD, // Optional: use cloud provider
  execute: async ({ input, state, agent, page }) => {
    logger.info({
      message: 'Navigating to website',
      sessionId: state.sessionId,
      url: input.url,
    });

    // Navigate to the website
    // Screenshot and log are automatically captured
    await page.goto(input.url, { timeout: 60000 });

    // Wait for content to load
    await page.waitForSelector('.order-summary', { timeout: 10000 });

    // Extract data from the page
    const orderItems = await page.locator('.order-item').allInnerTexts();
    const totalAmount = await page.locator('.total-amount').innerText();

    logger.info({
      message: 'Extracted order details',
      sessionId: state.sessionId,
      itemCount: orderItems.length,
    });

    // Update state if needed
    state.memory.lastOrderUrl = input.url;

    return {
      result: {
        items: orderItems,
        total: totalAmount,
      },
    };
  },
};

export default rpaGetOrderDetailsTool;

RPA Development Guidelines

Prerequisites

Before creating an RPA tool, gather:

The URL of the website you need to automate
The task to be performed (e.g., "Fill out a form", "Extract order details")
Any required information (e.g., login credentials, form data)

Development Workflow

Plan the automation steps
- Break down the task into discrete steps (e.g., "Navigate to URL", "Click on X button", "Type Y into Z input field")
- Identify the selectors you'll need for each element
Test with Playwright MCP
- Use the Playwright MCP to interact with the browser interactively
- Read the HTML of the page to find correct selectors
- Test each step before writing the code
- Iterate up to 3 times if a step fails
Write the tool code
- Use the page object provided in the execute function
- Add comments to explain what each action does
- Use appropriate Playwright methods (click, fill, type, goto, etc.)
Test the complete tool
- Run: npm run tool <toolName> <param1>=<value1> <param2>=<value2> ...
- Add cookies=false flag to avoid using cookies from previous sessions (useful for login tasks)
- Check screenshots and logs in the UI
- If the tool fails, check the rpaTestResults folder for error screenshots and HTML

Available Data Sources

In the context of an RPA tool, you have access to:

Tool input parameters: Data extracted by the LLM from the conversation
Environment variables: process.env for configuration and secrets
State and memory: Current conversation state and agent memory

If you need additional data, ask the user whether you should add it to the input parameters.

General Rules

Important - avoid using waitForNavigation: As it can lead to hangs. Use waitForSelector instead.
Test before implementing: Use Playwright MCP to test interactions before writing code
Don't close browser sessions: The platform manages browser sessions automatically
Never assume credentials: Always ask the user for credentials if they're unavailable
Verify changes: When modifying an RPA script, re-run the complete flow to verify it works
Follow instructions: Complete only the task specified - don't add extra steps
Use Playwright MCP during development: Prefer interactive testing over running the tool directly while building

CAPTCHA Handling

If a website requires a CAPTCHA:

Use the resolve_captcha MCP tool in the development phase.
Use the resolveCaptcha function exported by the @minded-ai/mindedjs package in the rpa tool code.
Captcha resolution is not always successful, add retry mechanism that would retry to resolve the CAPTCHA up to 5 times.

Tool Registration

RPA tools are registered the same way as regular tools:

// In your tools/index.ts
import rpaGetOrderDetailsTool from './rpaGetOrderDetails';

export default [
  rpaGetOrderDetailsTool,
  // ... other tools
];

Tool Nodes

RPA tools can be used in tool nodes just like regular tools:

nodes:
  - name: 'Extract Order Data'
    type: 'tool'
    toolName: 'rpa_get_order_details'

The platform will automatically:

Create a browser session
Execute the RPA tool with screenshot and log capture
Display results in the UI

Testing

Local Testing

# Test an RPA tool locally
npm run tool rpa_get_order_details url=https://example.com

# Test without cookies (useful for login flows)
npm run tool rpa_get_order_details url=https://example.com cookies=false

Debugging Failed Executions

If a tool fails:

Check the rpaTestResults/ folder for:
- screenshot.jpeg: Final state screenshot
- content.html: HTML content at failure point
Review logs in the UI to see which step failed
Check screenshots to see the visual state at each step

Limitations

Browser Context: Each RPA tool execution uses a fresh browser context (cookies are preserved)
Session Management: Browser sessions are managed automatically - don't manually create or destroy them
Concurrent Execution: Multiple RPA tools in the same flow share the same browser session

hashtagWhat are RPA Tools?

hashtagRPA Tool Structure

hashtagExecute Function Signature

hashtagParameters

hashtagProxy Configuration

hashtagBrowser Provider Configuration

hashtagAutomatic Features

hashtagBasic RPA Tool Example

hashtagRPA Development Guidelines

hashtagPrerequisites

hashtagDevelopment Workflow

hashtagAvailable Data Sources

hashtagGeneral Rules

hashtagCAPTCHA Handling

hashtagTool Registration

hashtagTool Nodes

hashtagTesting

hashtagLocal Testing

hashtagDebugging Failed Executions

hashtagLimitations

hashtagSee Also