# RPA Tools

RPA (Robotic Process Automation) tools are special tools that automate browser interactions using Playwright. They provide automatic screenshot capture, action logging, and browser session management.

## What are RPA Tools?

RPA tools automate tasks on the web by controlling a browser programmatically. They are ideal for:

* Web scraping and data extraction
* Form filling and submission
* Clicking buttons and navigating websites
* Automating repetitive web-based workflows
* Interacting with web applications that don't have APIs

## RPA Tool Structure

RPA tools must implement the `RPATool` interface with `type: 'rpa'`:

```ts
// Example: RPA Tool with input and optional output schemas
interface RPATool<Input extends z.ZodSchema, Memory = any, Output extends z.ZodSchema = z.ZodTypeAny> {
  name: string; // Unique tool identifier
  description: string; // What the tool does (used by LLM)
  input: Input; // Zod schema for input validation
  output?: Output; // Optional: Zod schema for output validation (e.g., outputSchema)
  isGlobal?: boolean; // Optional: available across all LLM calls
  type: 'rpa'; // Required: marks this as an RPA tool
  proxyConfig?: ProxyConfig; // Optional: unified proxy configuration
  browserTaskMode?: BrowserTaskMode; // Optional: browser provider (local/cloud/onPrem)
  persistSession?: boolean; // Optional: persist cookies & localStorage across executions (default: true)
  execute: ({ input, state, agent, page }) => Promise<{ result? }>;
}
```

## Execute Function Signature

RPA tools receive an additional `page` parameter (Playwright Page object) in their execute function:

```ts
execute: ({ input, state, agent, page }) => Promise<{ result? }>;
```

### Parameters

* **`input`**: Validated data matching your Zod schema
* **`state`**: Current conversation state including memory, sessionId, and other context
* **`agent`**: Agent instance providing access to PII gateway, logging, and other platform features
* **`page`**: Playwright Page object for browser automation - **automatically provided for RPA tools**

### Proxy Configuration

RPA tools can optionally specify proxy configuration using the unified `ProxyConfig` type. This allows you to use Minded-managed proxies or custom proxy servers:

```ts
import { ProxyConfig, ProxyProvider, MindedProxyMode } from '@minded-ai/mindedjs';

// Custom proxy server
const rpaToolCustom: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa',
  proxyConfig: {
    provider: ProxyProvider.CUSTOM,
    server: 'http://proxy.example.com:8080',
    username: 'user', // Optional
    password: 'pass', // Optional
  },
  browserTaskMode: BrowserTaskMode.CLOUD,
};
```

**Proxy Configuration Options:**

* **Minded Proxy - Trusted IP**: Routes through Minded proxy for IP whitelisting without region selection
* **Minded Proxy - Region**: Routes through Minded proxy with a specific country/region (requires `countryCode`)
* **Custom Proxy**: Use your own proxy server (requires `server`, optional `username` and `password`)

**Note**: Proxy configuration only applies when using cloud or on-prem browser providers. Local browser sessions ignore proxy settings.

### Browser Provider Configuration

RPA tools can optionally specify which browser provider to use:

```ts
import { BrowserTaskMode } from '@minded-ai/mindedjs';

const rpaTool: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa',
  browserTaskMode: BrowserTaskMode.CLOUD, // Optional: local/cloud/onPrem
  // ... rest of tool definition
};
```

Available options:

* `BrowserTaskMode.LOCAL`: Use local browser instance (browser lives in the same machine as the agent), for local development only.
* `BrowserTaskMode.CLOUD`: Use cloud browser provider (allows for proxy configuration, also automatically passes CloudFlare and other anti-bot protection)

If not specified, the tool will use the browser provider configured in your environment (`BROWSER_TASK_MODE` environment variable).

### Automatic Features

When you mark a tool as `type: 'rpa'`, the platform automatically:

1. **Browser Session Management**: Creates and manages a browser session
2. **Screenshot Capture**: Takes screenshots before and after every Playwright action (click, fill, type, goto, etc.)
3. **Action Logging**: Automatically logs all browser actions (e.g., "Navigate to: <https://example.com>", "Click: .submit-button")
4. **Error Handling**: Captures screenshots and HTML on errors for debugging. No need to add any additional code to handle errors (try/catch blocks).

All screenshots and logs are automatically displayed in the UI when the tool executes.

## Basic RPA Tool Example

Here's a simple RPA tool that navigates to a website and extracts data:

```ts
import { z } from 'zod';
import { Tool, logger, BrowserTaskMode } from '@minded-ai/mindedjs';
import memorySchema from '../schema';

type Memory = z.infer<typeof memorySchema>;

const schema = z.object({
  url: z.string().describe('The website URL to navigate to'),
});

import { ProxyConfig, ProxyProvider, MindedProxyMode } from '@minded-ai/mindedjs';

const rpaGetOrderDetailsTool: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa', // Mark as RPA tool
  proxyConfig: {
    provider: ProxyProvider.MINDED,
    mode: MindedProxyMode.REGION,
    countryCode: 'US',
  }, // Optional: use US proxy
  browserTaskMode: BrowserTaskMode.CLOUD, // Optional: use cloud provider
  execute: async ({ input, state, agent, page }) => {
    logger.info({
      message: 'Navigating to website',
      sessionId: state.sessionId,
      url: input.url,
    });

    // Navigate to the website
    // Screenshot and log are automatically captured
    await page.goto(input.url, { timeout: 60000 });

    // Wait for content to load
    await page.waitForSelector('.order-summary', { timeout: 10000 });

    // Extract data from the page
    const orderItems = await page.locator('.order-item').allInnerTexts();
    const totalAmount = await page.locator('.total-amount').innerText();

    logger.info({
      message: 'Extracted order details',
      sessionId: state.sessionId,
      itemCount: orderItems.length,
    });

    // Update state if needed
    state.memory.lastOrderUrl = input.url;

    return {
      result: {
        items: orderItems,
        total: totalAmount,
      },
    };
  },
};

export default rpaGetOrderDetailsTool;
```

## RPA Development Guidelines

### Prerequisites

Before creating an RPA tool, gather:

* **The URL of the website** you need to automate
* **The task to be performed** (e.g., "Fill out a form", "Extract order details")
* **Any required information** (e.g., login credentials, form data)

### Development Workflow

1. **Plan the automation steps**
   * Break down the task into discrete steps (e.g., "Navigate to URL", "Click on X button", "Type Y into Z input field")
   * Identify the selectors you'll need for each element
2. **Test with Playwright MCP**
   * Use the Playwright MCP to interact with the browser interactively
   * Read the HTML of the page to find correct selectors
   * Test each step before writing the code
   * Iterate up to 3 times if a step fails
3. **Write the tool code**
   * Use the `page` object provided in the execute function
   * Add comments to explain what each action does
   * Use appropriate Playwright methods (click, fill, type, goto, etc.)
4. **Test the complete tool**
   * Run: `npm run tool <toolName> <param1>=<value1> <param2>=<value2> ...`
   * Add `cookies=false` flag to avoid using cookies from previous sessions (useful for login tasks)
   * Check screenshots and logs in the UI
   * If the tool fails, check the `rpaTestResults` folder for error screenshots and HTML

### Available Data Sources

In the context of an RPA tool, you have access to:

1. **Tool input parameters**: Data extracted by the LLM from the conversation
2. **Environment variables**: `process.env` for configuration and secrets
3. **State and memory**: Current conversation state and agent memory

If you need additional data, ask the user whether you should add it to the input parameters.

### General Rules

* **Important - avoid using waitForNavigation**: As it can lead to hangs. Use waitForSelector instead.
* **Test before implementing**: Use Playwright MCP to test interactions before writing code
* **Don't close browser sessions**: The platform manages browser sessions automatically
* **Never assume credentials**: Always ask the user for credentials if they're unavailable
* **Verify changes**: When modifying an RPA script, re-run the complete flow to verify it works
* **Follow instructions**: Complete only the task specified - don't add extra steps
* **Use Playwright MCP during development**: Prefer interactive testing over running the tool directly while building

### CAPTCHA Handling

**If a website requires a CAPTCHA:**

* Use the `resolve_captcha` MCP tool in the development phase.
* Use the `resolveCaptcha` function exported by the `@minded-ai/mindedjs` package in the rpa tool code.
* Captcha resolution is not always successful, add retry mechanism that would retry to resolve the CAPTCHA up to 5 times.

## Tool Registration

RPA tools are registered the same way as regular tools:

```ts
// In your tools/index.ts
import rpaGetOrderDetailsTool from './rpaGetOrderDetails';

export default [
  rpaGetOrderDetailsTool,
  // ... other tools
];
```

## Tool Nodes

RPA tools can be used in tool nodes just like regular tools:

```yaml
nodes:
  - name: 'Extract Order Data'
    type: 'tool'
    toolName: 'rpa_get_order_details'
```

The platform will automatically:

* Create a browser session
* Execute the RPA tool with screenshot and log capture
* Display results in the UI

## Testing

### Local Testing

```bash
# Test an RPA tool locally
npm run tool rpa_get_order_details url=https://example.com

# Test without cookies (useful for login flows)
npm run tool rpa_get_order_details url=https://example.com cookies=false
```

### Debugging Failed Executions

If a tool fails:

1. Check the `rpaTestResults/` folder for:
   * `screenshot.jpeg`: Final state screenshot
   * `content.html`: HTML content at failure point
2. Review logs in the UI to see which step failed
3. Check screenshots to see the visual state at each step

## Limitations

* **Browser Context**: Each RPA tool execution uses a fresh browser context (cookies are preserved)
* **Session Management**: Browser sessions are managed automatically - don't manually create or destroy them
* **Concurrent Execution**: Multiple RPA tools in the same flow share the same browser session

## See Also

* [Tools Documentation](/low-code-editor/tools.md) - General tool development guide
* [Playwright Documentation](https://playwright.dev) - Complete Playwright API reference
* [Node Types](/low-code-editor/nodes.md) - Using tools in flow nodes


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.minded.com/low-code-editor/rpa-tools.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
