# RPA Tools

RPA (Robotic Process Automation) tools are special tools that automate browser interactions using Playwright. They provide automatic screenshot capture, action logging, and browser session management.

## What are RPA Tools?

RPA tools automate tasks on the web by controlling a browser programmatically. They are ideal for:

* Web scraping and data extraction
* Form filling and submission
* Clicking buttons and navigating websites
* Automating repetitive web-based workflows
* Interacting with web applications that don't have APIs

## RPA Tool Structure

RPA tools must implement the `RPATool` interface with `type: 'rpa'`:

```ts
// Example: RPA Tool with input and optional output schemas
interface RPATool<Input extends z.ZodSchema, Memory = any, Output extends z.ZodSchema = z.ZodTypeAny> {
  name: string; // Unique tool identifier
  description: string; // What the tool does (used by LLM)
  input: Input; // Zod schema for input validation
  output?: Output; // Optional: Zod schema for output validation (e.g., outputSchema)
  isGlobal?: boolean; // Optional: available across all LLM calls
  type: 'rpa'; // Required: marks this as an RPA tool
  proxyConfig?: ProxyConfig; // Optional: unified proxy configuration
  browserTaskMode?: BrowserTaskMode; // Optional: browser provider (local/cloud/onPrem)
  persistSession?: boolean; // Optional: persist cookies & localStorage across executions (default: true)
  execute: ({ input, state, agent, page }) => Promise<{ result? }>;
}
```

## Execute Function Signature

RPA tools receive an additional `page` parameter (Playwright Page object) in their execute function:

```ts
execute: ({ input, state, agent, page }) => Promise<{ result? }>;
```

### Parameters

* **`input`**: Validated data matching your Zod schema
* **`state`**: Current conversation state including memory, sessionId, and other context
* **`agent`**: Agent instance providing access to PII gateway, logging, and other platform features
* **`page`**: Playwright Page object for browser automation - **automatically provided for RPA tools**

### Proxy Configuration

RPA tools can optionally specify proxy configuration using the unified `ProxyConfig` type. This allows you to use Minded-managed proxies or custom proxy servers:

```ts
import { ProxyConfig, ProxyProvider, MindedProxyMode } from '@minded-ai/mindedjs';

// Custom proxy server
const rpaToolCustom: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa',
  proxyConfig: {
    provider: ProxyProvider.CUSTOM,
    server: 'http://proxy.example.com:8080',
    username: 'user', // Optional
    password: 'pass', // Optional
  },
  browserTaskMode: BrowserTaskMode.CLOUD,
};
```

**Proxy Configuration Options:**

* **Minded Proxy - Trusted IP**: Routes through Minded proxy for IP whitelisting without region selection
* **Minded Proxy - Region**: Routes through Minded proxy with a specific country/region (requires `countryCode`)
* **Custom Proxy**: Use your own proxy server (requires `server`, optional `username` and `password`)

**Note**: Proxy configuration only applies when using cloud or on-prem browser providers. Local browser sessions ignore proxy settings.

### Browser Provider Configuration

RPA tools can optionally specify which browser provider to use:

```ts
import { BrowserTaskMode } from '@minded-ai/mindedjs';

const rpaTool: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa',
  browserTaskMode: BrowserTaskMode.CLOUD, // Optional: local/cloud/onPrem
  // ... rest of tool definition
};
```

Available options:

* `BrowserTaskMode.LOCAL`: Use local browser instance (browser lives in the same machine as the agent), for local development only.
* `BrowserTaskMode.CLOUD`: Use cloud browser provider (allows for proxy configuration, also automatically passes CloudFlare and other anti-bot protection)

If not specified, the tool will use the browser provider configured in your environment (`BROWSER_TASK_MODE` environment variable).

### Automatic Features

When you mark a tool as `type: 'rpa'`, the platform automatically:

1. **Browser Session Management**: Creates and manages a browser session
2. **Screenshot Capture**: Takes screenshots before and after every Playwright action (click, fill, type, goto, etc.)
3. **Action Logging**: Automatically logs all browser actions (e.g., "Navigate to: <https://example.com>", "Click: .submit-button")
4. **Error Handling**: Captures screenshots and HTML on errors for debugging. No need to add any additional code to handle errors (try/catch blocks).

All screenshots and logs are automatically displayed in the UI when the tool executes.

## Basic RPA Tool Example

Here's a simple RPA tool that navigates to a website and extracts data:

```ts
import { z } from 'zod';
import { Tool, logger, BrowserTaskMode } from '@minded-ai/mindedjs';
import memorySchema from '../schema';

type Memory = z.infer<typeof memorySchema>;

const schema = z.object({
  url: z.string().describe('The website URL to navigate to'),
});

import { ProxyConfig, ProxyProvider, MindedProxyMode } from '@minded-ai/mindedjs';

const rpaGetOrderDetailsTool: Tool<typeof schema, Memory> = {
  name: 'rpa_get_order_details',
  description: 'Navigate to a website and extract order details',
  input: schema,
  type: 'rpa', // Mark as RPA tool
  proxyConfig: {
    provider: ProxyProvider.MINDED,
    mode: MindedProxyMode.REGION,
    countryCode: 'US',
  }, // Optional: use US proxy
  browserTaskMode: BrowserTaskMode.CLOUD, // Optional: use cloud provider
  execute: async ({ input, state, agent, page }) => {
    logger.info({
      message: 'Navigating to website',
      sessionId: state.sessionId,
      url: input.url,
    });

    // Navigate to the website
    // Screenshot and log are automatically captured
    await page.goto(input.url, { timeout: 60000 });

    // Wait for content to load
    await page.waitForSelector('.order-summary', { timeout: 10000 });

    // Extract data from the page
    const orderItems = await page.locator('.order-item').allInnerTexts();
    const totalAmount = await page.locator('.total-amount').innerText();

    logger.info({
      message: 'Extracted order details',
      sessionId: state.sessionId,
      itemCount: orderItems.length,
    });

    // Update state if needed
    state.memory.lastOrderUrl = input.url;

    return {
      result: {
        items: orderItems,
        total: totalAmount,
      },
    };
  },
};

export default rpaGetOrderDetailsTool;
```

## RPA Development Guidelines

### Prerequisites

Before creating an RPA tool, gather:

* **The URL of the website** you need to automate
* **The task to be performed** (e.g., "Fill out a form", "Extract order details")
* **Any required information** (e.g., login credentials, form data)

### Development Workflow

1. **Plan the automation steps**
   * Break down the task into discrete steps (e.g., "Navigate to URL", "Click on X button", "Type Y into Z input field")
   * Identify the selectors you'll need for each element
2. **Test with Playwright MCP**
   * Use the Playwright MCP to interact with the browser interactively
   * Read the HTML of the page to find correct selectors
   * Test each step before writing the code
   * Iterate up to 3 times if a step fails
3. **Write the tool code**
   * Use the `page` object provided in the execute function
   * Add comments to explain what each action does
   * Use appropriate Playwright methods (click, fill, type, goto, etc.)
4. **Test the complete tool**
   * Run: `npm run tool <toolName> <param1>=<value1> <param2>=<value2> ...`
   * Add `cookies=false` flag to avoid using cookies from previous sessions (useful for login tasks)
   * Check screenshots and logs in the UI
   * If the tool fails, check the `rpaTestResults` folder for error screenshots and HTML

### Available Data Sources

In the context of an RPA tool, you have access to:

1. **Tool input parameters**: Data extracted by the LLM from the conversation
2. **Environment variables**: `process.env` for configuration and secrets
3. **State and memory**: Current conversation state and agent memory

If you need additional data, ask the user whether you should add it to the input parameters.

### General Rules

* **Important - avoid using waitForNavigation**: As it can lead to hangs. Use waitForSelector instead.
* **Test before implementing**: Use Playwright MCP to test interactions before writing code
* **Don't close browser sessions**: The platform manages browser sessions automatically
* **Never assume credentials**: Always ask the user for credentials if they're unavailable
* **Verify changes**: When modifying an RPA script, re-run the complete flow to verify it works
* **Follow instructions**: Complete only the task specified - don't add extra steps
* **Use Playwright MCP during development**: Prefer interactive testing over running the tool directly while building

### CAPTCHA Handling

**If a website requires a CAPTCHA:**

* Use the `resolve_captcha` MCP tool in the development phase.
* Use the `resolveCaptcha` function exported by the `@minded-ai/mindedjs` package in the rpa tool code.
* Captcha resolution is not always successful, add retry mechanism that would retry to resolve the CAPTCHA up to 5 times.

## Tool Registration

RPA tools are registered the same way as regular tools:

```ts
// In your tools/index.ts
import rpaGetOrderDetailsTool from './rpaGetOrderDetails';

export default [
  rpaGetOrderDetailsTool,
  // ... other tools
];
```

## Tool Nodes

RPA tools can be used in tool nodes just like regular tools:

```yaml
nodes:
  - name: 'Extract Order Data'
    type: 'tool'
    toolName: 'rpa_get_order_details'
```

The platform will automatically:

* Create a browser session
* Execute the RPA tool with screenshot and log capture
* Display results in the UI

## Testing

### Local Testing

```bash
# Test an RPA tool locally
npm run tool rpa_get_order_details url=https://example.com

# Test without cookies (useful for login flows)
npm run tool rpa_get_order_details url=https://example.com cookies=false
```

### Debugging Failed Executions

If a tool fails:

1. Check the `rpaTestResults/` folder for:
   * `screenshot.jpeg`: Final state screenshot
   * `content.html`: HTML content at failure point
2. Review logs in the UI to see which step failed
3. Check screenshots to see the visual state at each step

## Limitations

* **Browser Context**: Each RPA tool execution uses a fresh browser context (cookies are preserved)
* **Session Management**: Browser sessions are managed automatically - don't manually create or destroy them
* **Concurrent Execution**: Multiple RPA tools in the same flow share the same browser session

## See Also

* [Tools Documentation](https://docs.minded.com/low-code-editor/tools) - General tool development guide
* [Playwright Documentation](https://playwright.dev) - Complete Playwright API reference
* [Node Types](https://docs.minded.com/low-code-editor/nodes) - Using tools in flow nodes
