Operator

The Operator lets an agent run autonomous web actions using a live Chromium, guided by an LLM of choice. It supports structured inputs/outputs, live view previews, optional country proxying, and an on-prem mode.

What it does

  • Plans + Executes: The node first asks the LLM to propose parameters (based on your input schema) and compiles your prompt. It then starts a browser session and runs the task to completion.

  • Live session: Shows a live view preview of the browser session.

  • Structured I/O: Define inputSchema to guide the LLM’s tool-call arguments and outputSchema to hint desired output structure.

  • Tool access: Exposes agent tools that allow execution requests to the browser agent.

Configuration fields (node)

  • prompt (required): Natural-language instructions for the browser agent. Supports variables from the state object.

  • model (optional): Preferred LLM (UI defaults to gpt-4o).

  • inputSchema (optional): Array of fields { name, type: 'string'|'number', description?, required? }.

  • outputSchema (optional): Array of fields { name, type: 'string'|'number', description?, required? }. Sent to the browser agent to shape outputs.

  • proxy (optional): Two‑letter country code (e.g., IL, US). Disabled when running on‑prem.

  • hooks (optional): Array of { name }. Passed to the browser agent.

  • onPrem (optional): If true, uses on‑premises browser infrastructure and auto‑captures screenshots.

On‑prem mode

By default, the browser agent runs on the cloud. For customers who want to run the browser agent on their own infrastructure & network, on-prem mode is supported out of the box by running the on-prem kit on a PC / cloud VM.

Please contact us to assist you with on-prem setup at [email protected].

How it works

The on-prem kit acts as a CDP server that launches chromium locally and exposes the protocol via websocket to the Minded backend service thus while the browser agent runs on the cloud, the browser is running locally inside the customer's network allowing it to access internet services without whitelisting.

Installation

  1. Enable on-prem mode in the node configuration (onPrem: true)

  2. Run the docker image with the agent ID and environment variables Production:

docker run --name --pull minded-browser-onprem -e AGENT_ID=[Agent ID] public.ecr.aws/o9t8x5z4/browser-use-onprem-kit:latest

Proxy

The browser agent can be configured to use a proxy server. This is useful when the browser agent needs to access the internet through a proxy server. Enable the proxy in the node configuration (proxy: 'UK').

MFA

The browser agent can be configured to use MFA. Currently TOTP is supported out of the box for MFA login flow.

Tools

Operators can leverage tools—reusable functions that the agent can call during a browser session to perform specific actions (e.g., extract data, interact with APIs, update memory, etc.). Tools are defined using the platform’s Tool interface, and can be made available to the browser agent for use within your browser task flows.

How tools work in operators

  • Tool calls: During a browser task, the LLM can decide to call any tool that is registered and exposed to the agent. The tool receives validated input, the current state, and the agent instance.

  • Structured input/output: Tools use Zod schemas for input validation and can return structured results, which are sent back to the LLM and/or merged into agent memory.

  • Tools available to browser tasks: Tools marked as allowExecutionRequests: true are available to all nodes, including browser tasks.

Tool execution in operators

You can mention tool names in your browser task prompt to encourage the LLM to use them, or let the LLM decide when to call them based on context.

When the LLM calls a tool during a browser task:

  • The tool’s execute function runs with the provided input and current state.

  • Any returned state is merged into the agent’s memory/state.

  • The result is sent back to the LLM and can be used in subsequent reasoning or actions.

For more on tool structure and best practices, see Tools documentation.

Browser manipulation inside tool

You can manipulate the browser using a package of choice (e.g. puppeteer, playwright, etc.) by using the CDP URL that is passed to the tool in the state object.

Operator node example

# Flow YAML snippet illustrating a browserTask node
- name: 444f2347-dcc0-4c4e-b0af-3743033c8f5e
  type: browserTask
  displayName: Search product price in Amazon
  prompt: |
    Your task is to search for a product in Amazon and return the price of the product.

    1. Navigate to amazon.com
    2. Search for "iphone 15"
    3. Return the price of the product
  model: gpt-4o
  inputSchema: []
  outputSchema: []
  proxy: ''
  onPrem: false

Last updated