Parallel LLM Requests

Parallel LLM requests can significantly reduce latency by sending multiple identical requests and using the fastest response. This feature can reduce response times by 30-50% in scenarios with variable network conditions.

Quick Start

The easiest way to enable parallel requests is through your minded.json configuration:

{
  "flows": ["./src/flows"],
  "tools": ["./src/tools"],
  "agent": "./src/agent.ts",
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true
    }
  }
}

Your agent will automatically use parallel requests for all LLM calls:

import { Agent } from '@minded-ai/mindedjs';
import memorySchema from './agentMemorySchema';
import config from '../minded.json';
import tools from './tools';

const agent = new Agent({
  memorySchema,
  config, // Parallel configuration is automatically applied
  tools,
});

Configuration Options

MindedChatOpenAI (Recommended)

For agents running on the Minded platform, use MindedChatOpenAI with parallel configuration:

{
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "temperature": 0.7
    }
  }
}

AzureChatOpenAI

For Azure OpenAI deployments:

{
  "llm": {
    "name": "AzureChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "azureOpenAIApiVersion": "2024-02-01"
    }
  }
}

Required environment variables:

AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_API_INSTANCE_NAME=your_instance_name
AZURE_OPENAI_API_DEPLOYMENT_NAME=your_deployment_name

ChatOpenAI

For standard OpenAI API:

{
  "llm": {
    "name": "ChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "openAIApiKey": "${OPENAI_API_KEY}"
    }
  }
}

Configuration Parameters

Parameter

Type

Default

Description

numParallelRequests

number

Number of parallel requests (2-5 recommended)

logTimings

boolean

false

Enable detailed timing logs

Performance Notes

Optimal Range: 2-3 parallel requests usually provide the best latency/cost balance
Cost Impact: You pay for all parallel requests made
Best Use Cases: Variable network conditions, consistency requirements
Latency Reduction: Typically 30-50% faster response times

Monitoring Performance

When logTimings: true is enabled, you'll see detailed performance logs:

[Model] Fastest request completed { requestTime: 1.234, numParallelRequests: 3 }
[Model] Time saved using parallel requests {
  fastestRequestTime: 1.234,
  secondFastestRequestTime: 1.567,
  allFinishTime: 2.345,
  timeSaved: 1.111,
  timeSavedFromSecond: 0.333
}

Advanced Usage

Dynamic Configuration

You can adjust parallel requests based on environment:

{
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": "${NODE_ENV === 'production' ? 3 : 1}",
      "logTimings": "${NODE_ENV === 'development'}"
    }
  }
}

Manual Instantiation with createParallelWrapper

For advanced use cases where you need direct control over the LLM instance, you can manually apply parallel wrapping:

import { createParallelWrapper } from '@minded-ai/mindedjs';
import { ChatOpenAI, AzureChatOpenAI } from '@langchain/openai';

// Manual wrapping for ChatOpenAI
const parallelOpenAI = createParallelWrapper(
  new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-4o',
    temperature: 0.7,
  }),
  {
    numParallelRequests: 3,
    logTimings: true,
  },
);

// Manual wrapping for AzureChatOpenAI
const parallelAzure = createParallelWrapper(
  new AzureChatOpenAI({
    azureOpenAIApiKey: process.env.AZURE_OPENAI_API_KEY,
    azureOpenAIApiInstanceName: process.env.AZURE_OPENAI_INSTANCE!,
    azureOpenAIApiDeploymentName: process.env.AZURE_OPENAI_DEPLOYMENT!,
    azureOpenAIApiVersion: '2024-02-01',
  }),
  {
    numParallelRequests: 2,
    logTimings: false,
  },
);

// Use directly with Agent (bypassing configuration)
const agent = new Agent({
  memorySchema,
  config: {
    flows: ['./flows'],
    tools: [],
    llm: parallelOpenAI as any, // Direct LLM instance
  },
  tools: [],
});

Note: The configuration-based approach is recommended for most use cases as it's simpler and more maintainable. Use manual instantiation only when you need specific control over LLM creation or are integrating with existing LangChain workflows.

How It Works

MindedChatOpenAI (Backend Processing)

Parallel requests are handled on the Minded platform backend
Multiple requests sent to Azure OpenAI from the backend
Fastest response returned to your agent
Optimal for production deployments

Other LLM Providers (Client-Side Processing)

Parallel requests handled in your application
Multiple requests sent directly to the LLM provider
Good for development and custom deployments

Best Practices

Start Small: Begin with 2-3 parallel requests
Monitor Costs: Each parallel request counts toward your usage
Enable Logging: Use logTimings: true during development to measure improvements
Environment-Specific: Use fewer parallel requests in development
Configuration Over Code: Prefer minded.json configuration over manual instantiation

Troubleshooting

No Performance Improvement

Check network latency variability
Ensure numParallelRequests > 1
Verify timing logs are showing multiple requests

Increased Costs

Reduce numParallelRequests
Consider cost vs. latency trade-offs
Monitor usage patterns

Rate Limiting

Lower numParallelRequests
Implement backoff strategies
Contact provider about rate limits

PreviousLogging NextSecrets

Last updated 3 months ago