# Parallel LLM Requests

Parallel LLM requests can significantly reduce latency by sending multiple identical requests and using the fastest response. This feature can reduce response times by 30-50% in scenarios with variable network conditions.

## Quick Start

The easiest way to enable parallel requests is through your `minded.json` configuration:

```json
{
  "flows": ["./src/flows"],
  "tools": ["./src/tools"],
  "agent": "./src/agent.ts",
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true
    }
  }
}
```

Your agent will automatically use parallel requests for all LLM calls:

```typescript
import { Agent } from '@minded-ai/mindedjs';
import memorySchema from './agentMemorySchema';
import config from '../minded.json';
import tools from './tools';

const agent = new Agent({
  memorySchema,
  config, // Parallel configuration is automatically applied
  tools,
});
```

## Configuration Options

### MindedChatOpenAI (Recommended)

For agents running on the Minded platform, use `MindedChatOpenAI` with parallel configuration:

```json
{
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "temperature": 0.7
    }
  }
}
```

### AzureChatOpenAI

For Azure OpenAI deployments:

```json
{
  "llm": {
    "name": "AzureChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "azureOpenAIApiVersion": "2024-02-01"
    }
  }
}
```

Required environment variables:

```env
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_API_INSTANCE_NAME=your_instance_name
AZURE_OPENAI_API_DEPLOYMENT_NAME=your_deployment_name
```

### ChatOpenAI

For standard OpenAI API:

```json
{
  "llm": {
    "name": "ChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "openAIApiKey": "${OPENAI_API_KEY}"
    }
  }
}
```

## Configuration Parameters

| Parameter             | Type    | Default | Description                                   |
| --------------------- | ------- | ------- | --------------------------------------------- |
| `numParallelRequests` | number  | 1       | Number of parallel requests (2-5 recommended) |
| `logTimings`          | boolean | false   | Enable detailed timing logs                   |

## Performance Notes

* **Optimal Range**: 2-3 parallel requests usually provide the best latency/cost balance
* **Cost Impact**: You pay for all parallel requests made
* **Best Use Cases**: Variable network conditions, consistency requirements
* **Latency Reduction**: Typically 30-50% faster response times

## Monitoring Performance

When `logTimings: true` is enabled, you'll see detailed performance logs:

```
[Model] Fastest request completed { requestTime: 1.234, numParallelRequests: 3 }
[Model] Time saved using parallel requests {
  fastestRequestTime: 1.234,
  secondFastestRequestTime: 1.567,
  allFinishTime: 2.345,
  timeSaved: 1.111,
  timeSavedFromSecond: 0.333
}
```

## Advanced Usage

### Dynamic Configuration

You can adjust parallel requests based on environment:

```json
{
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": "${NODE_ENV === 'production' ? 3 : 1}",
      "logTimings": "${NODE_ENV === 'development'}"
    }
  }
}
```

### Manual Instantiation with createParallelWrapper

For advanced use cases where you need direct control over the LLM instance, you can manually apply parallel wrapping:

```typescript
import { createParallelWrapper } from '@minded-ai/mindedjs';
import { ChatOpenAI, AzureChatOpenAI } from '@langchain/openai';

// Manual wrapping for ChatOpenAI
const parallelOpenAI = createParallelWrapper(
  new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-4o',
    temperature: 0,
  }),
  {
    numParallelRequests: 3,
    logTimings: true,
  },
);

// Manual wrapping for AzureChatOpenAI
const parallelAzure = createParallelWrapper(
  new AzureChatOpenAI({
    azureOpenAIApiKey: process.env.AZURE_OPENAI_API_KEY,
    azureOpenAIApiInstanceName: process.env.AZURE_OPENAI_INSTANCE!,
    azureOpenAIApiDeploymentName: process.env.AZURE_OPENAI_DEPLOYMENT!,
    azureOpenAIApiVersion: '2024-02-01',
  }),
  {
    numParallelRequests: 2,
    logTimings: false,
  },
);

// Use directly with Agent (bypassing configuration)
const agent = new Agent({
  memorySchema,
  config: {
    flows: ['./flows'],
    tools: [],
    llm: parallelOpenAI as any, // Direct LLM instance
  },
  tools: [],
});
```

**Note**: The configuration-based approach is recommended for most use cases as it's simpler and more maintainable. Use manual instantiation only when you need specific control over LLM creation or are integrating with existing LangChain workflows.

## How It Works

### MindedChatOpenAI (Backend Processing)

* Parallel requests are handled on the Minded platform backend
* Multiple requests sent to Azure OpenAI from the backend
* Fastest response returned to your agent
* Optimal for production deployments

### Other LLM Providers (Client-Side Processing)

* Parallel requests handled in your application
* Multiple requests sent directly to the LLM provider
* Good for development and custom deployments

## Best Practices

1. **Start Small**: Begin with 2-3 parallel requests
2. **Monitor Costs**: Each parallel request counts toward your usage
3. **Enable Logging**: Use `logTimings: true` during development to measure improvements
4. **Environment-Specific**: Use fewer parallel requests in development
5. **Configuration Over Code**: Prefer `minded.json` configuration over manual instantiation

## Troubleshooting

### No Performance Improvement

* Check network latency variability
* Ensure `numParallelRequests > 1`
* Verify timing logs are showing multiple requests

### Increased Costs

* Reduce `numParallelRequests`
* Consider cost vs. latency trade-offs
* Monitor usage patterns

### Rate Limiting

* Lower `numParallelRequests`
* Implement backoff strategies
* Contact provider about rate limits


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.minded.com/sdk/parallel-llm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
