# Parallel LLM Requests

Parallel LLM requests can significantly reduce latency by sending multiple identical requests and using the fastest response. This feature can reduce response times by 30-50% in scenarios with variable network conditions.

## Quick Start

The easiest way to enable parallel requests is through your `minded.json` configuration:

```json
{
  "flows": ["./src/flows"],
  "tools": ["./src/tools"],
  "agent": "./src/agent.ts",
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true
    }
  }
}
```

Your agent will automatically use parallel requests for all LLM calls:

```typescript
import { Agent } from '@minded-ai/mindedjs';
import memorySchema from './agentMemorySchema';
import config from '../minded.json';
import tools from './tools';

const agent = new Agent({
  memorySchema,
  config, // Parallel configuration is automatically applied
  tools,
});
```

## Configuration Options

### MindedChatOpenAI (Recommended)

For agents running on the Minded platform, use `MindedChatOpenAI` with parallel configuration:

```json
{
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "temperature": 0.7
    }
  }
}
```

### AzureChatOpenAI

For Azure OpenAI deployments:

```json
{
  "llm": {
    "name": "AzureChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "azureOpenAIApiVersion": "2024-02-01"
    }
  }
}
```

Required environment variables:

```env
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_API_INSTANCE_NAME=your_instance_name
AZURE_OPENAI_API_DEPLOYMENT_NAME=your_deployment_name
```

### ChatOpenAI

For standard OpenAI API:

```json
{
  "llm": {
    "name": "ChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": 3,
      "logTimings": true,
      "openAIApiKey": "${OPENAI_API_KEY}"
    }
  }
}
```

## Configuration Parameters

| Parameter             | Type    | Default | Description                                   |
| --------------------- | ------- | ------- | --------------------------------------------- |
| `numParallelRequests` | number  | 1       | Number of parallel requests (2-5 recommended) |
| `logTimings`          | boolean | false   | Enable detailed timing logs                   |

## Performance Notes

* **Optimal Range**: 2-3 parallel requests usually provide the best latency/cost balance
* **Cost Impact**: You pay for all parallel requests made
* **Best Use Cases**: Variable network conditions, consistency requirements
* **Latency Reduction**: Typically 30-50% faster response times

## Monitoring Performance

When `logTimings: true` is enabled, you'll see detailed performance logs:

```
[Model] Fastest request completed { requestTime: 1.234, numParallelRequests: 3 }
[Model] Time saved using parallel requests {
  fastestRequestTime: 1.234,
  secondFastestRequestTime: 1.567,
  allFinishTime: 2.345,
  timeSaved: 1.111,
  timeSavedFromSecond: 0.333
}
```

## Advanced Usage

### Dynamic Configuration

You can adjust parallel requests based on environment:

```json
{
  "llm": {
    "name": "MindedChatOpenAI",
    "properties": {
      "model": "gpt-4o",
      "numParallelRequests": "${NODE_ENV === 'production' ? 3 : 1}",
      "logTimings": "${NODE_ENV === 'development'}"
    }
  }
}
```

### Manual Instantiation with createParallelWrapper

For advanced use cases where you need direct control over the LLM instance, you can manually apply parallel wrapping:

```typescript
import { createParallelWrapper } from '@minded-ai/mindedjs';
import { ChatOpenAI, AzureChatOpenAI } from '@langchain/openai';

// Manual wrapping for ChatOpenAI
const parallelOpenAI = createParallelWrapper(
  new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY,
    model: 'gpt-4o',
    temperature: 0,
  }),
  {
    numParallelRequests: 3,
    logTimings: true,
  },
);

// Manual wrapping for AzureChatOpenAI
const parallelAzure = createParallelWrapper(
  new AzureChatOpenAI({
    azureOpenAIApiKey: process.env.AZURE_OPENAI_API_KEY,
    azureOpenAIApiInstanceName: process.env.AZURE_OPENAI_INSTANCE!,
    azureOpenAIApiDeploymentName: process.env.AZURE_OPENAI_DEPLOYMENT!,
    azureOpenAIApiVersion: '2024-02-01',
  }),
  {
    numParallelRequests: 2,
    logTimings: false,
  },
);

// Use directly with Agent (bypassing configuration)
const agent = new Agent({
  memorySchema,
  config: {
    flows: ['./flows'],
    tools: [],
    llm: parallelOpenAI as any, // Direct LLM instance
  },
  tools: [],
});
```

**Note**: The configuration-based approach is recommended for most use cases as it's simpler and more maintainable. Use manual instantiation only when you need specific control over LLM creation or are integrating with existing LangChain workflows.

## How It Works

### MindedChatOpenAI (Backend Processing)

* Parallel requests are handled on the Minded platform backend
* Multiple requests sent to Azure OpenAI from the backend
* Fastest response returned to your agent
* Optimal for production deployments

### Other LLM Providers (Client-Side Processing)

* Parallel requests handled in your application
* Multiple requests sent directly to the LLM provider
* Good for development and custom deployments

## Best Practices

1. **Start Small**: Begin with 2-3 parallel requests
2. **Monitor Costs**: Each parallel request counts toward your usage
3. **Enable Logging**: Use `logTimings: true` during development to measure improvements
4. **Environment-Specific**: Use fewer parallel requests in development
5. **Configuration Over Code**: Prefer `minded.json` configuration over manual instantiation

## Troubleshooting

### No Performance Improvement

* Check network latency variability
* Ensure `numParallelRequests > 1`
* Verify timing logs are showing multiple requests

### Increased Costs

* Reduce `numParallelRequests`
* Consider cost vs. latency trade-offs
* Monitor usage patterns

### Rate Limiting

* Lower `numParallelRequests`
* Implement backoff strategies
* Contact provider about rate limits
