# Document Processing

Parse and extract data from images, PDFs, Word documents, spreadsheets, and more using AI-powered document processing. This tool handles both single and multiple document parsing with optional data extraction.

## Overview

**Supported formats:** Images (JPG, PNG, GIF, BMP, WebP, TIFF), Documents (PDF, DOC, DOCX, TXT, RTF, ODT), Spreadsheets (XLS, XLSX, CSV, ODS), Presentations (PPT, PPTX, ODP), Web formats (HTML, HTM, MD, XML)

**Available extraction modes:**

1. **Structured Extraction with Schema**: Extract data into a predefined Zod schema using AI
2. **Structured Extraction with Prompt**: Guide extraction using custom prompts
3. **Raw Text Extraction**: Parse document and extract plain text without AI processing
4. **Multiple Document Processing**: Process multiple documents at once - content is concatenated for extraction

## Processing Modes

Document processing supports two modes: **managed** (default) and **local**. In managed mode documents are uploaded to Minded cloud and processed there, providing secure API key storage, automatic cost tracking, and no SDK configuration. Local mode processes documents directly in your SDK using LlamaCloud.

To use local mode, set the `DOCUMENT_PROCESSING_MODE` environment variable to `local` and provide a `LLAMA_CLOUD_API_KEY`:

```bash
DOCUMENT_PROCESSING_MODE=local
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key
```

## Document Quality

When using managed mode, you can control the quality of document parsing via the `quality` option. This maps to `DocumentQuality` exported from `@minded-ai/mindedjs`:

| Value        | Credit cost | Description                                                                                                                                                                          |
| ------------ | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `'advanced'` | 3x          | **Default.** High-quality OCR and parsing. Handles complex layouts, diagrams, images, scanned documents, and most document types with high accuracy. Recommended for production use. |
| `'standard'` | 1x          | Faster processing, lower cost. Suitable for text-based documents.                                                                                                                    |

The `quality` option is available on both the `parseDocument` and `parseDocumentAndExtractStructuredData` functions, as well as on the `minded-parse-documents` flow node. It has no effect in local processing mode.

## Using in Flows

Document processing includes built-in AI extraction - use the node's `prompt` and `outputSchema` properties to specify what data to extract. No additional extraction tool is needed.

**Available properties:**

* `parameters.documentSource` (string or array, required): URL or file path to a single document, or array of URLs/file paths to process multiple documents. When an array is provided, documents are parsed and concatenated with double newlines.
* `parameters.returnStructuredOutput` (boolean, optional, default: `false`): Set to `true` to enable AI-powered extraction, `false` for raw text only. When `true`, requires either `prompt` or `outputSchema` (or both)
* `quality` (`'standard'` | `'advanced'`, optional, default: `'advanced'`): Controls OCR/parsing quality in managed mode. `'advanced'` (default) provides high-quality extraction supporting complex layouts, diagrams, and images at 3x the credit cost of `'standard'`. Use `'standard'` for simple, clean text-based documents where speed and cost matter. Has no effect in local processing mode.
* `prompt` (string, optional): Instructions for AI-powered extraction. Ignored when `returnStructuredOutput` is `false`
* `outputSchema` (schema object, optional): Define the structure of extracted data for structured extraction. Ignored when `returnStructuredOutput` is `false`

### Examples

Raw text extraction without AI processing:

```yaml
name: Main flow
nodes:
  - name: 'parse-invoice'
    type: appTool
    displayName: 'Parse Invoice'
    actionKey: 'minded-parse-documents'
    actionName: 'Parse Document'
    appName: 'Minded'
    parameters:
      documentSource: '{state.memory.invoiceUrl}'
      returnStructuredOutput: false
```

Structured extraction with schema and prompt:

```yaml
name: Main flow
nodes:
  - name: 'parse-uploaded-document'
    type: appTool
    displayName: 'Parse Uploaded Document'
    actionKey: 'minded-parse-documents'
    actionName: 'Parse Document'
    appName: 'Minded'
    parameters:
      documentSource: '{state.memory.filePath}'
      returnStructuredOutput: true
    prompt: 'Extract invoice number, amount, date, and vendor'
    outputSchema:
      - name: invoiceNumber
        type: string
        description: Invoice number
      - name: amount
        type: number
        description: Total amount
      - name: date
        type: string
        description: Invoice date
      - name: vendor
        type: string
        description: Vendor name
```

Unstructured extraction guided by prompt only:

```yaml
name: Main flow
nodes:
  - name: 'extract-names-addresses'
    type: appTool
    displayName: 'Extract Names and Addresses'
    actionKey: 'minded-parse-documents'
    actionName: 'Parse Document'
    appName: 'Minded'
    parameters:
      documentSource: '{state.memory.uploadedFile}'
      returnStructuredOutput: true
    prompt: 'Extract all names and addresses from this document'
```

Processing multiple documents with structured extraction:

```yaml
name: Main flow
nodes:
  - name: 'parse-multiple-invoices'
    type: appTool
    displayName: 'Parse Multiple Invoices'
    actionKey: 'minded-parse-documents'
    actionName: 'Parse Document'
    appName: 'Minded'
    parameters:
      documentSource: '{state.memory.invoiceUrls}'
      # Also possible to pass an array as JSON with different items, including items from other arrays:
      # documentSource: '["{state.memory.invoiceFilePath}", "{state.memory.invoiceUrls[0]}"]
      returnStructuredOutput: true
    prompt: 'Extract all invoice data from the provided documents'
    outputSchema:
      - name: invoices
        type: array
        description: Array of invoice data
        items:
          - name: invoiceNumber
            type: string
            description: Invoice number
          - name: amount
            type: number
            description: Total amount
          - name: date
            type: string
            description: Invoice date
```

## Programmatic Usage

The SDK provides three main functions for document processing:

1. **`parseDocumentAndExtractStructuredData`** - Parse one or more documents and optionally extract structured data with AI

   ```typescript
   type parseDocumentAndExtractStructuredData = <T>(options: {
     documentSources: string[],               // Required: Array of URLs/file paths. The results of multiple documents are concatenated.
     sessionId: string,                       // Required: Session identifier
     returnStructuredOutput: boolean,         // Required: Enable/disable AI extraction
     llm?: BaseLanguageModel,                 // Optional: LLM instance (required when returnStructuredOutput is true)
     outputSchema?: ZodType<T>,               // Optional: Zod schema for structured extraction
     outputSchemaPrompt?: string,             // Optional: Instructions for extraction
     processingMode?: DocumentProcessingMode, // Optional: Processing mode (default: DocumentProcessingMode.MANAGED)
     llamaCloudApiKey?: string,               // Optional: API key for local mode
     quality?: DocumentQuality,               // Optional: 'standard' | 'advanced' (default: 'advanced'). Managed mode only.
   }) => Promise<{
     rawContent?: string,                     // Concatenated content when multiple documents provided
     structuredContent?: T | string           // Extracted from concatenated content when multiple documents provided
   }>
   ```
2. **`parseDocument`** - Parse document and extract raw text only

   ```typescript
   type parseDocument = (options: {
     documentSource: string,                  // Required: URL or file path
     sessionId: string,                       // Required: Session identifier
     processingMode?: DocumentProcessingMode, // Optional: Processing mode (default: DocumentProcessingMode.MANAGED)
     llamaCloudApiKey?: string,               // Optional: API key for local mode
     quality?: DocumentQuality,               // Optional: 'standard' | 'advanced' (default: 'advanced'). Managed mode only.
   }) => Promise<{
     rawContent?: string,
     metadata?: { fileSize?: number, fileType: string, processingTime: number, contentLength: number }
   }>
   ```
3. **`extractStructuredDataFromString`** - Extract structured data from already parsed text

   ```typescript
   type extractStructuredDataFromString = <T>(options: {
     content: string,                  // Required: Text content to extract from
     llm: BaseLanguageModel,           // Required: LLM instance
     sessionId: string,                // Required: Session identifier
     schema?: ZodType<T>,              // Optional: Zod schema for structured extraction
     prompt?: string                   // Optional: Instructions for extraction
   }) => Promise<T | string>
   ```

### Structured Extraction with Schema

Extract data matching a predefined Zod schema.

{% hint style="info" %}
**Note:** Use `nullable()` instead of `optional()` for fields that are not required.
{% endhint %}

```typescript
import { parseDocumentAndExtractStructuredData } from '@minded-ai/mindedjs';
import { z } from 'zod';

// Invoice processing with structured schema
const invoiceSchema = z.object({
  invoiceNumber: z.string(),
  vendor: z.string(),
  amount: z.number(),
  dueDate: z.string(),
  lineItems: z.array(
    z.object({
      description: z.string(),
      quantity: z.number(),
      unitPrice: z.number(),
    }),
  ),
});

const result = await parseDocumentAndExtractStructuredData({
  documentSources: ['./invoice.pdf'],
  sessionId: state.sessionId,
  returnStructuredOutput: true,
  llm: agent.llm,
  outputSchema: invoiceSchema,
  outputSchemaPrompt: 'Extract all invoice details including line items',
});

console.log(result.structuredContent); // Typed data matching your schema
console.log(result.rawContent); // Original raw text
```

### Structured Extraction with Prompt Only

Extract data using AI with a custom prompt, without a predefined schema.

```typescript
import { parseDocumentAndExtractStructuredData } from '@minded-ai/mindedjs';

// Contract analysis with prompt guidance
const result = await parseDocumentAndExtractStructuredData({
  documentSources: ['./contract.pdf'],
  sessionId: state.sessionId,
  returnStructuredOutput: true,
  llm: agent.llm,
  outputSchemaPrompt: 'Extract parties involved, key dates, payment terms, and termination clauses',
});

console.log(result.structuredContent); // String or object with extracted information
```

### Raw Text Extraction

Parse document without AI processing to get plain text only.

```typescript
import { parseDocument } from '@minded-ai/mindedjs';

const result = await parseDocument({
  documentSource: './document.pdf',
  sessionId: state.sessionId,
});

console.log(result.rawContent); // Raw extracted text
console.log(result.metadata); // File size, type, processing time, content length
```

### Using URLs

All functions accept URLs in addition to file paths via the `documentSource` and `documentSources` parameters.

```typescript
import { parseDocumentAndExtractStructuredData } from '@minded-ai/mindedjs';
import { z } from 'zod';

const result = await parseDocumentAndExtractStructuredData({
  documentSources: ['https://example.com/invoice.pdf'],
  sessionId: state.sessionId,
  returnStructuredOutput: true,
  llm: agent.llm,
  outputSchema: z.object({
    invoiceNumber: z.string(),
    amount: z.number(),
    dueDate: z.string(),
  }),
});
```

### Processing Multiple Documents

Process multiple documents by providing an array of URLs or file paths. Documents are parsed in parallel and their content is concatenated with double newlines before optional structured extraction.

```typescript
import { parseDocumentAndExtractStructuredData } from '@minded-ai/mindedjs';
import { z } from 'zod';

// Extract data from multiple invoices into an array
const invoiceSchema = z.array(
  z.object({
    invoiceNumber: z.string(),
    vendor: z.string(),
    amount: z.number(),
    date: z.string(),
  })
);

const result = await parseDocumentAndExtractStructuredData({
  documentSources: [
    './invoice1.pdf',
    './invoice2.pdf',
    './invoice3.pdf',
  ],
  sessionId: state.sessionId,
  returnStructuredOutput: true,
  llm: agent.llm,
  outputSchema: invoiceSchema,
  outputSchemaPrompt: 'Extract invoice data from all provided invoices',
});

console.log(result.structuredContent); // Array of invoice data
console.log(result.rawContent); // Concatenated text from all invoices
```

You can also extract raw text from multiple documents without structured extraction:

```typescript
import { parseDocumentAndExtractStructuredData } from '@minded-ai/mindedjs';

const result = await parseDocumentAndExtractStructuredData({
  documentSources: [
    'https://example.com/doc1.pdf',
    'https://example.com/doc2.pdf',
  ],
  sessionId: state.sessionId,
  returnStructuredOutput: false,
});

console.log(result.rawContent); // Concatenated text from both documents
```

### Identity Document Verification

Extract personal information from ID documents with structured validation.

```typescript
import { parseDocumentAndExtractStructuredData } from '@minded-ai/mindedjs';
import { z } from 'zod';

const idSchema = z.object({
  documentType: z.enum(['passport', 'driver_license', 'national_id']),
  fullName: z.string(),
  dateOfBirth: z.string(),
  documentNumber: z.string(),
  expiryDate: z.string().nullable(),
  address: z.string().nullable(),
});

const result = await parseDocumentAndExtractStructuredData({
  documentSources: ['./id-card.jpg'],
  sessionId: state.sessionId,
  returnStructuredOutput: true,
  llm: agent.llm,
  outputSchema: idSchema,
  outputSchemaPrompt: 'Extract personal information from this ID document',
});
```

### Extract from Already Parsed Content

Use `extractStructuredDataFromString` to extract structured data from text you've already obtained.

```typescript
import { extractStructuredDataFromString } from '@minded-ai/mindedjs';
import { z } from 'zod';

// With schema for structured extraction
const structured = await extractStructuredDataFromString({
  content: 'Invoice #12345\nTotal: $500.00\nDate: 2024-01-15',
  llm: agent.llm,
  schema: z.object({
    invoiceNumber: z.string(),
    totalAmount: z.number(),
    date: z.string(),
  }),
  sessionId: state.sessionId,
});
// Returns: {invoiceNumber: "12345", totalAmount: 500, date: "2024-01-15"}

// Without schema for unstructured extraction
const unstructured = await extractStructuredDataFromString({
  content: 'Invoice #12345\nTotal: $500.00\nDate: 2024-01-15',
  llm: agent.llm,
  prompt: 'Extract the invoice number, total amount, and date',
  sessionId: state.sessionId,
});
// Returns: String or object with LLM's analysis
```

### Local Processing Mode

By default, document processing uses the Minded cloud service. To process documents locally using LlamaCloud, set the `DOCUMENT_PROCESSING_MODE` environment variable and provide a LlamaCloud API key:

```bash
# Processing mode: 'managed' (default) or 'local'
DOCUMENT_PROCESSING_MODE=local

# Required for local mode:
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key
```

You can also pass the API key and processing mode directly to the function:

```typescript
import { parseDocument, DocumentProcessingMode } from '@minded-ai/mindedjs';

const result = await parseDocument({
  documentSource: './contract.pdf',
  processingMode: DocumentProcessingMode.LOCAL,
  sessionId: state.sessionId,
  llamaCloudApiKey: process.env.LLAMA_CLOUD_API_KEY,
});
```

## Supported File Types

| Category      | Extensions                                  | Notes                                    |
| ------------- | ------------------------------------------- | ---------------------------------------- |
| Images        | .jpg, .jpeg, .png, .gif, .bmp, .webp, .tiff | Direct processing                        |
| Documents     | .pdf, .doc, .docx, .txt, .rtf, .odt         | Requires LlamaCloud for advanced formats |
| Spreadsheets  | .xls, .xlsx, .csv, .ods                     | Requires LlamaCloud for binary formats   |
| Presentations | .ppt, .pptx, .odp                           | Requires LlamaCloud                      |
| Web           | .html, .htm, .md, .xml                      | Basic text extraction                    |
