Document Processing

Parse and extract data from images, PDFs, Word documents, spreadsheets, and more using AI-powered document processing. This tool handles both document parsing and data extraction in a single step.

Overview

Supported formats: Images (JPG, PNG, GIF, BMP, WebP, TIFF), Documents (PDF, DOC, DOCX, TXT, RTF, ODT), Spreadsheets (XLS, XLSX, CSV, ODS), Presentations (PPT, PPTX, ODP), Web formats (HTML, HTM, MD, XML)

Built-in extraction modes:

Structured Extraction: Extract data into a predefined Zod schema using AI
Unstructured Extraction: Extract information based on prompt instructions
Raw Text Extraction: Extract plain text without AI processing

Processing Modes

Processing mode is controlled via the DOCUMENT_PROCESSING_MODE environment variable:

Managed (Default)

Backend handles processing. Benefits: secure API key storage, automatic cost tracking, no SDK configuration required.

Local

SDK handles processing. Set DOCUMENT_PROCESSING_MODE=local in your environment. Requires LLAMA_CLOUD_API_KEY in SDK environment.

Using in Flows

Document processing includes built-in AI extraction - use systemPrompt to specify what data to extract. No additional extraction tool is needed.

Example: Document Processing

- name: 'parse-invoice' # Must be unique within the flow
  type: appTool
  displayName: 'Parse Invoice'
  actionKey: 'minded-parse-documents'
  actionName: 'Parse Document'
  appName: 'Minded'
  parameters:
    documentSource: '{state.memory.invoiceUrl}'
    extractRaw: true

- name: 'parse-uploaded-document' # Must be unique within the flow
  type: appTool
  displayName: 'Parse Uploaded Document'
  actionKey: 'minded-parse-documents'
  actionName: 'Parse Document'
  appName: 'Minded'
  parameters:
    documentSource: '{state.memory.filePath}'
    systemPrompt: 'Extract invoice number, amount, date, and vendor'

- name: 'extract-names-addresses' # Must be unique within the flow
  type: appTool
  displayName: 'Extract Names and Addresses'
  actionKey: 'minded-parse-documents'
  actionName: 'Parse Document'
  appName: 'Minded'
  parameters:
    documentSource: '{state.memory.uploadedFile}'
    systemPrompt: 'Extract all names and addresses from this document'

Available parameters:

documentSource: URL or file path to the document (auto-detected)
extractRaw: Set to true for raw text without AI
schema: Zod schema for structured extraction
systemPrompt: Instructions for AI-powered extraction

Programmatic Usage

Structured Extraction

Note: use nullable() instead of optional() for fields that are not required.

import { extractFromDocument } from '@minded-ai/mindedjs';
import { z } from 'zod';

const schema = z.object({
  name: z.string(),
  email: z.string(),
  phoneNumber: z.string().nullable(),
});

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './id-card.jpg',
  schema,
  systemPrompt: 'Extract personal information from this ID document',
});

console.log(result.data.name); // Typed data matching your schema

Unstructured Extraction

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './contract.pdf',
  systemPrompt: 'Extract parties involved, key dates, payment terms, and termination clauses',
});

console.log(result.data); // String with extracted information

Raw Text Extraction

const result = await extractFromDocument({
  documentPath: './document.pdf',
  config: {
    llamaCloudApiKey: process.env.LLAMA_CLOUD_API_KEY,
  },
});

console.log(result.data); // Raw text content

Using URLs

const result = await extractFromDocument({
  llm: agent.llm,
  documentUrl: 'https://example.com/invoice.pdf',
  schema: z.object({
    invoiceNumber: z.string(),
    amount: z.number(),
    dueDate: z.string(),
  }),
});

Configuration

Environment Variables

# Processing mode: 'managed' (default) or 'local'
DOCUMENT_PROCESSING_MODE=managed

# Required for local mode only:
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key

Document Processor Options

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './image.jpg',
  schema: yourSchema,
  config: {
    llamaCloudApiKey: 'your-key',
    useBase64: true,
    maxImageWidth: 1600,
    imageQuality: 90,
  },
});

LLM Configuration

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './document.pdf',
  schema: yourSchema,
  llmConfig: {
    model: 'gpt-4o',
    temperature: 0.1,
  },
});

Example Use Cases

Invoice Processing

const invoiceSchema = z.object({
  invoiceNumber: z.string(),
  vendor: z.string(),
  amount: z.number(),
  dueDate: z.string(),
  lineItems: z.array(
    z.object({
      description: z.string(),
      quantity: z.number(),
      unitPrice: z.number(),
    }),
  ),
});

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './invoice.pdf',
  schema: invoiceSchema,
  systemPrompt: 'Extract all invoice details including line items',
});

Identity Document Verification

const idSchema = z.object({
  documentType: z.enum(['passport', 'driver_license', 'national_id']),
  fullName: z.string(),
  dateOfBirth: z.string(),
  documentNumber: z.string(),
  expiryDate: z.string().nullable(),
});

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './id-document.jpg',
  schema: idSchema,
});

Extract Names and Addresses

const contactsSchema = z.object({
  contacts: z.array(
    z.object({
      name: z.string(),
      address: z.string(),
    }),
  ),
});

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './document.pdf',
  schema: contactsSchema,
  systemPrompt: 'Extract all names and addresses from this document',
});

Troubleshooting

PDF Processing Fails

Error: PDF processing requires LLAMA_CLOUD_API_KEY

Solution (local mode only): Set the LLAMA_CLOUD_API_KEY environment variable.

Image Too Large

Error: Image processing failed: Input image exceeds pixel limit

Solution: Reduce maxImageWidth in configuration.

Schema Validation Errors

Error: LLM extraction failed: Invalid schema

Solution: Verify your Zod schema matches the expected data structure.