Document Processing

MindedJS provides powerful document processing capabilities that allow you to extract data from various document types including images, PDFs, Word documents, spreadsheets, and more.

Overview

The document processing system supports three modes of operation:

  1. Structured Extraction with Schema: Extract data into a predefined structure using AI

  2. Unstructured Extraction with Prompt: Extract information based on prompt instructions

  3. Raw Text Extraction: Extract plain text without AI processing

Supported document types:

  • Images: JPG, PNG, GIF, BMP, WebP, TIFF

  • Documents: PDF, DOC, DOCX, TXT, RTF, ODT

  • Spreadsheets: XLS, XLSX, CSV, ODS

  • Presentations: PPT, PPTX, ODP

  • Web formats: HTML, HTM, MD, XML

Library Tool Integration

The document processing functionality is available as a library tool called minded-parse-documents that can be added to your flows through the Minded platform. As a library tool, it provides several configuration advantages:

System Prompt Configuration

The systemPrompt property can be configured directly in the Minded platform's flow editor when adding the parseDocument tool to your flow. This allows you to:

  • Set extraction instructions without code changes

  • Customize prompts for different use cases within the same agent

  • Update extraction behavior through the UI without redeployment

When system prompt is configured in the platform:

  • The platform-configured prompt takes precedence and guides the AI extraction

  • The prompt is automatically applied to all document processing operations in that flow node

When system prompt is not set:

  • The tool falls back to using any Zod schema provided for structured extraction

  • If no schema is provided, the tool performs basic text extraction with minimal AI guidance

  • Raw text extraction mode bypasses AI processing entirely regardless of prompt configuration

Parameter Configuration

Other tool parameters that can be pre-configured in the platform include:

  • loadFrom: Source type (url, path, buffer, string)

  • extractRaw: Whether to extract raw text without AI processing

  • schema: Zod schema for structured data extraction

Parameters not set in the platform can still be provided by the LLM during execution, allowing for flexible hybrid configuration.

Quick Start

Structured Data Extraction (with Schema)

import { extractFromDocument } from '@minded-ai/mindedjs';
import { z } from 'zod';

// Define the data structure you want to extract
const schema = z.object({
  name: z.string(),
  email: z.string(),
  phoneNumber: z.string().optional(),
});

// Extract structured data using the agent's LLM
const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './path/to/id-card.jpg',
  schema,
  systemPrompt: 'Extract personal information from this ID document',
});

console.log(result.data.name); // Typed data matching your schema
console.log(result.metadata.processingTime); // Processing time in ms

Unstructured Extraction (Prompt Only)

// Extract information without a predefined schema
const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './contract.pdf',
  systemPrompt: `
    Extract and summarize:
    1. All parties involved
    2. Key dates and deadlines
    3. Payment terms
    4. Termination clauses
  `,
});

console.log(result.data); // String containing the extracted information

Raw Text Extraction (No AI)

// Extract raw text without AI processing
const result = await extractFromDocument({
  documentPath: './document.pdf',
  config: {
    llamaCloudApiKey: process.env.LLAMA_CLOUD_API_KEY,
  },
});

console.log(result.data); // Raw text content from the document

Using Document URLs

const result = await extractFromDocument({
  llm: agent.llm,
  documentUrl: 'https://example.com/invoice.pdf',
  schema: z.object({
    invoiceNumber: z.string(),
    amount: z.number(),
    dueDate: z.string(),
  }),
});

Configuration

Environment Variables

Set these environment variables for optimal functionality:

# Required for advanced document parsing (PDFs, Word docs, etc.)
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key

Document Processor Configuration

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './image.jpg',
  schema: yourSchema,
  config: {
    llamaCloudApiKey: 'your-key', // Override env variable
    useBase64: true, // Return images as base64
    maxImageWidth: 1600, // Max image width for processing
    imageQuality: 90, // JPEG quality (0-100)
  },
});

LLM Configuration

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './document.pdf',
  schema: yourSchema,
  llmConfig: {
    model: 'gpt-4o', // Override default model
    temperature: 0.1, // Lower temperature for consistent extraction
  },
});

Advanced Usage

Custom System Prompts

Tailor the AI's extraction behavior with custom prompts:

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './medical-record.pdf',
  schema: z.object({
    patientName: z.string(),
    diagnosis: z.string(),
    medications: z.array(z.string()),
  }),
  systemPrompt: `
    You are a medical data extraction specialist.
    Extract information with high accuracy and attention to medical terminology.
    If any required field is unclear or missing, mark it as "Not clearly specified".
  `,
});

Handling Multiple Document Sources

const processDocuments = async (documents: string[]) => {
  const results = await Promise.all(
    documents.map((path) =>
      extractFromDocument({
        llm: agent.llm,
        documentPath: path,
        schema: invoiceSchema,
      }),
    ),
  );

  return results.map((r) => r.data);
};

Error Handling

try {
  const result = await extractFromDocument({
    llm: agent.llm,
    documentPath: './document.pdf',
    schema: yourSchema,
  });

  logger.info({ message: 'Extracted', data: result.data });
} catch (err) {
  if (err.message.includes('LLAMA_CLOUD_API_KEY')) {
    logger.error({ message: 'LlamaCloud API key required for PDF processing', err });
  } else if (err.message.includes('Document not found')) {
    logger.error({ message: 'File does not exist', err });
  } else {
    logger.error({ message: 'Processing failed', err });
  }
}

Example Use Cases

Invoice Processing

const invoiceSchema = z.object({
  invoiceNumber: z.string(),
  vendor: z.string(),
  amount: z.number(),
  dueDate: z.string(),
  lineItems: z.array(
    z.object({
      description: z.string(),
      quantity: z.number(),
      unitPrice: z.number(),
    }),
  ),
});

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './invoice.pdf',
  schema: invoiceSchema,
  systemPrompt: 'Extract all invoice details including line items',
});

Identity Document Verification

const idSchema = z.object({
  documentType: z.enum(['passport', 'driver_license', 'national_id']),
  fullName: z.string(),
  dateOfBirth: z.string(),
  documentNumber: z.string(),
  expiryDate: z.string().optional(),
  issuingCountry: z.string().optional(),
});

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './id-document.jpg',
  schema: idSchema,
});

Contract Analysis

const contractSchema = z.object({
  parties: z.array(z.string()),
  effectiveDate: z.string(),
  expiryDate: z.string().optional(),
  keyTerms: z.array(z.string()),
  terminationClauses: z.array(z.string()),
  governingLaw: z.string().optional(),
});

const result = await extractFromDocument({
  llm: agent.llm,
  documentPath: './contract.docx',
  schema: contractSchema,
  systemPrompt: 'Focus on legal terms, dates, and parties involved',
});

Tool Implementation

Using the Library Tool

The easiest way to add document processing to your flows is using the minded-parse-documents library tool available in the Minded platform. Simply add it to your flow through the platform UI and configure the parameters as needed.

Standalone Usage

You can also use the document processor independently of the agent:

import { DocumentProcessor, extractFromDocument } from '@minded-ai/mindedjs';

// Using the class directly
const processor = new DocumentProcessor({
  llamaCloudApiKey: 'your-key',
  maxImageWidth: 1200,
});

const result = await processor.extractFromDocument({
  documentPath: './document.pdf',
  schema: yourSchema,
});

// Using the convenience function
const result2 = await extractFromDocument({
  documentPath: './document.pdf',
  schema: yourSchema,
  config: {
    llamaCloudApiKey: 'your-key',
  },
});

Troubleshooting

Common Issues

PDF Processing Fails

Error: PDF processing requires LLAMA_CLOUD_API_KEY

Solution: Set the LLAMA_CLOUD_API_KEY environment variable.

Image Too Large

Error: Image processing failed: Input image exceeds pixel limit

Solution: Reduce maxImageWidth in configuration or resize images beforehand.

Schema Validation Errors

Error: LLM extraction failed: Invalid schema

Solution: Check that your Zod schema matches expected data structure.

Network Timeouts

Error: Failed to fetch document from URL: timeout

Solution: Implement retry logic or download files locally first.

Debugging

Enable debug logging to troubleshoot issues:

// Set LOG_LEVEL environment variable
process.env.LOG_LEVEL = 'debug';

// Or use the logger directly
import { logger } from '@minded-ai/mindedjs';
logger.debug('Processing document:', { path: documentPath });

Supported File Types

Category
Extensions
Notes

Images

.jpg, .jpeg, .png, .gif, .bmp, .webp, .tiff

Direct processing

Documents

.pdf, .doc, .docx, .txt, .rtf, .odt

Requires LlamaCloud for advanced formats

Spreadsheets

.xls, .xlsx, .csv, .ods

Requires LlamaCloud for binary formats

Presentations

.ppt, .pptx, .odp

Requires LlamaCloud

Web

.html, .htm, .md, .xml

Basic text extraction

Last updated