Document Processing
MindedJS provides powerful document processing capabilities that allow you to extract data from various document types including images, PDFs, Word documents, spreadsheets, and more.
Overview
The document processing system supports three modes of operation:
Structured Extraction with Schema: Extract data into a predefined structure using AI
Unstructured Extraction with Prompt: Extract information based on prompt instructions
Raw Text Extraction: Extract plain text without AI processing
Supported document types:
Images: JPG, PNG, GIF, BMP, WebP, TIFF
Documents: PDF, DOC, DOCX, TXT, RTF, ODT
Spreadsheets: XLS, XLSX, CSV, ODS
Presentations: PPT, PPTX, ODP
Web formats: HTML, HTM, MD, XML
Library Tool Integration
The document processing functionality is available as a library tool called minded-parse-documents
that can be added to your flows through the Minded platform. As a library tool, it provides several configuration advantages:
System Prompt Configuration
The systemPrompt
property can be configured directly in the Minded platform's flow editor when adding the parseDocument tool to your flow. This allows you to:
Set extraction instructions without code changes
Customize prompts for different use cases within the same agent
Update extraction behavior through the UI without redeployment
When system prompt is configured in the platform:
The platform-configured prompt takes precedence and guides the AI extraction
The prompt is automatically applied to all document processing operations in that flow node
When system prompt is not set:
The tool falls back to using any Zod schema provided for structured extraction
If no schema is provided, the tool performs basic text extraction with minimal AI guidance
Raw text extraction mode bypasses AI processing entirely regardless of prompt configuration
Parameter Configuration
Other tool parameters that can be pre-configured in the platform include:
loadFrom
: Source type (url, path, buffer, string)extractRaw
: Whether to extract raw text without AI processingschema
: Zod schema for structured data extraction
Parameters not set in the platform can still be provided by the LLM during execution, allowing for flexible hybrid configuration.
Quick Start
Structured Data Extraction (with Schema)
import { extractFromDocument } from '@minded-ai/mindedjs';
import { z } from 'zod';
// Define the data structure you want to extract
const schema = z.object({
name: z.string(),
email: z.string(),
phoneNumber: z.string().optional(),
});
// Extract structured data using the agent's LLM
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './path/to/id-card.jpg',
schema,
systemPrompt: 'Extract personal information from this ID document',
});
console.log(result.data.name); // Typed data matching your schema
console.log(result.metadata.processingTime); // Processing time in ms
Unstructured Extraction (Prompt Only)
// Extract information without a predefined schema
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './contract.pdf',
systemPrompt: `
Extract and summarize:
1. All parties involved
2. Key dates and deadlines
3. Payment terms
4. Termination clauses
`,
});
console.log(result.data); // String containing the extracted information
Raw Text Extraction (No AI)
// Extract raw text without AI processing
const result = await extractFromDocument({
documentPath: './document.pdf',
config: {
llamaCloudApiKey: process.env.LLAMA_CLOUD_API_KEY,
},
});
console.log(result.data); // Raw text content from the document
Using Document URLs
const result = await extractFromDocument({
llm: agent.llm,
documentUrl: 'https://example.com/invoice.pdf',
schema: z.object({
invoiceNumber: z.string(),
amount: z.number(),
dueDate: z.string(),
}),
});
Configuration
Environment Variables
Set these environment variables for optimal functionality:
# Required for advanced document parsing (PDFs, Word docs, etc.)
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key
Document Processor Configuration
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './image.jpg',
schema: yourSchema,
config: {
llamaCloudApiKey: 'your-key', // Override env variable
useBase64: true, // Return images as base64
maxImageWidth: 1600, // Max image width for processing
imageQuality: 90, // JPEG quality (0-100)
},
});
LLM Configuration
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './document.pdf',
schema: yourSchema,
llmConfig: {
model: 'gpt-4o', // Override default model
temperature: 0.1, // Lower temperature for consistent extraction
},
});
Advanced Usage
Custom System Prompts
Tailor the AI's extraction behavior with custom prompts:
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './medical-record.pdf',
schema: z.object({
patientName: z.string(),
diagnosis: z.string(),
medications: z.array(z.string()),
}),
systemPrompt: `
You are a medical data extraction specialist.
Extract information with high accuracy and attention to medical terminology.
If any required field is unclear or missing, mark it as "Not clearly specified".
`,
});
Handling Multiple Document Sources
const processDocuments = async (documents: string[]) => {
const results = await Promise.all(
documents.map((path) =>
extractFromDocument({
llm: agent.llm,
documentPath: path,
schema: invoiceSchema,
}),
),
);
return results.map((r) => r.data);
};
Error Handling
try {
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './document.pdf',
schema: yourSchema,
});
logger.info({ message: 'Extracted', data: result.data });
} catch (err) {
if (err.message.includes('LLAMA_CLOUD_API_KEY')) {
logger.error({ message: 'LlamaCloud API key required for PDF processing', err });
} else if (err.message.includes('Document not found')) {
logger.error({ message: 'File does not exist', err });
} else {
logger.error({ message: 'Processing failed', err });
}
}
Example Use Cases
Invoice Processing
const invoiceSchema = z.object({
invoiceNumber: z.string(),
vendor: z.string(),
amount: z.number(),
dueDate: z.string(),
lineItems: z.array(
z.object({
description: z.string(),
quantity: z.number(),
unitPrice: z.number(),
}),
),
});
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './invoice.pdf',
schema: invoiceSchema,
systemPrompt: 'Extract all invoice details including line items',
});
Identity Document Verification
const idSchema = z.object({
documentType: z.enum(['passport', 'driver_license', 'national_id']),
fullName: z.string(),
dateOfBirth: z.string(),
documentNumber: z.string(),
expiryDate: z.string().optional(),
issuingCountry: z.string().optional(),
});
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './id-document.jpg',
schema: idSchema,
});
Contract Analysis
const contractSchema = z.object({
parties: z.array(z.string()),
effectiveDate: z.string(),
expiryDate: z.string().optional(),
keyTerms: z.array(z.string()),
terminationClauses: z.array(z.string()),
governingLaw: z.string().optional(),
});
const result = await extractFromDocument({
llm: agent.llm,
documentPath: './contract.docx',
schema: contractSchema,
systemPrompt: 'Focus on legal terms, dates, and parties involved',
});
Tool Implementation
Using the Library Tool
The easiest way to add document processing to your flows is using the minded-parse-documents library tool available in the Minded platform. Simply add it to your flow through the platform UI and configure the parameters as needed.
Standalone Usage
You can also use the document processor independently of the agent:
import { DocumentProcessor, extractFromDocument } from '@minded-ai/mindedjs';
// Using the class directly
const processor = new DocumentProcessor({
llamaCloudApiKey: 'your-key',
maxImageWidth: 1200,
});
const result = await processor.extractFromDocument({
documentPath: './document.pdf',
schema: yourSchema,
});
// Using the convenience function
const result2 = await extractFromDocument({
documentPath: './document.pdf',
schema: yourSchema,
config: {
llamaCloudApiKey: 'your-key',
},
});
Troubleshooting
Common Issues
PDF Processing Fails
Error: PDF processing requires LLAMA_CLOUD_API_KEY
Solution: Set the LLAMA_CLOUD_API_KEY
environment variable.
Image Too Large
Error: Image processing failed: Input image exceeds pixel limit
Solution: Reduce maxImageWidth
in configuration or resize images beforehand.
Schema Validation Errors
Error: LLM extraction failed: Invalid schema
Solution: Check that your Zod schema matches expected data structure.
Network Timeouts
Error: Failed to fetch document from URL: timeout
Solution: Implement retry logic or download files locally first.
Debugging
Enable debug logging to troubleshoot issues:
// Set LOG_LEVEL environment variable
process.env.LOG_LEVEL = 'debug';
// Or use the logger directly
import { logger } from '@minded-ai/mindedjs';
logger.debug('Processing document:', { path: documentPath });
Supported File Types
Images
.jpg, .jpeg, .png, .gif, .bmp, .webp, .tiff
Direct processing
Documents
.pdf, .doc, .docx, .txt, .rtf, .odt
Requires LlamaCloud for advanced formats
Spreadsheets
.xls, .xlsx, .csv, .ods
Requires LlamaCloud for binary formats
Presentations
.ppt, .pptx, .odp
Requires LlamaCloud
Web
.html, .htm, .md, .xml
Basic text extraction
Last updated