Document Processing
Parse and extract data from images, PDFs, Word documents, spreadsheets, and more using AI-powered document processing. This tool handles both single and multiple document parsing with optional data extraction.
Overview
Supported formats: Images (JPG, PNG, GIF, BMP, WebP, TIFF), Documents (PDF, DOC, DOCX, TXT, RTF, ODT), Spreadsheets (XLS, XLSX, CSV, ODS), Presentations (PPT, PPTX, ODP), Web formats (HTML, HTM, MD, XML)
Available extraction modes:
Structured Extraction with Schema: Extract data into a predefined Zod schema using AI
Structured Extraction with Prompt: Guide extraction using custom prompts
Raw Text Extraction: Parse document and extract plain text without AI processing
Multiple Document Processing: Process multiple documents at once - content is concatenated for extraction
Processing Modes
Document processing supports two modes: managed (default) and local. In managed mode documents are uploaded to Minded cloud and processed there, providing secure API key storage, automatic cost tracking, and no SDK configuration. Local mode processes documents directly in your SDK using LlamaCloud.
To use local mode, set the DOCUMENT_PROCESSING_MODE environment variable to local and provide a LLAMA_CLOUD_API_KEY:
DOCUMENT_PROCESSING_MODE=local
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_keyUsing in Flows
Document processing includes built-in AI extraction - use the node's prompt and outputSchema properties to specify what data to extract. No additional extraction tool is needed.
Available properties:
parameters.documentSource(string or array, required): URL or file path to a single document, or array of URLs/file paths to process multiple documents. When an array is provided, documents are parsed and concatenated with double newlines.parameters.returnStructuredOutput(boolean, optional, default:false): Set totrueto enable AI-powered extraction,falsefor raw text only. Whentrue, requires eitherpromptoroutputSchema(or both)prompt(string, optional): Instructions for AI-powered extraction. Ignored whenreturnStructuredOutputisfalseoutputSchema(schema object, optional): Define the structure of extracted data for structured extraction. Ignored whenreturnStructuredOutputisfalse
Examples
Raw text extraction without AI processing:
Structured extraction with schema and prompt:
Unstructured extraction guided by prompt only:
Processing multiple documents with structured extraction:
Programmatic Usage
The SDK provides three main functions for document processing:
parseDocumentAndExtractStructuredData- Parse one or more documents and optionally extract structured data with AIparseDocument- Parse document and extract raw text onlyextractStructuredDataFromString- Extract structured data from already parsed text
Structured Extraction with Schema
Extract data matching a predefined Zod schema.
Note: Use nullable() instead of optional() for fields that are not required.
Structured Extraction with Prompt Only
Extract data using AI with a custom prompt, without a predefined schema.
Raw Text Extraction
Parse document without AI processing to get plain text only.
Using URLs
All functions accept URLs in addition to file paths via the documentSource and documentSources parameters.
Processing Multiple Documents
Process multiple documents by providing an array of URLs or file paths. Documents are parsed in parallel and their content is concatenated with double newlines before optional structured extraction.
You can also extract raw text from multiple documents without structured extraction:
Identity Document Verification
Extract personal information from ID documents with structured validation.
Extract from Already Parsed Content
Use extractStructuredDataFromString to extract structured data from text you've already obtained.
Local Processing Mode
By default, document processing uses the Minded cloud service. To process documents locally using LlamaCloud, set the DOCUMENT_PROCESSING_MODE environment variable and provide a LlamaCloud API key:
You can also pass the API key and processing mode directly to the function:
Supported File Types
Images
.jpg, .jpeg, .png, .gif, .bmp, .webp, .tiff
Direct processing
Documents
.pdf, .doc, .docx, .txt, .rtf, .odt
Requires LlamaCloud for advanced formats
Spreadsheets
.xls, .xlsx, .csv, .ods
Requires LlamaCloud for binary formats
Presentations
.ppt, .pptx, .odp
Requires LlamaCloud
Web
.html, .htm, .md, .xml
Basic text extraction
Last updated