Document Processing

Parse and extract data from images, PDFs, Word documents, spreadsheets, and more using AI-powered document processing. This tool handles both single and multiple document parsing with optional data extraction.

Overview

Supported formats: Images (JPG, PNG, GIF, BMP, WebP, TIFF), Documents (PDF, DOC, DOCX, TXT, RTF, ODT), Spreadsheets (XLS, XLSX, CSV, ODS), Presentations (PPT, PPTX, ODP), Web formats (HTML, HTM, MD, XML)

Available extraction modes:

  1. Structured Extraction with Schema: Extract data into a predefined Zod schema using AI

  2. Structured Extraction with Prompt: Guide extraction using custom prompts

  3. Raw Text Extraction: Parse document and extract plain text without AI processing

  4. Multiple Document Processing: Process multiple documents at once - content is concatenated for extraction

Processing Modes

Document processing supports two modes: managed (default) and local. In managed mode documents are uploaded to Minded cloud and processed there, providing secure API key storage, automatic cost tracking, and no SDK configuration. Local mode processes documents directly in your SDK using LlamaCloud.

To use local mode, set the DOCUMENT_PROCESSING_MODE environment variable to local and provide a LLAMA_CLOUD_API_KEY:

DOCUMENT_PROCESSING_MODE=local
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key

Using in Flows

Document processing includes built-in AI extraction - use the node's prompt and outputSchema properties to specify what data to extract. No additional extraction tool is needed.

Available properties:

  • parameters.documentSource (string or array, required): URL or file path to a single document, or array of URLs/file paths to process multiple documents. When an array is provided, documents are parsed and concatenated with double newlines.

  • parameters.returnStructuredOutput (boolean, optional, default: false): Set to true to enable AI-powered extraction, false for raw text only. When true, requires either prompt or outputSchema (or both)

  • prompt (string, optional): Instructions for AI-powered extraction. Ignored when returnStructuredOutput is false

  • outputSchema (schema object, optional): Define the structure of extracted data for structured extraction. Ignored when returnStructuredOutput is false

Examples

Raw text extraction without AI processing:

Structured extraction with schema and prompt:

Unstructured extraction guided by prompt only:

Processing multiple documents with structured extraction:

Programmatic Usage

The SDK provides three main functions for document processing:

  1. parseDocumentAndExtractStructuredData - Parse one or more documents and optionally extract structured data with AI

  2. parseDocument - Parse document and extract raw text only

  3. extractStructuredDataFromString - Extract structured data from already parsed text

Structured Extraction with Schema

Extract data matching a predefined Zod schema.

circle-info

Note: Use nullable() instead of optional() for fields that are not required.

Structured Extraction with Prompt Only

Extract data using AI with a custom prompt, without a predefined schema.

Raw Text Extraction

Parse document without AI processing to get plain text only.

Using URLs

All functions accept URLs in addition to file paths via the documentSource and documentSources parameters.

Processing Multiple Documents

Process multiple documents by providing an array of URLs or file paths. Documents are parsed in parallel and their content is concatenated with double newlines before optional structured extraction.

You can also extract raw text from multiple documents without structured extraction:

Identity Document Verification

Extract personal information from ID documents with structured validation.

Extract from Already Parsed Content

Use extractStructuredDataFromString to extract structured data from text you've already obtained.

Local Processing Mode

By default, document processing uses the Minded cloud service. To process documents locally using LlamaCloud, set the DOCUMENT_PROCESSING_MODE environment variable and provide a LlamaCloud API key:

You can also pass the API key and processing mode directly to the function:

Supported File Types

Category
Extensions
Notes

Images

.jpg, .jpeg, .png, .gif, .bmp, .webp, .tiff

Direct processing

Documents

.pdf, .doc, .docx, .txt, .rtf, .odt

Requires LlamaCloud for advanced formats

Spreadsheets

.xls, .xlsx, .csv, .ods

Requires LlamaCloud for binary formats

Presentations

.ppt, .pptx, .odp

Requires LlamaCloud

Web

.html, .htm, .md, .xml

Basic text extraction

Last updated