# Data Extraction

Extract structured data from unstructured text using AI. The `minded-extraction` tool uses LLM capabilities to parse content and return data in a predefined format.

## Overview

* **Structured Extraction with Zod Schema**: Define exact data structure
* **Prompt-based Extraction**: Extract information using custom prompts
* **Validation and Retries**: Automatic validation with configurable retry logic
* **Structured Output Support**: Uses LLM native structured output when available

## Using in Flows

```yaml
- id: extractCustomerInfo
  type: tool
  toolName: minded-extraction
  prompt: Extract customer name, email, and phone number from the message
```

### Tool Parameters

| Parameter    | Type    | Description                            | Required |
| ------------ | ------- | -------------------------------------- | -------- |
| content      | string  | Text to extract from                   | Yes      |
| schema       | object  | Zod-compatible schema                  | No       |
| systemPrompt | string  | Custom instructions                    | No       |
| examples     | array   | Input/output examples                  | No       |
| strictMode   | boolean | Enable validation (default: true)      | No       |
| maxRetries   | number  | Retry attempts on failure (default: 3) | No       |
| defaultValue | any     | Fallback value                         | No       |

### Overriding Parameters in Flows

```yaml
- name: 'Extract Customer Info'
  type: 'tool'
  toolName: 'minded-extraction'
  parameters:
    content: '{state.memory.rawText}'
    schema:
      name:
        type: 'string'
        description: 'Customer full name'
      email:
        type: 'string'
        description: 'Email address'
        required: false
      phone:
        type: 'string'
    systemPrompt: 'Extract contact information from the text'
    strictMode: true
    maxRetries: 3
```

**Available schema field properties:**

* `type`: `'string'`, `'number'`, `'boolean'`, `'array'`, or `'object'`
* `description`: Optional field description
* `required`: Optional boolean (defaults to true)

## Programmatic Usage

```typescript
import { extract, createExtractor } from '@minded-ai/mindedjs';
import { z } from 'zod';

// Direct extraction
const result = await extract(
  content,
  {
    schema: z.object({
      name: z.string(),
      age: z.number(),
    }),
    systemPrompt: 'Extract person details',
  },
  agent.llm,
);

// Create reusable extractor
const extractor = createExtractor(schema, { systemPrompt: 'Extract data' });
const result = await extractor(content, agent.llm);
```

## How It Works

1. **With Structured Output Support**: Uses LLM's `withStructuredOutput` for direct schema-compliant extraction
2. **Fallback Mode**: Generates prompt with schema description, parses JSON, validates against Zod schema, retries with error feedback
3. **Non-strict Mode**: Skips validation for flexible extraction


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.minded.com/tooling/extraction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
