# Data Extraction

Extract structured data from unstructured text using AI. The `minded-extraction` tool uses LLM capabilities to parse content and return data in a predefined format.

## Overview

* **Structured Extraction with Zod Schema**: Define exact data structure
* **Prompt-based Extraction**: Extract information using custom prompts
* **Validation and Retries**: Automatic validation with configurable retry logic
* **Structured Output Support**: Uses LLM native structured output when available

## Using in Flows

```yaml
- id: extractCustomerInfo
  type: tool
  toolName: minded-extraction
  prompt: Extract customer name, email, and phone number from the message
```

### Tool Parameters

| Parameter    | Type    | Description                            | Required |
| ------------ | ------- | -------------------------------------- | -------- |
| content      | string  | Text to extract from                   | Yes      |
| schema       | object  | Zod-compatible schema                  | No       |
| systemPrompt | string  | Custom instructions                    | No       |
| examples     | array   | Input/output examples                  | No       |
| strictMode   | boolean | Enable validation (default: true)      | No       |
| maxRetries   | number  | Retry attempts on failure (default: 3) | No       |
| defaultValue | any     | Fallback value                         | No       |

### Overriding Parameters in Flows

```yaml
- name: 'Extract Customer Info'
  type: 'tool'
  toolName: 'minded-extraction'
  parameters:
    content: '{state.memory.rawText}'
    schema:
      name:
        type: 'string'
        description: 'Customer full name'
      email:
        type: 'string'
        description: 'Email address'
        required: false
      phone:
        type: 'string'
    systemPrompt: 'Extract contact information from the text'
    strictMode: true
    maxRetries: 3
```

**Available schema field properties:**

* `type`: `'string'`, `'number'`, `'boolean'`, `'array'`, or `'object'`
* `description`: Optional field description
* `required`: Optional boolean (defaults to true)

## Programmatic Usage

```typescript
import { extract, createExtractor } from '@minded-ai/mindedjs';
import { z } from 'zod';

// Direct extraction
const result = await extract(
  content,
  {
    schema: z.object({
      name: z.string(),
      age: z.number(),
    }),
    systemPrompt: 'Extract person details',
  },
  agent.llm,
);

// Create reusable extractor
const extractor = createExtractor(schema, { systemPrompt: 'Extract data' });
const result = await extractor(content, agent.llm);
```

## How It Works

1. **With Structured Output Support**: Uses LLM's `withStructuredOutput` for direct schema-compliant extraction
2. **Fallback Mode**: Generates prompt with schema description, parses JSON, validates against Zod schema, retries with error feedback
3. **Non-strict Mode**: Skips validation for flexible extraction
