Document Data Ingestion allows users to extract structured data from uploaded documents and add it to the data context of a workflow. This makes it easier to automate workflows that rely on information stored in PDFs, images, or scanned files.
How to add
Select the Extract data from files option and add to your Workflow.
Setup
First, select the File Upload field from your workflow’s Forms step.
Once a file is uploaded, users can choose between two options for extracting data:
Option 1: From Test File
- Upload a test document (PDF or image).
- Click “Extract fields”.
- The system will automatically generate example prompts based on the uploaded document.
- Users can review, edit, or remove these prompts.
Supported file types: PDF, image (e.g., JPG, PNG).
Option 2: Manual
Users can manually define their extraction prompts:
- In the Field Name of the data extraction rules, enter the label for the data field.
- This label will appear in the workflow’s data context mapping for use in downstream steps.
Adding a Data Field Name & Writing Effective Prompts
The last step is to specify the label to assign to extracted data. This label will appear in the workflow's data context and be available for mapping in future steps.
The optional Custom Instructions field allows users to give detailed instructions on how to locate, extract, and optionally modify the data before it’s used. This might include:
- Formatting changes
- Summarization
- Validation logic
Tip: Don’t hesitate to get creative! Tools like ChatGPT, Claude, or Gemini can help generate effective prompts.
Prompt Examples by Strategy
Field: Patient Name
Strategy | Prompt |
---|---|
Label-Based | “Extract the text next to the label ‘Patient Name:’.” |
Contextual | “Find the name above or near ‘Date of Birth’ or ‘Sex’.” |
Positional | “Look for a name in the upper-left corner of the document header.” |
Pattern-Based | “Extract the first full name (First Last) near identifiers like MRN.” |
Fallback | “If multiple names, return one near ‘Patient Info’ or ‘DOB’.” |
Field: Company Name (Invoices, Contracts, Forms)
Strategy | Prompt |
---|---|
Label-Based | “Find the value next to ‘Company Name’ or ‘Organization Name’.” |
Header-Based | “Use the first bold, capitalized name at the top of the document.” |
Contextual | “Identify sender’s company near the address or contact info.” |
Pattern-Based | “Look for names ending in Inc, LLC, Ltd, or Corp near the header.” |
Field: Invoice Total (Invoices, Billing Docs)
Strategy | Prompt |
---|---|
Label-Based | “Extract the amount labeled ‘Total Due’, ‘Balance Due’, etc.” |
Positional | “Use the largest currency value in the bottom-right corner.” |
Fallback | “Use the last currency value before payment instructions.” |
Formatting | “Normalize to $X,XXX.XX format and remove extra characters.” |
Field: Contract Effective Date (Agreements, Legal Docs)
Strategy | Prompt |
---|---|
Label-Based | “Find the date labeled ‘Effective Date’ or ‘Agreement Start Date’.” |
Semantic | “Extract the date after ‘This agreement is made effective as of’.” |
Positional | “Use the first date in the contract body’s opening paragraph.” |
Formatting | “Return the date in ISO format: YYYY-MM-DD.” |
Custom Instructions (Advanced Prompts)
Use this area to define global rules for document handling. These apply across the entire file, not just a single field.
Examples:
- “If this isn’t a driver’s license, don’t extract anything.”
- “Only process documents that are invoices.”
- “Reject if this is a handwritten document.”
- “Ignore PDFs with more than 2 pages.”
- “Don’t upload the document if any required field is missing.”
- “Process only if document contains the word ‘Contract’.”
- “Reject all scanned images; only accept digital PDFs.”
Feature Limitations
- Only accepts documents submitted through the File Upload field on a Form step.
- Designed for reviewing documents like PDFs and images; cannot modify or generate documents.
Compliance & Security
This feature leverages Claude for document processing and data extraction. Please ensure your use of this feature complies with your organization’s data handling and privacy policies.
Comments
0 comments
Article is closed for comments.