Import PDF File Scripting - AI Importing: Difference between revisions
No edit summary |
No edit summary |
||
| Line 3: | Line 3: | ||
== Overview == | == Overview == | ||
The Invoice | The AI-Powered PDF Invoice Import System enables automated extraction and processing of supplier invoices from PDF documents using artificial intelligence. The system uses vision AI models to read invoice PDFs and automatically create AP (Accounts Payable) invoices in the accounting system, eliminating manual data entry. | ||
== | == System Capabilities == | ||
* '''Multi-file processing''': | * '''Multi-file processing''': Select and process multiple PDF invoices in a single operation | ||
* '''AI | * '''AI vision processing''': Leverages advanced AI models to read and interpret invoice documents | ||
* ''' | * '''Flexible extraction''': Adapts to different invoice formats and layouts automatically | ||
* ''' | * '''OCR accuracy''': Intelligent character recognition with disambiguation of similar characters | ||
* '''Automated invoice creation''': Generates complete AP invoices with line items | * '''Mathematical validation''': Verifies totals and GST calculations for data integrity | ||
* '''Batch | * '''Automated invoice creation''': Generates complete AP invoices with all line items | ||
* '''Batch support''': Optional batch processing for grouping related supplier invoices | |||
== | == AI Processing Methods == | ||
The system supports two processing approaches: | |||
=== Vision-Based Processing (Recommended) === | |||
Processes the PDF directly using AI vision models: | |||
* Analyzes the visual layout and structure of the invoice | |||
* Handles complex formatting, tables, and multi-column layouts | |||
* Works with scanned documents and image-based PDFs | |||
* Processes multi-page invoices as a complete document | |||
* More accurate for invoices with complex layouts | |||
=== Text Extraction Processing === | |||
Extracts text from PDF then processes with AI: | |||
* Uses GemBox libraries to extract structured text | |||
* Suitable for text-based PDFs with simple layouts | |||
* Faster processing for straightforward invoices | |||
* May struggle with complex table layouts or scanned documents | |||
== Supported AI Providers == | |||
=== | === Anthropic (Claude) === | ||
* | * Default and recommended provider | ||
* | * Excellent document understanding capabilities | ||
* | * Strong structured data extraction | ||
* | * Handles complex invoice layouts | ||
* | |||
* | === OpenAI (GPT Vision) === | ||
* | |||
* Alternative vision processing option | |||
* Compatible with GPT-4 Vision models | |||
* Good for standard invoice formats | |||
== Configuration Parameters == | |||
The system requires the following configuration: | |||
* '''AIVisionEnabled''': Enable/disable vision processing (true/false) | |||
* '''AIModel''': Model identifier string | |||
* '''AIModelId''': Specific model version (e.g., "claude-sonnet-4-20250514") | |||
* '''AIApiKey''': Encrypted API key for AI service authentication | |||
* '''FileName''': Path to PDF file (when processing single files) | |||
== OCR and Character Recognition == | |||
The AI system includes intelligent character disambiguation: | |||
=== Common Character Confusions === | |||
The system is instructed to carefully distinguish between: | |||
* '''Z vs 2''': Z has diagonal line, 2 has curves | |||
* '''O vs 0''': O is round, 0 may have slash or be more oval | |||
* '''I vs 1 vs l''': Context-aware recognition (numbers vs letters) | |||
* '''S vs 5''': Shape and context analysis | |||
* '''G vs 6''': Character form recognition | |||
=== Validation Checks === | |||
* Mathematical verification of line totals | |||
* GST calculation validation (typically 15% in NZ) | |||
* Cross-checking of subtotals and final amounts | |||
* Multi-page continuity verification | |||
== Data Extraction Process == | |||
The system can extract various fields depending on the invoice format: | |||
=== Standard Fields === | |||
* | * Invoice Number | ||
* | * Invoice Date | ||
* | * Company/Supplier Name | ||
* | * Client/Customer Name | ||
* Account Numbers | |||
* Order Numbers | |||
* Reference Numbers | |||
=== Financial Totals === | === Financial Totals === | ||
* | * Line item amounts | ||
* | * Subtotal (excluding GST) | ||
* Total | * GST/Tax amount | ||
* Total amount (including GST) | |||
== | === Line Item Details === | ||
* Product codes or descriptions | |||
* Quantities | |||
* Unit prices | |||
* Extended amounts | |||
* Date information (for time-based services) | |||
* Reference codes or job numbers | |||
=== Custom Fields === | |||
The AI prompt can be customized to extract additional fields specific to your supplier's invoice format: | |||
* | * Vehicle registration numbers | ||
* | * Serial numbers | ||
* | * Odometer readings | ||
* | * Job descriptions | ||
* Barcode data | |||
* Custom reference fields | |||
== | == Invoice Creation Configuration == | ||
=== Transaction Settings === | |||
* '''Transaction Type''': APINV (AP Invoice) | |||
* '''Transaction Type Code''': 20 (default for AP invoices) | |||
* '''Created Date''': Defaults to current date/time | |||
* '''Created User''': Configurable (default: "Admin") | |||
=== Processing Dates === | |||
* '''Invoice Date''': Extracted from PDF | |||
* '''Payment Date''': Calculated (typically 20th of following month) | |||
* '''Process Date''': Defaults to current date | |||
=== Account Assignments === | |||
* '''Supplier ID (OtherPartyId)''': Must be configured in script | |||
* '''Location ID''': Configurable (default: "Misc") | |||
* '''Order Number''': Extracted from invoice or use reference | |||
* '''Reference''': Invoice number from PDF | |||
=== Line Item | === Line Item Configuration === | ||
''' | * '''Item ID''': Default item code for imported lines (must be configured) | ||
* '''Description''': Extracted from PDF line items | |||
* '''Department''': Optional, can be mapped from invoice fields | |||
* '''Quantity''': Extracted from line items | |||
* '''Unit Price''': Calculated or extracted | |||
* '''GST Treatment''': Configurable (GST inclusive or exclusive) | |||
== GST Calculation Methods == | |||
The system supports different GST calculation approaches: | |||
=== GST Inclusive Amounts === | |||
When invoice amounts include GST: | |||
<pre> | |||
Total With GST = Line Amount (as shown on invoice) | |||
GST Exc Total = Total With GST ÷ 1.15 | |||
GST Amount = Total With GST - GST Exc Total | |||
Unit Price = GST Exc Total ÷ Quantity | |||
</pre> | |||
=== | === GST Exclusive Amounts === | ||
When invoice amounts exclude GST: | |||
<pre> | <pre> | ||
GST Exc Total = Line | GST Exc Total = Line Amount (as shown on invoice) | ||
Total With GST = GST Exc Total × 1.15 | Total With GST = GST Exc Total × 1.15 | ||
GST Amount = Total With GST - GST Exc Total | GST Amount = Total With GST - GST Exc Total | ||
| Line 123: | Line 182: | ||
</pre> | </pre> | ||
== | == Multi-Page Invoice Handling == | ||
The system is designed to handle multi-page invoices: | |||
* Processes all pages of the PDF document | |||
* Extracts line items spanning multiple pages | |||
* Maintains line item sequence | |||
* Validates totals across entire document | |||
* Provides page count in extraction results | |||
== Batch Processing == | |||
=== Batch | === Batch Creation === | ||
Optional batch | Optional batch grouping for related invoices: | ||
* Batch ID | * '''Batch ID''': Defaults to supplier ID or can be specified | ||
* | * '''Batch Comment''': Descriptive text for the batch | ||
* | * '''Automatic Batching''': Groups invoices by supplier | ||
=== Batch Benefits === | |||
* Groups related invoices for review | |||
* Simplifies posting process | |||
* Maintains audit trail | |||
* Enables bulk approval workflows | |||
== Error Handling == | == Error Handling == | ||
The system | The system provides comprehensive error handling: | ||
=== File Selection Errors === | |||
* No file selected | |||
* Invalid file type (non-PDF) | |||
* File not found or inaccessible | |||
* Corrupted PDF files | |||
=== | === AI Processing Errors === | ||
* Invalid or | * AI service connection failures | ||
* | * Invalid or empty AI responses | ||
* | * JSON parsing errors | ||
* Incomplete data extraction | |||
=== Data | === Data Validation Errors === | ||
* Missing required fields | * Missing required fields | ||
* | * Invalid date formats | ||
* Mathematical inconsistencies | |||
* Zero or negative amounts | |||
=== Invoice Creation Errors === | === Invoice Creation Errors === | ||
* Invalid supplier | * Invalid supplier ID | ||
* Missing | * Missing item master data | ||
* Missing department codes | |||
* Transaction validation failures | * Transaction validation failures | ||
* | * Database constraint violations | ||
== Customizing the AI Prompt == | |||
The extraction prompt can be customized for specific invoice formats: | |||
=== Prompt Structure === | |||
<pre> | |||
1. Role definition (extraction agent) | |||
2. Task description (extract invoice data) | |||
3. OCR guidance (character disambiguation) | |||
4. Field list (specific fields to extract) | |||
5. JSON schema (output format) | |||
6. Validation rules (totals, dates, etc.) | |||
7. Output constraints (no extra text, no markdown) | |||
</pre> | |||
=== Customization Examples === | |||
'''Adding Custom Fields:''' | |||
Add fields to the JSON schema in the prompt to extract additional data specific to your invoices. | |||
'''Changing Date Formats:''' | |||
Specify date format requirements in the prompt (e.g., "yyyy-MM-dd", "dd/MM/yyyy"). | |||
'''Field Name Variations:''' | |||
Provide alternative field names the AI should recognize (e.g., "Cust. Order No", "Customer Order", "Order Ref"). | |||
'''Calculation Rules:''' | |||
Specify how amounts should be calculated or validated. | |||
== Implementation Workflow == | |||
=== Basic Implementation Steps === | |||
# Configure AI provider and API credentials | |||
# Set default supplier ID and item codes | |||
# Customize extraction prompt for invoice format | |||
# Configure GST calculation method | |||
# Set location and account defaults | |||
# Test with sample invoices | |||
# Review and validate created invoices | |||
# Adjust prompt and settings as needed | |||
=== Testing Recommendations === | |||
* Start with clear, simple invoices | |||
* Verify mathematical accuracy of extractions | |||
* Check department and item code assignments | |||
* Validate date parsing and calculations | |||
* Test multi-page invoice handling | |||
* Review batch creation behavior | |||
== | == Best Practices == | ||
=== | === Invoice Preparation === | ||
* Use clear, readable PDF scans | |||
* Ensure full pages are captured | |||
* Avoid skewed or rotated scans | |||
* Check PDF file integrity before processing | |||
* Process similar invoice types together | |||
=== | === Configuration === | ||
* | * Set appropriate default values for all parameters | ||
* | * Use descriptive batch comments | ||
* | * Configure supplier-specific item codes | ||
* | * Validate master data prerequisites | ||
* | * Document custom prompt modifications | ||
=== | === Data Quality === | ||
* Review | * Review AI-extracted data before posting | ||
* Verify | * Verify mathematical calculations | ||
* | * Check supplier ID assignments | ||
* | * Validate department code mapping | ||
* | * Confirm date calculations | ||
== | === Performance === | ||
* Process invoices in reasonable batch sizes | |||
* Monitor AI service response times | |||
* Handle errors gracefully with clear messages | |||
* Log processing results for audit trails | |||
== Advanced Features == | |||
=== | === JSON Response Extraction === | ||
The system includes a helper method to extract clean JSON from AI responses: | |||
* Navigates AI response structure | |||
* Extracts text content from nested JSON | |||
* Handles various response formats | |||
* Provides error handling for malformed responses | |||
=== Dynamic Field Mapping === | |||
The system can map extracted fields to invoice line items: | |||
* Product codes to item IDs | |||
* Reference numbers to departments | |||
* Custom fields to standard accounting fields | |||
* Date parsing and conversion | |||
=== Calculation Flexibility === | |||
Supports various calculation scenarios: | |||
* | * Zero-quantity items (single services) | ||
* Division by zero protection | |||
* | * Rounding rules for currency | ||
* | * Tax-inclusive vs tax-exclusive amounts | ||
* | |||
== Integration Points == | == Integration Points == | ||
The | The PDF Import System integrates with: | ||
* '''Item Management''': Item lookup and validation | * '''Item Management''': Item code lookup and validation | ||
* '''Department Management''': Department code resolution | * '''Department Management''': Department code resolution | ||
* '''Transaction Processing''': Invoice creation and | * '''Supplier Management''': Supplier/vendor record validation | ||
* '''Transaction Processing''': Invoice creation and persistence | |||
* '''Batch Management''': Batch header creation and tracking | * '''Batch Management''': Batch header creation and tracking | ||
* '''AI Vision Services''': External API for document analysis | * '''AI Vision Services''': External API for document analysis | ||
== Security Considerations == | |||
* API keys are encrypted using 128-bit encryption | |||
* File access restricted to allowed paths | |||
* User authentication tracked for created invoices | |||
* Audit trail maintained for all imports | |||
== Troubleshooting == | |||
=== Common Issues === | |||
'''Problem''': AI extracts incorrect invoice numbers<br/> | |||
'''Solution''': Add specific field location hints in prompt, emphasize OCR character disambiguation | |||
'''Problem''': Missing line items from multi-page invoices<br/> | |||
'''Solution''': Ensure prompt explicitly mentions checking all pages, verify PDF page count | |||
'''Problem''': GST calculations don't match<br/> | |||
'''Solution''': Verify GST inclusive/exclusive setting matches invoice format | |||
'''Problem''': Department codes not assigned<br/> | |||
'''Solution''': Check department master data exists, verify field mapping in script | |||
'''Problem''': JSON deserialization errors<br/> | |||
'''Solution''': Check AI response format, verify date format compatibility, review JSON schema | |||
== Future Enhancements == | |||
Potential improvements for consideration: | |||
* Automatic supplier detection and matching | |||
* Learning from correction patterns | |||
* Support for additional currencies | |||
* Purchase order matching (three-way matching) | |||
* Email-based invoice submission | |||
* Duplicate invoice detection | |||
* Confidence scoring for extracted data | |||
* Interactive review and correction interface | |||
* Export of extraction results for verification | |||
* Batch progress tracking and reporting | |||
== See Also == | == See Also == | ||
| Line 248: | Line 417: | ||
* [[Transaction Processing]] | * [[Transaction Processing]] | ||
* [[AI Vision Services]] | * [[AI Vision Services]] | ||
* [[Batch Processing]] | |||
* [[Supplier Management]] | |||
[[Category:Scripting]] | [[Category:Scripting]] | ||
[[Category:Import Functions]] | [[Category:Import Functions]] | ||
[[Category:AI Features]] | [[Category:AI Features]] | ||
[[Category:Accounts Payable]] | |||
Revision as of 03:26, 10 November 2025
Overview
The AI-Powered PDF Invoice Import System enables automated extraction and processing of supplier invoices from PDF documents using artificial intelligence. The system uses vision AI models to read invoice PDFs and automatically create AP (Accounts Payable) invoices in the accounting system, eliminating manual data entry.
System Capabilities
- Multi-file processing: Select and process multiple PDF invoices in a single operation
- AI vision processing: Leverages advanced AI models to read and interpret invoice documents
- Flexible extraction: Adapts to different invoice formats and layouts automatically
- OCR accuracy: Intelligent character recognition with disambiguation of similar characters
- Mathematical validation: Verifies totals and GST calculations for data integrity
- Automated invoice creation: Generates complete AP invoices with all line items
- Batch support: Optional batch processing for grouping related supplier invoices
AI Processing Methods
The system supports two processing approaches:
Vision-Based Processing (Recommended)
Processes the PDF directly using AI vision models:
- Analyzes the visual layout and structure of the invoice
- Handles complex formatting, tables, and multi-column layouts
- Works with scanned documents and image-based PDFs
- Processes multi-page invoices as a complete document
- More accurate for invoices with complex layouts
Text Extraction Processing
Extracts text from PDF then processes with AI:
- Uses GemBox libraries to extract structured text
- Suitable for text-based PDFs with simple layouts
- Faster processing for straightforward invoices
- May struggle with complex table layouts or scanned documents
Supported AI Providers
Anthropic (Claude)
- Default and recommended provider
- Excellent document understanding capabilities
- Strong structured data extraction
- Handles complex invoice layouts
OpenAI (GPT Vision)
- Alternative vision processing option
- Compatible with GPT-4 Vision models
- Good for standard invoice formats
Configuration Parameters
The system requires the following configuration:
- AIVisionEnabled: Enable/disable vision processing (true/false)
- AIModel: Model identifier string
- AIModelId: Specific model version (e.g., "claude-sonnet-4-20250514")
- AIApiKey: Encrypted API key for AI service authentication
- FileName: Path to PDF file (when processing single files)
OCR and Character Recognition
The AI system includes intelligent character disambiguation:
Common Character Confusions
The system is instructed to carefully distinguish between:
- Z vs 2: Z has diagonal line, 2 has curves
- O vs 0: O is round, 0 may have slash or be more oval
- I vs 1 vs l: Context-aware recognition (numbers vs letters)
- S vs 5: Shape and context analysis
- G vs 6: Character form recognition
Validation Checks
- Mathematical verification of line totals
- GST calculation validation (typically 15% in NZ)
- Cross-checking of subtotals and final amounts
- Multi-page continuity verification
Data Extraction Process
The system can extract various fields depending on the invoice format:
Standard Fields
- Invoice Number
- Invoice Date
- Company/Supplier Name
- Client/Customer Name
- Account Numbers
- Order Numbers
- Reference Numbers
Financial Totals
- Line item amounts
- Subtotal (excluding GST)
- GST/Tax amount
- Total amount (including GST)
Line Item Details
- Product codes or descriptions
- Quantities
- Unit prices
- Extended amounts
- Date information (for time-based services)
- Reference codes or job numbers
Custom Fields
The AI prompt can be customized to extract additional fields specific to your supplier's invoice format:
- Vehicle registration numbers
- Serial numbers
- Odometer readings
- Job descriptions
- Barcode data
- Custom reference fields
Invoice Creation Configuration
Transaction Settings
- Transaction Type: APINV (AP Invoice)
- Transaction Type Code: 20 (default for AP invoices)
- Created Date: Defaults to current date/time
- Created User: Configurable (default: "Admin")
Processing Dates
- Invoice Date: Extracted from PDF
- Payment Date: Calculated (typically 20th of following month)
- Process Date: Defaults to current date
Account Assignments
- Supplier ID (OtherPartyId): Must be configured in script
- Location ID: Configurable (default: "Misc")
- Order Number: Extracted from invoice or use reference
- Reference: Invoice number from PDF
Line Item Configuration
- Item ID: Default item code for imported lines (must be configured)
- Description: Extracted from PDF line items
- Department: Optional, can be mapped from invoice fields
- Quantity: Extracted from line items
- Unit Price: Calculated or extracted
- GST Treatment: Configurable (GST inclusive or exclusive)
GST Calculation Methods
The system supports different GST calculation approaches:
GST Inclusive Amounts
When invoice amounts include GST:
Total With GST = Line Amount (as shown on invoice) GST Exc Total = Total With GST ÷ 1.15 GST Amount = Total With GST - GST Exc Total Unit Price = GST Exc Total ÷ Quantity
GST Exclusive Amounts
When invoice amounts exclude GST:
GST Exc Total = Line Amount (as shown on invoice) Total With GST = GST Exc Total × 1.15 GST Amount = Total With GST - GST Exc Total Unit Price = GST Exc Total ÷ Quantity
Multi-Page Invoice Handling
The system is designed to handle multi-page invoices:
- Processes all pages of the PDF document
- Extracts line items spanning multiple pages
- Maintains line item sequence
- Validates totals across entire document
- Provides page count in extraction results
Batch Processing
Batch Creation
Optional batch grouping for related invoices:
- Batch ID: Defaults to supplier ID or can be specified
- Batch Comment: Descriptive text for the batch
- Automatic Batching: Groups invoices by supplier
Batch Benefits
- Groups related invoices for review
- Simplifies posting process
- Maintains audit trail
- Enables bulk approval workflows
Error Handling
The system provides comprehensive error handling:
File Selection Errors
- No file selected
- Invalid file type (non-PDF)
- File not found or inaccessible
- Corrupted PDF files
AI Processing Errors
- AI service connection failures
- Invalid or empty AI responses
- JSON parsing errors
- Incomplete data extraction
Data Validation Errors
- Missing required fields
- Invalid date formats
- Mathematical inconsistencies
- Zero or negative amounts
Invoice Creation Errors
- Invalid supplier ID
- Missing item master data
- Missing department codes
- Transaction validation failures
- Database constraint violations
Customizing the AI Prompt
The extraction prompt can be customized for specific invoice formats:
Prompt Structure
1. Role definition (extraction agent) 2. Task description (extract invoice data) 3. OCR guidance (character disambiguation) 4. Field list (specific fields to extract) 5. JSON schema (output format) 6. Validation rules (totals, dates, etc.) 7. Output constraints (no extra text, no markdown)
Customization Examples
Adding Custom Fields:
Add fields to the JSON schema in the prompt to extract additional data specific to your invoices.
Changing Date Formats:
Specify date format requirements in the prompt (e.g., "yyyy-MM-dd", "dd/MM/yyyy").
Field Name Variations:
Provide alternative field names the AI should recognize (e.g., "Cust. Order No", "Customer Order", "Order Ref").
Calculation Rules:
Specify how amounts should be calculated or validated.
Implementation Workflow
Basic Implementation Steps
- Configure AI provider and API credentials
- Set default supplier ID and item codes
- Customize extraction prompt for invoice format
- Configure GST calculation method
- Set location and account defaults
- Test with sample invoices
- Review and validate created invoices
- Adjust prompt and settings as needed
Testing Recommendations
- Start with clear, simple invoices
- Verify mathematical accuracy of extractions
- Check department and item code assignments
- Validate date parsing and calculations
- Test multi-page invoice handling
- Review batch creation behavior
Best Practices
Invoice Preparation
- Use clear, readable PDF scans
- Ensure full pages are captured
- Avoid skewed or rotated scans
- Check PDF file integrity before processing
- Process similar invoice types together
Configuration
- Set appropriate default values for all parameters
- Use descriptive batch comments
- Configure supplier-specific item codes
- Validate master data prerequisites
- Document custom prompt modifications
Data Quality
- Review AI-extracted data before posting
- Verify mathematical calculations
- Check supplier ID assignments
- Validate department code mapping
- Confirm date calculations
Performance
- Process invoices in reasonable batch sizes
- Monitor AI service response times
- Handle errors gracefully with clear messages
- Log processing results for audit trails
Advanced Features
JSON Response Extraction
The system includes a helper method to extract clean JSON from AI responses:
- Navigates AI response structure
- Extracts text content from nested JSON
- Handles various response formats
- Provides error handling for malformed responses
Dynamic Field Mapping
The system can map extracted fields to invoice line items:
- Product codes to item IDs
- Reference numbers to departments
- Custom fields to standard accounting fields
- Date parsing and conversion
Calculation Flexibility
Supports various calculation scenarios:
- Zero-quantity items (single services)
- Division by zero protection
- Rounding rules for currency
- Tax-inclusive vs tax-exclusive amounts
Integration Points
The PDF Import System integrates with:
- Item Management: Item code lookup and validation
- Department Management: Department code resolution
- Supplier Management: Supplier/vendor record validation
- Transaction Processing: Invoice creation and persistence
- Batch Management: Batch header creation and tracking
- AI Vision Services: External API for document analysis
Security Considerations
- API keys are encrypted using 128-bit encryption
- File access restricted to allowed paths
- User authentication tracked for created invoices
- Audit trail maintained for all imports
Troubleshooting
Common Issues
Problem: AI extracts incorrect invoice numbers
Solution: Add specific field location hints in prompt, emphasize OCR character disambiguation
Problem: Missing line items from multi-page invoices
Solution: Ensure prompt explicitly mentions checking all pages, verify PDF page count
Problem: GST calculations don't match
Solution: Verify GST inclusive/exclusive setting matches invoice format
Problem: Department codes not assigned
Solution: Check department master data exists, verify field mapping in script
Problem: JSON deserialization errors
Solution: Check AI response format, verify date format compatibility, review JSON schema
Future Enhancements
Potential improvements for consideration:
- Automatic supplier detection and matching
- Learning from correction patterns
- Support for additional currencies
- Purchase order matching (three-way matching)
- Email-based invoice submission
- Duplicate invoice detection
- Confidence scoring for extracted data
- Interactive review and correction interface
- Export of extraction results for verification
- Batch progress tracking and reporting