Invoice capture is a growing area of AI where most companies are making their first purchase of an AI product because it is an easy to integrate solution with significant benefits.
While digitization helped automate numerous processes, mostly rule based software was used in digitization. However, invoice capture involves both reading the invoice text with Optical Character Recognition (OCR) and understanding its context with machine learning.
What is invoice capture?
Invoice capture (also called invoice capture, invoice data extraction or invoice OCR) is extracting structured from invoices so invoices can be automatically processed. Invoice capture has been the first back office process to be automated with AI for most companies.
If there is significant uncertainty about the data, a human is notified to take a look at the invoice. If data extraction is deemed to be successful, data is fed to the record keeping and payment systems
What are the benefits of invoice capture?
- helps reduce back-office costs by reducing manual effort
- allows employees to focus on higher value added activities
- reduces invoice processing errors
- allows faster turn-around time which prevents the unnecessary back and forth between suppliers and the business which consumes valuable employee time
- allows auditability: Invoice data can be stored with bounding visual boxes that show where data was extracted from the invoice. In case the company discovers that data extraction had faults, these documents can be used to understand the source of errors which can be corrected for future invoices.
What are the differences between invoice capture and OCR?
While OCR captures text, invoice capture solutions capture key-value pairs and tables which are required to auto process invoices.
Capturing key value pairs
Invoices include key value pairs such as Company name, bank account number etc. Invoice capture solutions can extract key value pairs from documents.
Most invoices include an itemized list of services or products provided. Invoice capture solutions can recognize these itemized lists and process them.
What are different types of invoice capture solutions?
In the invoice automation landscape, there are 3 types of solutions:
- Template based solutions: End-user inputs the document structure to the software. These solutions were prevalent before the recent rise of machine learning solutions. However, they are no longer relevant since
- There are many different structures for invoices and these structures tend to change over time. This results in errors.
- Using templates creates a code base that needs to be maintained
- Inputting template structures to the software is additional work and ideally automation solutions should not bring new tasks for users.
- Pre-trained ML solutions: Companies build automation solutions based on millions of invoices. These solutions are great however can run into issues when they face types of invoices they have not encountered before.
- Continuously trained ML solutions: Best solutions on the market. They are trained on millions of invoices and developers work with their customers to ensure that their solution is constantly trained on new invoices
What type of companies provide invoice capture solutions?
- Established companies in account payable management business started to provide invoice data extraction solutions. Since their solutions were the first solutions on the market, some solutions are dated and rely on templates.
- Tech giants: Amazon AWS Textract is a new comer in the field and has a competitive price of $5 for 100 pages (for 1M+ pages/month). Amazon also brings the ability to combine textract with other services like ground truth. For example, ground truth could provide human validators to check documents that Textract can not process with a high level of confidence. This combination of services could allow companies to completely outsource their document processing. Of course such combinations can also be built on top of non-Amazon companies solutions as well, since most invoice capture solutions support APIs.
- Startups that leverage machine learning to build flexible solutions. Since the increasing commercialization of AI in the last 10 years, there has been an increase in application of AI into extracting structured data from semi-structured data. Outsiders could see startups as doomed after Amazon’s entry to the business. However, startups still have a few advantages in terms of features when compared to Amazon as Rossum.ai explains below:
Welcome to the data capture club, @awscloud! If you’re looking for something beyond simple key-value pairs that can train on your feedback and end users can use, talk to us at Rossum.#RossumAI #machinelearning #OCR #textract https://t.co/BVmygxlU1d
— Rossum (@RossumAi) November 30, 2018
What is the complete list of companies that provide invoice capture solutions?
|Company||Number of employees on linkedin||Area of focus||Pricing||Largest customers||On prem solution||Automated processing rate*||Type of solution|
|Amazon AWS Textract||N/A||Document data extraction||$0.05 per page**||Roche||Possible with AWS Outposts***||Pre-trained ML|
|Coupa||1000+||B2B spend management||Template based|
|Datamolino||11-50||Bookkeeping automation||Not template based|
|Docparser||1-5||Document data extraction||$0.05 per document (up to 5 pages per document)||SMEs||N/A||Template based|
|Docucharm||1-5||Document data extraction||N/A||Continuously trained ML|
|Hypatos||6-10||Document data extraction||Deutsche Bank||Available||Continuously trained ML|
|Instabase||11-50||Document data extraction|
|pdfdata.io||1-5||Document data extraction||Template based|
|Proactis||501-1000||B2B spend management||Numerous Fortune 500||Available||80%||Continuously trained ML|
|Rossum||11-50||Document data extraction||Continuously trained ML|
|SapphireOne||1-5||ERP, CRM, DMS and Business Accounting Software||Template based|
|Tabula (open source)||Not applicable||Table extraction||Template based|
|Tipalti||100-500||B2B spend management||Continuously trained ML|
|Xtracta||11-50||Document data extraction||Available||Continuously trained ML|
** Including key value pair+table extraction at a volume of 1M+ pages/month
*** Outposts was announced in AWS re:Invent 2018 but is not yet available. Post launch, services like RDS, ECS, EKS, SageMaker, EMR are announced to be the first services to be available
How to choose your invoice capture vendor?
Ask for the false positive and manual data extraction rates. Then run a PoC to see the actual rates on the invoices received by your company.
- False positives are invoices that are auto-processed but have errors in data extraction. These are difficult to identify and can disrupt operations for example if order sizes are incorrectly extracted from the invoice. Minimizing this should be the absolute focus.
- Manual data extraction is necessary when automated data extraction system has limited confidence in its result. This could be due to a different invoice format, poor image quality or a misprint by the supplier. This is also important to minimize but there’s a trade-off between false positives and manual data extraction. Having more manual data extraction can be preferable to having false positives
Ask for how the solution learns about invoices from which it can not extract data with confidence now. Best solutions have an interface for allowing your team to help guide the solution. As your company’s employee picks the key-value pairs, the invoice capture solution takes note so it can be more confident about a similar invoice next time.
Evaluate the ease-of-use of their manual data entry solution. It will be used by your company’s back-office personnel as they manually process invoices that can not be automatically processed with confidence.
Beyond this, best practice procurement questions make sense. For example:
- How widely adopted is their solution? Do they have Fortune 500 customers?
- Are their customers happy with their solution and support? Could be good to ask an acquaintance from a company that is already using their solution. Since invoice automation is not a solution that would improve marketing or sales of a company, even competitors could share with one another their view of invoice automation solutions
- What are the options to integrate the solution? Is IT on-board with the integration approach?
- What is their Total Cost of Ownership (TCO)? Different solutions use different units of pricing (e.g. price per page or price per document) which makes this comparison difficult. However, using a sample from your archives, you could have an estimate of the cost
If you have questions, feel free to contact us of course: