Invoice capture is a growing area of AI where most companies are making their first purchase of an AI product. This is because invoice capture is an easy to integrate solution with significant benefits.

While digitization helped automate numerous processes, mostly rule based software was used in digitization. Invoice capture software is different. Invoice capture involves both reading the invoice text with Optical Character Recognition (OCR) and understanding its context with machine learning.

We answered all your invoice capture related questions:

What is invoice capture?

Invoice capture (also called invoice data extraction or invoice OCR) is extracting structured data from invoices so invoices can be automatically processed.  Invoice capture has been the first back office process to be automated with AI for most companies.

If there is significant uncertainty about the data, a human is notified to take a look at the invoice. If data extraction is deemed to be successful, data is fed to the record keeping and payment systems.

Companies need to set up quality assurance processes in any automated process where errors can be costly. Invoice capture is no exception. To ensure that wrong payments are not made, suspicious invoices and invoices that require payments beyond a certain limit would need to be reviewed by humans.

What are the benefits of invoice capture?

Invoice capture 

  • helps reduce back-office costs by reducing manual effort
  • allows employees to focus on higher value added activities
  • reduces invoice processing errors
  • allows faster turn-around time which prevents the unnecessary back and forth between suppliers and the business which consumes valuable employee time
  • allows auditability: Invoice data can be stored with bounding visual boxes that show where data was extracted from the invoice. In case the company discovers that data extraction had faults, these documents can be used to understand the source of errors which can be corrected for future invoices.
  • improves compliance: Invoices hold numerous data fields which are traditionally not captured manually. By capturing all data on an invoice, invoice capture software enables companies to run compliance checks on invoice data.

What are the differences between invoice capture and OCR?

While OCR captures text, invoice capture solutions capture key-value pairs and tables which are required to auto process invoices.

Capturing key value pairs

Invoices include key value pairs such as company name, bank account number etc. Invoice capture solutions can extract key value pairs from documents.

Source: Amazon AWS Textract

Capturing tables

Most invoices include an itemized list of services or products provided. Invoice capture solutions can recognize these itemized lists and process them.

Source: Amazon AWS Textract

What are different types of invoice capture solutions?

In the invoice automation landscape, there are 3 types of solutions:

  • Template based solutions: End-user inputs the document structure to the software. These solutions were prevalent before the recent rise of machine learning solutions. However, they are no longer relevant since
    • There are many different structures for invoices and these structures tend to change over time. This results in errors.
    • Using templates creates a code base that needs to be maintained
    • Inputting template structures to the software is additional work. Ideally, automation solutions should not create new manual tasks for users.
  • Pre-trained machine learning (ML) solutions: Companies build automation solutions based on millions of invoices. These solutions are great however can run into issues when they face types of invoices they have not encountered before.
  • Continuously trained ML solutions: Best solutions on the market. They are trained on millions of invoices and developers work with their customers to ensure that their solution is constantly trained on new invoices.

What type of companies provide invoice capture solutions?

Established account payable tech companies

These companies were to first to provide invoice data extraction solutions. Since their solutions were the first solutions on the market, some solutions are dated and rely on templates.

Tech giants

Amazon AWS Textract is a new comer in the field and has a competitive price of $5 for 100 pages (for 1M+ pages/month). Amazon also brings the ability to combine Textract with other services like ground truth. For example, ground truth could provide human validators to check documents that Textract can not process with a high level of confidence. This combination of services could allow companies to completely outsource their document processing. Such combined services can also be built on top of other companies’ solutions as well since most invoice capture solutions support APIs.

Startups

Startups leverage machine learning to build flexible solutions. Since the increasing commercialization of AI in the last 10 years, there has been an increase in application of AI into extracting structured data from semi-structured data. Outsiders could see startups as doomed after Amazon’s entry to the business. However, startups still have major advantages when compared to Amazon:

Hypatos, one of the startups in this space, pulled a UiPath. As you may remember, UiPath was the first RPA company to introduce a free version of their product in 2016. 3 years down the line, they are the most valuable RPA company with a latest valuation of ~7 billion as of April 2019.

Hypatos introduced a free version of their tool called Community Edition in November 2019. Though the free version produces lower accuracy products than their paid product, Subscription Edition, it could still be good enough for most use cases.

Rossum.ai, another startup in this space, explains some of the advantages of their offering below:

What is the complete list of companies that provide invoice capture solutions?

CompanyNumber of employees on linkedinArea of focusPricingLargest customersOn prem solutionType of solution
Amazon AWS TextractN/ADocument data extraction$0.05 per page**RochePossible with AWS Outposts***Pre-trained ML
Coupa1000+B2B spend managementTemplate based
Datamolino11-50Bookkeeping automationNot template based
Docparser1-5Document data extraction$0.05 per document (up to 5 pages per document)SMEsN/ATemplate based
Docucharm1-5Document data extractionN/AContinuously trained ML
Hypatos11-50Document data extraction & advanced processingCommunity Edition is freePwC
Deloitte
EY
Schwarz Gruppe
AvailableContinuously trained ML
Infrrd100-500Document data extraction
Instabase11-50Document data extraction
pdfdata.io1-5Document data extractionTemplate based
Proactis501-1000B2B spend managementNumerous Fortune 500AvailableContinuously trained ML
Rossum11-50Document data extractionN/AContinuously trained ML
SapphireOne1-5ERP, CRM, DMS and Business Accounting SoftwareTemplate based
Tabula (open source)Not applicableTable extractionTemplate based
Tipalti100-500B2B spend managementContinuously trained ML
Xtracta11-50Document data extractionAvailableContinuously trained ML
* According to case studies
** Including key value pair+table extraction at a volume of 1M+ pages/month
*** Outposts was announced in AWS re:Invent 2018 but is not yet available. Post launch, services like RDS, ECS, EKS, SageMaker, EMR are announced to be the first services to be available

How to choose your invoice capture vendor?

Choose a provider that supplies a solution in line with your company’s data privacy policies. Your company’s data privacy policy can be a show-stopper to using external APIs such as Amazon AWS Textract. Most providers offer on-premise solutions so data privacy policies would not necessarily stop your company from using an invoice capture solution.

Ask for the false positive and manual data extraction rates. Then run a Proof of Concept (PoC) project to see the actual rates on the invoices received by your company.

  • False positives are invoices that are auto-processed but have errors in data extraction. These are difficult to identify and can disrupt operations. For example, incorrect extraction of payment amounts would be problematic. Minimizing this should be the absolute focus.
  • Manual data extraction is necessary when automated data extraction system has limited confidence in its result. This could be due to a different invoice format, poor image quality or a misprint by the supplier. This is also important to minimize but there’s a trade-off between false positives and manual data extraction. Having more manual data extraction can be preferable to having false positives.

Leverage a PoC to measure the automation rate they expect to achieve. This depends on the number of fields you expect to capture from the documents. A typical set of ~10 fields including items like purchase order ID, vendor name, vendor name etc. can enable data entry into ERP and payments. Best practice vendors achieve ~80% STP by extracting all of these ~10 fields with almost no errors ~80% of the time. Though there may be errors from time to time, manually checking the largest payments can ensure that no significant wrong payment slips through the net.

Ask for advanced processing options provided by the vendor. Extraction is the first step in data collection, it needs to be followed by data processing in most cases. For example, invoices need to be checked for VAT compliance (e.g. domestic invoices without VAT need to explain why VAT is excluded) and failure to do so could result in significant fines for the company depending on the country. Hypatos provides numerous advanced processing options, however we have not seen other vendors provide such features as they focus exclusively on data extraction.

Ask for how the solution learns about new invoices. Best solutions have an interface for allowing your team to help guide the solution. As your company’s employee picks the key-value pairs, the invoice capture solution takes note so it can be more confident about a similar invoice next time.

Evaluate the ease-of-use of their manual data entry solution. It will be used by your company’s back-office personnel as they manually process invoices that can not be automatically processed with confidence.

Beyond this, best practice procurement questions make sense. For example:

  • How widely adopted is their solution? Do they have Fortune 500 customers?
  • Are their customers happy with their solution and support? Could be good to ask an acquaintance from a company that is already using their solution. Since invoice automation is not a solution that would improve marketing or sales of a company, even competitors could share with one another their view of invoice automation solutions.
  • What are the options to integrate the solution to your company’s systems (e.g. ERP)? Is IT on-board with the integration approach?
  • What is their Total Cost of Ownership (TCO)? Different solutions use different units of pricing (e.g. price per page or price per document) which makes this comparison difficult. However, using a sample from your archives, you could have an estimate of the cost.

If you have more questions, feel free to contact us of course:

Let us find the right vendor for your business

Featured image source and other image sources

How useful was this post?

Click on a star to rate it!

Average rating / 5. Vote count:

What did you like about this post?

Appreciate it if you leave your name and surname so we can publish your review as a testimonial

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*