Commercial Consumption of Document AI – How Legacy and Newcomer IDP manufacturers handle LLMs and SLMs

One thing that remains constant about Artificial Intelligence and Machine Learning is that it’s always changing. Features, functionality, and pricing seem to change weekly, and consumers are left to wonder what to pursue. Intelligent Document Processing (IDP) has always been dominated by big players like OpenText, ABBYY, Kofax (now Tungsten), and Hyland. Over the last 10 years, legacy high-end document capture systems like Input Accel, Brainware, ReadSoft, KTM, DataCap, and others have been sold and resold between so many manufacturers it’s hard to remember who owns who.

The general public’s perception of Artificial Intelligence has shifted greatly, leading to increased demand for Machine Learning and AI-powered solutions. This offers massive opportunity for manufacturers and solution architects as well as challenges for how to best permeate the market. As businesses try to navigate integrating Machine Learning and AI for their transactional business documents, it’s difficult for them to know where to start. The number of options regarding functionality, cost, setup, infrastructure, and computing is staggering. Do they select the stability of a stalwart such as Tungsten or the ingenuity of a newcomer like Rossum? Does their document automation project fall under the requirements of a pre-built or custom extraction?

How Do Legacy Providers Compete with Cloud Providers and Start-Ups?

The old guard struggles to keep up with a crop of newcomers in the industry and vendors such as Base64.ai, Datamatics, Parashift, Rossum, and Hyperscience. They are quickly outpacing legacy organizations, and their collective key differentiator is they all use a Cloud-first methodology, with a more agile infrastructure and framework. While some have created their own OCR engines and neural large language models, most leverage engines from the large hypervisor Cloud organizations such as Microsoft Azure, Google, and AWS. For these manufacturers, it’s advantageous to hitch their wagons to Microsoft or Google regarding Generative AI, rather than attempt to out develop them. For example, Kofax TA now has integration into OpenAI, and products like Datamatics and FormX.Ai provide users choice on pursuing OCR engines from either AWS or Azure.

Legacy IDP providers and some niche IDP newcomers typically charge $0.20-$0.50 per page. Given that many also charge large, upfront page volume purchases, as well as professional services, they’ll find it hard to compete with the large cloud hypervisors.

Azure, AWS, and Google all own the Cloud, infrastructure, storage that these legacy IDP players and newcomers are hosted on. The hypervisors can charge a penny per page while ABBYY Vantage charges $0.20 per page.

Offshore resources should also be of high consideration regarding IDP analysis, specifically when the use case is data entry reduction or avoidance. At times, it can be more cost effective to offshore the work at $0.10 per page rather than process in house at $0.20 per page (not including infrastructure and IT labor costs). Even with an IDP solution, there is always a level of human intervention with lower automation rates than human supplied data entry labor.

Solution Architects are in a unique position. Trends constantly change, features update as soon as they’re created, and others become deprecated or replaced. A recent example of this is regarding Microsoft Azure AI Document Intelligence. They introduced a number of new services and deprecated others so quickly it grew difficult for architects to even utilize these services. When architects design solutions around Cloud products, the products they use change so frequently that delivering an end-result may require costly change orders to use new AI features.

How do AI data model methodologies play into cost and architecting a solution?

A growing trend in consumer AI is the change in both discussion and the technology landscape from Large Language Models (LLM) to Small Language Models (SLM) and Large Action Models (LAM). Both are newer concepts that are gaining momentum for a handful of reasons. For one, LLMs are expensive, require lots of expensive infrastructure and energy to process through these large models.

For example, Microsoft AI Document Intelligence charges $0.01 per page for pre-built models like invoices, receipts, contracts, business cards and tax forms, and charges $0.05 per page for custom models. The reason is use case specific, smaller context-based knowledge bases or models where there is a defined result set is cheaper to run, learn, and function over custom models where the output can be applicable to anything. Cost is always going to be a driving factor for organizations assessing IDP and AI solutions in their business processes.

Using Small Language Models (SLM) for case specific requirements allows for swarm sourcing, leveraging the global footprint of Microsoft to build the models. This allows them to offer a very competitive processing cost, making AI technology more accessible to organizations.

Designing a solution with these tools means you have to pivot quickly if models become discontinued. With each version Microsoft publishes, a new API version releases for the custom or pre-built models. Solution architects need to be aware of these changes, as they often require annual model re-building to support new features, as well as deprecation of existing features.

Just as one needs a realtor who understands the nuances of the housing market to help sell a house, organizations require an expert IDP architect that understands the nuances and changes in the industry. Hypervisors, for example, are swiftly leading the charge in facilitating much of these changes. Microsoft, AWS or Google have a global footprint, can gain access to more document samples than anyone, and have their own Cloud capable of storage and computing resources at much lower costs than it offers its commercial accounts.

This may be an unfair assessment, given that Microsoft is an Application Programming Interface (API) first Software Development Kit (SDK), rather than a fully functional IDP capture solution with UI/UX, auto learning, and document manipulation features. But IDP platforms, like Kodak Alaris Info Input, have figured out how to wrap Microsoft, AWS, Google, Hyperscience, and other hypervisors around an industrial strength batch capture tool. More and more IDP manufacturers are declining to develop their own neural AI and Machine Learning, opting to replace it with Original Equipment Manufacturer (OEM) engines from these hypervisors instead.

Pricing has been significantly impacted in the industry by hypervisors as well. In the last few weeks alone the Microsoft Azure AI Document Intelligence price list has grown from three items to nine items. Older services like custom extraction have been reduced by 40% from $0.05/page to $0.03/page, and new additions like training new models have increased from cost-free to $3/hour.

The shift towards Cloud-first methodologies and the integration of robust AI engines from hypervisors like Microsoft, Google, and AWS are transforming the industry, offering new opportunities and challenges. As the competition between legacy systems and innovative newcomers intensifies, organizations must stay informed about the latest developments and pricing changes to make strategic decisions that align with their goals. The expertise of skilled IDP architects becomes invaluable in navigating this ever-evolving market, ensuring that businesses leverage the most effective and cost-efficient AI solutions available.

About the Author

Brent Wesler, is the VP of Strategic Technology and Digital Automation at PiF Technologies and focuses on strategic business consulting practices such as Machine Learning, Artificial Intelligence, robotic process automation and workflow automation. He has spent the last 25 years within solution consulting, presales, architecting and professional services leadership roles implementing services around Cloud, document and workflow automation software.

His experience includes VP of Professional Services at Westbrook Technologies (acquired by DocuWare), VP of Business Development and Professional Services for Square 9 Softworks, a manufacturer of ECM, web forms, and IDP solutions, and Global Worldwide Presales Solutions Engineer for Kodak Alaris.

Through his two decades within the document automation space, both on the manufacturer and value-added reseller side, Brent has seen the industry go through an extreme technology inflection. As a thought leader, speaker, podcaster and regular contributor to industry publications, Brent focuses on business process re-engineering with customers and the technology that allows for its automation.


📨Get IDP industry news, distilled into 5 minutes or less, once a week. Delivered straight to your inbox ↓

Share This Post
Have your say!
10