By Ed Wingate, Head of Strategic partnerships at Aluma
Capturing information from business documents is an old and established exercise. We have used technology in this space for over 100 years, yet as we progress through the second decade of the 21st century, document capture is still a task that requires more manual, or human, intervention than it should.
But modern technology is coming to the rescue. Modern systems enable us to use AI and machine learning to go beyond merely reading text — to understand it, place it in context, and learn from mistakes. This new capability is what analyst firm Deep Analysis is calling “cognitive capture.”
The challenge with capturing information
The traditional route to capturing information from paper-based documents was via a technology known as optical character recognition (OCR). The first OCR systems were developed in the early 1900s and focused on recognizing the curves and shapes of printed letters on documents. These techniques subsequently found themselves embedded in the OCR applications now used within many organizations around the world. The challenge with OCR, though, is that it only works well in very specific, limited circumstances. It is essentially a dumb-operator.
OCR doesn’t know that “Z010” is not a valid year without help from a set of associated business rules —and defining all of the potential rules to manage so-called “exceptions” is a task that rarely gets finished.
As a result, OCR often suffers from a lack of trust — users know it will get things wrong, they just don’t know what or when. So ultimately, everything that the OCR engine processes subsequently gets checked by a human. This paradox removes many of the productivity gains that OCR can potentially deliver.
Traditional capture — Six Steps to Do One Job
But OCR is only part of the traditional capture process. The broader structure for converting a human-readable document into machine-readable documents uses OCR and other technologies to capture, store, categorize, and index documents.
According to Deep Analysis, this manifests itself in a six-step capture process, as seen below.
Six-step capture process
Each part of this process has a distinct requirement.
- Input – Initiates document processing using various methods such as document scanning, hot-folder watching, etc.
- Classify – Identifies different document types based on organizational definitions.
- Recognize – Detects core attributes and the overall text.
- Extract – Locates and extracts all structured data within the document.
- Verify – Authenticates the extracted data to ensure accuracy.
- Output – Passes the document image and associated metadata to other applications.
The interesting thing about traditional capture is that most vendors tried to focus on the whole process. This choice required many discrete capabilities, spreading technical and development resources over a wide range of different “tools,” and often failed to deliver best-in-class capabilities for any of the six components.
Cognitive Capture - Smarter By Half
The new era of capture is known as cognitive capture — and reduces the steps involved in the capture process by half. Instead of breaking the challenge down into discrete technical chunks, each of which follows blindly on from the one before, cognitive capture works in a manner more akin to humans. It acquires content, (repeatedly) understands it, then figures out what to do with it.
Cognitive Capture - Smarter By Half
The cognitive capture approach has several advantages over traditional document capture techniques:
- The approach combines many of the original capture functions together — which streamlines how they work together, allows for greater automation, and enhances the focus on solving the business problem, not the technical one.
- The combinatorial concept continues while understanding the content. Why focus on just OCR or just machine vision? Cognitive capture uses multiple “understanding” tools to deliver the best results for any given use case. In addition, the advanced models in this new phase constantly change and improve through reinforced learning.
- The simplified approach shields the technical complexity — which is perfect for most users. As a result, the overall process requires minimal manual intervention — a vast improvement over traditional capture.
- Integrating downstream tools and processes that use the output from capture can be complicated. The simple logic to cognitive capture provides a single point of integration — offering more fluid and robust interaction with business processes, content repositories, and business users alike.
All or Nothing - Or Maybe Something in the Middle
What we see in the new world of capture is a focus. End-users are focused on getting better results from capture, with less manual effort, more consistently. Vendors can focus on one or more of the three newly defined areas of acquire, understand, and integrate — providing clarity and the ability to deliver results, not the speeds and feeds style metrics that capture solutions often focus on.
At Aluma, we focus very much on understanding content. We receive “raw” content from several sources that we then read, classify, and ultimately understand: These content sources include:
- Leading hardware vendors to receive content directly from scanners and MFPs
- “Traditional” capture tools to receive tiff or pdf files for further enrichment
- Email systems, archives, ECM/DM systems, FTP, and of course, regular folder structures.
All of these provide the raw materials for understanding. We will leave the intricacies of “understanding” to another blog - but the automated classification, entity extraction, and creation of both document and metadata for consumption further downstream are what we do at Aluma.
The distinction between the very functional areas of cognitive capture allows Aluma to focus on understanding — and we believe this is vital to ensure the best possible results.
Think of an operation in a hospital.
You have different specialists to deliver the anesthetic, do the operation, and provide post-operative care. Would you want the same person doing all three parts of the process? Probably not. Personally, I want an expert for each particular job, not someone who can “sort of” do all of them.
That is the unwritten but vital benefit that cognitive capture is providing.
In the same way that the Content Services Platforms movement in the wider information management space is redefining how traditional ECM and DM vendors operate, cognitive capture will do the same for arguably the most critical part of the information lifecycle.
And at Aluma we can’t wait!