By Ed Wingate

Data security professionals have traditionally focused their attention and effort on structured data. Because the information is stored in a database, and identified using a predefined and fixed schema, it’s been relatively straightforward to devise computer systems and business rules that govern access and use. A suite of database tools has made the deployment of all the right security best practices relatively easy.

But not all enterprise information is structured. The reality is that a great deal of an organization’s intellectual property and sensitive information is stored in documents, which are inherently unstructured. Documents don't adhere to a set structure and are typically text-heavy. The irregularities and ambiguities make it difficult to apply traditional security protocols as compared to data stored in fielded form in databases. However, things like contracts, letters, and corporate memos all contain sensitive or proprietary information such as names, numbers, and other facts that demands the same security acumen be applied as other types of corporate data.

A great deal of sensitive information is stored in corporate documents and it demands the same security acumen be applied as other types of corporate data.

New Approaches to Document Automation and Security

The answer lies in new and developing approaches to document automation. Advancements in AI and ML make it much easier to train an engine to classify documents and pin-point sensitive data. Using the latest Natural Language Processing algorithms, training can be accomplished with a mere handful of samples of each document type.

Searching and Finding Sensitive Data

Identifying potentially sensitive information used to be a very cumbersome process based on applying simplistic regular expression code against text files. But now, low-code environments and microservices architectures allow for much more rapid development of much more intelligent PII and CI rules engines applied to documents. With the right samples, a classification engine can be trained in minutes to recognize sensitive documents. And within a few hours of configuration, extraction engines can be trained to identify, highlight, and potentially redact sensitive or private information anywhere in any process.

New Tools Address the Governance Shortfall

Until now, data security professionals have not had the tools to easily address the information governance shortfall inherent in unstructured documents. But today is a new day in document security. The appropriate application of AI to document recognition and data extraction will go a long way in closing the gaps. It starts with the acceptance of the free-wheeling chaos of unstructured data and the adoption of document automation technologies that find it, classify it, and protect it.

Ed Wingate is Head of Strategic Partnerships for aluma. Before that he was a member of HP's Security Advisor Board. Ed earned an engineering degree from Princeton and an MBA from Harvard Business School.


More from the blog