By Ed Wingate, Head of Strategic Partnerships at Aluma

Data security is a never-ending evolution of risk, technology and strategy. In my last blog, I discussed the need to consider unstructured data found in documents as a pressing information governance concern. Data security professionals have traditionally focused their attention on structured data; but a great deal of sensitive information is stored in documents, which are inherently unstructured. Don’t let this be an open back door to increased risk.

Digital Documents, Digital Security

Digital documents flow through any enterprise in a variety of business and content management systems. They often pile up in private repositories within SharePoint or other Cloud storage services. Users print, copy and scan documents both at the office and while working from home. As a result, a great deal of unstructured and highly sensitive information is allowed to fly under the radar of data security.

Advanced Tools for Document Security

Applying Artificial Intelligence and Machine Learning techniques can help pinpoint potential risks. New document automation tools make it much easier to classify digital documents than it used to be. Low-code and microservices architectures now allow for the rapid development and deployment of much more intelligent data privacy rules engines to be applied to documents. Natural Language Processing applications learn as they go, and can be trained in just a few minutes to recognize sensitive data and expose vulnerabilities.

A great deal of sensitive information is stored in documents. Don’t let this be an open back door to increased risk.

In the Real-World

These techniques are needed today. Most business leaders feel their cybersecurity risks are increasing and point to back doors and application vulnerabilities as one of their top worries. Concerned about leakage through email attachments? Pass messages through a Personal Identifiable Information filter service; interrogating not just the text of the email, but the PDFs and other attachments. Concerned about remote workers using their home printer and putting private information at risk? Apply a simple document classifier in the print driver or spooler to verify whether or not the document should indeed be printed. Concerned about sharing contents of a Document Management system with a subcontractor or vendor? Automatically create a sanitized version of every file in a mirrored system; for every file added, the system can create sanitized copies instantaneously.

In all these cases, just insert a call to a properly trained document classification service and extend your information governance practices to those everyday interactions that represent the real threat to information security.

Shut the Door

It’s not easy applying traditional security protocols to the unstructured data within documents. The answer lies in applying emerging approaches of machine learning and natural language processing to create the structure required to systematically close the “document back door” to information security.


More from the blog