Last Updated on January 8, 2019 by Ashok Kumar
It is crucial that companies across the globe understand the difference between structured and unstructured data, if they want to remain compliant with the many data protection laws and regulations that govern them.
Structured data, as the term would suggest, is data that is structured in a deterministic fashion. An obvious example of structured data would be a database, where each record has a key, which can be used to quickly find and retrieve the data. Each key will be associated with fields, such as name, address, date-of-birth, Social Security number, and so on.
Unstructured data, on the other hand, is data that doesn’t fit into a particular schema, which may include documents, applications, media files, and so on. Naturally, unstructured data is harder to secure, as we have less understanding about what it is, where it is, who it relates to, how it is shared, and so on. Not only that, but unstructured data is primarily managed and maintained by humans, who make mistakes and often fail to adhere to company policies in an attempt to cut corners.
While there has been some dispute over the percentage of unstructured data companies store on average, let’s just say that it is a lot. To add to this, a lot of structured data can end up being stored in an unstructured format, if, for example, an administrator was to copy and paste some information stored in a database into a file, and share that file with a colleague. What we need is an innovative and multi-pronged approach to securing our unstructured data.
Security Starts with Data!
Naturally, in order to protect our sensitive data, we need to know exactly where it is located. Not only that but identifying where our sensitive data resides is a core requirement of the GDPR, and other data protection regulations. Fortunately, technologies exist which can automatically discover, classify and encrypt sensitive data, such as PII, PHI, and PCI. They are able to scan emails, spreadsheets, PDFs, and so on, and create a map of where the sensitive data resides by adding metadata to the documents, which acts as a type of fingerprint.
It is also a good idea to implement a solution which can manage duplicate data sets. As you can imagine, having multiple copies of the same data located in different parts of our network will make it even harder to secure. Data de-duplication solutions scan for duplicates and replace the duplicate data with a reference/pointer to the original data.
Data discovery and classification not only allows us to track the movements of unstructured data but allows us to setup policies which dictate how this data can be accessed, moved, modified or deleted, and by whom. We can then use those policies to assign access privileges to the data, and then audit changes to those privileges. While it is theoretically possible to keep track of such changes using the native server logs, doing so would be an arduous task.
It would make a lot more sense to monitor files, folders, user accounts and mailbox accounts using a sophisticated DCAP (Data-Centric Audit & Protection) solution. These solutions also provide real-time alerts and customizable reports, which businesses can use to satisfy compliance requirements. Finally, Data Loss Prevention (DLP) solutions can also help us keep track of our unstructured data by identifying unencrypted sensitive data as it leaves the network, and either block, quarantine or encrypt the data before it is allowed to be forwarded to the intended recipient.