The purpose of data classification is to ensure that we know exactly what data we have, where it is located, and how sensitive the data is. Yet, despite how crucial it is to have this knowledge, it is an area of data security that is often overlooked. And then we have Data Loss Prevention (DLP). DLP refers to the methods we use to prevent the unauthorized disclosure of internal data – especially data that is considered sensitive. Data classification and DLP can work together to keep our confidential data out of the wrong hands.
What Does Data Loss Prevention (DLP) Software Do?
DLP solutions differ from vendor to vendor. Some include an Intrusion Detection System (IDS) to monitor network traffic for malicious activity, while others may be a full-blown Security Information and Event Management (SIEM) solution, which can detect and correlate events from applications and network hardware.
However, given that the majority of data breaches are caused by either malicious or negligent employees, we’ve seen a greater focus on User Behavior Analytics (UBA) in recent years. It’s also worth noting that some DLP solutions come with data discovery and classification tools out of the box. Of course, in an ideal world, you will have complete control and visibility over all parts of your network, including the data, applications, hardware, and any cloud services you use.
For many organizations, though, this is not a realistic goal, as most security teams operate on a limited budget, and have a limited number of trained personnel. As such, it is important to research different vendors, to see what they have to offer, and make a decision based on the resources that are available to you. Below are some of the key features offered by DLP solutions.
Securing data at rest: DLP can be used to monitor access controls and ensure that all sensitive data is encrypted.
Securing data in transit: This will include analyzing network traffic to look for any violations of security policies, such as the transfer of unencrypted sensitive data to an unauthorized location.
Securing data in use: A DLP solution can be used to monitor user behavior. A UBA solution will establish typical usage patterns for each user, and automatically detect and respond to events that deviate from these patterns. A sophisticated UBA solution can detect and manage inactive user accounts, failed login attempts, and potential ransomware attacks, and can instantly generate pre-defined reports that can be used to satisfy regulatory compliance requirements. They can also detect and respond to changes made on a variety of cloud platforms. UBA solutions are a good choice for many organizations as they are relatively affordable, easy to implement and use, and are probably the most effective way of protecting sensitive data.
Endpoint security: In this case, a DLP solution will essentially act as a gateway, to ensure that all endpoints connecting to the network are secure. It can also be used to detect, block or quarantine unencrypted sensitive data leaving the network. Examples of endpoint security include antivirus software, firewalls, Intrusion Prevention Systems (IPS), and Security Information and Event Management (SIEM) solutions.
How Data Classification Works
Fortunately, data classification is not something you need to do manually, especially since there are a number of affordable, easy-to-use tools out there that can automatically discover and classify a wide range of data types, such as Personally Identifiable Information (PII), Protected Health Information (PHI), Payment Card Information (PCI), and any other information that you are legally required to protect.
In fact, most data classification solutions will come with default settings that are optimized to meet the requirements of a wide range of data privacy laws.
Data classification solutions work by injecting meta-data into the data we want to classify. Such meta-data can include its level of sensitivity, the file format, the author of the data, its time of creation, and any other relevant information that can be used by our chosen DLP solution. In order for the DLP solution to use this meta-data, the administrator must first set up access controls.
Authentication and Authorization
DLP solutions rely on protocols that manage authentication and authorization. Authentication is used to determine that the user is who they say that are, and authorization is used to determine which resources the user has access to.
Authentication and authorization are tightly coupled to ensure that the server knows who is trying to access the requested resource. Multi-user operating systems, such as Ubuntu, Windows, and Unix, allow the administrator to set up access controls.
Windows Server comes with Active Directory (AD) – a directory service that manages authentication and authorization and allows the administrator to define Access Control Lists (ACLs) and setup logical groupings of domains, users, devices, and so on.
A DLP solution essentially acts as a layer between the client (the user who is requesting access to the resource), and the server (or multi-user operating system) that will authenticate the request.
Of course, setting up access controls is just the beginning. Your chosen DLP solution, which mediates between the client and the server, will need to use your ACL to detect and respond to anomalous events, preferably in real time.
You are going to want a dashboard that provides intuitive reports on current access permissions, including a detailed historical account of all changes that have taken place, in order to carry out a forensic analysis in the event of a security incident and to comply with the relevant data privacy regulations. You need to know exactly what data has been accessed, when, and by who.
Guidelines for Data Classification
We classify data to help us to determine what baseline security controls are appropriate for safeguarding that data. The process of classification starts by carrying out a risk assessment, which may take some time, depending on the types, volume, and location of the data you store.
Essentially, we need to know what the impact would be, were a particular piece of data to be disclosed to the public. While there isn’t a strict set of rules to follow, a common classification schema would include Public, Private and Restricted. Naturally, data classified as the public would result in little to no harm to the organization.
The unauthorized disclosure of private data would result in a moderate level of risk, while the disclosure of restricted data could be potentially catastrophic. Whichever classification schema you choose to adopt, make sure that you keep it simple so that employees can correctly apply them. That said, some sophisticated data classification solutions can automatically classify the data at the point of creation.
How Data Classification Helps with Data Loss Prevention
If your ultimate objective with your data security strategy is to prevent breaches involving sensitive data, then the logical first step should be to determine exactly where that data resides. You cannot secure what you cannot see. Data discovery should be coupled with a robust and logical data classification strategy so that your sensitive data is grouped into categories that will help you prioritize risk.
Data classification can help your security team determine where they need to focus their monitoring and security efforts, and there are numerous methods of categorization that can help you do this. For example, data can be categorized by how it relates to specific compliance regulations that your organization has to adhere to. The driver’s license of a French national, for example, could be filed away under GDPR. By doing this, you can ensure that you can quickly identify all of the data within your data stores that relates to GDPR and apply the appropriate security controls.
Another method of data categorization could be focused specifically on risk; how would a breach involving this data affect your organization from a compliance and security standpoint? The impact to your business, for example, of losing a healthcare record of a US citizen could be devastating; likely resulting in a breach of HIPAA compliance and a hefty monetary penalty if you are an enterprise.
Data classification, therefore, serves as a starting point for your data loss prevention strategies. Once you have located and categorized your most sensitive and most at-risk data, you can then determine who has access to it and what changes are being made to it. This will help ensure that you can reduce risk where it matters most.
How Lepide Helps with Data Classification and Data Loss Prevention
Lepide enables organizations to discover and classify their sensitive data across a variety of data stores. Data can be classified according to a list of predefined classification rules (such as a specific compliance mandate or data type), and custom rules can be created to suit specific requirements.
However, data discovery and classification alone is just the first step of data loss prevention, as we touched on above. Lepide’s Data Classification software also allows you to determine who has access to your most sensitive data and what your users are doing with it; are they copying, moving or modifying files that contain important information? Should they even have access to these files in the first place?
If you would like to see how the Lepide Data Security Platform helps with Data Classification and Data Loss Prevention, you can schedule a demo with one of our engineers today or download free trial.