Data Compliance
To solve privacy issues, companies must take a more thoughtful, proactive approach to protect data before it's ever shared across your organization or accessed by outside sources.
Eric Schrock
Mar 11, 2019
Share
This article originally appeared on Forbes.com as part of Delphix CTO Eric Schrock’s ongoing column. See the original post here.
Too many companies have experienced firsthand that insufficient enterprise data security has serious consequences. From Facebook's stream of data lapses to last month's massive "Collection #1" breach, the importance of data and the need to protect it has skyrocketed from an afterthought to a top priority for executives across industries. (Full disclosure: Facebook is a customer of Delphix.)
Leaders have rightly turned their attention to strengthening data security controls that ensure only the right people have access to sensitive data. But they still struggle with managing the inherent risk carried by data should those security controls fail — especially when data is shared with third parties.
When data flows outside the company, you lose the chain of custody and the risk of breaches or misuse rises substantially. Relying on legal contracts is insufficient in today's world of heightened privacy and increasing regulation. Although restricting access to data or shutting down third-party integrations seems like the best course of action, it's ultimately just a band-aid. To solve privacy issues, companies must take a more thoughtful, proactive approach to protect data before it's ever shared across your organization or accessed by outside sources.
First, define the nature of the controls, people and risk associated with the data you're gathering. When MoviePass faced backlash from the use of location data within its mobile app, it begged the question: What reviews and controls exist to ensure that data is being secured and used appropriately? While the General Data Protection Regulation (GDPR) has forced companies to give users more control over how their data is shared, companies shouldn't wait for the next regulation to drop. Instead, businesses must learn to transparently communicate their data practices, gather only the minimum data required and ensure that data is secured and used appropriately.
Second, understand who has access to what types of data within your company and what that risk presents. Data is always changing in content and structure. It moves and transforms as it's split apart and is combined and distributed to those that need it. Unless you think proactively about where data should be, it will inevitably end up where it's not supposed to be. Before blindly making data available, make sure you answer key questions such as: Who needs access and why? What sensitive information is contained within the data? What would be the impact if security controls failed or an authorized user acted unethically? Once you open Pandora's box, there's no going back.
Finally, follow where data flows outside the company. Once data leaves your hands, you lose chain of custody forever to a third party. That third party may lack the same security controls you've worked so hard to put in place. Start by understanding the purposes for which that data is to be used and how long the data should live there. Figure out how data needs to be obfuscated and redacted, but know that data privacy is not black and white. Don't assume that just because you've eliminated standard personal information like credit cards and social security numbers that your data is risk-free. Think carefully about all the ways sensitive information could be derived from the dataset.
When providing data to third parties, start by ensuring that you're only sending the minimal information necessary for them to get their job done. For many types of behavioral analysis, for example, reams of personal data can simply be replaced with a token that uniquely identifies a user within the dataset. But this can be challenging when you’re not sure what data might be relevant to the analysis at hand. When in doubt, less is more. Start small and gradually add more data as you determine what is really needed.
While strong security controls are a given, redacting and obfuscating sensitive data is equally critical to effectively managing risk. Data masking is an important method for protecting sensitive data from malicious actors. With data masking, you can still have realistic data points that are anonymized. That means wrongdoers can't disrupt a business even when they bypass other security measures and extract your data. When data has been masked, it's like stealing a wallet that has no cash inside.
But masking is only part of the solution because even anonymized data carries the risk of reverse-engineering sensitive information. For example, let’s say you’re sending data to an HR agency to do a salary analysis. This may only require job titles and annual pay figures for employees, which you’ve shared in a way that protects the user privacy — or so you thought. With only one CEO, it’s trivial for someone to learn their salary from this data. Additional context opens the floodgates to de-anonymize data, as researchers have shown with real-world municipal data sets. What might appear as innocuous anonymized location data can be combined with outside knowledge (someone’s daily routine, for example) to then derive private information, such as where they went last Friday night. Ensure that datasets are sufficiently limited or redacted to mitigate the risk of reverse engineering.
At the heart of the issue is a single question: How do we keep sensitive information safe? Following recent data breaches and mismanagement, we can and should continue to increase security controls such as data encryption and access control. However, this only addresses one part of the data privacy story, and going too far can easily stifle innovation when the users of data can’t get the access they need.
Strong data security controls must be combined with data privacy techniques that mitigate the inherent risk that data can carry at every turn. Proper data obfuscation and redaction must be done before data leaves your hands because humans and systems will inevitably fail, putting you on the front page — with users and regulators calling for blood.