Data Compliance

How to Choose the Right Data Masking Tool for Salesforce Data

Three critical questions to ask when selecting an enterprise-scale solution for masking Salesforce sandboxes.

Tess Maggio

Jan 24, 2022

At most businesses, Salesforce data is the centerpiece of the customer record. With increasing cyber security threats and data breaches, this data (which often contains critical PII data) can also be a big compliance and privacy risk.

Companies rely on Salesforce data to test applications and provide their customers with new innovations to grow their business. As a result, many people across a business need access to Salesforce data and sandboxes. External consultants are brought in to build out complex Salesforce functions. Analysts require it for BI purposes. Salesforce developers need to validate that new processes function. QA testers need to check that a form going live to all customers works. Application developers across the org need to use it in a range of applications.

Test data management and security professionals need strong masking solutions to safeguard this data. Having worked with hundreds of companies to solve this type of problem, we put together a quick guide to help you with what to look for when selecting a Salesforce masking solution.

A quick intro to data masking!

Data masking renders data useless to those who may get their hands on a customer’s sensitive data while maintaining the inherent value of the data itself. It’s a proven method to obfuscate data in test environments.

Masking tools replace real data with fake yet realistic values, for use in testing, demos, or analytics. Such solutions change the values of the data but maintain the integrity of the real data itself. Algorithms are used to alter this data in ways that cannot be reverse-engineered. By masking data, you can ensure that everyone in the business can get access to realistic data, without increasing your security risk.

Three Critical Questions to Ask

Here are three critical questions to ask when selecting an enterprise-scale solution for masking Salesforce sandboxes:

1. Will the solution work across your enterprise to facilitate integration testing?

Many applications rely on Salesforce data along with other data sources. The solution you choose needs to work for all of these data sources, otherwise, your integration testing scenarios will be incomplete. It’s also essential that referential integrity is retained between these sources.

Referential integrity means that a specific record is masked the same wherever it occurs. For example, whether a record for Robert occurs in Salesforce or Oracle, Robert must always be masked to Steven, and Robert’s SSN must be masked the same in all data sets. In addition to preserving the primary and foreign keys that are needed to use and integrate the data sets effectively, it ensures that in an integration testing scenario you are actually testing how all the pieces fit together.

When choosing a solution, think about the applications that you test and deploy that rely on multiple data sources. Do you have a way to ensure that records are masked the same way between those data sources? Otherwise, your testing scenarios will be more time-consuming and riskier, since you can’t see how the whole system works before pushing to live.

If the solution you are looking at can only mask Salesforce, you will not be able to ensure that your masked Salesforce data can work with SQL Server, Postgres, or any other data source or application that interacts with it. Watch out for this, as it’s common for point-solution vendors to add a masking component that will work only in the Salesforce realm. If masking isn’t the vendor’s core competency, you may also be exposing yourself to more risk, which gets us to our next critical question.

2. Does the solution offer strong, irreversible masking?

There are two key points to consider here. One, the solution must replace production data with realistic data values, and two, it must mask in a manner that can’t be reverse-engineered.

To the first point, make sure the solution you choose doesn’t simply scramble the data, but rather provides fictitious yet realistic, business-specific data. The resulting masked values should provide no value to hackers, but should be functional for any non-production use case. This is an essential time-saver across QA and UA. Imagine pushing an application to live without being able to see that the first and last name fields are indeed names, or what order they appear in!

Secondly, ask about the masking algorithms themselves - can they be reverse-engineered? Ensure they can give you sufficient detail on this point on how their algorithms operate. There is no point in masking the data if once it’s obfuscated, a hacker could simply back out the original value or break through the algorithm. A masking algorithm should be designed as irreversible, purposely destroying information so the original data is not retrievable from the masked dataset.

3. Does the solution offer automated sensitive data discovery?

Masking is not a one-time process. Your Salesforce data will be constantly changing and your testers will need fresh compliant test data to ensure that the test environment they are working with reflects reality. Therefore, data discovery and masking both need to be fast and automatic - you don’t want to be manually sifting through all of your data to find what fields are sensitive. Choosing a solution with automated data discovery capabilities helps you quickly ensure compliance (allowing you to complete the first data mask faster), and makes it easier to keep your masked data up to date.

Companies can have 1-2 years out-of-date test data because it’s so tedious to go through and find sensitive data when there is a large schema change. This common tradeoff between speed and compliance can lead to very time-consuming production incidents down the line!

Salesforce data is constantly changing every time a new field is added (ex: each time the business decides to collect a new customer detail). Similarly, it changes every time a new AppExchange solution is installed - these bring their own objects and fields, which often contain PII.

You need an automated approach to scan for sensitive data and make frequent refreshes. You don’t want to have to do an arduous manual review process, or need to purchase a separate tool to help with sensitive data discovery at scale.

More and more companies realize that to keep up with regulations, they need to have a robust way to find all their sensitive data so that nothing leaks through into test environments. The global sensitive data discovery market is projected to grow from $5.1B in 2020 to $12.4B by 2026, with a CAGR of 16.1%, and the highest CAGR is expected in Cloud (vs on-prem). But, such tools are costly and stop at discovering sensitive data. What you need is a solution for masking PII data that can also handle sensitive data discovery, and which as a part of that discovery process, assigns the right masking algorithm to keep your data safe.

For more information on data masking and how Delphix can help you mask your Salesforce data, download our solution brief here.