Data Compliance
Discover how to efficiently mask SQL data at scale. Explore the best practices based on our extensive experience with large enterprises.
Ilker Taskaya
Dec 30, 2024
Share
Almost every enterprise has SQL databases, which means they need SQL data masking. Some SQL databases have built-in masking options, such as Microsoft SQL Server dynamic data masking. But is that the right approach? And what about all of your other databases and data sources?
In this blog, we’ll break down the best practices for SQL data masking at scale and share recommendations based on our experience working with large enterprises.
SQL data masking is the practice of replacing sensitive data from SQL data sources with fictitious but realistic values. The masked data retains its usability for use cases like testing, but it prevents the proliferation of sensitive data, such as personally identifiable information (PII), from non-production environments.
Dynamic data masking limits sensitive data usage by limiting access to production data to some users. It changes the data served to the user, whereas the data in the database is still sensitive. Dynamic data masking is the built-in option for masking with databases like Microsoft SQL Server.
Static data masking persists sensitive data with fictitious values at the data source level. It ensures the data's syntax is the same. It is even possible to retain some semantic richness, such as keeping a customer's age the same while changing their Date of Birth. For SQL masking, static data masking must be done with an additional toolset, such as Delphix masking.
Compare static vs. dynamic data masking >>
Native SQL Server masking is dynamic, which isn’t enough to truly protect sensitive data. That’s why adding static masking for SQL is important.
Using static SQL masking further reduces your risk because even the data in the database is fictitious. Thus, you can create zero-trust environments and ensure data privacy in compliance with regulations such as GDPR and HIPAA. And you can consistently mask different data sources to protect your entire non-production environment. Not just the SQL server databases.
35% of organizations surveyed in the 2024 State of Data Compliance and Security Report cited Microsoft as a tool frequently used for data masking. But is built-in masking the right choice? Find out what else 250+ enterprises said about masking and compliance in the report.
Get the Data Compliance Report
The data is protected even if hackers perform a spear phishing attack targeting developers, QA, and DBA resources. In addition to UI access, these individuals can have full-stack access from the database to the application logs.
Users can perform like-for-like testing in production and non-production environments. Dynamic data masking has a slight overhead on the database server.
Applications often don’t support multiple database users. This support for multiple database users is an underlying requirement for dynamic data masking, creating challenges.
Dynamic data masking methods are minimal. They usually involve redaction, such as (xxxx-xxxx-xxxx-1234), or dates, such as 1900-01-01. These kinds of transformations can create issues with testing because of the rules in the application to ensure the data is valid.
You can consistently mask different data sources such as a loan document MongoDB as well as a SQL Server customer table. Because a native SQL server dynamic data masking solution is not available for other data sources, it doesn’t reduce the risk in environments with multiple data sources.
A property and casualty insurance company I worked with had longitude and latitude information in their SQL Server databases. Native dynamic data masking didn’t offer an option to protect addresses defined as longitude and latitude. But Delphix did.
This customer also had large XML documents embedded in a CLOB column in the database. They needed to consistently mask the XML attributes containing sensitive data, such as policyholder demographics, with the varchar columns in the customer dimension table.
Delphix made it possible. This customer was able to mask millions of XML documents in the database consistently.
The success of data protection schemes in non-production succeeds or fails on the following factors:
Ability to discover sensitive data across data sources and apply a consistent security strategy.
Choosing fictitious yet realistic masked values. Test cases and analytical reports work seamlessly with Delphix masked data to avoid exceptions to your security policies.
Scalability of your data masking solutions to handle hundreds of terabytes of data.
While SQL server dynamic data masking provides limited data protection capabilities, the solution can be dead on arrival without data discovery capabilities.
An average database schema today has ~500 user tables. These tables in an OLTP design can have an average of 30 columns per table. Thus, the user masking the data must identify the sensitive columns out of 15,000 potential columns.
Sometimes, the metadata is physical, and the column name doesn't relay that it contains sensitive data. Now, the user has to scan not only the metadata but also the data in these columns. This becomes an onerous task quickly. Furthermore, once users identify a sensitive column, they must assign the correct masking algorithm to protect it.
Delphix provides out-of-the-box capabilities to scan metadata and data across hundreds of data domains. Then, based on your company's security policy, the correct masking algorithm is automatically assigned. These scans can also be integrated with a data governance product such as Collibra or BigId to ensure a single location for data security audits.
Delphix products are all designed with publicly available APIs, so our customers create closed-loop data governance. Gain policy-level controls for regulations like GDPR and HIPAA. And assign a masking algorithm based on your specific security policy.
This assignment is data source agnostic. You can consistently assign a security policy to an Oracle database, a MongoDB document database, data files on an AWS S3 bucket, and many more. This allows the creation of a consistent approach to data security wherever the data resides.
Related blog >> Oracle Data Masking
Data privacy regulations mandate that sensitive data must be protected. Masking sensitive data — whether it’s in SQL or other databases — is critical to fulfilling compliance requirements.
Delphix masking product algorithms are flexible — without user programming. Our masking algorithms easily handle different data types, whether CHAR, VARCHAR, DATE, TIMESTAMP, CLOB, BLOB, or index-organized tables.
Get unique synthetic data for primary keys or unique indexes from out-of-the-box algorithms. We have customers worldwide and provide masking algorithms in different languages and countries out of the box. Our masking algorithms are used in various industry verticals, such as healthcare, financial services, and defense.
So, you can easily mask a diagnosis code for HIPAA compliance, a CUSIP within a SWIFT app, or a helicopter part ID for ITAR compliance.
Due to the size and number of databases that a typical enterprise has, it’s important to have data masking that can scale across multiple SQL databases and to other data sources.
At Delphix, we’ve provided solutions in data masking for the last two decades. When we started, our products only masked a few hundred million rows using JDBC drivers to connect and mask. The JDBC driver to mask data is available from Delphix for smaller databases.
Customers focused on developer productivity can find it challenging to mask billions of rows of data in an SQL Server database in less than a day. Delphix provides multiple solutions to handle scale. We offer Hyperscale masking solutions to mask a billion rows of data per hour. Our customers can also mask data using our Delphix compliance solutions for Microsoft Fabric via an API on any data source on Azure.
For scale, Delphix solutions also provide the capability to change the database server before masking and return to the original state after masking is completed:
Change the database to ‘Simple Recovery’ to minimize the logging level.
Create identity columns when missing to increase the performance of updates.
Use logical keys to avoid a full table scan for updates.
Insert sorted rows based on the key for clustered indexes is beneficial for data warehouses.
Drop constraints and indexes across the tables with sensitive data, thus ensuring consistent transformations and putting the database objects as they are in production after masking.
Execute bulk update statements to avoid committing after each row.
Drop and create triggers.
Delphix is the best solution for the enterprise's SQL data masking at scale. We have masked petabytes of SQL data on-premises and on Azure. Our solutions have been truly tested with real data at scale across different data models and industry verticals. Our customers are large and global, and thus, they have protected data with different semantics, languages, and PII.
Delphix helps enterprises solve masking challenges. That’s because Delphix was built for global enterprises.
All Delphix compliance products are based on security policy. Companies can reuse the policy across multicloud and on-premises resources. The compliance products are built to be data source agnostic when transforming data. They can read and write data natively to all the data sources you need.
Meanwhile, transformations are always consistent, so you end up with referentially integrated data. Our customers choose the compliance product for the right scale and ease-of-use for their customers.
Sensitive data volumes are increasing in non-production environments, according to 75% of enterprises surveyed in the 2024 State of Data Compliance and Security Report. And the growth of sensitive data leads to an expanded exposure footprint, worrying 91% of enterprises.
Delphix helps you eliminate the risks of ever-growing sensitive data in SQL databases and beyond. That’s because Delphix provides pre-built and customizable masking algorithms that you can use to mask production data for non-production environments. This ensures that you meet regulatory requirements (e.g., GDPR, HIPAA) for detecting PII, protected health information (PHI), and other sensitive data.
Manually masking data and delivering it to lower environments is time-consuming. This slows down the ability to innovate. One of the top concerns in the 2024 State of Data Compliance and Security Report was the impact of compliance on development speed, with 32% citing it as a top challenge.
Delphix automates masking workflows. With Delphix, you can mask and deliver virtualized SQL databases in minutes, not hours. This helps you ensure compliance while achieving faster development speeds.
Enterprises can’t sacrifice quality to go faster — they need it all: compliance, quality, and speed. That’s why 36% of enterprises surveyed in the 2024 State of Data Compliance and Security Report cited quality as a top concern.
When enterprises use poor-quality test data, it weakens the quality of the software they are delivering. With Delphix, masked production data is quality test data. Using masked test data — that has referential integrity with its original source — in Delphix helps you ensure consistent outcomes in test environments.
Discover more >> What Is Delphix?
SQL databases aren’t the only data source used in large enterprises. Most businesses we work with also have some combination of Oracle databases, mainframe, PaaS data sources, and/or analytical sources like Snowflake and Databricks.
Many of the customers I work with use Delphix to mask both SQL databases and other data sources. Here are a few recent examples.
Boeing Employee Credit Union (BECU) had a breadth of supported databases and flat files. They needed a solution that could mask SQL Server — and Oracle, plus 100+ flat files across nine key applications including CRM, loan origination, and member portal systems.
With Delphix, they get a solution that can handle their data masking needs at scale. They masked 680 million data rows in just 15 hours. And, they gained speed and efficiency across the board, enabling 200+ developers with self-service data.
“Not only does Delphix reduce our risk footprint by masking sensitive data, but we can also give developers realistic, production-like environments."
Kyle Welsh, CISO, BECU
Gain Capital needed to accelerate data delivery times for their SQL databases. It would take up to 4 hours to deliver data environments for a single project. This led to slower development cycles and delays.
With Delphix, they get a solution that can deliver SQL database copies in just 3 minutes. Faster data delivery and self-service access for developers means they can accelerate development.
“Delphix has increased our output to the business by 20%, leaving more time for innovation, which in turn drives business growth."
Anup Anand, Database Director, Gain Capital
Are you ready to take SQL data masking to the next level?
Find out how Delphix can help you achieve SQL data masking at scale and enable your enterprise to innovate faster.
Reach out to our team to learn more.