The Mystery of Data Masking for Business Managers
Data Masking has recently become a well-recognized buzz word. All major IT players have released their own products to support this initiative. Oracle, IBM and Camouflage are just a few vendors with smart and flexible masking products. However there are not that many stories of a successful masking adoption (not just implementation!). There is a good reason for that.
Data Masking is a non-technical challenge.
Data Masking, according to Wikipedia, is a “process of obscuring (masking) specific data elements within data stores”. This sounds very technocratic. No wonder, Data Masking projects are often driven and executed by an IT organization. But IT does not benefit from the implementation at all! Masking procedures creates an extra work load for the administrators, testers and developers and give them absolutely no technical reasons to adopt it.
On the other end, Data Masking is forced by compliancy regulations and security policies and must (not even should) be handled by the business. Compliance, quality and security offices are well known watchdogs to lead, as well as supervise, corporate processes and vital business initiatives. These offices should be responsible for Data Masking programs, as well as playing a very active role in masking implementation.
Data Masking should never be enforced blindly.
Industry compliance documents like PII, PCI DSS, HIPAA define data categories that should be handled with extra caution. It does not mean that such data fields should be masked all the time. For example, it is absolutely impossible to mask production data used for production purposes. It may sound strange, but imagine a Call Center using masked data to serve their customers. The company will instantly lose credibility. Enron was yet another example of a creative “data masking” applied to accounting records and we well remember the end of that saga.
However, customer loyalty will be affected if the private information will leak to the public domain (like credit card numbers). Data protection does not equal Data Masking. There are many other security methods to ensure data safety.
Data Masking is a destructive irreversible transformation intended to break even loose association with real production data. It makes masked data sets safe (in theory) for distribution (like to send it to the offshore development team), but totally useless for production purposes.
Use “old-fashion” security for production instead
Considering my earlier statement, that Data Masking should not be applied to production data for production purposes, then what should? There are just a few simple rules to follow to stay out of trouble (even to stay compliant in some cases):
- Don’t be Google. Collect and store only data that you really need to run your business, not everything that flies by. This may eliminate a need to protect and mask sensitive data, since you don’t have it.
- Apply “need to know” access rule to all sensitive information. Not everyone needs to know all salaries in the company. Use the same approach for other data types.
- Limit export, printing and screen snapshot capabilities. Export files (like Excel) are often getting lost together with the laptop; trashed printed documents are blown away from the dumpster by a strong wind, etc.
- Encrypt sensitive data fields in the database. This will protect you from some “successful” intruders.
- Restrict physical access to the application servers. Server access is an Achilles’ heel in defense strategy. Never keep a “server” under your desk. Servers belong to the Data Centers, not offices.
- Encrypt backups and destroy backup media no longer in use. Never give your backup media to strangers.
- Test all security procedures frequently. Paper processes have no value. Random audits are helpful as well.
Data Masking for development
Development data sets are frequently stored outside of production security perimeter and are not affected by strict security rules. Always assume that these data sets may go “beyond the event horizon” and reappear on Wikileaks. Developers (in 99% cases) do not need real production data to test the application. Masked data set is as good as a production data for most development phases, but not for all. Here are some ideas to consider when preparing development data sets:
- Development data set generation does not require a high performance masking tool. Who cares if the process of data preparation will take even a day or two? Need it on Friday? Then launch it on Wednesday. Developer will use the same data for several weeks or months.
- Engage Business Analysts and Testers to define transformation logic. If new application is designed to work with the North-East user accounts, then California data in the masked set will be useless. Business Analysts should know what to mask and define masking approach to preserve all data patterns expected by the application.
- There is always a risk of generation of a “non-representative” data set. Business Analysts are human beings and may omit some “minor” details when define masking rules. “Fully” tested development application may fail, facing real production data and this is just a fact of life. It is impractical and not feasible to test all application behaviors, so there is always a risk of system malfunctioning in production. The best work around is to generate multiple data sets using different masking rules. This should lower a probability of hidden defects slipping to production.
- Production issues (Tier -3 Support) may require developers to connect to the real production data set and this situation cannot be eliminated. This security procedure should be thought through in advance and some equipment should be reserved within security perimeter to replicate production data for further troubleshooting.
Data Masking procedural guidelines
Data Masking is designed to be irreversible, but nothing is perfect. There is always a chance that access to multiple masked data sets may reveal corporate secrets. Even a single data set may give a clue to salary ranges and trigger unnecessary tensions in the company. Here are some ideas on how to treat masked data:
- Never consider masked data being fully sanitized. Apply reasonable production security procedures when working with masked data.
- Limit quantity of masked data sets generated and force authorization procedure to generate them. Excessive data volumes make intelligence work easier.
- Always include masked data sets in the corporate “Wikileaks” avoidance strategy.
These suggestions should help you to adjust your data masking approach and avoid obvious pitfalls during implementation.
(originally posted on CIO.com)