Blog

How to anonymize sensitive data

  • 21 January 2019
  • 0 replies
  • 885 views
How to anonymize sensitive data
Userlevel 3
Badge +5
With the introduction of Europe’s General Data Protection Regulation, it is more important than ever for companies that their IT environment is secure and meets the legal requirements for privacy protection. To help our customers comply with this new legislation, we added a new feature “Data sensitivity” to the Thinkwise Platform to automate the anonymization of personal data in any Thinkwise application. This feature can not only be used for personal data, but also for other sensitive data like passwords or credit card numbers.

With this new feature, we enable you to set the sensitivity for each data column of your application in the “Data sensitivity” modeler. View-columns, calculated fields, and identity fields are not data columns and therefore are not shown. A prefilter is provided to suggest which columns are sensitive, based on keywords or because columns with the same name or domain are also marked sensitive.



For sensitive columns, it is mandatory to select an anonymization type. You can choose between:
  • Sample data set
  • Random value
  • Expression
Foreign key columns are automatically anonymized based on the settings of the source column. They can be recognized by the icon below and are hidden by the undecided columns prefilter.


Sample data set

Thanks to the guys at Software Modernization, default sample data sets are available that you can use to anonymize your data. Upon anonymization, a random value from the set will be picked, taking into account the domain of the column. For example, if you have a sample data set with random dates and numbers, only valid dates will be used to anonymize a “birthday” column.

Random value

When you use this option to anonymize your data, a random value is generated that meets the domain specifications. This is not user-friendly, therefore I would not recommend using this for acceptance test data. Note that when you select a random date, this date can be in the future or in the past. If there are business rules that ensure a date in the past is used, then it is better to select “Expression” and write your own query or use “Sample data set” and add a set with random dates in the past.

Expression

In certain cases, a value is dependent on the value of other columns or has to meet specific requirements. In that case, the option “Expression” can be used. For instance, if end_date is mandatory when is_ended is true, then a random end_date can be generated with the following query:

code:
case
when t1.is_ended = 1 then dateadd(day, abs(checksum(newid()) % 65530), 0)
else null
end


When the expression is dependent on a column that needs to be anonymized first, use the checkbox “Delay expression”. For example, if you have an email address that is based on a username:

code:
t1.user_id + "@thinkwisesoftware.com"

The easiest way to corrupt your data is to execute the anonymization-task on production. To comply with the legislation, we use techniques where it is not possible to restore the original data. Therefore, the task must always be executed on a backup! There is no rollback once the task has started.

To prevent mistakes, we added two checks:

  1. Database name check. The database name that is provided as a parameter must match the current database.
  2. Single user mode. The database needs to be in single user mode. Not only as a check but also because we disable some constraints and triggers, and don’t want anyone else to work in the database during anonymization. (Note, this check is only relevant for SQL server, and not for DB2)

Anonymization

Based on the data sensitivity-settings, the Software Factory will generate two stored procedures:
1. Stored procedure tsf_test_data_anonymization to test the data sensitivity-settings and sample data:

code:
-- Use this statement to test the data sensitivity-settings
exec tsf_test_data_anonymization 'INSIGHTS_BACKUP'
go

2. Stored procedure tsf_anonymize_data to actually anonymize the data:

code:
-- Use this statement to anonymize the data
exec tsf_anonymize_data 'INSIGHTS_BACKUP'
go

To be continued…

To get this feature to you as soon as possible, we have decided to postpone some functions that will be added soon.

  1. DB2 and Oracle support. Anonymization is now only possible for SQL server databases. Setting up the sensitive columns for DB2 and Oracle is possible but anonymization-support will come later. (DB2 support is released in hotfix “20180615 – DB2 data anonymization” from 06/18/2018)
  2. Executing the anonymization tasks from the Software Factory. For now, these tasks are grayed-out.

0 replies

Be the first to reply!

Reply