With this new feature, we enable you to set the sensitivity for each data column of your application in the “Data sensitivity” modeler. View-columns, calculated fields, and identity fields are not data columns and therefore are not shown. A prefilter is provided to suggest which columns are sensitive, based on keywords or because columns with the same name or domain are also marked sensitive.
For sensitive columns, it is mandatory to select an anonymization type. You can choose between:
- Sample data set
- Random value
- Expression
Foreign key columns are automatically anonymized based on the settings of the source column. They can be recognized by the icon below and are hidden by the undecided columns prefilter.
Sample data set
Thanks to the guys at Software Modernization, default sample data sets are available that you can use to anonymize your data. Upon anonymization, a random value from the set will be picked, taking into account the domain of the column. For example, if you have a sample data set with random dates and numbers, only valid dates will be used to anonymize a “birthday” column.Random value
When you use this option to anonymize your data, a random value is generated that meets the domain specifications. This is not user-friendly, therefore I would not recommend using this for acceptance test data. Note that when you select a random date, this date can be in the future or in the past. If there are business rules that ensure a date in the past is used, then it is better to select “Expression” and write your own query or use “Sample data set” and add a set with random dates in the past.Expression
In certain cases, a value is dependent on the value of other columns or has to meet specific requirements. In that case, the option “Expression” can be used. For instance, if end_date is mandatory when is_ended is true, then a random end_date can be generated with the following query:code:
case
when t1.is_ended = 1 then dateadd(day, abs(checksum(newid()) % 65530), 0)
else null
end
When the expression is dependent on a column that needs to be anonymized first, use the checkbox “Delay expression”. For example, if you have an email address that is based on a username:
code:
t1.user_id + "@thinkwisesoftware.com"
The easiest way to corrupt your data is to execute the anonymization-task on production. To comply with the legislation, we use techniques where it is not possible to restore the original data. Therefore, the task must always be executed on a backup! There is no rollback once the task has started.
To prevent mistakes, we added two checks:
- Database name check. The database name that is provided as a parameter must match the current database.
- Single user mode. The database needs to be in single user mode. Not only as a check but also because we disable some constraints and triggers, and don’t want anyone else to work in the database during anonymization. (Note, this check is only relevant for SQL server, and not for DB2)
Anonymization
Based on the data sensitivity-settings, the Software Factory will generate two stored procedures:1. Stored procedure tsf_test_data_anonymization to test the data sensitivity-settings and sample data:
code:
-- Use this statement to test the data sensitivity-settings
exec tsf_test_data_anonymization 'INSIGHTS_BACKUP'
go
2. Stored procedure tsf_anonymize_data to actually anonymize the data:
code:
-- Use this statement to anonymize the data
exec tsf_anonymize_data 'INSIGHTS_BACKUP'
go
To be continued…
To get this feature to you as soon as possible, we have decided to postpone some functions that will be added soon.- DB2 and Oracle support. Anonymization is now only possible for SQL server databases. Setting up the sensitive columns for DB2 and Oracle is possible but anonymization-support will come later. (DB2 support is released in hotfix “20180615 – DB2 data anonymization” from 06/18/2018)
- Executing the anonymization tasks from the Software Factory. For now, these tasks are grayed-out.