We can never have the right test data, secured/masked, delivered to the right group, and ready at the right time. (CIO)

Is there any multi-purpose tool to help with purging & archiving, or with masking data while migrating to the cloud? …anything better than scripts! (CTIO)

Use Cases & Pain Points Addressed

This tool solves the following pain points, or greatly reduces their impact:

Reduces complex and time-consuming tasks in the identification/creation/generation of test data inside of IT organizations, which increases exponentially with the number of data systems involved in testing, and the data integration between these systems.

Reduces time & cost of critical resources to define & extract suitable test data (e.g. DBA's, Senior Testers & Developers), as well as execution time:
  • It eliminates the need for complex scripts and additional maintenance effort.
  • Its jobs are repeatable towards multiple environments, truncate & recreate on-demand (e.g. for Agile teams).
  • Can also be used as tool for purging & archiving related sets of records from a database, consistently.
  • Cross-platform migration means it can also supports migration of data to the Cloud (including masking where appropriate).

This tool provides a repeatable & reliable approach on how to apply systematically restrictions related to data privacy and security, for data replication scenarios and/or access to production data (See also GDPR, DSGVO, SOX, GLBA, PCI DSS, HIPAA, FIPA, KVKK & BDKK compliance requirements).

Data/Information Security: any additional privacy can be achieved according to any masking/scrambling rule, and for any of the data that is being copied, which can be table-to-table or the result of queries, or schema to schema. More data can also be synthesized (i.e. generated).

Operations support: use this tool also for ad-hoc analysis of production data, for operational purposes – not only for testing: one-time securing of data on a production system with in-line masking (e.g. for hot analysis of production issues).

Integration with ODI (in case ODI was used to create test environments): reading the ODI jobs and importing them automatically (i.e. a time saver to organization managing their environment with ODI).

HW costs savings by working with reduced data sets (consistent samples of data), yet still guaranteeing consistency and meaningfulness of data, in multiple environments.

Key Features & Differentiators

The following features and advantages should be noted:

  • Installed on distinct servers between source and target databases; the solution scales horizontally, through as many VM's as needed. It supports load balancing, multiple instance, multi process and multi-threading.
  • All transformation is done in memory.
  • As a result, the performance is much higher than other products (some of which also require additional space for staging area – which is also a security loophole).

Very riche coverage in features… Synthetic data generation is supported.

Referential integrity is preserved even if masking is applied on a column that is also used as a key.

Has its own SQL editor: any extraction query triggered from within that editor is bound to the masking/security rules defined in the tool. The editor will also restrict certain commands for which the authorization has not been granted (e.g. prevent from deleting, truncating).

  • On premise: Oracle, SQLServer, PostgreSQL, AbInitio, Terradata, Hive, Hadoop HDFS.
  • On premise, with limited features: DB2 (Luv & Z/OS), SAP HANA.
  • In the Cloud (all): Oracle, SQL Server.
  • Deployment in the Cloud: AWS only.

Interoperability - supports the copying & migration across multiple platforms:
  • Oracle to SQL Server (and vice-versa).
  • Oracle & SQL Server (incl. Cloud) to Hadoop & Hive (and vice-versa).

Costs/Expenses: comes at a fraction of the cost of international vendor tools (List Price comparison).

Performance benchmark from large corporations:
  • 10 Mio. records were copied from one table in 264 seconds.
  • 70 TB data of data distributed in 1035 tables were reduced to 70 GB in 4 hours, resulting in a consistent data set extracted from a given sample list of 10.000 Customers.
  • Test data generated out of 927 tables in 20 minutes, from a given sample list of 300 Customers. (Same test with Microfocus: 3 hours!)