Screen Shot 2021-04-19 at 6.46.51 PM.png

IBM Data Privacy for Cloud Pak for Data

IBM Data Privacy for Cloud Pak for Data

 

IBM® Data Privacy for IBM Cloud Pak for Data®

Data scientists, testers and business analysts need realistic, production-like data to deliver high quality releases and discover key business insights. However, sensitive, personally identifiable information must be protected before use by data citizens. With compliance requirements growing in laws like the General Data Protection Regulation (GDPR) and other regulations, protecting data so it can be used is becoming more of a challenge and more important. In 2020 alone, breaches that contained personally identifiable information incurred average costs of over $150 per record, for a total average cost of $3.6M per breach¹.

With IBM Data Privacy for IBM Cloud Pak for Data, personally identifiable information is masked so the data retains its utility and relationships. Data scientists, analysts, testers, and developers alike are able to use this data for actionable insights and high-quality releases, without putting the organization at risk. Combining protection with policy enforcement and discovery through IBM Watson Knowledge Catalog, data can be automatically masked with pre-defined techniques, allowing data citizens to get the data they need while reducing risks from breaches and non-compliance.

IBM Data Privacy for IBM Cloud Pak for Data is slated to launch June 2021.

¹ Cost of a Data Breach – Ponemon Institute, 2020

Screen Shot 2021-05-13 at 12.11.47 PM.png
 

Leadership

I am the Lead User Researcher on my product team. I work closely with the designers and product managers on my team to guide our team with user feedback. My research ranges anywhere between early stage design concept validation to usability testing on nuanced UI functionalities. I also work closely with the product managers to bring the user voice to the table when planning the product roadmap.

 

Key Personas

Data Providers

The Data Providers are the users who work to prepare data for Data Consumers downstream. In the context of Data Privacy, the Data Engineer and Data Steward work to mask all PII, PHI, and any other sensitive data, as well as to create curated subsets of data for the Data Consumers to have self-service access to.

In the context of Data Privacy, Muneiza queries and publishes subsets and full copies of data that reflect and act like production data for the Data Users to utilize. Her primary goal is to enable her dependencies with self-service means to access and utilize the data she has prepared.

In the context of Data Privacy, Muneiza queries and publishes subsets and full copies of data that reflect and act like production data for the Data Users to utilize. Her primary goal is to enable her dependencies with self-service means to access and utilize the data she has prepared.

In the context of Data Privacy, Stuart builds and applies data masking rules that comply with the data governance framework set forward by the Chief Governance Office. His primary goal is to ensure the company policies and practices are in compliance with local and international data privacy regulations.

In the context of Data Privacy, Stuart builds and applies data masking rules that comply with the data governance framework set forward by the Chief Governance Office. His primary goal is to ensure the company policies and practices are in compliance with local and international data privacy regulations.

Data Consumers

The Data Consumers are the users who end up with the masked full copies and subsets of data to use for their various use cases. The primary use cases for are testing (test data management) and analytics (data science, machine learning).

In the context of Data Privacy, Thierry is responsible for testing their software applications with data that mimics production data. His main goal is to ensure all software applications continue to run smoothly without interruption.

In the context of Data Privacy, Thierry is responsible for testing their software applications with data that mimics production data. His main goal is to ensure all software applications continue to run smoothly without interruption.

In the context of Data Privacy, Beatrice utilizes masked subsets and full copies of production data to provide actionable answers to the business' most-pressing questions. Her main goal is to provide value to stakeholders to make sure that they are supported in whatever decisions or predictions that they want to make.

In the context of Data Privacy, Beatrice utilizes masked subsets and full copies of production data to provide actionable answers to the business' most-pressing questions. Her main goal is to provide value to stakeholders to make sure that they are supported in whatever decisions or predictions that they want to make.

In the context of Data Privacy, Sicily utilizes masked subsets and full copies of production data for building and training their data models. To solve the problems, she creates models that predict answers that are needed for decision-making.

In the context of Data Privacy, Sicily utilizes masked subsets and full copies of production data for building and training their data models. To solve the problems, she creates models that predict answers that are needed for decision-making.

In the context of Data Privacy, Maxwell utilizes subsets of masked data to build and execute machine learning models. He strives to bring high-quality data models that enhance decision-making for the business.

In the context of Data Privacy, Maxwell utilizes subsets of masked data to build and execute machine learning models. He strives to bring high-quality data models that enhance decision-making for the business.

 

The Challenge

Data is the lifeblood of many modern, competitive organizations. It educates businesses on their users, it informs every business decision, and it helps to predict the future. There is a huge interest and incentive in collecting data, but protecting data privacy is no longer optional—it’s the law. Data protection regulations are mounting locally and globally all across the world. Leaving personally identifiable information (PII) unmasked leaves companies vulnerable to data breach and regulatory risk.

How might we create a space for Data Providers to ensure their data is compliant with all local and global data privacy regulations while still retaining utility for Data Consumers? This domain is ever-changing and evolving daily. Our goal is become the most-trusted one-stop-shop for all data privacy needs and solutions.

In an enterprise environment, we are always facing complex problems that just cannot be solved on the spot. Systematically influencing the ecosystem and understanding the evolving user goals are the challenges solved through successful product design process.

 

Research

When I first joined the team in March 2020, our product scope was only a fraction of what it is now. Our team managed to accelerate our learning in understanding the domain, our customers, and the everyday pains and needs of our users through heavy, iterative user research. Through user research, we were able to grow the product scope from a test data management tool to an all-encompassing data masking tool for both testing and analytics use cases. Through conversations with data experts, we were able to uncover their most-pressing questions in order to inform our product direction:

  • How might a Data Steward mask data while still retain relevancy and utility?

  • How might a Data Engineer ensure timely, self-service access to complaint data for Data Consumers downstream?

  • How might a Data Scientist or Machine Learning Engineer know if the data they are using is compliant with data regulations?

  • How might an Application Tester synthesize additional data if they do not have enough for their testing purposes?

I continuously work to surface research questions from my design team, product management team, and development team. I conduct user research with both customers and non-customers to gather the most-relevant data from multiple perspectives.

 

Outcome and Impact

Since the inception of the product proposal in December 2019, we have launched a successful beta (November 2020), will soon launch a second beta in May 2021, and soon after a GA release in June 2021.

Throughout the product development, we have gained the trust of several early customers through continuous feedback sessions on the user experience and product roadmap.

“I’ve been thinking about how you guys would capture the MAGEN flexibility from a UI point of view, and the end result even better than I hoped for.”

– Technical Sales Specialist