IBM® Data Privacy for IBM Cloud Pak for Data®
Data scientists, testers and business analysts need realistic, production-like data to deliver high quality releases and discover key business insights. However, sensitive, personally identifiable information must be protected before use by data citizens. With compliance requirements growing in laws like the General Data Protection Regulation (GDPR) and other regulations, protecting data so it can be used is becoming more of a challenge and more important. In 2020 alone, breaches that contained personally identifiable information incurred average costs of over $150 per record, for a total average cost of $3.6M per breach¹.
With IBM Data Privacy for IBM Cloud Pak for Data, personally identifiable information is masked so the data retains its utility and relationships. Data scientists, analysts, testers, and developers alike are able to use this data for actionable insights and high-quality releases, without putting the organization at risk. Combining protection with policy enforcement and discovery through IBM Watson Knowledge Catalog, data can be automatically masked with pre-defined techniques, allowing data citizens to get the data they need while reducing risks from breaches and non-compliance.
IBM Data Privacy for IBM Cloud Pak for Data is slated to launch June 2021.
¹ Cost of a Data Breach – Ponemon Institute, 2020
Leadership
David Townsend – Director of Design, IBM Cloud and Cognitive Software
John Bailey – Program Director of Design, DataOps
Ashwin Umathay – Design Manager
Rashmi Kaushik – Lead Product Manager
Jessie Yang – Product Manager
Peter Costigan – Product Manager
Yuffie Zhang – Product Manager
Victor Lara – UX Designer Lead
Perry Ting – Visual Designer Lead
Madeline Goulet – User Research Lead
I am the Lead User Researcher on my product team. I work closely with the designers and product managers on my team to guide our team with user feedback. My research ranges anywhere between early stage design concept validation to usability testing on nuanced UI functionalities. I also work closely with the product managers to bring the user voice to the table when planning the product roadmap.
Key Personas
Data Providers
The Data Providers are the users who work to prepare data for Data Consumers downstream. In the context of Data Privacy, the Data Engineer and Data Steward work to mask all PII, PHI, and any other sensitive data, as well as to create curated subsets of data for the Data Consumers to have self-service access to.
Data Consumers
The Data Consumers are the users who end up with the masked full copies and subsets of data to use for their various use cases. The primary use cases for are testing (test data management) and analytics (data science, machine learning).
The Challenge
Data is the lifeblood of many modern, competitive organizations. It educates businesses on their users, it informs every business decision, and it helps to predict the future. There is a huge interest and incentive in collecting data, but protecting data privacy is no longer optional—it’s the law. Data protection regulations are mounting locally and globally all across the world. Leaving personally identifiable information (PII) unmasked leaves companies vulnerable to data breach and regulatory risk.
How might we create a space for Data Providers to ensure their data is compliant with all local and global data privacy regulations while still retaining utility for Data Consumers? This domain is ever-changing and evolving daily. Our goal is become the most-trusted one-stop-shop for all data privacy needs and solutions.
In an enterprise environment, we are always facing complex problems that just cannot be solved on the spot. Systematically influencing the ecosystem and understanding the evolving user goals are the challenges solved through successful product design process.
Research
When I first joined the team in March 2020, our product scope was only a fraction of what it is now. Our team managed to accelerate our learning in understanding the domain, our customers, and the everyday pains and needs of our users through heavy, iterative user research. Through user research, we were able to grow the product scope from a test data management tool to an all-encompassing data masking tool for both testing and analytics use cases. Through conversations with data experts, we were able to uncover their most-pressing questions in order to inform our product direction:
How might a Data Steward mask data while still retain relevancy and utility?
How might a Data Engineer ensure timely, self-service access to complaint data for Data Consumers downstream?
How might a Data Scientist or Machine Learning Engineer know if the data they are using is compliant with data regulations?
How might an Application Tester synthesize additional data if they do not have enough for their testing purposes?
I continuously work to surface research questions from my design team, product management team, and development team. I conduct user research with both customers and non-customers to gather the most-relevant data from multiple perspectives.
Outcome and Impact
Since the inception of the product proposal in December 2019, we have launched a successful beta (November 2020), will soon launch a second beta in May 2021, and soon after a GA release in June 2021.
Throughout the product development, we have gained the trust of several early customers through continuous feedback sessions on the user experience and product roadmap.