Data utility can be preserved while enhancing data privacy

Organizations need to strike a perplexing balance when launching strategic AI initiatives: data needs to be accessible, without compromising privacy regulation compliance or the speed of business innovation. Customer trust and brand reputation are key competitive advantages, so accelerated digital transformation and growth relies on businesses being smart about protecting sensitive customer data while still preserving data utility for AI and analytics teams.

Three questions organizations need to confront when it comes to leveraging customer data are:

How can initiatives inside and outside my organization work securely with personal information (PI) and sensitive data?
How can I remove PI from datasets without affecting the integrity of the data or accuracy of my projects’ results?
How can I actively protect PI and sensitive data whenever they are accessed, wherever they reside?

When organizations do not have answers readily available to the questions above, then Artificial Intelligence projects are often stalled and collaboration using meaningful data is limited. Gartner predicts that by 2024, the use of data protection techniques will increase industry collaborations on AI projects by 70%.

IBM AutoPrivacy framework and the key use cases delivered via IBM Cloud Pak® for Data. Today I will expand on the advanced data protection use case, which is one of key capabilities in the AutoPrivacy framework.

Data protection and de-identification of sensitive data are not new concepts. Although these concepts have been well known for many years, most enterprises did not employ these practices consistently. The enforcement of GDPR has drastically changed that and in the post-GPDR era, enterprises are hyperaware of data protection regulations that they must adhere to. With the enforcement of GDPR (Europe), CCPA (California), LGPD (Brazil) and many other data protection legislations in recent months, consumers are now well aware of their privacy rights and are demanding that enterprises provide transparent privacy protection approaches.

Historically, enterprises have used many methods of sensitive data protection, including redaction and various forms of masking such as substitution, shuffling or randomization. However, with the employment of deep (learning) neural network technology in AI, data science and analytical modeling, the risk of re-identification has been increasing. Hence, there is a need for newer data protection techniques and robust encryption algorithms that can enhance privacy but also preserve utility of the data.

By far, the most important requirement from IBM customers has been the consistent enforcement of data protection policies, regardless of where the data resides.

Data cannot simply be de-identified randomly; important relationships must be maintained. Format preservation is a fundamental requirement. Values must be de-identified consistently across the enterprise, respecting relationships across multiple data assets. For example, de-identification of a credit card number, personal first and last names, or any other entity identifiers must be repeatable consistently across data sources in on-premises and hybrid cloud environments.

In addition, I have often encountered unique industry use cases where there is a need for special treatment of certain data elements. For example, in financial services and healthcare, the time intervals between certain dates should be the same whether unmasked or masked. The accuracy of dates of disease treatment in healthcare are critical for biomedical research, so while shifting dates, it’s very important to maintain the right intervals. Similarly, the interval between a date of birth and date of an auto policy agreement (in other words, the customer’s age) may make a very big difference in the cost and available features of auto insurance.

Most customers require support for custom de-identification when it comes to complex, multi-field computation using a low-code or no-code approach. There are also several use cases that require the addition of statistical noise to hide individual data and only surface group level information for analytics.

These rich data protection and consistent policy enforcement capabilities are available via IBM Watson® Knowledge Catalog Enterprise Edition to address a wide range of use cases.

The future is bright as the latest privacy enhancing technologies such as differential privacy, synthetic data fabrication and more are brought into the solution. These technologies, paired with the power of IBM Cloud Pak for Data, will allow data science teams to make choices along the privacy-utility spectrum and continue to push the boundaries of AI initiatives.

Browse categories

Share Blog Post

Pragma Edge (4.5/5)

Rated 4.5 out of 5

Published on July 29, 2021

Recent Blog Posts - Read More

Pragma Edge, IBM Gold Partner, IBM, Pragmaedge,

IBM Sterling CI/CD Pipeline | Automate Fixpacks & Deployment

July 21, 2026 No Comments

The Missing Stage in Sterling CI/CD: Turning Deployment from Weeks into Minutes If your last IBM Sterling B2B Integrator security fixpack or

The Cost of Delaying Your Sterling B2B Integrator Upgrade—and How to Modernize with Confidence

July 10, 2026 No Comments

The Cost of Delaying Your Sterling B2B Integrator Upgrade – and How to Modernize with Confidence A Sterling B2B Integrator upgrade is

Process Automation with GenAI: From Automation to Decision Intelligence

June 25, 2026 No Comments

Process Automation with GenAI: From Task Automation to Decision Intelligence Enterprises have been automating tasks for decades. But GenAI isn’t just another

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others

Data utility can be preserved while enhancing data privacy

Three questions organizations need to confront when it comes to leveraging customer data are:

Share Blog Post

Industries

Products

Who We Are

IBM Partner Engagement Manager Standard

IBM Partner Engagement Manager Standard

IBM Partner Engagement Manager Standard

Pragma Edge - API Connect