Is Anonymous Data Really Anonymous?

Cyber security professionals are also often involved in ensuring the privacy of organizational and customer data. One important category of that data is Personally Identifiable Information or PII. That is data where it can be determined to whom the data belongs.

One example of PII is stored credit card information. A database may include a record that contains a cardholder name, address, card number, and other information. If that data were disclosed it would be relatively easy for a bad actor to use the information and to know to whom it belongs.

Another example is medical information where the name of a patient stored along with medical information such as disease history, medications prescribed, and so forth. If this information were to be disclosed, it could be hazardous to the patient. It may be advantageous to researchers, however, to be able to access the data without the direct tie to the patient directly.

conceptual image of office professionals and data graphics

The process of removing the name and other patient-specific information is often referred to as anonymizing the data. Bad factors however, may be able to tie the data to the individual if they have additional information. The issue is that there is so much data collected by so many companies that the information from multiple sites can be aggregated and connections discovered.

Using an example from the linked paper:

[A] seemingly anonymous child might have a profile at a social network website, such as Facebook:

Name: Billy Doe
Age: 13
Location: I live in Washington, DC
Narrative: I love to build things with Legos. I love Snickers bars. I recently saw the Batman movie and thought it was the coolest movie ever!

Another database might have the following information:
Name: William Doe
Date of Birth: 04-04-1996
Address: 2000 H Street, NW, Washington, DC 20052

Piecing together these pieces of information, one can link the anonymized record to William Doe.

The example is fairly simplistic but illustrates the point that even basic ties between two or more sets of data can allow one to link the entries. Here the links were the last name, the birth date/age, and the city. Had he watched the movie on a streaming service, that data could also be linked. The more data a bad actor has, the more likely it is that a connection can be made.

Is There A Solution?

We live in a world where there is an abundance of data about individuals. Some is held by medical organizations, some is held by companies to help manage their customers, and much is held by marketing organizations.

Governments are beginning to recognize the need for organizations to protect the data they hold. Europe's GDPR (General Data Protection Regulation) is one of the best known. Most web users see its impact in the ubiquitous "cookie" notices on web sites. It goes farther, however. It covers many types of stored data, for instance. US states and other countries are also implementing data privacy regulations.

Cyber Security professionals are often responsible for securing data that is transmitted (in motion) and stored (at rest). Learning Tree's cyber security curriculum has multiple courses addressing this, including the foundational Course 468, System and Network Security Introduction which I co-wrote. Two basic concepts are at the core of this protection: encrypt data and ensure that only those who genuinely need to are able to access the data.