Sunday, August 20, 2017

GDPR & Personal Data - Context is Key and (Foreign) Key is Context

A logical data model is one of the important milestones on the road to GDPR (General Data Protection Regulation) compliance. Being the blueprint of an organization's semantic data and the relationships among them, the logical data model serves as the virtual hub between the existing physical data stores and the future implementation of a GDPR-compliant data architecture.

The logical data model even offers a GDPR-related bonus, as it teaches that being 'personal' (or non-'personal') is not an absolute characteristic of data, but depends on the context in which these data are made available.

To illustrate the latter, let's look at an example of a logical data model which presumably represents the business of a B2C online retailer. This model may have been obtained as the result of the process described in my previous post "GDPR - How to Discover and Document Personal Data" or through any other modeling approach.

Click to enlarge

Which of these tables contain records with personal data?  As per the definition of 'personal data' imposed by the GDPR ('personal data' means any information relating to an identified or identifiable natural person), the answer is: All of them! 

Why? Because all tables are 'related' to the table 'person', i.e. there is a path from each table to 'person' (and vice versa).

This does not mean that all records of all tables shown here contain personal data, but those records that can be reached through a chain of foreign-key-value to primary-key-value links (or vice versa) from a 'person id' or to a 'person id'.

In other words, the existence of relationships (foreign keys) provides the context that categorizes records of data as 'personal' or 'non-personal'. For example, if we isolate the table 'address', its content simply constitutes a list of addresses which may exist in public reference databases such as Google Maps and therefore cannot be considered to contain personal data. But in the context shown in the above model, those records of the table 'address' that are identified by the value of the foreign key 'residential address id' in table 'person' (or by values of the foreign keys 'delivery address id' and 'billing address id' in table 'order') become personal data.

Still, the necessity and degree to protect personal data may vary from table to table and from column to column. The sensitivity of personal data must be evaluated, and the risk of processing personal data with respect to the rights and freedoms of natural persons must be assessed. Sensitivity and processing risk for each personal data element in isolation, but more importantly for their combination and in context will influence the physical design of data stores including measures of encrypting, pseudonymizing and anonymizing personal data to achieve GDPR compliance. But that will be subject to another post...

Wednesday, August 16, 2017

GDPR - How to Discover and Document Personal Data

One of the first steps for organizations on the journey to GDPR compliance is to find out what 'personal data' (i.e. any information relating to an identified or identifiable natural person) are stored where. For many organizations, this can be a tedious, cumbersome process, since very often the complete 'list' of all metadata describing personal data is not at hand right from the start. Making matters more complex, personal data's metadata (like any metadata) may be found under a variety of synonyms in different data stores. 

To streamline the process for data discovery as much as possible, I suggest a sequence of 5 steps which may need to be repeated several times. With each pass, additional personal data and/or their locations may be discovered based on the names of columns / fields added in a previous iteration. The process can be stopped once a consolidated, structurally sound logical data model has been obtained. 

Click to enlarge
The steps include:
  1. Create an inventory of all data stores. Record their name, purpose and  physical location (device type, country!). Important: Include locations where potential 'processors' (contractors) store business data on behalf of the 'controlling' organization! 
  2. Select (subset of) data stores that are already known to contain personal data. (In a first iteration, start searching data stores using typical metadata of personal data! In later iterations, search data stores using additional metadata of personal data based on the logical data model previously created (see step 5).) 
  3. Capture / reverse engineer the physical model of the selected data stores.
  4. For each selected data store, identify metadata (field names) of personal data and of objects relating to personal data. Assign business meaning to those fields by linking them to semantic items from your business data dictionary. (If you do not have a business data dictionary, create one in parallel by using existing documentation and involving subject matter experts!) 
  5. Create / enrich (partial) logical data model using the business data dictionary.
Although this is only the beginning of the journey, professional data (and process) modeling tools are obviously necessary on the road to GDPR compliance. (Note: All red arrows in the above image do not only indicate step sequence, but ought to also represent links among the related artifacts in the modeling tools' metadata repository.) Having already a business data dictionary in place and/or logical and physical data models tool-documented will greatly facilitate the process.

Stay tuned and read part 2 "GDPR & Personal Data - Context is Key and (Foreign) Key is Context" where I will demonstrate how context is important to determine whether data are to be considered personal or not with respect to the GDPR.