Monday, July 15, 2019

Personal Data - a Universal Definition Applying in the USA and Canada (and Elsewhere)

In my previous post A Privacy Regulation Applying in the USA and Canada (and Elsewhere), I alluded to the worldwide impact of the General Data Protection Regulation (GDPR)*. In this post, I like to clarify** the GDPR’s definitional scope of personal data.

The GDPR considers data to be personal if they relate to an identified or identifiable natural person (“data subject”). Although the term “identifiable” tends to consume all the attention here, actually the term “relate” carries more weight as it widens the definition in a way that it resonates with common sense.

So while e.g. a Social Security Number, International Bank Account Number (IBAN) or passport number uniquely identify an individual and are therefore unquestionably personal data, the GDPR includes as well non-identifying attributes such as weight (at a given point in time), height or eye color to be personal data if they can be assigned to an identifiable individual.

But the term “relate” even pertains to more than a natural person’s attributes. A prominent example is a civic address register which certainly is public and does in no way constitute personal data in and by itself. However, if a register entry, e.g. “123 Main Street, Newcastle, Fantasyland”, relates to an individual “Jane Doe” whereas the relationship is “is residential address of”, that entry, by common sense and GDPR, becomes part of Jane Doe’s personal data.

You will find a more complete approach in my post GDPR & Personal Data - Context is Key and (Foreign) Key is Context underscoring that data modeling is a mandatory discipline in any medium or large organization being in need to get their head around personal data.

** Legal disclaimer: This blog post is not intended to be legal advice, but to raise awareness that it is recommended to consult a lawyer.

Friday, July 12, 2019

A Privacy Regulation Applying in the USA and Canada (and Elsewhere)

Although the General Data Protection Regulation (GDPR)* was passed more than three years ago and entered force more than one year ago, there is still confusion and misconception about the regulation’s sphere of impact.

In a new series of posts, I will share some advice** drawn from experience in projects that I have conducted since the inception of the regulation. This post is meant to reemphasize the territorial impact (pls. also see GDPR - Repeated Misconceptions).

GDPR is a regulation to protect personal data of individuals (“data subjects”) who are physically present – and may it only be temporarily - in the EEA (European Economic Area – which includes all EU member states plus Iceland, Liechtenstein and Norway). 

Protection rights are not dependent on the data subjects’ citizenship or residency!

All organizations worldwide that process personal data of data subjects being in the EEA are obligated to comply with the GDPR.

Organizations established in the EEA have additional obligations: They must comply with GDPR regardless of where in the world data subjects are located.

The following matrix shows under which conditions GDPR applies:

Organizations established in the USA or Canada (or elsewhere) may still believe that GDPR does not apply to them, if they do not have any ”EU customers". Again, the decisive factor is whether their data subjects with any nationality are within the borders of the EEA. And in a world of international trade and high mobility of individuals, organizations can obviously not control the data subjects’ whereabouts while a business transaction takes place or thereafter. (Even data subjects’ IP addresses captured during electronic communications do not prove presence in or absence from a certain territory since individuals can legitimately – and will increasingly – hide their real geographical location by employing Virtual Private Networks.)

This been said, any medium or large organization worldwide should presume, by default, that GDPR applies to them. On the complementary, only businesses that operate with limited geographical, commercial and technical outreach could be exempt from GDPR compliance, e.g. a brick-and-mortar shop that provides all goods / services on the spot without processing personal data and in direct exchange for cash.

* Official publication of GDPR at
** Legal disclaimer: This blog post is not intended to be legal advice, but to raise awareness that it is recommended to consult a lawyer.

Wednesday, November 29, 2017

Value of a Data Management Office

Inspired by a LinkedIn discussion here, below I respond to the two initial questions that Dylan Jones (@DylanJonesUK) posed:

Question 1: What is the value of a data management office and why do we need one?

To state the obvious first: People with an "office job" exclusively process data (emails, telephone calls (audio data), electronic documents, paper documents, personal communication with co-workers etc.). Presuming that the employing organization is profitable, (in a simplified view) the total value of the data processed by those employees must be higher than the total of their salaries.

To see how other corporate assets are typically managed in an organization, let's look at Finances (summarizing everything that is included in a balance sheet) and People. They find their organizational representation in departments for Finance (usually headed by a CFO) and Human Resources (usually headed by a CHRO). Notwithstanding that any business department manages its particular financial targets / budgets as well as its employees, on the corporate level the departments Finance and Human Resources fulfill a central role which includes the following tasks (as mentioned in my post "Pondering on Data and CDO (Chief Data Officer)"):

"In their respective realm, Finance and Human Resources a.o.
  • Develop corporate target scenarios and related strategies
  • Ensure that the organization follows legal and regulatory obligations
  • Advise business departments regarding strategic and legal aspects
  • Perform tasks that are not assigned to the department level, but to the corporate level (e.g. declare taxes, report to regulatory authorities, compose the balance sheet, negotiate with the workers' union)
  • Provide standard templates / procedures that operational departments can / must apply (e.g. standardize expense reports)

Since any item of the above list is abstractly applicable to the resource Data, I suggest that medium and large enterprises implement a central unit headed by a CDO (Chief Data Officer) who directly reports to the CEO."

More precisely, applying the above to the resource Data, the central Data Management Office's tasks include e.g.
  • Develop a High-Level Enterprise Information Management Map (also see my post here)
  • Ensure that the organization follows worldwide-applicable regulations such as the GDPR (General Data Protection Regulation) and industry-specific regulations such as HIPAA, Solvency II, Basel III etc.
  • Derive measures that respond to international, national and corporate requirements of Data Governance and advise business areas accordingly
  • Develop a detailed data model for the intersection of business areas (Master Data)
  • Conceive standard interfaces for Master Data Management and related hubs
  • Build a corporate Business Data Dictionary

Question 2: What are the pros and cons of having a centralised DMO versus separate DMOs per business area?

Central and decentral Data Management Offices are not mutually exclusive, but should complement each other in a collaborative climate (following the principle "Decentralize as much as possible, centralize as much as necessary"). While the obligations of the central DMO are mentioned above, each business area ought to have its separate DMO with Subject Matter Experts / Data Stewards representing their realms and performing tasks such as:
  • Develop a business-area-specific data model that details the High-Level Enterprise Information Management Map
  • Contribute to the corporate Business Data Dictionary
  • Enforce the rules of Data Governance in their respective business areas

Saturday, October 14, 2017

Some Basic Recommendations for Data Quality

Inspired by the initiative of Prash Chandramohan (@mdmgeek) here, below please find some basic notes and recommendations for Data Quality.

1. Create a business data model while limiting its scope to data which
  • You are legally entitled to collect
  • Have a clear business purpose
  • Have a purpose that you can explain to the respective target group (customers, employees, suppliers etc.)
while avoiding to re-create entities / attributes that are rightfully already defined within the organization.

2. Define all business metadata regarding
  • Their (business) meaning
  • Format (length, data type)
  • Nullability
  • Range of values (where meaningful and possible).

3. Define use cases and related rules that serve a purpose-specific data quality.

4. As much as meaningful / possible: In business processes, programmatically
  • Enforce the rules for business data (quality)
  • At least, suggest a use-case-specific selection of values.

5. Educate business staff according to their role and responsibility in business processes about the purpose / use cases of data, in particular about the impact of
  • Their choice of values when creating or updating data
  • Deleting data.

6. Monitor the quality of data on a regular basis while applying / interpreting (use-case-specific) rules, e.g. using the Friday Afternoon Measurement (even if it's not Friday!).

7. Provide feedback to business staff and / or business analysts.

Monday, September 25, 2017

GDPR - Repeated Misconceptions

A not small number of articles and comments repeatedly and inaccurately conveys the notion that the GDPR (General Data Protection Regulation) only applies to organizations
  • processing PII* 
  • of prospects / clients 
  • who are EU citizens.

Let's keep it simple:
Every organization worldwide
needs to be GDPR-compliant!
Too simple? No - unless an organization's business mission is to never be in contact with anyone that breathes the air of the European Union, the provisions of the GDPR apply - whereas more precisely
  • "Be in contact" means holding / processing Personal Data** (which accordingly is a superset of PII) about that "anyone",
  • "Anyone" means a natural person in any role, e.g. being or representing a prospect, customer, job applicant, employee, cooperation partner, supplier, ...,
  • "Breathes the air of the European Union" means physically is within the borders of the EEA*** regardless of their citizenship.

This been said, I believe it is safe to assume that excluding contacts with the EEA is not a sustainable business model. Moreover, a business does not necessarily have any control over e.g. its customers' choices where they may temporarily or permanently be (see also my post here).

Complementarily phrased: An organization is exempted from the provisions of the GDPR only if its business is established outside of the EEA, is local by nature and does not process any data relating to natural persons with whom it is in contact for business reasons.

* PII (commonly: "personally identifiable information", but more precisely: "person-identifying information")

** Personal Data ... means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person [GDPR Art. 4 (1)].

*** EEA (European Economic Area / Since July 20th, 2018, the territorial scope of the GDPR includes Iceland, Liechtenstein and Norway - in addition to the EU member states.)

[Legal disclaimer: This blog post is not intended to be legal advice, but to raise awareness that it is recommended to consult a lawyer.]

Sunday, August 20, 2017

GDPR & Personal Data - Context is Key and (Foreign) Key is Context

A logical data model is one of the important milestones on the road to GDPR (General Data Protection Regulation) compliance. Being the blueprint of an organization's semantic data and the relationships among them, the logical data model serves as the virtual hub between the existing physical data stores and the future implementation of a GDPR-compliant data architecture.

The logical data model even offers a GDPR-related bonus, as it teaches that being 'personal' (or non-'personal') is not an absolute characteristic of data, but depends on the context in which these data are made available.

To illustrate the latter, let's look at an example of a logical data model which presumably represents the business of a B2C online retailer. This model may have been obtained as the result of the process described in my previous post "GDPR - How to Discover and Document Personal Data" or through any other modeling approach.

Click to enlarge

Which of these tables contain records with personal data?  As per the definition of 'personal data' imposed by the GDPR ('personal data' means any information relating to an identified or identifiable natural person), the answer is: All of them! 

Why? Because all tables are 'related' to the table 'person', i.e. there is a path from each table to 'person' (and vice versa).

This does not mean that all records of all tables shown here contain personal data, but those records that can be reached through a chain of foreign-key-value to primary-key-value links (or vice versa) from a 'person id' or to a 'person id'.

In other words, the existence of relationships (foreign keys) provides the context that categorizes records of data as 'personal' or 'non-personal'. For example, if we isolate the table 'address', its content simply constitutes a list of addresses which may exist in public reference databases such as Google Maps and therefore cannot be considered to contain personal data. But in the context shown in the above model, those records of the table 'address' that are identified by the value of the foreign key 'residential address id' in table 'person' (or by values of the foreign keys 'delivery address id' and 'billing address id' in table 'order') become personal data.

Still, the necessity and degree to protect personal data may vary from table to table and from column to column. The sensitivity of personal data must be evaluated, and the risk of processing personal data with respect to the rights and freedoms of natural persons must be assessed. Sensitivity and processing risk for each personal data element in isolation, but more importantly for their combination and in context will influence the physical design of data stores including measures of encrypting, pseudonymizing and anonymizing personal data to achieve GDPR compliance. But that will be subject to another post...

Wednesday, August 16, 2017

GDPR - How to Discover and Document Personal Data

One of the first steps for organizations on the journey to GDPR compliance is to find out what 'personal data' (i.e. any information relating to an identified or identifiable natural person) are stored where. For many organizations, this can be a tedious, cumbersome process, since very often the complete 'list' of all metadata describing personal data is not at hand right from the start. Making matters more complex, personal data's metadata (like any metadata) may be found under a variety of synonyms in different data stores. 

To streamline the process for data discovery as much as possible, I suggest a sequence of 5 steps which may need to be repeated several times. With each pass, additional personal data and/or their locations may be discovered based on the names of columns / fields added in a previous iteration. The process can be stopped once a consolidated, structurally sound logical data model has been obtained. 

Click to enlarge
The steps include:
  1. Create an inventory of all data stores. Record their name, purpose and  physical location (device type, country!). Important: Include locations where potential 'processors' (contractors) store business data on behalf of the 'controlling' organization! 
  2. Select (subset of) data stores that are already known to contain personal data. (In a first iteration, start searching data stores using typical metadata of personal data! In later iterations, search data stores using additional metadata of personal data based on the logical data model previously created (see step 5).) 
  3. Capture / reverse engineer the physical model of the selected data stores.
  4. For each selected data store, identify metadata (field names) of personal data and of objects relating to personal data. Assign business meaning to those fields by linking them to semantic items from your business data dictionary. (If you do not have a business data dictionary, create one in parallel by using existing documentation and involving subject matter experts!) 
  5. Create / enrich (partial) logical data model using the business data dictionary.
Although this is only the beginning of the journey, professional data (and process) modeling tools are obviously necessary on the road to GDPR compliance. (Note: All red arrows in the above image do not only indicate step sequence, but ought to also represent links among the related artifacts in the modeling tools' metadata repository.) Having already a business data dictionary in place and/or logical and physical data models tool-documented will greatly facilitate the process.

Stay tuned and read part 2 "GDPR & Personal Data - Context is Key and (Foreign) Key is Context" where I will demonstrate how context is important to determine whether data are to be considered personal or not with respect to the GDPR.