One of the first steps for organizations on the journey to GDPR compliance is to find out what 'personal data' (i.e. any information relating to an identified or identifiable natural person) are stored where. For many organizations, this can be a tedious, cumbersome process, since very often the complete 'list' of all metadata describing personal data is not at hand right from the start. Making matters more complex, personal data's metadata (like any metadata) may be found under a variety of synonyms in different data stores.
To streamline the process for data discovery as much as possible, I suggest a sequence of 5 steps which may need to be repeated several times. With each pass, additional personal data and/or their locations may be discovered based on the names of columns / fields added in a previous iteration. The process can be stopped once a consolidated, structurally sound logical data model has been obtained.
|Click to enlarge|
- Create an inventory of all data stores. Record their name, purpose and physical location (device type, country!). Important: Include locations where potential 'processors' (contractors) store business data on behalf of the 'controlling' organization!
- Select (subset of) data stores that are already known to contain personal data. (In a first iteration, start searching data stores using typical metadata of personal data! In later iterations, search data stores using additional metadata of personal data based on the logical data model previously created (see step 5).)
- Capture / reverse engineer the physical model of the selected data stores.
- For each selected data store, identify metadata (field names) of personal data and of objects relating to personal data. Assign business meaning to those fields by linking them to semantic items from your business data dictionary. (If you do not have a business data dictionary, create one in parallel by using existing documentation and involving subject matter experts!)
- Create / enrich (partial) logical data model using the business data dictionary.
Although this is only the beginning of the journey, professional data (and process) modeling tools are obviously necessary on the road to GDPR compliance. (Note: All red arrows in the above image do not only indicate step sequence, but ought to also represent links among the related artifacts in the modeling tools' metadata repository.) Having already a business data dictionary in place and/or logical and physical data models tool-documented will greatly facilitate the process.
Stay tuned and read part 2 "GDPR & Personal Data - Context is Key and (Foreign) Key is Context" where I will demonstrate how context is important to determine whether data are to be considered personal or not with respect to the GDPR.