Wednesday, October 6, 2021

The Importance of Data Literacy

Since EVERY person performing tasks at a (virtual or real) desk for professional reasons works with data, Data Literacy is a basic qualification for EVERYONE with an “office job” and not only required for Data Analysts, Data Engineers, Data Scientists etc. and their managers. (Note: ... as reading and writing are basic qualifications for everyone, not only for those who want to pursue an academical career.)

This been said, I suggest the following definition:
 
Data Literacy is the ability of an individual or a group within their rights and responsibilities in an organization to
  • understand the definition and meaning of data
  • interpret data in their respective context
  • apply data terms correctly and communicate clearly and concisely about data to other individuals or groups inside and outside of the organization
  • select, extract, compose, transform, create, and delete data under corporate rules & standards
  • judge the quality of data for the potential impact on the purpose of subsequent processes in the organization.
In brief, Data Literacy is “knowing what (data) you are talking about, what you are doing with those data and why”.
 
A Data Literacy program must emphasize educating and sensibilizing particularly rank and file employees in business units who lay the "data groundwork" by collecting data and entering them into the respective systems, often from (unstructured) sources outside the organization such as online forms, letters, emails, messages, phone calls etc.
 
Organizations need to train employees based on 
  • a Business Data Glossary related to the subject area Data Model that corresponds to the employee’s respective responsibility
  • a subject-area / job-task-related Process Model with its technical and organizational measures showing
    • the potentially available sources of data
    • what to do with those data
    • the potential recipients of data inside and outside the organization
and make those artifacts available for reference.
 

Sunday, September 26, 2021

Managing Business-Driven Data Models

In my previous post "Managing Data Models" I emphasized the Logical Enterprise Model as the center piece of model management.

Since the Logical Enterprise Model builds over time, a CDO suggested to me that she would like to integrate a “Corporate Map” into the model management process. This map would be developed in moderated sessions with C-suite members to provide guidance to corporate leaders and stakeholders how business areas are related, which Data Strategy to choose and which projects to prioritize. 

Our conversation brought us quickly to what I introduced as an “Enterprise Information Management Map” in my past posts using an example for the insurance industry:
 
 
Enterprise Information Management Map

Click to enlarge

 
Adding the development of an Enterprise Information Management Map to the model management process for a more business-driven top-down approach:
 
Business-Driven Data Model Management

Click to enlarge




 

Wednesday, September 8, 2021

Managing Data Models


A functioning data modeling environment could look like this:
 
 
Too complex? Not if you consider that most organizations need to maintain legacy systems in parallel to developing new applications – and both in a flexible or even agile manner.
 
Though delimitations between model levels may vary among readers, this image suggests the principle steps and resulting artifacts to proceed from institutional knowledge in  business departments to structures of files and databases.
 
While I may revisit this approach in future discussions and explanations, I will restrict myself here to a few key comments prioritized by recent discussions with clients.
 
1. First and foremost, using a structure like this effectively and efficiently obviously necessitates a professional tool set of well connected components. (Though I am biased* regarding the choice of tools, the approach outlined here should not impose a particular technical solution.)
 
2. In overcoming departmental silos, the Logical Enterprise Model plays a central role connecting business attributes with potentially multiple physical occurrences in different application systems and databases (Data Catalog). As opposed to the (failing) approach of creating a “complete” enterprise data model first, I suggest to build the Logical Enterprise Model by integrating partial Logical Data Models from ongoing business projects. Due to its central role, this integration process and the maintenance of that model needs to be assigned to a central BUSINESS unit, e.g. the Chief Data Office. Since naming and definitions are a mandatory part of business data modeling, a Business Glossary is an implicit component of the Logical Enterprise Model
 
3. It is not uncommon that organizations operate production databases which were developed in another (incompatible) project environment 20 or more years ago, i.e. those legacy databases may be insufficiently documented and/or the last knowledgeable programmer / database administrator has meanwhile left the organization. To simply shut down those databases may not on be an option, if they still support valuable business processes, at least not without first migrating the enclosed data to a new storage system. Both maintaining or migrating a database require to know its structure (tables, columns, keys, etc.). Fortunately, professional data modeling tools support “reverse engineering”, i.e. they can recreate the physical data model usually directly from a database.
 
However, the reverse engineering process may leave expectant users with frustration as cryptic, abbreviated physical names in databases are common and do not reveal the semantics necessary to understand and document the purpose of tables and columns. Also, the sheer number of tables can be overwhelming.
 
The best strategy “forward” to identify the relevant submodel (“Physical Business Data Model”) is to find the meaning of core tables (i.e. tables that have multiple relationships) and from the subset of interest to navigate to neighboring objects.
 
The necessary business semantics to enrich the model with logical names and definitions can be obtained
  • as part of the reverse engineering process if the developers of the database used in-line documentation features (as e.g. supported by Oracle or SQL Server).
  • from database-external sources (e.g. ERP systems often hold the necessary business information in a separate dictionary).
In absence of these tool-supported options, the reverse engineering approach is limited to tap some business users' knowledge to semanticize the reversed model. While this can be a tedious process including trial and error, it is inevitable to do it, if neither shutting down the database nor continuing to operate it as-is are viable options. Legal obligations (e.g. to comply with applicable data privacy laws) may cause additional pressure to follow this inconvenient, but necessary path to reduce technical debt.
 
I welcome your questions and comments.
 
[* In the spirit of full disclosure: I represent Grandite, the supplier of the SILVERRUN Business Architecture Tools]