This is a question I am frequently asked, as data modeling seems to be
associated to SQL databases while there is no support (available,
possible, necessary?) for so called NoSQL databases.
To put a possible short answer into perspective, I like to include some preliminary thoughts.
I strongly believe that (business) data modeling is the single most
important discipline for any medium and large organization. To proceed
effectively and efficiently, to acknowledge the principles of data
governance, master data management, data quality etc., each organization
should have one standardized method to create all
their conceptual and logical data models and employ a data modeling tool
that allows them to link and manage all their models under one hood. I
think it's safe to say that, over the past decades and over all
organizations that have practiced data modeling, relational modeling
has proven to be the most relevant and viable method to represent
business data models.
On the other hand, given that most medium and large organizations
cannot avoid to deal with multiple software suppliers, given that
requirements regarding data volume, velocity, views, human-machine
interfaces etc. each may dictate (or at least strongly support) a
particular software solution, organizations find themselves exposed to a
myriad of storage technologies which include SQL, NoSQL, NewSQL (and
whatever-type of) databases.
Despite the attempt of non-RDBMS suppliers to differentiate
themselves from the competition by introducing new jargon,
apparently all storage technologies are based on the concepts of database, table and column, i.e.
the physical models of RDBMS- as well as non-RDBMS-related storages can
be obtained by denormalizing logical models. An organization's data
modeling tool of choice should allow to derive the physical model for
the respective application and storage technology from the existing
business data model while maintaining the link between the objects
(tables, columns) of the business model and the physical storage model.
Doing so does not only serve application development, but constitutes an
important measure to ensure that the organization is in control of
definitions, lineage, usage etc. of all their data elements in order to
suffice legal obligations, achieve regulatory compliance or in general
to be able to flexibly (re)act if business and/or technical requirements
change.
As long as storage technologies can be reduced to the three-level container principle database-table-column
(and potentially other common constructs such as indexes), the
(denormalized) physical models already include the primary ingredients
for the related DDL. The employed data modeling tool should be
extensible to support generating the CREATE / DROP statements (or their
equivalents) for the respective target storage system from a physical
model. (E.g. the current version of Grandite's SILVERRUN modeling tools also offers to generate the DDL for Cassandra and Neo4j.)
If you like to discuss this more in detail or challenge me on the
above, please be invited to comment here or to contact me by email (axel
. troike [at] grandite . com)
[In the spirit of full disclosure: I represent Grandite, a supplier of data modeling tools]
No comments:
Post a Comment