Data Demythed

Debunking the "conventional wisdom" of data modelling, normalization, etc.

Data Modelling

Normalize for Better Code

Normalization, Programming

My second early-career formative experience came while spending a week advising an organisation on a data strategy. To illustrate normalization, we looked at a small, self-contained project which they had in the design phase. Their data model had 5-6 tables. Over a couple of days of discussion, we restructured it into around 20 tables.

At the end they acknowledged the reasons for all the tables, while still being somewhat perplexed at the idea of so many tables for such a small project.

Some months later I was talking to someone from the organisation, and asked about that project. The person had some interesting observations.

The first was that they had reservations about trying something that was unfamiliar to them. But they had decided it was a manageable risk for that small project so they went with the 20-odd tables.

The second was that for the first couple of weeks the programmers grumbled about all the tables they had to learn their way around. This didn’t surprise me: I contend the main reason - e.g., “stop at 3NF”, “denormalize for efficiency”, … - that poor models are used is because models reflect what programmers will tolerate, instead of what the data represents. And programmers generally think, wrongly (per the observations that follow), that fewer tables are somehow better for them.

The third was that, after a couple of weeks, grumbling from the programmers shifted: the programming was too simple and straightforward! What a query or update function needed to do was clear and obvious. There were none of the “special cases” that require extra, “interesting” programming, because normalization had worked those out of the data model. Bugs were easier to resolve because it was more obvious where to find and fix them.

The fourth, and final, observation gave it a “bottom line”: the project went into production ahead of schedule, under budget, and with the lowest bug rate that they had experienced in a project. In addition, they even included functionality that they had expected to be part of a “phase 2!”

Who doesn’t want all of that??

As valuable as my first formative experience had been, this one was much more potent. And became the essence for how I approached data analysis and modelling, and why, for the rest of my career.