Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
User Journal

einhverfr's Journal: Good Normalization, Bad Normalization 4

Journal by einhverfr

I have recently been in a discussion on the LedgerSMB development list over the value of normalization of our database and where I feel that we need to go. This lead to a fairly heated debate between those of us who work on the database back-end in part because a lot of databases are sometimes normalized in ways which are not practical. My view, which I still hold, is that normalization is always a good thing if done properly, but that sometimes people try to do recipe-book normalization without understanding the process and create horrible messes.

For people who don't know what normalization of a relational database schema involves, this is the process by which data dependencies are analyzed and tables broken down in such a way as to duplicate as little data as possible. As one progresses through the normal forms (First Normal Form, Second Normal Form, Third Normal Form, Boyce/Codd Normal Form, Forth Normal Form, and Fifth Normal Form) one breaks up tables into smaller entities so that single facts are stored in one place only (and therefore are less likely to get out of date). Just as good object-oriented design breaks up data structures along lines on dependencies of program flow (so that proper encapsulation is possible), good relational design breaks up data relations along the lines of dependencies on data (so that central data management is possible).

In my view, relational data modelling is a fairly straight-forward mathematical exercise. And in general, normalizing to third normal form does not cause people to make too many mistakes. However, further normalization is fraught with hazards because many database engineers try to apply some method of arriving at a model of data which does not take into account its own structure. In many cases (key-value modelling, for example) these approaches are foreign to the relational model and hence result in problems. I call these messes "misnormalized databases." In most of these cases, though, the best option is to avoid such ideas and stick with table structures that best represent the structure and dependencies of your data.

When one progresses beyond third normal form, one of the problems that one encounters is the fact that it is not possible to know how normalized a table is without knowing how the table is used and what real-world constraints are on the data (data may exist independent of any given program flow, but it does not exist independent of its use). In general the places where one runs into trouble are:

  1. Tables where the information in them is no longer sufficient to derive any authoritative statement from the record and those records noted via foreign keys specifically in that table.
  2. Tables normalized in a way which makes it more difficult to enforce real-world data constraints (this usually means you are missing real data dependencies).
  3. Tables normalized for the sole purpose of reducing duplicate information without addressing internal data semantics.

Additionally, relational modeling allows for easier enforcement of data constraints in data not subject to prior data structure constraints (accounting data in particular does not allow for all constraints to be specified in relational terms in the current generation of RDBMS's). Hence things like addresses, location data, and customer information are more easily stored and checked than financial data (one area where I see room for improvement in RDMBS's).

In general, normalization is always a win if it is done well. If it is done badly without proper analysis of real world data constraints, you end up with a mess.

This discussion has been archived. No new comments can be posted.

Good Normalization, Bad Normalization

Comments Filter:
  • When one progresses beyond third normal form
    Can you give me an example where you think it's appropriate to progress beyond 3NF? Which part of LedgerSMB its database has been normalized further than that?

    Oh and keep up the good work :-)
    • by einhverfr (238914)
      In the areas of the database I would like to see further normalized. There are some open questions about addresses which have not been addressed or fully reviewed/discussed. In short addresses probably could be more normalized but it is not clear yet how this should be done. My guess is that we will see a gradual normalization in this area as data dependencies become more fully understood and addressed. Part of the problem with addresses is that there are a lot of corner cases which need to be consider
      • Heh yeah, addresses. It sounds simple until you try to gather _all) requirements. I've seen the way the Oracle eBusiness suite stores addresses. It's about 20 tables or so. :-) Thanks for the answer by the way.
    • by einhverfr (238914)
      Sorry, forgot to give an example of something where one should go beyond 3NF.

      The basic issue is that normal forms tend to handle questions of atomic facts-- if a row isn't an atomic fact, the table structure can be broken up into smaller tables.

      Here is an example of 3NF which should probably be considered to be separate:

      CREATE TABLE street_address (
      address text,
      city text,
      state text,
      country text,
      prim

PLUG IT IN!!!

Working...