Just like with any design – whether graphical or in programming – you’ve really got to make yourself a roadmap to be successful. Databases aren’t that different. After all, just like with graphics design, they fall into information architecture. That is, the most successful databases are the ones which effectively organize your information.
In this post, I will discuss relationships between information, as well as how to effectively normalize your database so that each table avoids redundancy between each datum. Before we begun, let’s define a few key terms!
Vocabulary for the day!
E-R Model: That is, the entity-relationship model which shows the relationship between entities. For example, M cats have relationship with N cities in which they live. Let’s explore some terminology specific to the E-R model:
One-to-One Relationship: This means that one datum is related to one datum, such as if Spunky were the only cat to live on First Street.
One-to-Many Relationship: This relationship would mean that one entity is related to many other entities. An example of this might be that Tom has 5 cats. So, Tom > Owns > 5 Cats
Many-to-Many Relationships: This means that the relationship involves more than one object relating to a group of other objects. That is, Spunky and Boots both attend 5 different stores in his home town, so we can visualize this as Spunk + Spot > Visit > Petsland, Pets More, Pets Mall, Catland, KittenKorner.
Normalization
Normalization is the process of tabulating data from our real-world problem into several simple tables for our relational database. This is important because it allows the database to be managed much easier. In my previous post, I touched on labeling a primary key for our database. The primary key is our identifier that allows us to find specific data because it is unique to each field – no two fields will have the same primary key.
The unnormalized form of our table will be one in which items appear more than once, thus we would see quite a bit of redundancy. Obviously, this makes it very difficult to manage the information. I mean, if we wanted to change how many Domestic Shorthair cats were in our database, we would have to scan each line and row for any that have Domestic Shorthairs rather than go to one specific line and edit the field there. This is why we have to organize the data!
From our unnormailzed form, we generally move into what is referred to as the first normal form. This is usually a simple, two-dimensional table resulting from a division of the original data into one or two tables (or more!) But we aren’t done – this table will have to also be divided so that no items will appear more than once.
From the first normal form, we then would generally move into the second normal form. This refers to a table in which we can now use our primary key to identify data. That is, our unique key can determine the values in the other columns. This is referred to being functionally dependent – meaning that the primary key value determines the values in other columns. Part of a functionally dependent operation would the state of our data being transitively dependent – that is, if a value indirectly determines the values in other columns. If we have any transitively dependent values, we will have to do another division of our data.
If our second form still has unrelated data still being determined by data which is not a primary key, we would then do a third division – which would be our third normal form. Basically, we are wanting our values to be determined by one very unique value so that it is easy for us to find and edit information – the entire point of having a database in the first place.
So, by this point you should be able to understand that databases need to be simplified. Even if you have many different tables, the major point is to have your data simplified so that only a few values are actually on it but that interlink into the big picture.



