Database Normalization Part 001

Filed Under (Visual FoxPro, work.BLOG) by WildFire on 15-09-2004

This is a screenshot of the prototype of a new project of ours.

To a give you a short overview it is a database program that monitors the children records in a certain place where parents can 'safely' leave their children so they can shop with comfort and peace of mind.

These children will then unite with the other 'temporarily abandoned' children in that place to battle the forces of evil and squash the invading twelve-headed-iodine-squirting aliens while their parents comfortably shop for shoes and perfumes.

(How I wish someone would come up with the idea for something like this where instead of children, the husbands are the main clients while their lovely wives do the shopping thing... (I would even wholeheartedly create a database program for free if that business pops up))

Some of these children obviously do not win in this enormous battle so a database program is needed to monitor who came back 'alive' and who came back 'replaced'. The TimeIn and TimeOut records of these brave kids are necessary to compute the wormhole coordinates use for their battle portals.

Let us leave those alien battle-invasion details for now and proceed.

If you can remember I posted this yesterday:

If you are to choose between the second level of data normalization or an easier usable interface, what would you prioritize?

Going back to the prototype screenshot above, you see a very simple interface. There are textboxes and combo boxes for the regular information this application needs: ChildIDNO, ChildName, Nickname, Birthdate, FatherName, MotherName, Guardian, ParentsAddress, ParentsPhoneNumber to name a few.

Easy flowing, right?

Probably yes... IF you're creating something and don't give an aliens arse after you get paid. If you're one of those irresponsible 'database dudes' out there who deserve to be devoured by sex-cravin'-tentacle-infested aliens.

If you choose the easy path you'll get a lame database that looks like this (let's just select the important fields):

_KIDS DATABASE
ChildIDNO ChildName Father Mother Etc...
0000001 X FatherofX+Y MotherofX+Y  
0000002 Y (X's bro) FatherofX+Y MotherofX+Y  
0000003 Z FatherofZ MotherofZ  

That is even the picture in the best case scenario. Re-entering the parent data again for another kid with the same parents would yield possible errors... extra spaces and more open for typo-related errors.

For example: Foo, John R.
Can be entered as: Foo, John
  Foo, John R
  Foo, John _R. (an extra space)
  Foo, Jonh R. (and countless versions of typos)

If Child_001 and Child_002 are even encoded to the database a number of days apart, the possibility of an error to occur increases. (Let's say after a month when the second child, learning that one of his sibling was abducted, pledges to go on in a quest to rescue his big brother from aliens.)

(By the way... one of the target readers for these blogs of mine are the ComSci/IT students of our country who needs 'database enlightenment'; for those who have been experts of this field already, you are free to move your arse out of here (for now) and read other more advanced stuff instead... probably about Source Control HowTo or How windows exploit the hyperthreading technology.)

Now if you'll look back to the fields above, ParentID, FatherName and MotherName, ParentAddress, ParentsPhone and the other Parents-related information are redundant. A definite no no in the area of database structure planning.

So the first 'level' of improvement on this is to create a separate database for Kids and a different database for Parents. Something like this:

_KIDS DATABASE
ChildIDNO ChildName Etc...    
0000001 X      
0000002 Y (X's bro)      
0000003 Z      
_PARENTS DATABASE
ChildIDNO Father Mother Etc...  
0000001 FatherofX MotherofX    
0000002 FatherofX MotherofX    
0000003 FatherofZ MotherofZ    

The KIDS database holds fields that are related to... well kids and the PARENTS database for PARENTS-related information such as contact number and the likes.

Now... is that enough improvement and normalization already..?

While it is better than the first table it is not that good enough yet. You can still see redundant records in the PARENTS table.

You can improve this further by creating another field PARENTS_ID on the KIDS table linked to the same field in the PARENTS table.

_KIDS DATABASE
ChildIDNO ChildName Parents_ID Etc...  
0000001 X 0000001    
0000002 Y (X's bro) 0000001    
0000003 Z 0000002    
_PARENTS DATABASE
ChildIDNO Parents_ID Father Mother Etc...
0000001 0000001 FatherofX MotherofX  
0000003 0000002 FatherofZ MotherofZ  

Now is this fine already..?

Better than the first and second approaches but you can still improve things more.

Also... notice that Child_002 disappeared in this scheme.

But I'll end things here for now and we'll continue this series on another post which will include the solution.

I also have to tackle yet the issue I posted above, the issue between Data Normalization and a more usable interface. That too on the continuation of this post.