Relational Database Design & Normalization

Relational Database Design

Whenever you are working with a relational database management system (RDBMS), the first step in creating and using a database is to establish the database’s structure (also called the database schema). Database design, aka data modeling, is crucial for successful long-term management of information. Using a process called normalization, you carefully eliminate redundancies and other problems that will undermine the integrity of your database.

Database Normalization

Normalization was developed by an IBM researcher named E.F. Codd in the early 1970s (he also invented the relational database). A relational database is merely a collection of data, organized in a particular manner, and Dr. Codd created a series of rules called normal forms that help define that organization. There are many forms for database normalization however the first three of the normal forms are sufficient for most database designs. Before you begin normalizing your database, you must define the role of the application being developed. Whether it means that you thoroughly discuss the subject with a client or figure it out for yourself, understanding how the information will be accessed dictates the modeling. Thus, this process will require paper and pen rather than the database software.

One of the best ways to determine what information should be stored in a database is to think about what questions will be asked of the database and what data would be included in the answers.

Keys

Keys are integral part of normalized databases. There are two types of keys: primary and foreign. A primary key is a unique identifier that has to
abide by certain rules. They must

Always have a value (they cannot be NULL)
Have a value that remains the same (never changes)
Have a unique value for each record in a table

The best real-world example of a primary key is the U.S. Social Security number: each individual has a unique Social Security number, and that number never changes. Just as the Social Security number is an artificial construct used to identify people, you’ll frequently find creating an arbitrary primary key for each table to be the best design practice.

The second type of key is a foreign key. Foreign keys are the representation in Table B of the primary key from Table A. If you have a shopping database with a customer table and a products table, the primary key from customers would be linked as a foreign key in products. Forgeing keys help to maintain relations between database tables.

Note: As a rule of thumb, it is better to name the primary keys using at least part of the table’s name (e.g. customer) and the word ’id’ to name it customer_id. Some database developers like to add the abbreviation ’pk’ to the name as well. Some RDBMS allows for only one primary key per table such as MySQL. Ideally, your primary key should always be an integer, which results in better performance.