Database Introduction

Databases are involved in almost all facets and activities of our daily lives: from school, to work, to medical care, government, nonprofit organizations, and houses of worship. In today's world, data is ubiquitous (abundant, global, everywhere) and pervasive (unescapable, prevalent, persistent). From birth to death, we generate and consume data, starting with a birth certificate and continuing throughout our lives.

Data is not only ubiquitous and pervasive, it is essential for organizations to survive and prosper. Telecommunications companies, such as Sprint and AT&T, process data on trillions of phone calls with new data being added at speeds up to 70,000 calls per second. Google responds to over 91 million searches per day across several terabytes of data, with results available almost instantly.

What is a Database?

A database is a shared, integrated computer structure that stores a collection of:

The metadata describes the data characteristics and the set of relationships that links the data found within the database. For example, the metadata component stores information such as the name of each data element, the type of values (numeric, dates, or text) stored on each data element, and whether the data element can be left empty. Because of this, a database is often described as a "collection of self-describing data."

Databases are specialized structures that allow computer-based systems to store, manage, and retrieve data very quickly. Virtually all modern business systems rely on databases.

Data versus Information

To understand what drives database design, you must understand the difference between data and information.

Data consists of raw facts. The word "raw" indicates that the facts have not yet been processed to reveal their meaning. For example, a list of faculty names, departments, and employment types is raw data.

Information is the result of processing raw data to reveal its meaning. Information requires context. For example, an average temperature reading of 105 degrees does not mean much unless you also know its context: Is this in degrees Fahrenheit or Celsius? Is this a machine temperature, a body temperature, or an outside air temperature?

Knowledge is the body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.

Data → Information → Knowledge
Data:    100, 98, 102, 105, 99 (raw numbers)

Information: Average temperature = 100.8 degrees
            (requires context: Fahrenheit, body temperature)

Knowledge: Understanding that 100.8°F indicates a fever
           and requires medical attention

Key Points:

Why Use Databases?

Databases provide several important advantages over traditional file systems:

Types of Databases

There are several types of database systems:

1. Relational Databases (RDBMS)

Data is organized in tables (relations) with rows and columns. Tables can be linked through relationships using keys. Examples include MySQL, PostgreSQL, Oracle, and SQL Server.

2. NoSQL Databases

Non-relational databases designed for specific data models and scaling needs. Types include:

3. Object-Oriented Databases

Store data as objects, similar to object-oriented programming concepts.

4. Hierarchical Databases

Data is organized in a tree-like structure with parent-child relationships.

5. Network Databases

Similar to hierarchical but allows multiple parent-child relationships.

Key Database Concepts

Database Management System (DBMS)

A Database Management System (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database. The DBMS serves as the intermediary between the user and the database. It presents the end user with a single, integrated view of the data in the database and hides much of the database's internal complexity.

DBMS Functions

A DBMS performs several important functions that guarantee the integrity and consistency of data:

Tables

Tables are the fundamental storage units in relational databases. A table consists of:

Primary Key

A primary key is a unique identifier for each row in a table. It ensures that each record can be uniquely identified and cannot be duplicated.

Foreign Key

A foreign key is a column that references the primary key of another table, establishing relationships between tables.

Problems with File System Data Processing

Understanding file system limitations helps explain why databases are necessary. File systems suffer from several critical problems:

1. Structural and Data Dependence

Structural dependence means that access to a file is dependent on its structure. Adding a field to a file structure requires modifying all programs that use that file. Structural independence exists when you can change the file structure without affecting the application's ability to access the data.

Data dependence means that data representation and manipulation are dependent on physical data storage characteristics. Changing a field from integer to decimal requires changes in all programs that access the file. Data independence exists when you can change data storage characteristics without affecting the program's ability to access the data.

2. Data Redundancy

Data redundancy exists when the same data is stored unnecessarily at different places. This creates:

3. Data Anomalies

Data redundancy leads to data anomalies - abnormalities in the database:

The Database System Environment

A database system refers to an organization of components that define and regulate the collection, storage, management, and use of data. The database system is composed of five major parts:

  1. Hardware: Physical devices including computers, storage devices, printers, network devices
  2. Software: Operating system software, DBMS software, and application programs/utilities
  3. People: System administrators, database administrators (DBAs), database designers, system analysts/programmers, and end users
  4. Procedures: Instructions and rules that govern the design and use of the database system
  5. Data: The collection of facts stored in the database

Why Database Design is Important

Database design refers to the activities that focus on the design of the database structure that will be used to store and manage end-user data. A database that meets all user requirements does not just happen; its structure must be designed carefully.

Data is one of an organization's most valuable assets. Thorough planning to ensure that data is properly used and leveraged is just as important as proper financial planning. A well-designed database facilitates data management and generates accurate and valuable information. A poorly designed database is likely to become a breeding ground for difficult-to-trace errors that may lead to poor decision making.

Designing appropriate data repositories involves decomposition - breaking integrated data into its constituent parts, with each part stored in its own table. The relationships between these tables must be carefully considered and implemented so the integrated view of the data can be recreated later.

Database Design Process

Designing a database involves several steps:

  1. Requirements Analysis: Understanding what data needs to be stored and how it will be used
  2. Conceptual Design: Creating Entity-Relationship (ER) diagrams to model data relationships
  3. Logical Design: Converting ER diagrams to relational model with tables and relationships
  4. Normalization: Organizing data to reduce redundancy and eliminate anomalies
  5. Physical Design: Implementing the database with specific DBMS, considering performance and storage

Next Steps

Now that you understand the basics of databases, you're ready to learn about Data Models and how different database models organize information.