Home / Database / Introduction

Database Introduction

Databases are involved in almost all facets and activities of our daily lives: from school, to work, to medical care, government, nonprofit organizations, and houses of worship. In today's world, data is ubiquitous (abundant, global, everywhere) and pervasive (unescapable, prevalent, persistent). From birth to death, we generate and consume data, starting with a birth certificate and continuing throughout our lives.

Data is not only ubiquitous and pervasive, it is essential for organizations to survive and prosper. Telecommunications companies, such as Sprint and AT&T, process data on trillions of phone calls with new data being added at speeds up to 70,000 calls per second. Google responds to over 91 million searches per day across several terabytes of data, with results available almost instantly.

What is a Database?

A database is a shared, integrated computer structure that stores a collection of:

End-user data: Raw facts of interest to the end user
Metadata: Data about data, through which the end-user data is integrated and managed

The metadata describes the data characteristics and the set of relationships that links the data found within the database. For example, the metadata component stores information such as the name of each data element, the type of values (numeric, dates, or text) stored on each data element, and whether the data element can be left empty. Because of this, a database is often described as a "collection of self-describing data."

Databases are specialized structures that allow computer-based systems to store, manage, and retrieve data very quickly. Virtually all modern business systems rely on databases.

Data versus Information

To understand what drives database design, you must understand the difference between data and information.

Data consists of raw facts. The word "raw" indicates that the facts have not yet been processed to reveal their meaning. For example, a list of faculty names, departments, and employment types is raw data.

Information is the result of processing raw data to reveal its meaning. Information requires context. For example, an average temperature reading of 105 degrees does not mean much unless you also know its context: Is this in degrees Fahrenheit or Celsius? Is this a machine temperature, a body temperature, or an outside air temperature?

Knowledge is the body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.

Data → Information → Knowledge

Data:    100, 98, 102, 105, 99 (raw numbers)

Information: Average temperature = 100.8 degrees
            (requires context: Fahrenheit, body temperature)

Knowledge: Understanding that 100.8°F indicates a fever
           and requires medical attention

Key Points:

Data constitutes the building blocks of information
Information is produced by processing data
Information is used to reveal the meaning of data
Accurate, relevant, and timely information is the key to good decision making
Good decision making is the key to organizational survival in a global environment

Why Use Databases?

Databases provide several important advantages over traditional file systems:

Improved data sharing: The DBMS helps create an environment in which end users have better access to more and better-managed data
Improved data security: A DBMS provides a framework for better enforcement of data privacy and security policies
Better data integration: Wider access to well-managed data promotes an integrated view of the organization's operations
Minimized data inconsistency: The probability of data inconsistency is greatly reduced in a properly designed database
Improved data access: The DBMS makes it possible to produce quick answers to ad hoc queries
Improved decision making: Better-managed data and improved data access make it possible to generate better-quality information
Increased end-user productivity: The availability of data, combined with tools that transform data into usable information, empowers end users

Types of Databases

There are several types of database systems:

1. Relational Databases (RDBMS)

Data is organized in tables (relations) with rows and columns. Tables can be linked through relationships using keys. Examples include MySQL, PostgreSQL, Oracle, and SQL Server.

2. NoSQL Databases

Non-relational databases designed for specific data models and scaling needs. Types include:

Document databases: Store data as documents (MongoDB)
Key-value stores: Simple data model with key-value pairs (Redis)
Column-family stores: Store data in columns (Cassandra)
Graph databases: Store data as nodes and relationships (Neo4j)

3. Object-Oriented Databases

Store data as objects, similar to object-oriented programming concepts.

4. Hierarchical Databases

Data is organized in a tree-like structure with parent-child relationships.

5. Network Databases

Similar to hierarchical but allows multiple parent-child relationships.

Key Database Concepts

Database Management System (DBMS)

A Database Management System (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database. The DBMS serves as the intermediary between the user and the database. It presents the end user with a single, integrated view of the data in the database and hides much of the database's internal complexity.

DBMS Functions

A DBMS performs several important functions that guarantee the integrity and consistency of data:

Data dictionary management: The DBMS stores definitions of data elements and their relationships (metadata) in a data dictionary, providing data abstraction and removing structural and data dependence
Data storage management: The DBMS creates and manages complex structures required for data storage, handling performance tuning and concurrent data access
Data transformation and presentation: The DBMS transforms entered data to conform to required structures and formats data for presentation based on user needs
Security management: The DBMS creates a security system that enforces user security and data privacy through authentication and authorization
Multiuser access control: The DBMS uses sophisticated algorithms to ensure multiple users can access the database concurrently without compromising integrity
Backup and recovery management: The DBMS provides backup and data recovery to ensure data safety and integrity after failures
Data integrity management: The DBMS promotes and enforces integrity rules, minimizing redundancy and maximizing consistency
Database access languages and APIs: The DBMS provides data access through query languages like SQL and application programming interfaces
Database communication interfaces: The DBMS accepts end-user requests via multiple network environments, including web browsers

Tables

Tables are the fundamental storage units in relational databases. A table consists of:

Rows (Records): Individual data entries
Columns (Fields): Attributes that describe the data

Primary Key

A primary key is a unique identifier for each row in a table. It ensures that each record can be uniquely identified and cannot be duplicated.

Foreign Key

A foreign key is a column that references the primary key of another table, establishing relationships between tables.

Problems with File System Data Processing

Understanding file system limitations helps explain why databases are necessary. File systems suffer from several critical problems:

1. Structural and Data Dependence

Structural dependence means that access to a file is dependent on its structure. Adding a field to a file structure requires modifying all programs that use that file. Structural independence exists when you can change the file structure without affecting the application's ability to access the data.

Data dependence means that data representation and manipulation are dependent on physical data storage characteristics. Changing a field from integer to decimal requires changes in all programs that access the file. Data independence exists when you can change data storage characteristics without affecting the program's ability to access the data.

2. Data Redundancy

Data redundancy exists when the same data is stored unnecessarily at different places. This creates:

Poor data security: Multiple copies increase chances for unauthorized access
Data inconsistency: Different versions of the same data appear in different places
Data-entry errors: More likely when complex entries are made in several files
Data integrity problems: Difficulty ensuring data accuracy and validity

3. Data Anomalies

Data redundancy leads to data anomalies - abnormalities in the database:

Update anomalies: Changing a field value requires updates in many places
Insertion anomalies: Adding new data may require adding dummy data or restructure
Deletion anomalies: Deleting data may unintentionally remove other needed data

The Database System Environment

A database system refers to an organization of components that define and regulate the collection, storage, management, and use of data. The database system is composed of five major parts:

Hardware: Physical devices including computers, storage devices, printers, network devices
Software: Operating system software, DBMS software, and application programs/utilities
People: System administrators, database administrators (DBAs), database designers, system analysts/programmers, and end users
Procedures: Instructions and rules that govern the design and use of the database system
Data: The collection of facts stored in the database

Why Database Design is Important

Database design refers to the activities that focus on the design of the database structure that will be used to store and manage end-user data. A database that meets all user requirements does not just happen; its structure must be designed carefully.

Data is one of an organization's most valuable assets. Thorough planning to ensure that data is properly used and leveraged is just as important as proper financial planning. A well-designed database facilitates data management and generates accurate and valuable information. A poorly designed database is likely to become a breeding ground for difficult-to-trace errors that may lead to poor decision making.

Designing appropriate data repositories involves decomposition - breaking integrated data into its constituent parts, with each part stored in its own table. The relationships between these tables must be carefully considered and implemented so the integrated view of the data can be recreated later.

Database Design Process

Designing a database involves several steps:

Requirements Analysis: Understanding what data needs to be stored and how it will be used
Conceptual Design: Creating Entity-Relationship (ER) diagrams to model data relationships
Logical Design: Converting ER diagrams to relational model with tables and relationships
Normalization: Organizing data to reduce redundancy and eliminate anomalies
Physical Design: Implementing the database with specific DBMS, considering performance and storage

Next Steps

Now that you understand the basics of databases, you're ready to learn about Data Models and how different database models organize information.

← Database Home Data Models →