Database Introduction
Databases are involved in almost all facets and activities of our daily lives: from school, to work, to medical care, government, nonprofit organizations, and houses of worship. In today's world, data is ubiquitous (abundant, global, everywhere) and pervasive (unescapable, prevalent, persistent). From birth to death, we generate and consume data, starting with a birth certificate and continuing throughout our lives.
Data is not only ubiquitous and pervasive, it is essential for organizations to survive and prosper. Telecommunications companies, such as Sprint and AT&T, process data on trillions of phone calls with new data being added at speeds up to 70,000 calls per second. Google responds to over 91 million searches per day across several terabytes of data, with results available almost instantly.
What is a Database?
A database is a shared, integrated computer structure that stores a collection of:
- End-user data: Raw facts of interest to the end user
- Metadata: Data about data, through which the end-user data is integrated and managed
The metadata describes the data characteristics and the set of relationships that links the data found within the database. For example, the metadata component stores information such as the name of each data element, the type of values (numeric, dates, or text) stored on each data element, and whether the data element can be left empty. Because of this, a database is often described as a "collection of self-describing data."
Databases are specialized structures that allow computer-based systems to store, manage, and retrieve data very quickly. Virtually all modern business systems rely on databases.
Data versus Information
To understand what drives database design, you must understand the difference between data and information.
Data consists of raw facts. The word "raw" indicates that the facts have not yet been processed to reveal their meaning. For example, a list of faculty names, departments, and employment types is raw data.
Information is the result of processing raw data to reveal its meaning. Information requires context. For example, an average temperature reading of 105 degrees does not mean much unless you also know its context: Is this in degrees Fahrenheit or Celsius? Is this a machine temperature, a body temperature, or an outside air temperature?
Knowledge is the body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.
Data: 100, 98, 102, 105, 99 (raw numbers)
Information: Average temperature = 100.8 degrees
(requires context: Fahrenheit, body temperature)
Knowledge: Understanding that 100.8°F indicates a fever
and requires medical attention
Key Points:
- Data constitutes the building blocks of information
- Information is produced by processing data
- Information is used to reveal the meaning of data
- Accurate, relevant, and timely information is the key to good decision making
- Good decision making is the key to organizational survival in a global environment
Why Use Databases?
Databases provide several important advantages over traditional file systems:
- Improved data sharing: The DBMS helps create an environment in which end users have better access to more and better-managed data
- Improved data security: A DBMS provides a framework for better enforcement of data privacy and security policies
- Better data integration: Wider access to well-managed data promotes an integrated view of the organization's operations
- Minimized data inconsistency: The probability of data inconsistency is greatly reduced in a properly designed database
- Improved data access: The DBMS makes it possible to produce quick answers to ad hoc queries
- Improved decision making: Better-managed data and improved data access make it possible to generate better-quality information
- Increased end-user productivity: The availability of data, combined with tools that transform data into usable information, empowers end users
Types of Databases
There are several types of database systems:
1. Relational Databases (RDBMS)
Data is organized in tables (relations) with rows and columns. Tables can be linked through relationships using keys. Examples include MySQL, PostgreSQL, Oracle, and SQL Server.
2. NoSQL Databases
Non-relational databases designed for specific data models and scaling needs. Types include:
- Document databases: Store data as documents (MongoDB)
- Key-value stores: Simple data model with key-value pairs (Redis)
- Column-family stores: Store data in columns (Cassandra)
- Graph databases: Store data as nodes and relationships (Neo4j)
3. Object-Oriented Databases
Store data as objects, similar to object-oriented programming concepts.
4. Hierarchical Databases
Data is organized in a tree-like structure with parent-child relationships.
5. Network Databases
Similar to hierarchical but allows multiple parent-child relationships.
Key Database Concepts
Database Management System (DBMS)
A Database Management System (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database. The DBMS serves as the intermediary between the user and the database. It presents the end user with a single, integrated view of the data in the database and hides much of the database's internal complexity.
DBMS Functions
A DBMS performs several important functions that guarantee the integrity and consistency of data:
- Data dictionary management: The DBMS stores definitions of data elements and their relationships (metadata) in a data dictionary, providing data abstraction and removing structural and data dependence
- Data storage management: The DBMS creates and manages complex structures required for data storage, handling performance tuning and concurrent data access
- Data transformation and presentation: The DBMS transforms entered data to conform to required structures and formats data for presentation based on user needs
- Security management: The DBMS creates a security system that enforces user security and data privacy through authentication and authorization
- Multiuser access control: The DBMS uses sophisticated algorithms to ensure multiple users can access the database concurrently without compromising integrity
- Backup and recovery management: The DBMS provides backup and data recovery to ensure data safety and integrity after failures
- Data integrity management: The DBMS promotes and enforces integrity rules, minimizing redundancy and maximizing consistency
- Database access languages and APIs: The DBMS provides data access through query languages like SQL and application programming interfaces
- Database communication interfaces: The DBMS accepts end-user requests via multiple network environments, including web browsers
Tables
Tables are the fundamental storage units in relational databases. A table consists of:
- Rows (Records): Individual data entries
- Columns (Fields): Attributes that describe the data
Primary Key
A primary key is a unique identifier for each row in a table. It ensures that each record can be uniquely identified and cannot be duplicated.
Foreign Key
A foreign key is a column that references the primary key of another table, establishing relationships between tables.
Problems with File System Data Processing
Understanding file system limitations helps explain why databases are necessary. File systems suffer from several critical problems:
1. Structural and Data Dependence
Structural dependence means that access to a file is dependent on its structure. Adding a field to a file structure requires modifying all programs that use that file. Structural independence exists when you can change the file structure without affecting the application's ability to access the data.
Data dependence means that data representation and manipulation are dependent on physical data storage characteristics. Changing a field from integer to decimal requires changes in all programs that access the file. Data independence exists when you can change data storage characteristics without affecting the program's ability to access the data.
2. Data Redundancy
Data redundancy exists when the same data is stored unnecessarily at different places. This creates:
- Poor data security: Multiple copies increase chances for unauthorized access
- Data inconsistency: Different versions of the same data appear in different places
- Data-entry errors: More likely when complex entries are made in several files
- Data integrity problems: Difficulty ensuring data accuracy and validity
3. Data Anomalies
Data redundancy leads to data anomalies - abnormalities in the database:
- Update anomalies: Changing a field value requires updates in many places
- Insertion anomalies: Adding new data may require adding dummy data or restructure
- Deletion anomalies: Deleting data may unintentionally remove other needed data
The Database System Environment
A database system refers to an organization of components that define and regulate the collection, storage, management, and use of data. The database system is composed of five major parts:
- Hardware: Physical devices including computers, storage devices, printers, network devices
- Software: Operating system software, DBMS software, and application programs/utilities
- People: System administrators, database administrators (DBAs), database designers, system analysts/programmers, and end users
- Procedures: Instructions and rules that govern the design and use of the database system
- Data: The collection of facts stored in the database
Why Database Design is Important
Database design refers to the activities that focus on the design of the database structure that will be used to store and manage end-user data. A database that meets all user requirements does not just happen; its structure must be designed carefully.
Data is one of an organization's most valuable assets. Thorough planning to ensure that data is properly used and leveraged is just as important as proper financial planning. A well-designed database facilitates data management and generates accurate and valuable information. A poorly designed database is likely to become a breeding ground for difficult-to-trace errors that may lead to poor decision making.
Designing appropriate data repositories involves decomposition - breaking integrated data into its constituent parts, with each part stored in its own table. The relationships between these tables must be carefully considered and implemented so the integrated view of the data can be recreated later.
Database Design Process
Designing a database involves several steps:
- Requirements Analysis: Understanding what data needs to be stored and how it will be used
- Conceptual Design: Creating Entity-Relationship (ER) diagrams to model data relationships
- Logical Design: Converting ER diagrams to relational model with tables and relationships
- Normalization: Organizing data to reduce redundancy and eliminate anomalies
- Physical Design: Implementing the database with specific DBMS, considering performance and storage
Next Steps
Now that you understand the basics of databases, you're ready to learn about Data Models and how different database models organize information.