3 Chapter 3 Characteristics and Benefits of a Database
Fred Strickland
Original Author: Adrienne Watt
Rewrite: Fred Strickland
Learning Outcomes
Review and Direction of Chapter 3
Chapter 2 explored the history of databases and examined some of the database models. A brief listing of the advantages and the disadvantages was provided. Chapter 2 ended with a brief introduction to the concept of a database management system. Chapter 3 will go deeper.
Why Use a Database Management System?
A database management system (DBMS) can be used to collect data. If the DBMS is configured as a centralized repository of data, then users can route their data to the DBMS. This arrangement helps to avoid problems such as misplaced data, duplication of data, and accidental loss. The result is that users can use the data to perform various tasks. The following are examples of possible tasks that the collected data could be used for:
- Creating mailing lists
- Writing management reports
- Generating lists of selected news stories
- Identifying various client needs
The actual DBMS contains utilities that can manipulate the stored data:
- Sorting
- Pattern matching
- Aggregating
- Calculating
- Arranging
- Updating
- Deleting
A DBMS could be linked to support the following:
- A website that is capturing registered users
- A client-tracking application for social service organizations
- A medical record system for a health care facility
- A personal address book in your e-mail client
- A collection of word-processed documents
- A system that issues airline reservations
Characteristics and Benefits of a Database
As noted in chapter 2, there are several characteristics that distinguish the database approach from the file-based system or approach. This chapter describes the benefits (and features) of the database system.
Self-Describing Nature of a Database
A database is self-describing since it not only contains the database itself, but also has metadata that defines and describes the data and relationships between the tables in the database. This information is used by the DBMS software and by the database users as needed.
This separation of data and information about the data makes a database totally different from the traditional file-based system in which the data definition is part of the application programs.
Insulation Between Program and Data
In a file-based system, the structure of the data files is defined in the application programs. If a user wants to change the structure of a file, all the programs that access that file might need to be changed as well.
In the database approach, the data structure, the databases tables, the database views, and other vital pieces of information are stored in the system catalogue and not in the programs. This insulation between the programs and data is called program-data independence or simply data independence. In other words, the system data descriptions or data describing data (metadata) are separated from the application programs. This is possible since changes to the data structure are handled by the DBMS and are not embedded in the program itself.
Support for Multiple Views of Data
A database supports multiple views of data. A view is a subset of the database that is defined and dedicated for a particular set of users. Multiple users in the system might have different views. Each view would contain only the data of interest to a user or group of users.
Sharing of Data and Multi-user System
Collecting the organization’s data in one location has many advantages.
- It allows employees and others who have access to the system to share data.
- It gives users the ability to gain insights by viewing data from different entities within the organization.
Current DBMSs are designed for multiple users and DBMSs allow many users to access the same data at the same time. This access is achieved through a feature called concurrency control strategies. These strategies ensure that the data access is always correct and that data integrity is maintained. Data integrity is the assurance that an organization’s data is accurate, complete, and consistent at any point in its lifecycle. This involves safeguarding an organization’s data against loss, leaks, and corrupting influences.
Enforcement of Integrity Constraints
Database constraints or better known as integrity constraints are used to maintain the quality of the data. Integrity constraints ensure that the actions of insertion, of updating, of deleting, and of other actions are done in a way that the actual data is not impacted incorrectly.
There are four types of integrity constraints:
- Domain Constraints
- Entity Integrity Constraints
- Key Constraints
- Referential Integrity Constraints
Student ID |
First Name |
Last Name |
City |
Two-Letter Postal Code |
1 |
James |
Smith |
Vancouver |
BC |
2 |
John |
Johnson |
Grand Falls |
NB |
3 |
Robert |
Williams |
Toronto |
ON |
4 |
Michael |
Brown |
Caribou |
ME |
5 |
William |
Jones |
Presque Isle |
ME |
Figure 3.1 Example of a database table
Domain constraints define the legal values for a column. The following are examples:
- An ID must be a number.
- A student’s grade must be a number.
- Set of two-letter postal codes for the United States
- Set of two-letter postal codes for Canada
Figure 3.1 shows a valid table. The student ID values are only integers (whole numbers). These are drawn from the universe of integers ranging from 1 through the highest support number. The first name, the last name, and the city columns are populated with strings. These could be drawn from the universe of all possible strings. The two-letter postal code values contain only two letters. These could be drawn from the universe of two-letter codes ranging from AA through to ZZ.
Entity integrity constraints pertain to the primary key. A primary key is a unique value for locating a row of data in a database table. Entity integrity constraints prevent a null or empty value or undefined value from being used.
Figure 3.1 shows a valid table. The student ID column does not contain any null entries.
Key constraints pertain to the primary key. The primary key must be unique.
Figure 3.1 shows a valid table. The student ID values are unique for each student. A student ID number is not used again for a different student.
Referential integrity constraints refer to the linkage between two tables. If a table uses a foreign key (a primary key appearing in another table) to reference another table, then that key value must be present in that table.
Advisor ID |
First Name |
Last Name |
Student ID |
1 |
David |
Garcia |
1 |
1 |
David |
Garcia |
3 |
2 |
Richard |
Miller |
2 |
2 |
Richard |
Miller |
4 |
3 |
Charles |
Davis |
5 |
Figure 3.2 Example of a database table with a foreign key
Figure 3.2 shows a valid table. The last column contains the student IDs of students listed in the Figure 3.1 table. If student ID 6 appeared in this table, but not in Figure 3.1 table, then referential integrity has been violated.
Control of Data Redundancy
Ideally, each data item is stored in only one place in the database and thus avoids data redundancy, which is the situation where a piece of data is stored in two or more places. Relational databases use the normalization rules to eliminate data redundancy.
In some cases, data redundancy may be retained in order to improve system performance.
Figure 3.1 has no data redundancy. Figure 3.2 does have data redundancy. See Figure 2.6 for an example that does not have any data redundancy.
Enforcement of Access and Permissions
Not all users of a database system will have the same access and the same privileges. The combination could be done for an individual or for a work group or for a work role.
Access pertains to what can be viewed. A user might have access to inventory data, but not to employee salary data. Another individual might have access to employee salary data, but not to inventory. The human resource group would have access to employees’ personal information. The sales group would have access to sales plans. The person filling the role as a chief executive officer would have access to personnel data, to planning documents, to budgeting documents, and so on. The person filling the role as the network administrator would have greater access.
Permissions pertain to what a person or a group or a role can do. Read-only permission is the ability to read a file but not make changes. Write permission is the ability to make changes to a file or to a resource. Typically, if write permission is being granted, then read permission is being granted at the same time (read and write privileges). There are situations when write permission is granted without read permission. For example, a log file could be write-only and thus prevents users from reading what others have written. This is an interesting topic, but is beyond the scope of this textbook. If you wish to learn more, use a search engine and look for POSIX file permissions or UNIX file permissions or Linux file permissions.
Transaction Processing
A database management system must include concurrency control subsystems. This feature ensures that the data remains consistent and valid during transaction processing even if several users update the same data.
Backup and Recovery Facilities
Backup and recovery are methods that protect the data from loss. The DBMS provides a separate process for backing up and recovering data. This is different and independent of a network backup system.
If a hard drive fails and the database is stored on that hard drive, then it would not be accessible. The only way to recover the database is from a backup.
If a computer system fails in the middle of a complex update process, the recovery subsystem is responsible for making sure that the database is restored to its original state.
Key Terms
access: This pertains to what can be viewed by a user.
concurrency control strategies: Strategies that ensure the data access is always correct and that data integrity is maintained.
data independence: The insulation between the programs and data is also called program-data independence.
data integrity: The assurance that an organization’s data is accurate, complete, and consistent at any point in its lifecycle.
data redundancy: This is the situation where a piece of data is stored in two or more places.
database constraint or integrity constraints: Steps that ensure that the actions of insertion, of updating, of deleting, and of other actions are done in a way that the actual data is not impacted incorrectly.
domain constraint: This defines the legal values for a column.
entity integrity constraint: This pertains to the primary key. Entity integrity constraints prevent a null or empty value from being used.
foreign key: This is a primary key appearing in another table.
key constraint: This pertains to the primary key. The primary key must be unique.
metadata: Defines and describes the data and relationships between the tables in the database.
normalization: Relational databases use a set of rules for eliminating data redundancy.
null: This is the absence of a user-defined value. This is not the same thing as zero.
permission: This pertains to what a person or a group or a role can do.
primary key: This is a unique value for locating a row of data in a database table.
program-data independence: See data independence.
read and write privileges: This is the ability to both read and modify a file.
read-only permission: This is the ability to read a file but not make changes.
referential integrity constraint: This refers to the linkage between two tables. If a table uses a foreign key (a primary key appearing in another table) to reference another table, then that key value must be present in that table.
self-describing: A database is self-describing because it not only contains the database itself, but also metadata that defines and describes the data and relationships between the tables in the database.
system catalogue (system catalog): In the database approach, the data structure, the databases tables, the database views, and other vital pieces of information are stored.
view: A subset of the database that is defined and dedicated for a particular set of users.
write permission: This is the ability to make changes to a file or to a resource.
Exercises
- List three reasons why a database management system should be used.
- Explain what is meant by the phrase that a database is self-describing?
- Explain the concept of data independence and its importance in a database system. (CS2018 IM/DS 6)
- Explain how data can be kept private from some users through access and through views. (CE2016 CE-PPP 9)
- Explain concurrency control strategies.
- Why is it important to have database constraints? (IT2017 ITE-IMA-03c)
- Explain the concept of domain constraints and provide two examples.
- Explain the concept of entity integrity constraints and provide two examples. Be sure to address the concept of a primary key in your answer. (CS2013 IM/RD 2 and IT2017 ITE-IMA-05d)
- Explain the concept of referential integrity constraints. Be sure to address the concept of a foreign key in your answer. Provide two examples. (CS2013 IM/RD 2 and IT2017 ITE-IMA-05d)
- Explain data redundancy.
- Explain access and permissions.
- Explain how database backups and database recoveries can provide database security. (IT2017 ITE-IMA-07d)
- What is the importance of metadata in a database environment? (IT2017 ITE-IMA-07e)
A Running Project
A running project was introduced in Chapter 2. Review what you have collected. Have you overlooked anything?
If you are doing this as a service project or as an internship, continue to chat with the organization. If possible, chat with others besides the main contact person.
Use a word processor to create tables with columns for the entity pieces. DO NOT USE SQL COMMANDS FROM A DATABASE MANAGEMENT SYSTEM! It is too early to think about doing this.
Attribution
This chapter of Database Design is a derivative copy of Database System Concepts by Nguyen Kim Anh licensed under Creative Commons Attribution License 3.0 license
The following material was written by Adrienne Watt for the second edition:
- Introduction
- Key Terms
- Exercises
The whole chapter was completely revised by Fred Strickland for the third edition.
References
“DBMS Integrity Constraints,” Geeks for Geeks, February 28, 2024. https://www.geeksforgeeks.org/dbms-integrity-constraints/
“NULL values in SQL,” Geeks for Geeks, April 24, 2023. https://www.geeksforgeeks.org/sql-null-values/
“Write access without read access,” Unix & Linux, n.d. https://unix.stackexchange.com/questions/22577/write-access-without-read-access
Mayank Dham. “Types of Constraints in DBMS,” PrepBytes Blog, July 21, 2023. https://www.prepbytes.com/blog/dbms/constraints-in-dbms-and-types/
Falguni Thakker. “SQL Domain Constraints (NOT NULL, Check, UNIQUE),” GoLinuxCloud, November 4, 2022. https://www.golinuxcloud.com/sql-domain-constraints/#google_vignette
- This contains example SQL commands.
“What is the most common last name in the United States?” Mongabay, n.d. https://names.mongabay.com/data/1000.html
“300 Most Common Male Names in the U.S.” Mongabay, November 20, 2005. https://names.mongabay.com/male_names.htm