"

4 Chapter 4 Data Modelling (Data Modeling)

Fred Strickland

Original Chapter 4 Author: Adrienne Watt and Nelson Eng

Original Chapter 5 Author: Adrienne Watt

Rewrite: Fred Strickland

Learning Outcomes

Computing Sub Discipline

Document Code, Reference Code, and Page Number

Text

Computer Science

CS2013

IM/Relational Databases (RD)

(Page 115-116)

1. Prepare a relational schema from a conceptual model developed using the entity- relationship model. [Usage]

CS2023

DM Modeling: Data Modeling

(Pages 116-117)

CS Core 1. Data modeling

CS Core 2. Relational data model

KA Core 3. Conceptual models (e.g., entity-relationship, UML diagrams).

ILO KA Core 4. Model a given environment using a conceptual data model.

CS2023

DM-NoSQL: NoSQL Systems

(Pages 120-121)

KA Core 1. Why NoSQL? (e.g., Impedance mismatch between Application [CRUD] and RDBMS)

ILO KA Core 1. Develop a use case for the use of NoSQL over RDBMS

Information Systems

IS2020

A3.2 Data / Information Competency Realm

(Page 101)

A3.2.1 Competency Area – Data / Information Management

(Page 101)

2. Design relational databases (Page 103)

Information Technology

IT2017

ITE-IMA-03 Data modeling

(Page 92)

b. Describe the relationship between a logical model and a physical model.

Introduction to Chapter 4

The second edition Chapter 4 was a short chapter that explored at a high level the types of data models. The second edition Chapter 5 explored data modelling. For the third edition, these two chapters have been combined.

Why Use a Model or a Framework?

We humans need something to help us to see the big picture. We could use a model or we could use a framework.

A model is a simplified representation of a system or of a process. It is a tool for breaking the complex into smaller and more manageable parts. Models can be used to help us to understand how something works. It could help us to predict how something might behavior in the future. A good example would be the Open Systems Interconnection (OSI) model, which is used to explain how computers communicate over a network.

A framework is a more structured approach for solving a problem. The framework may have a set of guidelines or rules for completing a task. Software engineers have the waterfall approach and the numerous agile approaches. The United States National Institute of Standards and Technology (NIST) has a cybersecurity framework that industry, government, and other organizations may use for reducing their cybersecurity risks. (See https://nvlpubs.nist.gov/nistpubs/CSWP/NIST.CSWP.29.pdf)

A data model provides a tool for communicating among stakeholders. It helps to understand what is important, what are the business rules, and what is the approach for implementing these items. Reviewing the models will reveal missing pieces and will help to correct errors.

There are trade-offs for using data models. It takes a long time. As a result, there is the temptation or push to jump from one model to the next model. This is especially true for the concept model. And there is the temptation of adding pieces early to the diagrams. Experts fall into this trap by adding line notations. In the concept data model figures, ignore the extra markings.

What Are the Models Used for Creating Databases?

In Chapter 2, we explored six database models. In this chapter, we will look at the data models for going from a concept to a fully working database management system (DBMS). We will look at the concept data model, the logical data model, and the physical data model.

The Concept Data Model — The “What” Model

We are at a high level. We are collecting data. Our great interest should be on the nouns and later the verbs. Look at the “Running Project” at the end of Chapter 2. There is a list of entities or objects.

Once you have a good understanding of the nouns, then you can start to look at the relationship between the objects. Figure 4.1 shows a conceptual diagram that contains travel routes with associated schedules and airlines.

image

Figure 4.1 An example of a conceptual data model diagram. Source of image: https://www.couchbase.com/blog/conceptual-physical-logical-data-models/

Notice that we are not considering the data types. We are not considering how the data would be stored. We are not thinking about relationships. (Ignore the extra lines at the “Airline” box and at the “Schedule” box. We will address this topic elsewhere.) At this point, you could be thinking about a pile of papers on your desk or about a spreadsheet or about a database. At this point, you could be brainstorming on a whiteboard!

image

Figure 4.2 A large whiteboard. Source of image: https://www.magnatag.com/img/products/WW/WWcover1170-2.png

Do spend a good amount of time in this model. Once the stakeholders have agreed that everything has been identified, then you can advance to the next model.

Consider a database that has data on a school and on students. We will be focusing on entities and in time the relationships. In the relational DBMS approach, the term “Entity-Relationship Diagrams” (ERDs) is used.

We are interested in identifying the entities and the relationships. In an educational institution, we are looking at students, courses, instructors, and departments. For the relationships, we are looking at the following:

  • Students enroll in courses.
  • Instructors teach courses.
  • Departments management courses.

Figure 4.3 illustrates the entities and the relationships.

Students

Courses

Instructors

Departments

Students take courses.

Instructors teach courses.

Departments manage courses.

Figure 4.3 Entities and relationships.

We could add some details. For the school, we would need to track the name and a description. For the student, we would need to track the name and the date of birth. Figure 4.4 shows the two entities. (Ignore the extra lines at the “School” box and at the “Student” box. We will address this topic elsewhere.) However, the temptation is to jump to the logical model too early.

image

Figure 4.4 The Conceptual Model Source of image: https://guides.visual-paradigm.com/navigating-the-three-levels-of-database-design-conceptual-logical-and-physical/

The Logical Data Model — The “How of The Details” Model

We are adding more details to the conceptual model. You would look at the business rules and at the constraints. Look at the section entitled “Enforcement of Integrity Constraints” in Chapter 3.

You would define the data types, the data sizes, the lengths, and other details. We are not ready to select a DBMS. So use the generic term of “string” instead of “varchar” or “nvarchar.”

A Logical Model for a Future Relational Database

A Logical Model for a Future Document Database

image

image

Figure 4.5 Two examples of a logical model. Source of image: https://www.couchbase.com/blog/conceptual-physical-logical-data-models/

It is a bit of the cart-before-the-horse situation. In the pure database work, you should not be thinking about the actual DBMS. But in order to draw the logical model, you need to have some foreknowledge into the nature of the future DBMS.

In the right-hand side of Figure 4.5, the Schedule table has a fairly small and finite footprint. So it makes sense to embed it into the same box or collection. This is bounding the Schedule table to the Route table.

In the left-hand side of Figure 4.5, the arrangement is slightly different.

For the educational institution example, we will use Figure 4.6 to illustrate the additional of the details.

Students

Courses

Instructors

Departments

StudentID

CourseID

InstructorID

DepartmentID

FirstName

CourseName

FirstName

DepartmentName

LastName

Credits

LastName

DateOfBirth

Students take courses.

Instructors teach courses.

Departments manage courses.

Figure 4.6 Adding details to the entities.

Once done, we need to think about the data types. We need an identification number (ID) as a number, the name fields as a string, and the date of birth as a date. Figure 4.7 will illustrate this addition.

Students

Courses

Instructors

Departments

StudentID

A number

CourseID

A number

InstructorID

A number

DepartmentID

A number

FirstName

A string

CourseName

A string

FirstName

A string

DepartmentName

A string

LastName

A string

Credits

A number

LastName

A string

DateOfBirth

A date

Students take courses.

Instructors teach courses.

Departments manage courses.

Figure 4.7 Adding details to the entities.

Here, we have the cart-before-the-horse situation. We know that the DBMS will be a relational DBMS. With this foreknowledge, we can state that the IDs will be integers (a whole number). We can state that the name fields will be varchar(255). We can state that the date of birth field will be a date data type. Figure 4.8 will illustrate this addition.

Students

Courses

Instructors

Departments

StudentID

integer

CourseID

integer

InstructorID

integer

DepartmentID

integer

FirstName

varchar(255)

CourseName

varchar(255)

FirstName

varchar(255)

DepartmentName

varchar(255)

LastName

varchar(255)

Credits

Integer

LastName

varchar(255)

DateOfBirth

date

Students take courses.

Instructors teach courses.

Departments manage courses.

Figure 4.8 Revising the data types.

Also with this foreknowledge, we can start to think about relationships. This could result in the need to add more fields to a table. Imagine that all students must be assigned to a department. Figure 4.9 will illustrate this addition.

Students

Courses

Instructors

Departments

StudentID

integer

CourseID

integer

InstructorID

integer

DepartmentID

integer

FirstName

varchar(255)

CourseName

varchar(255)

FirstName

varchar(255)

DepartmentName

varchar(255)

LastName

varchar(255)

Credits

Integer

LastName

varchar(255)

DateOfBirth

date

DepartmentID

Students take courses.

Instructors teach courses.

Departments manage courses.

Figure 4.9 Adding a connecting field to the Student table.

The StudentID, the CourseID, the InstructorID, and the DepartmentID are known as primary keys. A primary key is unique and cannot be null, because it takes us to one row and only one row. The DepartmentID in the Student table is a foreign key, because it points back to the Department table.

Figure 4.10 is from database effort where the user wishes to track schools and students. Figure 4.10 illustrates the use of “SchoolID” in the second field in the Student table.

image

Figure 4.10 The Logical Model Source of image: https://guides.visual-paradigm.com/navigating-the-three-levels-of-database-design-conceptual-logical-and-physical/

Also, we need to look at constraints and at business rules. Students may be limited to taking no more than five courses in a semester. An instructor may be limited to teaching no more than four courses.

Are we done? No. Does your model cover the following?

  • How to document an instructor teaching more than one session of a course?
  • How to document the room assignment for a course?
  • How to document the days and time for a course?
  • How to document grades assigned to a student?

As you think through the issues, you may need to add more fields and to add more tables. There may be a need to add special tables that connect two tables.

The Physical Data Model — The “How of The Implementation” Model

Once we have enough details worked out on the logical data model, then it is time to actually implement it into a real DBMS. Which implementation makes sense?

If your modeling process reveals that your data model is likely to change frequently to adapt to new requirements, you might consider going with a document database. This might be a “NoSQL document database.

Other DBMS schemes have been covered in earlier chapters. You may review those for insights for choosing one of those.

If you decided on a relational model, then there are many options such as:

  • Microsoft SQL Server Management Studio
  • Oracle
  • PostgreSQL
  • MySQL
  • Amazon Relational Database Service
  • Azure SQL Database.

The physical data model may include:

  • The selected DBMS.
  • How and where the data would be stored.
  • Details on how the data would be replicated.

The work of creating the database may be done by the database administrator or by the database developer.

A Few Words

This chapter introduced one approach for creating a database. The second edition used a different approach. It had the following:

  • External model
  • Conceptual model
  • Internal model
  • Physical model

The point of the approach is to capture the details. It does not matter whether one uses the three-model approached presented in this chapter or the four-model approach that was used in the second edition.

The rest of this textbook will go deeper into the relational DBMS design. Do not skip those chapters.

 

Key Terms

data type: This is how data is defined. It could be an integer, a string, a date, or something else. There are differences from DBMS to DBMS.

data models: Tools for going from a concept to a fully working database management system.

entities: The objects or the nouns that make up a database.

Entity-Relationship Diagram (ERD): This is used in relational DBMS work for modeling data.

framework: This is a more structured approach for solving a problem. The framework may have a set of guidelines or rules for completing a task.

integer: A whole number.

model: This is a simplified representation of a system or of a process. It is a tool for breaking the complex into smaller and more manageable parts.

foreign key: This is a primary key appearing in another table. (This was covered in Chapter 3.)

null: This is the absence of a user-defined value. This is not the same thing as zero. (This was covered in Chapter 3.)

primary key: This is a unique value for locating a row of data in a database table. (This was covered in Chapter 3.)

Exercises

  1. Why do we use models when working on a database?
  2. What is a data model?
  3. What is a high-level conceptual data model?
  4. You are creating a conceptual data model for an animal hospital. What are three possible entities?
  5. For question 4, what are some possible fields associated with the three entities?
  6. What are the possible relationships between the three entities?
  7. Using your answers to previous questions, create a logical model. (CS2013 IM/RD 1 and IS2020 A3.2.1.2)
  8. Describe the relationship between a logical model and a physical model. (IT2017 ITE-IMA-03b)
  9. Which model does the database administrator work with?
  10. As noted earlier in this chapter, models help to uncover errors. Do you see an error in Figure 4.11?

image

Figure 4.11 The Physical Model Source of image: https://guides.visual-paradigm.com/navigating-the-three-levels-of-database-design-conceptual-logical-and-physical/

A Running Project

A running project was introduced in Chapter 2. Review what you have collected. Have you overlooked anything?

If you are doing this as a service project or as an internship, continue to chat with the organization. If possible, chat with others besides the main contact person.

In Chapter 3, you were told to use a word processor to create tables with columns for the entity pieces. With the insights from this chapter, add the data types to each entity piece.

 

Attribution

This chapter of Database Design is a derivative copy of Database System Concepts by Nguyen Kim Anh licensed under Creative Commons Attribution License 3.0 license

Information from second edition Chapter 4 was not used. Question 1 and 2 were used and appear as questions 2 and 3.

The following material was written by Adrienne Watt for the second edition:

  • Introduction
  • Key Terms
  • Exercises

The whole chapter was completely revised by Fred Strickland for the third edition.

References

Matthew Groves. “Data Modeling Explained: Conceptual, Physical, Logical,” Couchbase, October 7, 2022. https://www.couchbase.com/blog/conceptual-physical-logical-data-models/

Shawn Manaher. “Model vs Framework: When And How Can You Use Each One? The Content Authority, n.d. https://thecontentauthority.com/blog/model-vs-framework

License

Database Design - 3rd Edition Copyright © by Fred Strickland. All Rights Reserved.

Share This Book