11 Chapter 11 Under the Hood of the SQL Server
Fred Strickland
Original Material to the textbook: Fred Strickland
Learning Outcomes
Computing Sub Discipline |
Document Code, Reference Code, and Page Number |
Text |
Computer Engineering |
CE2016 CE-SWD-10 Database systems (Page 103) |
Describe the major components of a modern database system. |
Computer Science |
CS2013 IM/Database Systems (Pages 113-114) |
2. Describe the most common designs for core database system components including the query optimizer, query executor, storage manager, access methods, and transaction processor. [Familiarity] 4. Describe the components of a database system and give examples of their use. [Familiarity] 5. Identify major DBMS functions and describe their role in a database system. [Familiarity] |
CS2023 DM-Core: Core Database Systems Concepts (Pages 115-116)
|
CS Core 2. Components of database systems. CS Core 3. Design of Core DBMS functions (e.g., query mechanisms, transaction management, buffer management, access methods) CS Core 4. Database architecture, data independence, and data abstraction CS Core 5. Transaction management ILO CS Core 2. Enumerate the components of a (relational) database system. ILO CS Core 3. Follow a query as it is processed by the components of a (relational) database system. ILO CS Core 6. Enumerate the four properties of a correct transaction manager. |
|
CS2023 DM-Internals: DBMS Internals (Pages 119-120) |
KA Core 1. DB Buffer Management. KA Core 2. Transaction Management KA Core 2a. Isolation Levels KA Core 2b. ACID KA Core 2c. Serializability KA Core 2d. Distributed Transactions KA Core 3. Concurrency Control KA Core 3a. 2-Phase Locking KA Core 3b. Deadlocks handling strategies KA Core 4. Recovery Manager KA Core 4a. Relation with Buffer Manager Non-core 5. Concurrency Control Non-core 5a. Optimistic concurrency control Non-core 5b. Timestamp concurrency control Non-core 6. Recovery Manager Non-core 6a. Write-Ahead logging
KA Core 1. Describe how a DBMS manages its Buffer Pool. KA Core 2. Describe the four properties for a correct transaction manager. |
|
Introduction to Chapter 11
We have been looking at relational DBMS from a hands-on or application viewpoint. And we have touched upon the different DBMSs. We have discovered that SQL-92 is not fully implemented by these DBMSs and that there are extensions.
In Chapter 10, we looked at a common way of expressing a query. We looked at some depth into the expressions of relational algebra and relational calculus.
In this chapter, we will look under the hood. You would think that the ACM would require information systems (IS) workers to have a passing understanding of this topic. And for sure, you would think that the ACM would require information technology (IT) workers to have a deeper understanding of the supported DBMS tool. That is not the case. The ACM requires IS workers to understand the design models, the programming interface to a DBMS, the security, and a few other items. The ACM requires IT workers to understand information management, the data modeling, the query languages, the managing of the database environment, and a few other items. Instead, the ACM requires computer engineers and computer scientists to have a deeper understanding of the inner workings of a DBMS.
So this chapter will examine what makes the Microsoft SQL Server work. The concepts apply in some degree to other DBMS. The coverage will be at a high level.
Chapter 12 will explore how to protect the DBMS.
What are the DBMS Hidden Components?
Microsoft SQL Server and perhaps other DBMS will have the following components:
- Query optimizer
- Query executor
- Storage manager
- Access methods
- Transaction processor
Figure 11.1 shows how Microsoft built the SQL Server.
Figure 11.1 Block diagram of the Microsoft SQL Server. Source of image: https://www.guru99.com/sql-server-architecture.html
Oracle’s documentation shows the Oracle Database Server architecture as a series of illustrations. Figure 11.2 shows the highest-level view.
Figure 11.2 The highest-level view of the Oracle Database Server architecture. Source of image: https://www.oracletutorial.com/oracle-administration/oracle-database-architecture/
A functioning DBMS needs to provide support for
- Access controls
- Privilege levels
- Security principles
- Facilities for sequence of data
- Storing and processing large volumes of data
For this chapter, we will briefly look at access controls, privilege levels, security principles, and the means for sequencing data. We will not look at the processing of large volumes of data. Chapter 12 will go into greater detail concerning access control and protecting the DBMS.
What is Access Control?
Access control is a method for restricting access to certain resources by certain individuals. Physical access controls are physical means such as fences, locked doors, and guards. Figure 11.3 shows such an example whereby a device is used to open a door.
Figure 11.3 An example of a physical device for gaining access to a room. Source of image: https://landing.verkada.com/access-demo
We are interested in access controls that are provided via software. These are known as logical access controls and these support letting certain individuals or groups to gain access to sensitive information. A working access control system will help to prevent unauthorized persons1 from accessing the data.
There are two parts to this function2:
- Authentication
- A way to verify the identity of a person.
- Authorization
- A way of determining whether or not that a person should be allowed to access the data or to make a change.
Authentication can be provided via the following mechanisms:
- Two-factor authentication
- Multi-factor authentication
- one-time password
- Three-factor authentication
- Biometrics3
- Hard Tokens
- Soft Tokens
- Contextual Authentication
- Device identification
We will not go further with authentication, because this topic is beyond the scope of this chapter. Our interest is in authorization to an organization’s resources.
There are different ways of providing access control services. These will be addressed in Chapter 12.
The Microsoft SQL Server DBMS Hidden Components
The SQL Server architecture has three major components with at least two subcomponents:
- Protocol Layer
- The SNI
- The TDS
- Relational Engine
- CMD Parser
- Optimizer
- Query Executor
- Storage Engine
- Transaction Manager
- Access Methods
- Buffer Manager
These components and subcomponents may interact with other components and subcomponents.
- The Protocol Layer uses the TDS to interact with the SQL Server Network Interface.
- The Storage Engine component interacts with the Transaction Log, the Data File, and the Plan Cache.
Contrasting with the Oracle Database Architecture
The Oracle Database architecture consists of at least one database instance and one database. The database instance handles the memory and the processes. In some fashion, this database instance connects to the database data files and to the database system files. The Oracle document drills down into the subcomponents of the database instance.
The database instance has the System Global Area, the Program Global Areas, the Background Processes and the Server Processes. See Figure 11.4.
Figure 11.4 Oracle’s Database Instance’s subcomponents. Source of image: https://www.oracle.com/webfolder/technetwork/tutorials/architecture-diagrams/19/pdf/db-19c-architecture.pdf
Each of these subcomponents contains several pieces. In some fashion, the Client Process connects to the Server Process and that in turn connects to the Program Global Area.
The Oracle documentation contains 40 pages and from this brief exploration, it is clear that Oracle has taken a completely different approach from Microsoft. Since we assumed that the running project was done with one of the Microsoft SQL Server versions, we will end our exploration of the Oracle architecture. Before we move on, I wish to point out a feature that students of operating systems may find interesting. Inside the System Global Area is the Database Buffer Cache. It uses the Least Recently Used (LRU) approach for freeing up the buffer. See Figure 11.5.
Figure 11.5 Oracle Database Buffer Cache uses the Least Recently Used (LRU) approach. Source of image: https://www.oracle.com/webfolder/technetwork/tutorials/architecture-diagrams/19/pdf/db-19c-architecture.pdf
The Microsoft SQL Server DBMS Hidden Components: Microsoft SQL Server Protocol Layer
Figure 11.6 Illustrating the location of the Protocol Layer. Source of image: https://www.guru99.com/sql-server-architecture.html
The Protocol Layer comes after the SQL Server Network Interface (see Figure 11.6). As explained in the previous section, access to the database is handled in many places within the SQL Server.
The Protocol Layer is before the Relational Engine. The relational engine has the CMD Parser the Query Optimizer, and the Query Executor.
The Protocol Layer supports three types of client server architecture:
- shared memory
- TCP/IP
- shared pipes
Figure 11.7 This shows the three protocols used by the SQL Server Configuration Manager.
Shared Memory: The client and the SQL Sever run on the same machine. The instructions on installing the Microsoft Developer edition assumed this configuration. The “Server Name” option could be any of the following:
- “.” (a period)
- “localhost”
- “127.0.0.1”
- “Machine\Instance”
The following is an image of where these would appear:
Figure 11.8 Where the name of the shared memory server name would appear in the floating dialog box. Source of image: https://www.guru99.com/sql-server-architecture.html
Transmission Control Protocol/Internet Protocol (TCP/IP): The client and the SQL Server run on the different machines in different locations. This is where we need to have the IP address and the port number. It is possible to use TCP/IP in the previous example. The drawback is that should the Internet go down, you might not be able to access the SQL Server!
The “Server Name” option would be “Machine\Instance of the server” and the port number is 1433. SQL server always uses port 1433 as its default. (Oracle databases use port 1521. MySQL databases use port 3306.)
The following is an image of how this would appear:
Figure 11.9 Where the name of the TCP/IP server name would appear. Source of image: https://www.guru99.com/sql-server-architecture.html
Named Pipes: The client and the SQL Sever run on the different machines in the same location. This uses the local area network (LAN) for the communications. The drawback is that should the Intranet go down, you might not be able to use SQL Server!
Strange as it may sound, this third option is disabled by default. You would use the SQL Configuration Manger to turn this option on.
Figure 11.10 The steps for finding where to enable the named pipes.
In addition to the three protocols for connecting to the SQL Server, there is the Tabular Data Stream (TDS) protocol. This is a protocol for transferring data between the client machine and the server machine. This was developed by Sybase and now is wholly owned by Microsoft4.
The Microsoft SQL Server DBMS Hidden Components: Microsoft SQL Server Relational Engine
CMD Parser
The data travels from the Protocol Layer to the CMD Parser. Its primary task is to check the query for errors in syntactics and in semantics.
- A syntactic error is when the query does not follow the grammar rules.
- A semantic error is when the query references a non-existent column or table.
- Not shown in Figure 11.6 is the Normalizer. It checks for the presence of the columns and tables. If these exist, then the Normalizer would bind the resource to the query.
If there are no errors, then the CMD Parser generates a Query Tree. More than one Query Tree could be generated.
Figure 11.11 The CMD Parser’s steps. Source of image: https://www.guru99.com/sql-server-architecture.html
Optimizer
The Query Trees are sent from the CMD Parser to the Optimizer. The Optimizer’s role is to find the cheapest execution plan. This may not be the best, cost-effective execution plan.
- Optimization is done for DML commands such as SELECT, INSERT, DELETE, and UPDATE.
- DDL commands such as CREATE and ALTER are not optimized.
The actual Optimizer algorithms are propriety5. The following are the known details.
Figure 11.12 The released details on the Optimizer steps. Source of image: https://www.guru99.com/sql-server-architecture.html
Phase 0: Search for the Trivial Plan
The Query Tree may contain only one practical and workable plan.
Phase 1: Search for Transaction processing Plans
This phase is visited when there is not a Trivial Plan available. The first effort is for a Simple Plan. If that effort fails, then the second effort is for a Complex Plan. These efforts look at table indexes.
Phase 2: Parallel Processing and Optimization
This phase is visited when Phase 1 failed to find an execution plan. For the first effort, it searches for Parallel Processing possibilities. If this effort fails, then using the Microsoft propriety algorithms, it will find all other possible options for executing the query in the best way.
Query Executor
The Query Plan goes from the Optimizer to the Query Executor. The Query Executor has a two-way link to the Access Methods inside the Storage Engine and a one-way connection to the Protocol Layer. Figure 11.13 illustrates these connections.
Figure 11.13 The Query Executor’s connections. Source of image: https://www.guru99.com/sql-server-architecture.html
The Query Executer provides an execution plan to the Access Methods module. The execution plan is used for creating the data fetching logic. One the data is received from the Storage Engine, then the results are sent to the Protocol Layer. The Protocol Layer routes the desired data to the end user.
The Microsoft SQL Server DBMS Hidden Components: Microsoft SQL Server Storage Engine
The Storage Engine has
- Access Methods. This is connected to the Query Executor in the Relational Engine, to the Transaction Manager, and to the Buffer Manager.
- Transaction Manager. This is connected to the Transaction Log and to the Access methods.
- Buffer Manager. This is connected to the Data File, to Access Methods and to Dirty Pages in the Data Cache.
These components are illustrated in Figure 11.14.
Figure 11.14 The components in the Storage Engine. Source of image: https://www.guru99.com/sql-server-architecture.html
Access Methods
As noted in the section on the Query Execute, the Access Methods module receives the execution plan. If there is a SELECT statement, then the query is passed to the Buffer Manager for further processing. Otherwise, the query is routed to the Transaction Manager. Figure 11.15 illustrates this activity.
Figure 11.15 The two paths for a SQL statement from the Access Methods module. Source of image: https://www.guru99.com/sql-server-architecture.html
Buffer Manager
The Buffer Manager has a two-way connection to the Access Methods module, a two-way connection to the data file, and a one-way connection to the Dirty Page module inside the Data Cache. The Buffer Manager manages three functions:
- Plan Cache
- Data Parsing: Buffer cache and data storage
- Dirty Page
Plan Cache
After the first execution of an execution plan, then the execution plan is stored for possible reuse. The Buffer Manager checks to see if the desired execution plan is present. The benefit of checking is that the server can execute faster.
Data Parsing
The Buffer Manager uses two approaches for accessing the desired data.
Hard Parsing
If the data is not present in the Buffer Manager, then access is made to the data storage. A copy is stored in the data cache.
Soft Parsing
If later queries seek the same data, then this data would be present in the data cache. The Query Executor snags a copy of this data. This approach improves the performance, because input/output operations are reduced.
Transaction Manager
The Transaction Manager is connected to the Transaction Log and to the Access Methods module. It is invoked when a non-select statement is being executed. (Refer to Figure 11.19.) There are three components:
- Log Manager
- Lock Manager
- Execution Process
A Quick Tangent on ACID
In several chapters, we addressed how to structure database tables so that data is not lost and so that the data is kept current. In Chapter 8, ACID was introduced. For this chapter, we have not addressed how a database crash could create similar problems. A proper database needs to support the ACID properties. We will review the ACID concept.
Figure 11.16 ACID Properties in a Relational DBMS. Source of image: https://www.geeksforgeeks.org/acid-properties-in-dbms/
Atomicity
A transaction is a single unit. It completes fully (Commit) or it fails (Abort).
Consistency
The integrity constraints must be maintained so that the database is consistent before and after a transaction. We are ensuring that the data is correct.
Isolation
Transactions do not interfere with other transactions.
Durability
The changes must persist. These changes need to be written to the storage medium. So if the power is lost, the changes are present and are not lost.
Drawbacks
ACID has three disadvantages.
- Performance. Additional processing is needed in order to ensure data consistency and integrity.
- Scalability. In large distributed systems, the support for multiple transactions while ensuring ACID may result in the need to cap the size in order to have a responsive system.
- Complexity. Overhead is needed for supporting ACID. This may require more resources. In addition, individuals supporting the database need to have significant expertise.
All in all, the advantages of ACID outweigh the disadvantages.
Log Manager
The Log Manager uses the Transaction Logs for storing all changes to the database. Each entry has a sequence number, a transaction ID, and details on the changes. These items make it possible for the database to support the atomicity requirement in the ACID.
Lock Manager
While one user is making changes, no one else can access the same data. This action makes it possible for the database to support data consistency and isolation requirements in the ACID.
Execution Process
The Execution Process uses the Log Manager and the Lock Manager. In addition, a copy of the requested data is stored in the Buffer cache. Any data that is to be changed has a copy stored in the Log Buffer.
Pages that are storing the data is known as “Dirty Pages.” The Checkpoint and the Write-Ahead Logging process marks all of the Dirty Pages that will be written to the permanent storage. Data from the Dirty Pages are copied to the data page of the log file. This is known as Write Ahead Logging. When the buffer is full, the least recently used (LRU) algorithm is used.
A Quick Tangent on Pages
Computer scientists use different terms to talk about memory and storage.
- Bits
- The ones and zeros. The on state and the off state.
- Nibble
- 4 bits equal 1 nibble.
- Bytes6
- 8 bits equal 1 byte.
- 2 nibbles equal 1 byte
- 1 kilobyte (KB) equals 1,024 Bytes
- 1 megabyte (MG) equals 1,024 KB
- 1 gigabyte or giga Byte (GB) equals 1,024 MB
- 1 terabyte or tera Byte (TB) equals 1,024 GB
- 1 petabyte or peta Byte (PB) equals 1,024 TB
- Word
- This is a grouping of bits. But each computer has a different word size or length.
- 8-bit
- 32-bit
- 64-bits
- 128-bits
- This term is used when talking about CPU registers and generally the length is 16-bits
- Page
- The physical memory is divided into fixed -size blocks called page frames. When a process requests memory, the operating system allocates one or more page frames to the process and maps the process’ logical pages to the actual physical page frames. The term “paging” refers to this action.
Microsoft’s Viewpoint of Pages
A data page is 8 KB in size. Eight data pages form one Extent. The total size is 64 KB. See Figure 11.17 for an illustration of one Extent.
Figure 11.17 A Microsoft Extent. Source of image: https://www.guru99.com/sql-server-architecture.html
Each page has a header, which holds the following metadata:
- Page Type
- Page Number
- Size of Used Space
- Size of Free Space
- Pointer to the next page
- Pointer to the previous page
There are three Page Types.
- Primary file
- Every database contains one Primary file.
- This store all important data related to tables, views,7 Triggers, etc.
- Extension is .mdf usually but can be of any extension.
- Secondary file
- Database may or may not contains multiple Secondary files.
- This is optional and contains user-specific data.
- Extension is .ndf usually but can be of any extension.
- Log file
- Also known as Write ahead logs.
- Extension is .ldf
- Used for Transaction Management.
- This is used to recover from any unwanted instances. Perform important task of Rollback to uncommitted transactions.
Key Terms
access control: This is a method for restricting access to certain resources by certain individuals.
ACID: This stands for Atomicity, Consistency, Isolation, and Durability. [This was repeated from Chapter 8.]
Atomicity: A transaction is a single unit. It completes fully (Commit) or it fails (Abort).
authentication: This is a way to verify the identity of a person.
authorization: This a way of determining whether or not that a person should be allowed to access the data or to make a change.
Consistency: The integrity constraints must be maintained so that the database is consistent before and after a transaction. We are ensuring that the data is correct.
Durability: The changes must persist. These changes need to be written to the storage medium. So if the power is lost, the changes are present and are not lost.
encryption: This makes a text unreadable and thus limits the loss of data should a hacker is able to penetrate the previous layers.
Extent: This is a Microsoft approach to data pages. A data page is 8 KB in size. Eight data pages form one Extent.
firewall: This is a separator or a restrictor of network traffic.
Identity-Based Access Control (IBAC): This grants access based on the identity of the user and on the granted credentials.
Isolation: Transactions do not interfere with other transactions.
logical access controls: These limit connections to computer networks, system files, and data. These are provided via software and these support letting certain individuals or groups to gain access to sensitive information
pages: This is an approach for storing data in memory or on a hard drive.
permission: In the context of Microsoft usage of RBAC, this is an access that is granted on a securable for a specific principal.
physical access controls: These are physical means such as fences, locked doors, and guards.
principal: In the context of Microsoft usage of RBAC, this is an entity that can be authenticated.
Principle of Least Privilege (PLP): This directs that users can only see what they need for their job.
securable: In the context of Microsoft usage of RBAC, this is a SQL Server resource that can be accessed by a principal.
SQL Server Application Database Roles: These are used by an application instead of by individuals. This adds another layer of security.
SQL Server Application Roles: These are used by an application instead of by individuals. The intent is to keep regular users and application users separate and safe.
Tabular Data Stream (TDS) protocol: This is a protocol for transferring data between the client machine and the server machine.
view: In the context of Microsoft usage, this limits what a person can see and can do.
Exercises
1. Describe the generic major components of a modern database system. [CE2016 CE-SWD-10]
2. Describe the following components for your selected DBMS:
- Query Optimizer
- Query Executor
- Storage Manager
- Transaction Processor
Note: This chapter focused on Microsoft SQL Server. There was a brief mention of the Oracle Database architecture. Answer this question based on the DBMS that your class is using.
[CS2013 IM/Database Systems 2 and 4]
3. Describe DBMS access controls, privilege levels, and security principles. Chapter 12 will ask this question again. [CS2013 IM/Database Systems 2 and National Security Agency DMS 4]
4. Define or explain the following terms:
- access control
- ACID
- Atomicity
- authentication
- authorization
- Include in your answer five examples.
- Consistency
- Durability
- encryption
- Extent
- firewall
- functions
- Isolation
- logical access controls
- pages
- permission
- physical access controls. These are physical means such as fences, locked doors, and guards.
- principal
- Principle of Least Privilege (PLP)
- securable
- SQL Server Application Database Roles
- SQL Server Application Roles
- Tabular Data Stream (TDS) protocol
- User-Defined SQL Server Database Roles
- User-Defined SQL Server Roles
- view
5. Microsoft SQL Server, Oracle database, and others use a paging scheme. What algorithm is used for dropping pages?
6. List and explain the five security layers used by Microsoft for securing SQL Server.
7. What is the TCP/IP port number for your DBMS?
8. What are the two types of errors that the CMD Parser would flag?
9. What is the output from the CMD parser?
10 Why does the Access Methods module not route SELECT queries to the Transaction Manager?
11. Explain ACID.
12. Explain the purpose of having a log manager and a lock manager in the Transaction Manager.
13. What are Dirty Pages?
14. Each page has a metadata. List and explain the items.
15. Explain the three database page types.
16. Which page type have a variable number of files?
Bonus Questions
BQ1. Microsoft SQL Server has the Optimizer component. There are many articles about how to write optimized queries. Explain why the Optimizer does not negate the need to write better queries.
A Running Project
Continue to work on your project.
Attribution
This chapter of Database Design is a brand-new addition.
This chapter drew from many sources.
Image Attribution
No second edition images were used.
References
“ACID Properties in DBMS,” Geeks for Geeks, September 19, 2024. https://www.geeksforgeeks.org/acid-properties-in-dbms/
Fiona Brown. “SQL Server Architecture (Explained),” GURU99, June 28, 2024. https://www.guru99.com/sql-server-architecture.html
“Oracle Database 19c Technical Architecture,” Oracle, 2019. https://www.oracle.com/webfolder/technetwork/tutorials/architecture-diagrams/19/pdf/db-19c-architecture.pdf
Oracle does have a tutorial that explains the architecture in understandable terms. See “Oracle Database Architecture,” Oracle Tutorial, n.d. https://www.oracletutorial.com/oracle-administration/oracle-database-architecture/
Robert Sheldon. “SQL Server Access Control: The Basics,” redgate, August 2, 2016. https://www.red-gate.com/simple-talk/databases/sql-server/database-administration-sql-server/sql-server-access-control-basics/
Nivritti Suste. “SQL Server Database and Server Roles for Security and Permissions,” MS SQL Tips, August 13, 2024. https://www.mssqltips.com/sqlservertip/8045/sql-server-database-and-server-roles-for-security-and-permissions/
Van To, William Assaf, Dennis Rea, et al. “Securing SQL Server,” Learn Microsoft, August 25, 2023. https://learn.microsoft.com/en-us/sql/relational-databases/security/securing-sql-server
- Authentication
- Authorization
- Access
- Manage
- Audit
This viewpoint seems unique to this Geeks for Geeks writer.
- https://pages.nist.gov/800-63-FAQ/#q-b13
- https://pages.nist.gov/800-63-3/sp800-63b.html
- https://pages.nist.gov/800-63-3/sp800-63-3.html
- Authentication and channel encryption negotiations
- Specification of requests in SQL (including Bulk Insert)
- Invocation of a stored procedure (also known as Remote Procedure Call or RPC)
- Returning of data
- Transaction Manager Request.
As of October 2024, the current version is 36.0. To learn more, see https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-tds/b46a581a-39de-4745-b076-ec4dbb7d13ec