Relational Database Management System — Introduction to Relational Databases (RDBMS)

Chapter 1

Overview of Database Management System

Learning Objectives. This chapter provides an overview of database management system which includes concepts related to data, database, and database management system. After completing this chapter, the reader should be familiar with the following concepts:

Data, information, database, database management system
Need and evolution of DBMS.
File management vs. database management system
ANSI/SPARK data model
Database architecture: two-, three-, and multitier architecture

1.1 Introduction

Science, business, education, economy, law, culture, all areas of human development “work” with the constant aid of data. Databases play a crucial role within science research: the body of scientific and technical data and information in the public domain is massive and factual data are fundamental to the progress of science. But the progress of science is not the only process affected by the way people use databases. Stock exchange data are absolutely necessary to any analyst; access to comprehensive databases of large scale is an everyday activity of a teacher, an educator, an academic or a lawyer. There are databases collecting all sorts of different data: nuclear structure and radioactive decay data for isotopes (the Evaluated Nuclear Structure Data File) and genes sequences (the Human Genome Database), prisoners’ DNA data (“DNA offender database”), names of people accused of drug offenses, telephone numbers, legal materials and many others. In this chapter, the basic idea about database management system, its evolution, its advantage over conventional file system, database system structure is discussed.

1.2 Data and Information

Data are raw facts that constitute building block of information. Data are the heart of the DBMS. It is to be noted that all the data will not convey useful information. Useful information is obtained from processed data. In other words, data has to be interpreted in order to obtain information. Good, timely, relevant information is the key to decision making. Good decision making is the key to organizational survival.
Data are a representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or automatic means. The data in DBMS can be broadly classified into two types, one is the collection of information needed by the organization and the other is “metadata” which is the information about the database. The term “metadata” will be discussed in detail later in this chapter.

Data are the most stable part of an organization’s information system. A company needs to save information about employees, departments, and salaries. These pieces of information are called data. Permanent storage of data are referred to as persistent data. Generally, we perform operations on data or data items to supply some information about an entity. For example, library keeps a list of members, books, due dates, and fines.

1.3 Database

A database is a well-organized collection of data that are related in a meaningful way, which can be accessed in different logical orders. Database systems are systems in which the interpretation and storage of information are of primary importance. The database should contain all the data needed by the organization as a result, a huge volume of data, the need for long-term storage of the data, and access of the data by a large number of users generally characterize database systems. The simplified view of database system is shown in Fig. 1.1. From this figure, it is clear that several users can access the data in an organization still the integrity of the data should be maintained. A database is integrated when same information is not recorded in two places.

1.4 Database Management System

A database management system (DBMS) consists of collection of interrelated data and a set of programs to access that data. It is software that is helpful in maintaining and utilizing a database.

A DBMS consists of:

A collection of interrelated and persistent data. This part of DBMS is referred to as database (DB).
A set of application programs used to access, update, and manage data. This part constitutes data management system (MS).
A DBMS is general-purpose software i.e., not application specific. The same DBMS (e.g., Oracle, Sybase, etc.) can be used in railway reservation system, library management, university, etc.
A DBMS takes care of storing and accessing data, leaving only application specific tasks to application programs.

DBMS is a complex system that allows a user to do many things to data as shown in Fig. 1.2. From this figure, it is evident that DBMS allows user to input data, share the data, edit the data, manipulate the data, and display the data in the database. Because a DBMS allows more than one user to share the data; the complexity extends to its design and implementation.

1.4.1 Structure of DBMS

An overview of the structure of database management system is shown in Fig. 1.3. A DBMS is a software package, which translates data from its logical representation to its physical representation and back.
The DBMS uses an application specific database description to define this translation. The database description is generated by a database designer.

**Fig. 1.2. Capabilities of database management system**

from his or her conceptual view of the database, which is called the Conceptual Schema. The translation from the conceptual schema to the database description is performed using a data definition language (DDL) or a graphical or textual design interface.

1.5 Objectives of DBMS

The main objectives of database management system are data availability, data integrity, data security, and data independence.

1.5.1 Data Availability

Data availability refers to the fact that the data are made available to wide variety of users in a meaningful format at reasonable cost so that the users can easily access the data.

1.5.2 Data Integrity

Data integrity refers to the correctness of the data in the database. In other words, the data available in the database is a reliable data.

1.5.3 Data Security

Data security refers to the fact that only authorized users can access the data. Data security can be enforced by passwords. If two separate users are accessing a particular data at the same time, the DBMS must not allow them to make conflicting changes.

1.5.4 Data Independence

DBMS allows the user to store, update, and retrieve data in an efficient manner. DBMS provides an “abstract view” of how the data is stored in the database.
In order to store the information efficiently, complex data structures are used to represent the data. The system hides certain details of how the data are stored and maintained.

1.6 Evolution of Database Management Systems

File-based system was the predecessor to the database management system. Apollo moon-landing process was started in the year 1960. At that time, there was no system available to handle and manage large amount of information. As a result, North American Aviation which is now popularly known as Rock-well International developed software known as Generalized Update Access Method (GUAM). In the mid-1960s, IBM joined North American Aviation to develop GUAM into Information Management System (IMS). IMS was based on Hierarchical data model. In the mid-1960s, General Electric released Integrated Data Store (IDS). IDS were based on network data model. Charles Bachmann was mainly responsible for the development of IDS. The network database was developed to fulfil the need to represent more complex data relationships than could be modeled with hierarchical structures. Conference on Data System Languages formed Data Base Task Group (DBTG) in 1967. DBTG specified three distinct languages for standardization. They are Data Definition Language (DDL), which would enable Database Administrator to define the schema, a subschema DDL, which would allow the application programs to define the parts of the database and Data Manipulation Language (DML) to manipulate the data.

The network and hierarchical data models developed during that time had the drawbacks of minimal data independence, minimal theoretical foundation, and complex data access. To overcome these drawbacks, in 1970, Codd of IBM published a paper titled “A Relational Model of Data for Large Shared Data Banks” in Communications of the ACM, vol. 13, No. 6, pp. 377-387, June 1970. As an impact of Codd’s paper, System R project was developed during the late 1970 by IBM San Jose Research Laboratory in California. The project was developed to prove that relational data model was implementable. The outcome of System R project was the development of Structured Query Language (SQL) which is the standard language for relational database management system. In 1980s IBM released two commercial relational database management systems known as DB2 and SQL/DS and Oracle Corporation released Oracle. In 1979, Codd himself attempted to address some of the failings in his original work with an extended version of the relational model called RM/T in 1979 and RM/V2 in 1990. The attempts to provide a data model that represents the “real world” more closely have been loosely classified as Semantic Data Modeling.

In recent years, two approaches to DBMS are more popular, which are Object-Oriented DBMS (OODBMS) and Object Relational DBMS (OR- DBMS).

The chronological order of the development of DBMS is as follows:

— Flat files — 1960s—1980s
— Hierarchical — 1970s—1990s
— Network — 1970s—1990s
— Relational — 1980s—present
— Object-oriented — 1990s—present
— Object-relational — 1990s—present
— Data warehousing — 1980s—present
— Web-enabled — 1990s—present

Early 1960s. Charles Bachman at GE created the first general purpose DBMS Integrated Data Store. It created the basis for the network model which was standardized by CODASYL (Conference on Data System Language).
Lake 1900s. IBM developed the Information Management System (IMS). IMS used an alternate model, called the Hierarchical Data Model.
1970. Edgar Codd, from IBM created the Relational Data Model. In 1981 Codd received the Turing Award for his contributions to database theory. Codd Passed away in April 2003.
1976. Peter Chen presented Entity-Relationship model, which is widely used in database design.
1980. SQL developed by IBM, became the standard query language for data-bases. SQL was standardized by ISO.
1980s and 1990s. IBM, Oracle, Informix and others developed powerful DBMS.

1.7 Classification of Database Management System

The database management system can be broadly classified into (1) Passive Database Management System and (2) Active Database Management System:

Passive Database Management System. Passive Database Management Systems are program-driven. In passive database management system, the users query the current state of database and retrieve the information currently available in the database. Traditional DBMSs are passive in the sense that they are explicitly and synchronously invoked by user or application program-initiated operations. Applications send requests for operations to be performed by the DBMS and wait for the DBMS to confirm and return any possible answers. The operations can be definitions and updates of the schema, as well as queries and updates of the data.
Active Database Management System. Active Database Management Systems are data-driven or event-driven systems. In active database management system, the users specify to the DBMS the information they need. If the information of interest is currently available, the DBMS actively monitors the arrival of the desired information and provides it to the relevant users. The scope of a query in a passive DBMS is limited to the past and present data, whereas the scope of a query in an active DBMS additionally includes future data. An active DBMS reverses the control flow between applications and the DBMS instead of only applications calling the DBMS, the DBMS may also call applications in an active DBMS.
Active databases contain a set of active rules that consider events that represent database state changes, look for TRUE or FALSE conditions as the result of a database predicate or query, and take an action via a data manipulation program embedded in the system. Alert is extension architecture at the IBM Almaden Research, for experimentation with active databases.

1.8 File-Based System

Prior to DBMS, file system provided by OS was used to store information. In a file-based system, we have collection of application programs that perform services for the end users. Each program defines and manages its own data.

Consider University database, the University database contains details about student, faculty, lists of courses offered, and duration of course, etc. In File-based processing for each database there is separate application program which is shown in Fig. 1.4

One group of users may be interested in knowing the courses offered by the university. One group of users may be interested in knowing the faculty information. The information is stored in separate files and separate applications programs are written.

1.9 Drawbacks of File-Based System

The limitations of file-based approach are duplication of data, data dependence, incompatible file formats, separation, and isolation of data.

1.9.1 Duplication of Data

Duplication of data means same data being stored more than once. This can also be termed as data redundancy. Data redundancy is a problem in file-based approach due to the decentralized approach. The main drawbacks of duplication of data are:

Duplication of data leads to wastage of storage space. If the storage space is wasted it will have a direct impact on cost. The cost will increase.
Duplication of data can lead to loss of data integrity; the data are no longer consistent. Assume that the employee detail is stored both in the department and in the main office. Now the employee changes his contact address. The changed address is stored in the department alone and not in the main office. If some important information has to be sent to his contact address from the main office, then that information will be lost. This is due to the lack of decentralized approach.

1.9.2 Data Dependence

Data dependence means the application program depends on the data. If some modifications have to be made in the data, then the application program has to be rewritten. If the application program is independent of the storage structure of the data, then it is termed as data independence. Data independence is generally preferred as it is more flexible. But in file-based system there is program-data dependence.

1.9.3 Incompatible File Formats

As file-based system lacks program data independence, the structure of the file depends on the application programming language. For example, the structure of the file generated by FORTRAN program may be different from the structure of a file generated by “C” program. The incompatibility of such files makes them difficult to process jointly.

1.9.4 Separation and Isolation of Data

In file-based approach, data are isolated in separate files. Hence it is difficult to access data. The application programmer must synchronize the processing of two files to ensure that the correct data are extracted. This difficulty is more if data has to be retrieved from more than two files.
The draw backs of conventional file-based approach are summarized later:

We have to store the information in a secondary memory such as a disk. If the volume of information is large; it will occupy more memory space.
We have to depend on the addressing facilities of the system. If the data-base is very large, then it is difficult to address the whole set of records.
For each query, for example the address of the student and the list of electives that the student has chosen, we have to write separate programs.
While writing several programs, lot of variables will be declared, and it will occupy some space.
It is difficult to ensure the integrity and consistency of the data when more than one program accesses some file and changes the data.
In case of a system crash, it becomes hard to bring back the data to a consistent state.
“Data redundancy” occurs when identical data are distributed over various files.
Data distributed in various files may be in different formats hence it is difficult to share data among different application (Data Isolation).

1.10 DBMS Approach

DBMS is software that provides a set of primitives for defining, accessing, and manipulating data. In DBMS approach, the same data are being shared by different application programs; as a result, data redundancy is minimized. The DBMS approach of data access is shown in Fig. 1.5.

1.11 Advantages of DBMS

There are many advantages of database management system. Some of the advantages are listed later:

Centralized data management.
Data Independence.
System Integration.

1.11.1 Centralized Data Management

In DBMS all files are integrated into one system thus reducing redundancies and making data management more efficient.

1.11.2 Data Independence

Data independence means that programs are isolated from changes in the way the data are structured and stored. In a database system, the database management system provides the interface between the application programs and the data. Physical data independence means the applications need not worry about how the data are physically structured and stored. Applications should work with a logical data model and declarative query language.
If major changes were to be made to the data, the application programs may need to be rewritten. When changes are made to the data representation, the data maintained by the DBMS is changed but the DBMS continues to provide data to application programs in the previously used way.
Data independence is the immunity of application programs to changes in storage structures and access techniques. For example if we add a new attribute, change index structure then in traditional file processing system, the applications are affected. But in a DBMS environment these changes are reflected in the catalog, as a result the applications are not affected. Data independence can be physical data independence or logical data independence.

Physical data independence is the ability to modify physical schema without causing the conceptual schema or application programs to be rewritten.
Logical data independence is the ability to modify the conceptual schema without having to change the external schemas or application programs.

1.11.3 Data Inconsistency

Data inconsistency means different copies of the same data will have different values. For example, consider a person working in a branch of an organization. The details of the person will be stored both in the branch office as well as in the main office. If that particular person changes his address, then the “change of address” has to be maintained in the main as well as the branch office.
For example, the “change of address” is maintained in the branch office but not in the main office, then the data about that person is inconsistent.
DBMS is designed to have data consistency. Some of the qualities achieved in DBMS are:

Data redundancy->Reduced in DBMS.
Data independence->Activated in DBMS.
Data inconsistency->Avoided in DBMS.
Centralizing the data->Achieved in DBMS.
Data integrity->Necessary for efficient Transaction.
Support for multiple views->Necessary for security reasons.

Data redundancy means duplication of data. Data redundancy will occupy more space hence it is not desirable.
Data independence means independence between application program and the data. The advantage is that when the data representation changes, it is not necessary to change the application program.
Data inconsistency means different copies of the same data will have different values.
Centralizing the data means data can be easily shared between the users but the main concern is data security.
The main threat to data integrity comes from several different users attempting to update the same data at the same time. For example, “The number of booking made is larger than the capacity of the aircraft/train.”
Support for multiple views means DBMS allows different users to see different “views” of the database, according to the perspective each one requires. This concept is used to enhance the security of the database.

1.12 Ansi/Spark Data Model (American National Standard Institute/ Standards Planning and Requirements Committee)

The distinction between the logical and physical representation of data were recognized in 1978 when ANSI/SPARK committee proposed a generalized framework for database systems. This framework provided a three-level architecture, three levels of abstraction at which the database could be viewed.

1.12.1 Need for Abstraction

The main objective of DBMS is to store and retrieve information efficiently; all the users should be able to access same data. The designers use complex data structure to represent the data, so that data can be efficiently stored and retrieved, but it is not necessary for the users to know physical database storage details. The developers hide the complexity from users through several levels of abstraction.

1.12.2 Data Independence

Data independence means the internal structure of database should be unaffected by changes to physical aspects of storage. Because of data independence, the Database administrator can change the database storage structures without affecting the users view.
The different levels of data abstraction are:

Physical level or internal level
Logical level or conceptual level
View level or external level

Physical Level
It is concerned with the physical storage of the information. It provides the internal view of the actual physical storage of data. The physical level describes complex low-level data structures in detail.

Logical Level
Logical level describes what data are stored in the database and what relationships exist among those data.
Logical level describes the entire database in terms of a small number of simple structures. The implementation of simple structure of the logical level may involve complex physical level structures; the user of the logical level does not need to be aware of this complexity. Database administrator use the logical level of abstraction.

View Level
View level is the highest level of abstraction. It is the view that the individual user of the database has. There can be many view level abstractions of the same data. The different levels of data abstraction are shown in Fig. 1.6.

Database Instances
Database change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database.

Database Schema
The overall design of the database is called the database schema. A schema is a collection of named objects. Schemas provide a logical classification of objects in the database. A schema can contain tables, views, triggers, functions, packages, and other objects.

A schema is also an object in the database. It is explicitly created using the CREATE SCHEMA statement with the current user recorded as the schema owner. It can also be implicitly created when another object is created, provided the user has IMPLICIT_SCHEMA authority.

1.13 Data Models

Data model is collection of conceptual tools for describing data, relationship between data, and consistency constraints. Data models help in describing the structure of data at the logical level. Data model describe the structure of the database. A data model is the set of conceptual constructs available for defining a schema. The data model is a language for describing the data and database, it may consist of abstract concepts, which must be translated by the designer into the constructs of the data definition interface, or it may consist of constructs, which are directly supported by the data definition interface. The constructs of the data model may be defined at many levels of abstraction.

1.13.1 Early Data Models

Three historically important data models are the hierarchical, network, and relational models. These models are relevant for their contributions in establishing the theory of data modeling and because they were all used as the basis of working and widely used database systems. Together they are often referred to as the “basic” data models. The hierarchical and network models, developed in the 1960s and 1970s, were based on organizing the primitive data structures in which the data were stored in the computer by adding connections or links between the structures. As such they were useful in presenting the user with a well-defined structure, but they were still highly coupled to the underlying physical representation of the data. Although they did much to assist in the efficient access of data, the principle of data independence was poorly supported.

1.14 Components and Interfaces of Database Management System

A database management system involves five major components; data, hardware, software, procedure, and users. These components and the interface between the components are shown in Fig. 1.7.

1.14.1 Hardware

The hardware can range from a single personal computer, to a single mainframe, to a network of computers. The particular hardware depends on the

requirements of the organization and the DBMS used. Some DBMSs run only on particular operating systems, while others run on a wide variety of operating systems. A DBMS requires a minimum amount of main memory and disk space to run, but this minimum configuration may not necessarily give acceptable performance.

1.14.2 Software

The software includes the DBMS software, application programs together with the operating systems including the network software if the DBMS is being used over a network. The application programs are written in third-generation programming languages like “C,” COBOL, FORTRAN, Ada, Pascal, etc. or using fourth-generation language such as SQL, embedded in a third-generation language. The target DBMS may have its own fourth-generation tools which allow development of applications through the provision of nonprocedural query languages, report generators, graphics generators, and application generators. The use of fourth-generation tools can improve productivity significantly and produce programs that are easier to maintain.

1.14.3 Data

A database is a repository for data which, in general, is both integrated and shared. Integration means that the database may be thought of as a unification of several otherwise distinct files, with any redundancy among those files partially or wholly eliminated. The sharing of a database refers to the sharing of data by different users, in the sense that each of those users may have access to the same piece of data and may use it for different purposes. Any given user will normally be concerned with only a subset of the whole database. The main features of the data in the database are listed later:

The data in the database is well organized (structured)
The data in the database is related.
The data are accessible in different orders without great difficulty.

The data in the database is persistent, integrated, structured, and shared.

Integrated Data

A data can be considered to be a unification of several distinct data files and when any redundancy among those files is eliminated, the data are said to be integrated data.

Shared Data

A database contains data that can be shared by different users for different application simultaneously. It is important to note that in this way of sharing of data, the redundancy of data are reduced, since repetitions are avoided, the possibility of inconsistencies is reduced.

Persistent Data

Persistent data are one, which cannot be removed from the database as a side effect of some other process. Persistent data have a life span that is not limited to single execution of the programs that use them.

1.14.4 Procedure

Procedures are the rules that govern the design and the use of database. The procedure may contain information on how to log on to the DBMS, start and stop the DBMS, procedure on how to identify the failed component, how to recover the database, change the structure of the table, and improve the performance.

1.14.5 People Interacting with Database

Here people refers to the people who manages the database, database administrator, people who design the application program, database designer and the people who interacts with the database, database users.

A DBMS is typically run as a back-end server in a local or global network, offering services to clients directly or to Application Servers.

Database Administrator

Database Administrator is a person having central control over data and programs accessing that data. The database administrator is a manager whose responsibilities are focused on management of technical aspects of the database system. The objectives of database administrator are given as follows:

To control the database environment
To standardize the use of database and associated software
To support the development and maintenance of database application projects
To ensure all documentation related to standards and implementation is up-to-date.

The summarized objectives of database administrator are shown in Fig. 1.8.

The control of the database environment should exist from the planning right through to the maintenance stage. During application development the database administrator should carry out the tasks that ensure proper control of the database when an application becomes operational. This includes review of each design stage to see if it is feasible from the database point of view. The database administrator should be responsible for developing standards to apply to development projects. In particular these standards apply to system analysis, design, and application programming for projects which are going to use the database. These standards will then be used as a basis for training systems analysts and programmers to use the database management system efficiently.

Responsibilities of Database Administrator (DBA)

The responsibility of the database administrator is to maintain the integrity, security, and availability of data. A database must be protected from

accidents, such as input or programming errors, from malicious use of the database and from hardware or software failures that corrupt data. Protection from accidents that cause data inaccuracy is a part of maintaining data integrity. Protecting the database from unauthorized or malicious use is termed as database security. The responsibilities of the database administrator are summarized as follows:

Authorizing access to the database.
Coordinating and monitoring its use.
Acquiring hardware and software resources as needed.
Backup and recovery. DBA has to ensure regular backup of database, in- case of damage, suitable recovery procedure are used to bring the database up with little downtime as possible.

Database Designer

Database designer can be either logical database designer or physical database designer. Logical database designer is concerned with identifying the data, the relationships between the data, and the constraints on the data that is to be stored in the database. The logical database designer must have thorough understanding of the organizations data and its business rule.

The physical database designer takes the logical data model and decides the way in which it can be physically implemented. The logical database designer is responsible for mapping the logical data model into a set of tables and integrity constraints, selecting specific storage structure, and designing security measures required on the data. In a nutshell, the database designer is responsible for:

Identifying the data to be stored in the database.
Choosing appropriate structure to represent and store the data.

Database Manager

Database manager is a program module which provides the interface between the low level data stored in the database and the application programs and queries submitted to the system:

The database manager would translate DML statement into low level file system commands for storing, retrieving, and updating data in the database.
Integrity enforcement. Database manager enforces integrity by checking consistency constraints like the bank balance of customer must be maintained to a minimum of Rs. 300, etc.
Security enforcement. Unauthorized users are prohibited to view the information stored in the data base.
Backup and recovery. Backup and recovery of database is necessary to ensure that the database must remain consistent despite the fact of failures.

Database Users

Database users are the people who need information from the database to carry out their business responsibility. The database users can be broadly classified into two categories like application programmers and end users.

Sophisticated End Users

Sophisticated end users interact with the system without writing programs. They form requests by writing queries in a database query language. These are submitted to query processor. Analysts who submit queries to explore data in the database fall in this category.

Specialized End Users

Specialized end users write specialized database application that does not fit into data-processing frame work. Application involves knowledge base and expert system, environment modeling system, etc.

Naive End Users

Naive end user interact with the system by using permanent application program Example: Query made by the student, namely number of books borrowed in library database.

System Analysts

System analysts determine the requirements of end user, and develop specification for canned transaction that meets this requirement.

Canned Transaction

Ready made programs through which naive end users interact with the database is called canned transaction.

1.14.6 Data Dictionary

A data dictionary, also known as a “system catalog,” is a centralized store of information about the database. It contains information about the tables, the fields the tables contain, data types, primary keys, indexes, the joins which have been established between those tables, referential integrity, cascades update, cascade delete, etc. This information stored in the data dictionary is called the “Metadata.” Thus, a data dictionary can be considered as a file that stores Metadata. Data dictionary is a tool for recording and processing information about the data that an organization uses. The data dictionary is a central catalog for Metadata. The data dictionary can be integrated within the DBMS or separate. Data dictionary may be referenced during system design, programming, and by actively executing programs. One of the major functions of a true data dictionary is to enforce the constraints placed upon the database by the designer, such as referential integrity and cascade delete.

Metadata

The information (data) about the data in a database is called Metadata. The Metadata are available for query and manipulation, just as other data in the database.

The functional components of database system structure are:
1. Storage manager.
2. Query processor.

Storage Manager
Storage manager is responsible for storing, retrieving, and updating data in the database. Storage manager components are:
1. Authorization and integrity manager.
2. Transaction manager.
3. File manager.
4. Buffer manager.

Transaction Management

A transaction is a collection of operations that performs a single logical function in a database application.
Transaction-management component ensures that the database remains in a consistent state despite system failures and transaction failure.
Concurrency control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.

Authorization and Integrity Manager

Checks the integrity constraints and authority of users to access data.

Transaction Manager

It ensures that the database remains in a consistent state despite system failures. The transaction manager manages the execution of database manipulation requests. The transaction manager function is to ensure that concurrent access to data does not result in conflict.

File Manager
File manager manages the allocation of space on disk storage. Files are used to store collections of similar data. A file management system manages independent files, helping to enter and retrieve information records. File manager establishes and maintains the list of structure and indexes defined in the internal schema. The file manager can:

Create a file
Delete a file
Update the record in the file
Retrieve a record from a file

Buffer

The area into which a block from the file is read is termed a buffer. The management of buffers has the objective of maximizing the performance or the utilization of the secondary storage systems, while at the same time keeping the demand on CPU resources tolerably low. The use of two or more buffers for a file allows the transfer of data to be overlapped with the processing of data.

Buffer Manager

Buffer manager is responsible for fetching data from disk storage into main memory. Programs call on the buffer manager when they need a block from disk. The requesting program is given the address of the block in main memory, if it is already present in the buffer. If the block is not in the buffer, the buffer manager allocates space in the buffer for the block, replacing some other block, if required, to make space for new block. Once space is allocated in the buffer, the buffer manager reads in the block from the disk to the buffer, and passes the address of the block in main memory to the requester.

Indices

Indices provide fast access to data items that hold particular values. An index is a list of numerical values which gives the order of the records when they are sorted on a particular field or column of the table.

1.15 Database Architecture

Database architecture essentially describes the location of all the pieces of information that make up the database application. The database architecture can be broadly classified into two-, three-, and multitier architecture.

1.15.1 Two-Tier Architecture

The two-tier architecture is a client—server architecture in which the client contains the presentation code and the SQL statements for data access. The database server processes the SQL statements and sends query results back to the client. The two-tier architecture is shown in Fig. 1.9. Two-tier client/server provides a basic separation of tasks. The client, or first tier, is primarily responsible for the presentation of data to the user and the “server,” or second tier, is primarily responsible for supplying data services to the client.

Presentation Services

“Presentation services” refers to the portion of the application which presents data to the user. In addition, it also provides for the mechanisms in which the user will interact with the data. More simply put, presentation logic defines and interacts with the user interface. The presentation of the data should generally not contain any validation rules.

Business Services/objects

“Business services” are a category of application services. Business services encapsulate an organizations business processes and requirements. These rules are derived from the steps necessary to carry out day-today business in an organization. These rules can be validation rules, used to be sure that the incoming information is of a valid type and format, or they can be process rules, which ensure that the proper business process is followed in order to complete an operation.

Application Services

“Application services” provide other functions necessary for the application.

Data Services

“Data services” provide access to data independent of their location. The data can come from legacy mainframe, SQL RDBMS, or proprietary data access systems. Once again, the data services provide a standard interface for accessing data.

Advantages of Two-tier Architecture

The two-tier architecture is a good approach for systems with stable requirements and a moderate number of clients. The two-tier architecture is the simplest to implement, due to the number of good commercial development environments.

Drawbacks of Two-tier Architecture

Software maintenance can be difficult because PC clients contain a mixture of presentation, validation, and business logic code. To make a significant change in the business logic, code must be modified on many PC clients. Moreover the performance of two-tier architecture can be poor when a large number of clients submit requests because the database server may be overwhelmed with managing messages. With a large number of simultaneous clients, three-tier architecture may be necessary.

1.15.2 Three-tier Architecture

A “Multitier,” often referred to as “three-tier” or “N-tier,” architecture pro- vides greater application scalability, lower maintenance, and increased reuse of components. Three-tier architecture offers a technology neutral method of building client/server applications with vendors who employ standard inter- faces which provide services for each logical “tier.” The three-tier architecture is shown in Fig. 1.10. From this figure, it is clear that in order to improve the performance a second-tier is included between the client and the server.

Through standard tiered interfaces, services are made available to the ap- plication. A single application can employ many different services which may reside on dissimilar platforms or are developed and maintained with different tools. This approach allows a developer to leverage investments in existing systems while creating new application which can utilize existing resources.

Although the three-tier architecture addresses performance degradations of the two-tier architecture, it does not address division-of-processing concerns. The PC clients and the database server still contain the same division of code although the tasks of the database server are reduced. Multiple-tier architectures provide more flexibility on division of processing.

1.15.3 Multitier Architecture

A multi-tier, three-tier, or N-tier implementation employs a three-tier logical architecture superimposed on a distributed physical model. Application Servers can access other application servers in order to supply services to the client application as well as to other Application Servers. The multiple-tier architecture is the most general client-server architecture. It can be most difficult to implement because of its generality. However, a good design and

implementation of multiple-tier architecture can provide the most benefits in terms of scalability, interoperability, and flexibility.

For example, in the diagram shown in Fig. 1.11, the client application looks to Application Server #1 to supply data from a mainframe-based application. Application Server #1 has no direct access to the mainframe application, but it does know, through the development of application services, that Application Server #2 provides a service to access the data from the mainframe application which satisfies the client request. Application Server #1 then invokes the appropriate service on Application Server #2 and receives the requested data which is then passed on to the client.

Application Servers can take many forms. An Application Server may be anything from custom application services, Transaction Processing Monitors, Database Middleware, Message Queue to a CORBA/COM based solution.

1.16 Situations where DBMS is not Necessary

It is also necessary to specify situations where it is not necessary to use a DBMS. If traditional file processing system is working well, and if it takes more money and time to design a database, it is better not to go for the DBMS. Moreover if only one person maintains the data and that person is not skilled in designing a database as well as not comfortable in using the DBMS then it is not advisable to go for DBMS.

DBMS is undesirable under following situations:

DBMS is undesirable if the application is simple, well-defined, and not expected to change.
Runtime overheads are not feasible because of real-time requirements.
Multiple accesses to data are not required.

Compared with file systems, databases have some disadvantages:

1. High cost of DBMS which includes:

Higher hardware costs
Higher programming costs
High conversion costs

2. Slower processing of some applications

3. Increased vulnerability

4. More difficult recovery

1.17 DBMS Vendors and their Products

Some of the popular DBMS vendors and their corresponding products are given Table 1.1.

Summary

The main objective of database management system is to store and manipulate the data in an efficient manner. A database is an organized collection of related data. All the data will not give useful information. Only processed data gives useful information, which helps an organization to take important

decisions. Before DBMS, computer file processing systems were used to store, manipulate, and retrieve large files of data. Computer file processing systems have limitations such as data duplications, limited data sharing, and no program data independence. In order to overcome these limitations database approach was developed. The main advantages of DBMS approach are program-data independence, improved data sharing, and minimal data redundancy. In this chapter we have seen the evolution of DBMS and broad introduction to DBMS. The responsibilities of Database administrator, ANSI/SPARK, two-tier, three-tier architecture were analyzed in this chapter.

Review Questions

1.1. What are the drawbacks of file processing system?

The drawbacks of file processing system are:

Duplication of data, which leads to wastage of storage space and data inconsistency.
Separation and isolation of data, because of which data cannot be used together.
No program data independence.

1.2. What is meant by Metadata?

Metadata are data about data but not the actual data.

1.3. Define the term data dictionary?

Data dictionary is a file that contains Metadata.

1.4. What are the responsibilities of database administrator?
1.5. Mention three situations where it is not desirable to use DBMS?
The situations where it is not desirable to use DBMS are:

The database and applications are not expected to change.
Data are not accessed by multiple users.

1.6. What is meant by independence?

Data independence renders application programs (e.g., SQL scripts) immune to changes in the logical and physical organization of data in the system. Logical organization refers to changes in the Schema. Example adding a column or tuples does not stop queries from working.

Physical organization refers to changes in indices, file organizations, etc.

1.7. What is meant by Physical and Logical data independence?

In logical data independence, the conceptual schema can be changed without changing the external schema. In physical data independence, the internal schema can be changed without changing the conceptual schema.

1.8. What are some disadvantages of using a DBMS over flat file system?

DBMS initially costs more than flat file system
DBMS requires skilled staff

1.9. What are the steps to design a good database?

First find out the requirements of the user
Design a view for each important application
Integrate the views giving the conceptual schema, which is the union of all views
Map to the data model provided by the DBMS (usually relational)
Design external views
Choose physical structures (indexes, etc.)

1.10. What is Database? Give an example.

A Database is a collection of related data. Here, the term “data” means that known facts that can be record. Examples of database are library information system, bus, railway, and airline reservation system. etc.

1.11. Define – DBMS

DBMS is a collection of programs that enables users to create and maintain a database.

1.12. Mention various types of databases?

The different types of databases are:

Multimedia database
Spatial database (Geographical Information system Database)
Real-time or Active Database
Data Warehouse or On-line Analytical Processing Database

1.13. Mention the advantages of using DBMS?

The advantages of using DBMS are:

Controlling Redundancy
Enforcing Integrity Constraints so as to maintain the consistency of the database
Providing Backup and recovery facilities
Restricting unauthorized access
Providing multiple user interfaces
Providing persistent storage of program objects and datastructures

1.14. What is “Snapshot” or “Database State”?

The data in the database at a particular moment is known as “Database State” or “Snapshot” of the Database.

1.15. Define Data Model.

It is a collection of concepts that can be used to describe the structure of a database.

The datamodel provides necessary means to achieve the abstraction i.e., hiding the details of data storage.

1.16. Mention the various categories of Data Model.

The various categories of datamodel are:

High Level or Conceptual Data Model (Example: ER model)
Low Level or Physical Data Model
Representational or Implementational Data Model
Relational Data Model
Network and Hierarchal Data Model
Record-based Data Model
Object-based Data Model

1.17. Define the concept of “database schema.” Describe the types of schemas that exist in a database complying with the three levels ANSI/SPARC architecture.

Database schema is nothing but description of the database. The types of schemas that exist in a database complying with three levels of ANSI/SPARC
architecture are:

External schema
Conceptual schema
Internal schema

Chapter 2

Entity-Relational Model

Learning Objectives. This chapter presents a top-down approach to data modeling. This chapter deals with ER and Enhanced ER (EER) model and conversion of ER model to relational model. After completing this chapter the reader should be familiar with the following concepts:

Entity, Attribute, and Relationship.
Entity classification-Strong entity, Weak entity, and Associative entity.
Attribute classification – Single value, Multivalue, Derived, and Null attribute.
Relationship – Unary, binary, and ternary relationship.
Enhanced ER model – Generalization, Specialization.
Mapping ER model to relation model or table.
Connection traps.

2.1 Introduction

Peter Chen first proposed modeling databases using a graphical technique that humans can relate to easily. Humans can easily perceive entities and their characteristics in the real world and represent any relationship with one another. The objective of modeling graphically is even more profound than simply representing these entities and relationship. The database designer can use tools to model these entities and their relationships and then generate database vendor-specific schema automatically. Entity Relationship (ER) model gives the conceptual model of the world to be represented in the database. ER Model is based on a perception of a real world that consists of collection of basic objects called entities and relationships among these objects. The main motivation for defining the ER model is to provide a high-level model for conceptual database design, which acts as an intermediate stage prior to mapping the enterprise being modeled onto a conceptual level. The ER model achieves a high degree of data independence which means that the database designer does not have to worry about the physical structure of the database. A database schema in ER model can be pictorially represented by Entity-Relationship diagram.

2.2 The Building Blocks of an Entity-Relationship Diagram

ER diagram is a graphical modeling tool to standardize ER modeling. The modeling can be carried out with the help of pictorial representation of entities, attributes, and relationships. The basic building blocks of Entity- Relationship diagram are Entity, Attribute and Relationship.

2.2.1 Entity

An entity is an object that exists and is distinguishable from other objects. In other words, the entity can be uniquely identified.

The examples of entities are:

A particular person, for example Dr. A.P.J. Abdul Kalam is an entity.
A particular department, for example Electronics and Communication Engineering Department.
A particular place, for example Coimbatore city can be an entity.

2.2.2 Entity Type

An entity type or entity set is a collection of similar entities. Some examples of entity types are:

All students in PSG, say STUDENT.
All courses in PSG, say COURSE.
All departments in PSG, say DEPARTMENT.

An entity may belong to more than one entity type. For example, a staff working in a particular department can pursue higher education as part-time. Hence the same person is a LECTURER at one instance and STUDENT at another instance.

2.2.3 Relationship

A relationship is an association of entities where the association includes one entity from each participating entity type whereas relationship type is a meaningful association between entity types.

The examples of relationship types are:

Teaches is the relationship type between LECTURER and STUDENT.
Buying is the relationship between VENDOR and CUSTOMER.
Treatment is the relationship between DOCTOR and PATIENT.

2.2.4 Attributes

Attributes are properties of entity types. In other words, entities are described in a database by a set of attributes.

The following are example of attributes:

Brand, cost, and weight are the attributes of CELLPHONE.
Roll number, name, and grade are the attributes of STUDENT.
Data bus width, address bus width, and clock speed are the attributes of
MICROPROCESSOR.

2.2.5 ER Diagram

The ER diagram is used to represent database schema. In ER diagram:

A rectangle represents an entity set.
An ellipse represents an attribute.
A diamond represents a relationship.
Lines represent linking of attributes to entity sets and of entity sets to relationship sets.

Example of ER diagram

Let us consider a simple ER diagram as shown in Fig. 2.1.

In the ER diagram the two entities are STUDENT and CLASS. Two simple attributes which are associated with the STUDENT are Roll number and the name. The attributes associated with the entity CLASS are Subject Name and Hall Number. The relationship between the two entities STUDENT and CLASS is Attends.

2.3 Classification of Entity Sets

Entity sets can be broadly classified into:

Strong entity.
Weak entity.
Associative entity.

2.3.1 Strong Entity

Strong entity is one whose existence does not depend on other entity.

Example
Consider the example, student takes course. Here student is a strong entity.

In this example, course is considered as weak entity because, if there are no students to take a particular course, then that course cannot be offered. The COURSE entity depends on the STUDENT entity.

2.3.2 Weak Entity
Weak entity is one whose existence depends on other entity. In many cases, weak entity does not have primary key.

Example
Consider the example, customer borrows loan. Here loan is a weak entity. For every loan, there should be at least one customer. Here the entity loan depends on the entity customer hence loan is a weak entity.

2.4 Attribute Classification

Attribute is used to describe the properties of the entity. This attribute can be broadly classified based on value and structure. Based on value the attribute can be classified into single value, multivalue, derived, and null value attribute. Based on structure, the attribute can be classified as simple and composite attribute.

2.4.1 Symbols Used in ER Diagram

The elements in ER diagram are Entity, Attribute, and Relationship. The different types of entities like strong, weak, and associative entity, different types of attributes like multivalued and derived attributes and identifying relationship and their corresponding symbols are shown later.

Single Value Attribute
Single value attribute means, there is only one value associated with that attribute.

Example

The examples of single value attribute are age of a person, Roll number of the student, Registration number of a car, etc.
Representation of Single Value Attribute in ER Diagram

Multivalued Attribute
In the case of multivalue attribute, more than one value will be associated with that attribute.

Representation of Multivalued Attribute in ER Diagram

Examples of Multivalued Attribute
1. Consider an entity EMPLOYEE. An Employee can have many skills; hence skills associated to an employee are a multivalue attribute.

2. Number of chefs in a hotel is an example of multivalue attribute. Moreover, a hotel will have variety of food items. Hence food items associated with the entity HOTEL is an example of multivalued attribute.

3. Application associated with an IC (Integrated Circuit). An IC can be used for several applications. Here IC stands for Integrated Circuit.

4. Subjects handled by a staff. A staff can handle more than one subject in a particular semester; hence it is an example of multivalue attribute.

Moreover, a staff can be an expert in more than one area, hence area of specialization is considered as multivalued attribute.

Derived Attribute

The value of the derived attribute can be derived from the values of other related attributes or entities.

In ER diagram, the derived attribute is represented by dotted ellipse.

Representation of Derived Attribute in ER Diagram ” “

Example of Derived Attribute

Age of a person can be derived from the date of birth of the person. In this example, age is the derived attribute.

2. Experience of an employee in an organization can be derived from date of joining of the employee.

3. CGPA of a student can be derived from GPA (Grade Point Average).

Null Value Attribute

In some cases, a particular entity may not have any applicable value for an attribute. For such situation, a special value called null value is created.

Example

In application forms, there is one column called phone no. if a person does not have phone, then a null value is entered in that column.

Composite Attribute

Composite attribute is one which can be further subdivided into simple attributes.

Example

Consider the attribute “address” which can be further subdivided into Street, name, City, and State.

As another example of composite attribute consider the degrees earned by a particular scholar, which can range from undergraduate, postgraduate, doctorate degree, etc. Hence degree can be considered as composite attribute.

2.5 Relationship Degree

Relationship degree refers to the number of associated entities. The relationship degree can be broadly classified into unary, binary, and ternary relationship.

2.5.1 Unary Relationship

The unary relationship is otherwise known as recursive relationship. In the unary relationship the number of associated entities is one. An entity related to itself is known as recursive relationship.

Roles and Recursive Relation

When an entity sets appear in more than one relationship, it is useful to add labels to connecting lines. These labels are called as roles.

Example

In this example, Husband and wife are referred as roles.

2.5.2 Binary Relationship

In a binary relationship, two entities are involved. Consider the example; each staff will be assigned to a particular department. Here the two entities are STAFF and DEPARTMENT.

2.5.3 Ternary Relationship

In a ternary relationship, three entities are simultaneously involved. Ternary relationships are required when binary relationships are not sufficient to accurately describe the semantics of an association among three entities.

Example

Consider the example of employee assigned a project. Here we are considering three entities EMPLOYEE, PROJECT, and LOCATION. The relationship is “assigned-to.” Many employees will be assigned to one project hence it is an example of one-to-many relationship.

2.5.4 Quaternary Relationships

Quaternary relationships involve four entities. The example of quaternary relationship is “A professor teaches a course to students using slides.” Here the four entities are PROFESSOR, SLIDES, COURSE, and STUDENT. The relationships between the entities are “Teaches.”

2.6 Relationship Classification

Relationship is an association among one or more entities. This relationship can be broadly classified into one-to-one relation, one-to-many relation, many- to-many relation and recursive relation.

2.6.1 One-to-Many Relationship Type

The relationship that associates one entity to more than one entity is called one-to-many relationship. Example of one-to-many relationship is Country having states. For one country there can be more than one state hence it is an example of one-to-many relationship. Another example of one-to-many relationship is parent—child relationship. For one parent there can be more than one child. Hence it is an example of one-to-many relationship.

2.6.2 One-to-One Relationship Type

One-to-one relationship is a special case of one-to-many relationship. True one-to-one relationship is rare. The relationship between the President and the country is an example of one-to-one relationship. For a particular country there will be only one President. In general, a country will not have more than one President hence the relationship between the country and the President is an example of one-to-one relationship. Another example of one-to-one relationship is House to Location. A house is obviously in only one location.

2.6.3 Many-to-Many Relationship Type

The relationship between EMPLOYEE entity and PROJECT entity is an example of many-to-many relationship. Many employees will be working in many projects hence the relationship between employee and project is many- to-many relationship.

2.6.4 Many-to-One Relationship Type

The relationship between EMPLOYEE and DEPARTMENT is an example of many-to-one relationship. There may be many EMPLOYEES working in one DEPARTMENT. Hence relationship between EMPLOYEE and DEPARTMENT is many-to-one relationship. The four relationship types are summarized and shown in Table 2.1.

2.7 Reducing ER Diagram to Tables

To implement the database, it is necessary to use the relational model. There is a simple way of mapping from ER model to the relational model. There is almost one-to-one correspondence between ER constructs and the relational ones.

2.7.1 Mapping Algorithm

The mapping algorithm gives the procedure to map ER diagram to tables.

The rules in mapping algorithm are given as:

For each strong entity type say E, create a new table. The columns of the table are the attribute of the entity type E.
For each weak entity W that is associated with only one 1 1 identifying owner relationship, identify the table T of the owner entity type. Include as columns of T, all the simple attributes and simple components of the composite attributes of W.
For each weak entity W that is associated with a 1-N or M-N identifying relationship, or participates in more than one relationship, create a new table T and include as its columns, all the simple attributes and simple components of the composite attributes of W. Also form its primary key by including as a foreign key in R, the primary key of its owner entity.
For each binary 1-1 relationship type R, identify the tables S and T of the participating entity types. Choose S, preferably the one with total participation. Include as foreign key in S, the primary key of T. Include as columns of S, all the simple attributes and simple components of the composite attributes of R.
For each binary 1-N relationship type R, identify the table S, which is at N side and T of the participating entities. Include as a foreign key in S, the primary key of T. Also include as columns of S, all the simple attributes and simple components of composite attributes of R.
For each M-N relationship type R, create a new table T and include as columns of T, all the simple attributes and simple components of com- posite attributes of R. Include as foreign keys, the primary keys of the participating entity types. Specify as the primary key of T, the list of foreign keys.
For each multivalued attribute, create a new table T and include as columns of T, the simple attribute or simple components of the attribute A. Include as foreign key, the primary key of the entity or relationship type that has A. Specify as the primary key of T, the foreign key and the columns corresponding to A.

Regular Entity

Regular entities are entities that have an independent existence and generally represent real-world objects such as persons and products. Regular entities are represented by rectangles with a single line.

2.7.2 Mapping Regular Entities

Each regular entity type in an ER diagram is transformed into a relation. The name given to the relation is generally the same as the entity type.
Each simple attribute of the entity type becomes an attribute of the relation.
The identifier of the entity type becomes the primary key of the corresponding relation.

Example 1

Mapping regular entity type tennis player

This diagram is converted into corresponding table as

Here,

-Entity name = Name of the relation or table.

In our example, the entity name is PLAYER which is the name of the table

-Attributes of ER diagram = Column name of the table.

In our example the Name, Nation, Position, and Number of Grand slams won which forms the column of the table.

2.7.3 Converting Composite Attribute in an ER Diagram to Tables

When a regular entity type has a composite attribute, only the simple component attributes of the composite attribute are included in the relation.

Example

In this example the composite attribute is the Customer address, which consists of Street, City, State, and Zip

When the regular entity type contains a multivalued attribute, two new relations are created.

The first relation contains all of the attributes of the entity type except the multivalued attribute.

The second relation contains two attributes that form the primary key of the second relation. The first of these attributes is the primary key from the first relation, which becomes a foreign key in the second relation. The second is the multivalued attribute.

2.7.4 Mapping Multivalued Attributes in ER Diagram to Tables

A multivalued attribute is having more than one value. One way to map a multivalued attribute is to create two tables.

Example

In this example, the skill associated with the EMPLOYEE is a multivalued attribute, since an EMPLOYEE can have more than one skill as fitter, electrician, turner, etc.

2.7.5 Converting “Weak Entities” in ER Diagram to Tables

Weak entity type does not have an independent existence and it exists only through an identifying relationship with another entity type called the owner.

For each weak entity type, create a new relation and include all of the simple attributes as attributes of the relation. Then include the primary key of the identifying relation as a foreign key attribute to this new relation.

The primary key of the new relation is the combination of the primary key of the identifying and the partial identifier of the weak entity type. In this example DEPENDENT is weak entity.

2.7.6 Converting Binary Relationship to Table

A relationship which involves two entities can be termed as binary relation- ship. This binary relationship can be one-to-one, one-to-many, many-to-one, and many-to-many.

Mapping one-to-Many Relationship

For each 1-M relationship, first create a relation for each of the two entity type’s participation in the relationship.

Example

One customer can give many orders. Hence the relationship between the two entities CUSTOMER and ORDER is one-to-many relationship. In one-to-, include the primary key attribute of the entity on the one-side of the relationship as a foreign key in the relation that is on the many sides of the relationship.

Here we have two entities CUSTOMER and ORDER. The relationship between CUSTOMER and ORDER is one-to-many. For two entities CUSTOMER and ORDER, two tables namely CUSTOMER and ORDER are created as shown later. The primary key CUSTOMER ID in the CUSTOMER relation becomes the foreign key in the ORDER relation.

Binary one-to-one relationship can be viewed as a special case of one-to- many relationships.

The process of mapping one-to-one relationship requires two steps. First, two relations are created, one for each of the participating entity types. Second, the primary key of one of the relations is included as a foreign key in the other relation.

2.7.7 Mapping Associative Entity to Tables

Many-to-many relationship can be modeled as an associative entity in the ER diagram.

Example 1. (Without Identifier)

Here the associative entity is ORDERLINE, which is without an identifier. That is the associative entity ORDERLINE is without any key attribute.

The first step is to create three relations, one for each of the two participating entity types and the third for the associative entity. The relation formed from the associative entity is associative relation.

Example 2. (With Identifier)

Sometimes data models will assign an identifier (surrogate identifier) to the associative entity type on the ER diagram. There are two reasons to motivate this approach:

The associative entity type has a natural identifier that is familiar to end user.
The default identifier may not uniquely identify instances of the associative entity.

(a) Shipment-No is a natural identifier to end user.

(b) The default identifier consisting of the combination of Customer-ID and Vendor-ID does not uniquely identify the instances of SHIPMENT.

2.7.8 Converting Unary Relationship to Tables

Unary relationships are also called recursive relationships. The two most important cases of unary relationship are one-to-many and many-to-many.

One-to-many Unary Relationship

Each employee has exactly one manager. A given employee may manage zero to many employees. The foreign key in the relation is named Manager-ID. This attribute has the same domain as the primary key Employee-ID.

2.7.9 Converting Ternary Relationship to Tables

A ternary relationship is a relationship among three entity types. The three entities given in this example are PATIENT, PHYSICIAN, and TREATMENT. The PATIENT–TREATMENT is an associative entity.

The primary key attributes – Patient ID, Physician ID, and Treatment Code – become foreign keys in PATIENT TREATMENT. These attributes are components of the primary key of PATIENT TREATMENT.

2.8 Enhanced Entity-Relationship Model (EER Model)

The basic concepts of ER modeling are not powerful enough for some complex applications. Hence some additional semantic modeling concepts are required, which are being provided by Enhanced ER model. The Enhanced ER model is the extension of the original ER model with new modeling constructs. The new modeling constructs introduced in the EER model are supertype (superclass)/subtype (subclass) relationships. The supertype allows us to model general entity type whereas the subtype allows us to model specialized entity types.

Enhanced ER model = ER model + hierarchical relationships.

EER modeling is especially useful when the domain being modeled is object-oriented in nature and the use of inheritance reduces the complexity of the design. The extended ER model extends the ER model to allow various types of abstraction to be included and to express constraints more clearly.

2.8.1 Supertype or Superclass

Supertype or superclass is a generic entity type that has a relationship with one or more subtypes. For example PLAYER is a generic entity type which has a relationship with one or more subtypes like CRICKET PLAYER, FOOTBALL PLAYER, HOCKEY PLAYER, TENNIS PLAYER, etc.

2.8.2 Subtype or Subclass

A subtype or subclass is a subgrouping of the entities in an entity type that is meaningful to the organization. A subclass entity type is a specialized type of superclass entity type. A subclass entity type represents a subset or subgrouping of superclass entity type’s instances. Subtypes inherit the attributes and relationships associated with their supertype.

Consider the entity type ENGINE, which has two subtypes PETROL ENGINE and DIESEL ENGINE.

Consider the entity type STUDENT, which has two subtypes UNDERGRADUATE and POSTGRADUATE.

2.9 Generalization and Specialization

Generalization and specialization are two words for the same concept, viewed from two opposite directions. Generalization is the bottom-up process of defining a generalized entity type from a set of more specialized entity types. Specialization is the top-down process of defining one or more subtypes of a supertype.

Generalization is the process of minimizing the differences between entities by identifying common features. It can also be defined as the process of defining a generalized entity type from a set of entity types.

Specialization is a process of identifying subsets of an entity set (the superset) that share some distinguishing characteristics. In specialization the superclass is defined first and the subclasses are defined next. Specialization is the process of viewing an object as a more refined, specialized object. Specialization emphasizes the differences between objects.

For example consider the entity type STUDENT, which can be further classified into FULLTIME STUDENT and PARTTIME STUDENT. The classification of STUDENT into FULLTIME STUDENT and PARTTIME STUDENT is called Specialization.

2.10 ISA Relationship and Attribute Inheritance

IS_A relationship supports attribute inheritance and relationship participation. In the EER diagram, the subclass relationship is represented by ISA relationship. Attribute inheritance is the property by which subclass entities inherit values for all attributes of the superclass.

Consider the example of EMPLOYEE entity set in a bank. The EMPLOYEE in a bank can be CLERK, MANAGER, CASHIER, ACCOUNTANT, etc. It is to be observed that the CLERK, MANAGER, CASHIER, ACCOUNTANT inherit some of the attributes of the EMPLOYEE.

In this example the superclass is EMPLOYEE and the subclasses are CLERK, MANAGER, and CASHIER. The subclasses inherit the attributes of the superclass. Since each member of the subclass is an ISA member of the superclass, the circle below the EMPLOYEE entity set represents ISA relationship.

2.11 Multiple Inheritance

A subclass with more than one superclass is called a shared subclass. A subclass inherits attributes not only of its direct superclass, but also of all its predecessor superclass, that is it has multiple inheritance from its superclasses. In multiple inheritance a subclass can be subclass of more than one superclass.

Example of Multiple Inheritance

Consider a person in an educational institution. The person can be employee, alumnus, and student. The employee entity can be staff or faculty. The student can be a graduate student or a postgraduate student. The postgraduate student can be a teaching assistant. If the postgraduate student is a teaching assistant, then he/she inherits the characteristics of the faculty as well as student class. That is the teaching assistant subclass is a subclass of more than one superclass (faculty, student). This phenomenon is called multiple inheritance and is shown in the Fig. 2.2.

2.12 Constraints on Specialization and Generalization

The constraints on specialization and generalization can be broadly classified into disjointness and completeness. The disjointness constraint allows us to specify whether an instance of a supertype may simultaneously be a member of two or more subtypes. In disjointness we have two categories (1) Overlap and (2) Disjoint. In completeness we have two categories (1) Total and (2) Partial. The completeness constraint addresses the question whether an instance of a supertype must also be a member of at least one subtype.

2.12.1 Overlap Constraint

Overlap refers to the fact that the same entity instance may be a member of more than one subclass of the specialization.

Example of Overlap Constraint

Consider the example of ANIMAL entity, which can be further subdivided into LAND ANIMAL and WATER ANIMAL. Consider the example of Frog and Crocodile which can live in both land and water hence the division of ANIMAL into LAND and WATER animals is an example of overlap constraint.

2.12.2 Disjoint Constraint

Disjoint refers to the fact that the same entity instance may be a member of only one subclass of the specialization.

Example of Disjointness Constraint

Consider the example of CATALOGUE. The CATALOGUE is a superclass, which can be further subdivided into BOOKS, JOURNALS, and PERIODICALS. This falls under disjointness because a BOOK entity can be neither JOURNAL nor PERIODICAL.

2.12.3 Total Specialization

Total completeness refers to the fact that every entity instance in the superclass must be a member of some subclass in the specialization. With total specialization, an instance of the supertype must be a member of at least one subtype.

Example of Total Specialization

Consider the example of TEACHER; the teacher is a general term, which can be further specialized into LECTURER, TUTOR, and DEMONSTRATOR. Here every member in the superclass participates as a member of a subclass, hence it is an example of total participation.

2.12.4 Partial Specialization

Partial completeness refers to the fact that an entity instance in the superclass need not be a member of any subclass in the specialization. With partial specialization, an instance of a supertype may or may not be a member of any subtype.

Example of Partial Specialization

Consider the PERSON specialization into EMPLOYEE and STUDENT. This is an example of partial specialization because there can be a person who is unemployed and does not study.

2.13 Aggregation and Composition

Relationships among relationships are not supported by the ER model. Groups of entities and relationships can be abstracted into higher level entities using aggregation. Aggregation represents a “HAS-A” or “IS-PART-OF” relationship between entity types. One entity type is the whole, the other is the part. Aggregation allows us to indicate that a relationship set participates in another relationship set.

Consider the example of a driver driving a car. The car has various components like tires, doors, engine, seat, etc., which varies from one car to another. Relationship drives is insufficient to model the complexity of this system. Part of relationships allow abstraction into higher level entities. In this example engine, tires, doors, and seats are aggregated into car.

Composition is a stronger form of aggregation where the part cannot exist without its containing whole entity type and the part can only be part of one entity type.

Consider the example of DEPARTMENT has PROJECT. Each project is associated with a particular DEPARTMENT. There cannot be a PROJECT without DEPARTMENT. Hence DEPARTMENT has PROJECT is an example of composition.

2.14 Entity Clusters

EER diagrams are difficult to read when there are many entities and relationships. One possible solution is to group entities and relationships into entity clusters. Entity cluster is a set of one or more entity types and associated relationships grouped into a single abstract entity type. Entity cluster behaves like an entity type; hence entity clusters and entity types can be further grouped to form a higher level entity cluster. Entity clustering is a hierarchical decomposition of a macrolevel view of the data model into finer and finer views, eventually resulting in the full detailed data model.

To understand entity cluster, consider the example of Hospital Management. In hospital, the DOCTORS treat the PATIENT. The DOCTORS are paid by the MANAGEMENT which builds buildings. The DOCTORS can 58 2 Entity–Relationship Model be either general physician or specialist like those with MS or MD. The patient can be either inpatient or outpatient. It is to be noted that only outpatient will be allotted bed. If we have to represent the earlier ideas, it can be done using EER diagram as shown in Fig. 2.3. The EER diagram is found to be complex; the same idea is represented using Entity Clusters as shown in Fig. 2.4. Here the DOCTOR specialization is clustered into DOCTORS entity and the PATIENT specialization is clustered into simply PATIENT. At the first glance, it may look like reduction of EER model to ER model, but it is not so. Here the entities as well as relationships are clustered into simply entity set.

2.15 Connection Traps

Connection trap is the misinterpretation of the meaning of certain relationships. This connection traps can be broadly classified into fan and chasm trap. Any conceptual model will contain potential connection traps. An error in the interpretation of the meaning of the relationship may cause the database to be incapable of storing certain information. Both the fan and chasm trap arise when the relationships appear to exist between entity types, but the links between occurrences may be ambiguous or not exist. Related groups of entities could become clusters.

2.15.1 Fan Trap

Fan trap occurs when the model represents a relationship between entity types but the pathway between certain entity occurrences is ambiguous. Fan trap occurs when 1-M relationships fan out from a single entity. In order to understand the concept of Fan trap, consider the following example

Contractor works in a team. . . . . . . . . Statement (1)

Team develops projects. . . . . . . . . . . . Statement (2)

Statement (1) represents M–1 relationship. Statement (2) represents 1–M relationship. But the information about which contractors are involved in developing which projects is not clear.

Consider another example of Fan trap.

Department is on Site. . . . . . . . . Statement (1)

Site employs Staff. . . . . . . . . . . . . . . Statement (2)

Statement (1) represents M–1 relationship, because many departments may be in a single site. Statement (2) represents 1–M relationships. However which staff works in a particular department is ambiguous. The fan trap is resolved by reconstructing the original ER model to represent the correct association.

2.15.2 Chasm Trap

A chasm trap occurs when a model suggests the existence of a relationship between entity types, but the pathway does not exist between certain entity occurrences. It occurs where there is a relationship with partial participation, which forms part of the pathway between entities that are related. Consider the relationship shown later.

A single branch may be allocated to many staff who oversees the management of properties for rent. It should be noted that not all staff oversee property and not all property is managed by a member of staff. Hence there exist a partial participation of Staff and Property in the relation “oversees,” which means that some properties cannot be associated with a branch office through a member of staff. Hence the model has to modified as shown later.

2.16 Advantages of ER Modeling

An ER model is derived from business specifications. ER models separate the information required by a business from the activities performed within a business. Although business can change their activities, the type of information tends to remain constant. Therefore, the data structures also tend to be constant. The advantages of ER modeling are summarized later:

The ER modeling provides an easily understood pictorial map for the database design.
It is possible to represent the real world problems in a better manner in ER modeling.
The conversion of ER model to relational model is straightforward.
The enhanced ER model provides more flexibility in modeling real world problems.
The symbols used to represent entity and relationships between entities are simple and easy to follow.

Summary

This chapter has described the fundamentals of ER modeling of data. An ER model is a logical representation of data. The ER model was introduced by Peter Chen in 1976. An ER model is usually expressed in the form of ER diagram. The basic constructs of ER model are entity types, relationships, and attributes. This chapter also described the types of entities like strong and weak entity, types of relationships like one-to-one, one-to-many, and many-to-many relationship. Attributes can also be classified as single valued, multivalued and derived attribute. In this chapter different types of entities, attributes, and relationship were explained with simple examples.

Review Questions

2.1. Construct an ER diagram of tennis player.

2.2. Construct an ER diagram of Indian cricket team.

One way of constructing ER diagram for Indian cricket team is shown later.

Here skills refers to player’s skill which may be batting, bowling, and fielding. All-rounders can have many skills.

2.3. What is Weak entity type?

Entity types that do not have key attribute of their own are called Weak entity type.

2.4. Define entity with example?

An entity is an object with a physical existence.

Examples of entity is a person, a car, an organization, a house, etc.

2.5. Define Entity type, Entity set?

An entity type defines a collection of entities that have same attribute Entity Set

Entity set is the collection of a particular entity type that are grouped into an “Entity Set.”

2.6. Should a real world object be modeled as an entity or as an attribute?

Object should be an entity if a number of attributes could be associated with it for proper identification and description, either now or later. Object should be an attribute, if it has an atomic nature. For example, Color should be an attribute, unless we identify Color either as a process (e.g., painting) where a number of attributes codes are to be recorded (e.g., type, shade, gray-scale, manufacturer, or as an object with properties (e.g., car-color with details).

2.7. When composite attribute usage is preferred than set of attributes?

Composite attribute is chosen when a meaningful name can be assigned to the set of attributes, e.g., data, address. Otherwise a set of simple attributes should be chosen.

2.8. Distinguish between strong and weak entity?

Strong entity	Weak entity
Exists independently of other entities	Dependent on a strong entity, cannot exist on its own
Strong entity has its own unique identifier	Does not have a unique identifier
Represented by a single line rectangle in ER diagram	Represented with a double-line rectangle in ER diagram

2.9. What is inheritance in generalization hierarchies?

Inheritance is a data modeling feature that supports sharing of attributes between a supertype and a subtype. Subtype inherits attributes from their supertype.

2.10. Give an example of supertype/subtype relationship where the overlap rule applies?

Overlap refers to the fact that the same entity instance may be a member of more than one subclass of the specialization. Consider the example of CRICKET PLAYER. Here CRICKET PLAYER is the supertype. The subtype can be BOWLER, BATSMAN.

Same player can be both batsman and bowler. Hence overlap rule holds good in this example.

2.11. Give an example of supertype/subtype relationship where the disjoint rule applies?

Let us consider the example of CRICKET PLAYER again. Here the super type is CRICKET PLAYER. The subtypes are BOWLER and WICKETKEEPER. We know that the same cricket player cannot be both bowler and wicket keeper hence disjoint rule applies for this example.

II. Match the following

Answer

Chapter 3

Relational Model

Learning Objectives. This chapter is dedicated to relational model which is in use since late 1970s. Various operations in relational algebra and relational calculus are given in this chapter. After completing this chapter the reader should be familiar with the following concepts:

Evolution and importance of relational model
Terms in relational model like tuple, domain, cardinality, and degree of a relation
Operations in relational algebra and relational calculus
Relational algebra vs relational calculus
QBE and various operations in QBE

3.1 Introduction

E.F. Codd (Edgar Frank Codd) of IBM had written an article “A relational model for large shared data banks” in June 1970 in the Association of Computer Machinery (ACM) Journal, Communications of the ACM. His work triggered people to work in relational model. One of the most significant implementations of the relational model was “System R,” which was developed by IBM during the late 1970s. System R was intended as a “proof of concept” to show that relational database systems could really build and work efficiently. It gave rise to major developments such as a structured query
language called SQL which has since become an ISO standard and de facto standard relational language. Various commercial relational DBMS products were developed during the 1980s such as DB2, SQL/DS, and Oracle. In relational data model the data are stored in the form of tables.

3.2 CODD’S Rules

In 1985, Codd published a list of rules that became a standard way of evaluating a relational system. After publishing the original article Codd stated that there are no systems that will satisfy every rule. Nevertheless the rules represent relational ideal and remain a goal for relational database designers.

Note: The rules are numbered from 1 to 12 whereas the statements preceded by the bullet mark are interpretations of the Codd’s rule:

The Information Rule. All information in a relational database is represented explicitly at the logical level and in exactly one way-by values in tables: • Data should be presented to the user in the tabular form.
Guaranteed Access Rule. Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name: • Every data element should be unambiguously accessible.
Systematic Treatment of Null Values. Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.
Dynamic On-line Catalog Based on the Relational Model. The database description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data: • The database description should be accessible to the users.
Comprehensive Data Sublanguage Rule. A relational system may support several languages and various modes of terminal use (for example the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and whose ability to support all the following is comprehensive: data definition, view definition, data manipulation (interactive and by program), integrity constraints, and transaction boundaries: • A database supports a clearly defined language to define the database, view the definition, manipulate the data, and restrict some data values to maintain integrity.
View Updating Rule. All views that are theoretically updatable are also updatable by the system: • Data should be able to be changed through any view available to the user.
High-level Insert, Update, and Delete. The capacity of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data: • All records in a file must be able to be added, deleted, or updated with singular commands
Physical Data Independence. Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods: • Changes in how data are stored or retrieved should not affect how a user accesses the data.
Logical Data Independence. Application programs and terminal activities remain logically unimpaired whenever information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables: • A user’s view of data should be unaffected by its actual form in files.
Integrity Independence. Integrity constraints specific to a particular relational database must be definable in a relational data sublanguage and storable in the catalog, not in the application programs. • Constraints on user input should exist to maintain data integrity.
Distribution Independence. A relational DBMS has distribution independence. Distribution independence implies that users should not have to be aware of whether a database is distributed. • A database design should allow for distribution of data over several computer sites.
Nonsubversion Rule. If a relational system has a low-level (single-recordat-a-time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time): • Data fields that affect the organization of the database cannot be changed.

There is one more rule called Rule Zero which states that “For any system that is claimed to be a relational database management system, that system must be able to manage data entirely through capabilities.”

3.3 Relational Data Model

The relational model uses a collection of tables to represent both data and the relationships among those data. Tables are logical structures maintained by the database manager. The relational model is a combination of three components, such as Structural, Integrity, and Manipulative parts.

3.3.1 Structural Part

The structural part defines the database as a collection of relations.

3.3.2 Integrity Part

The database integrity is maintained in the relational model using primary and foreign keys.

3.3.3 Manipulative Part

The relational algebra and relational calculus are the tools used to manipulate data in the database. Thus relational model has a strong mathematical background. The key features of relational data model are as follows:

– Each row in the table is called tuple.
– Each column in the table is called attribute.
– The intersection of row with the column will have data value.
– In relational model rows can be in any order.
– In relational model attributes can be in any order.
– By definition, all rows in a relation are distinct. No two rows can be exactly the same.
– Relations must have a key. Keys can be a set of attributes.
– For each column of a table there is a set of possible values called its domain. The domain contains all possible values that can appear under that column.
– Domain is the set of valid values for an attribute.
– Degree of the relation is the number of attributes (columns) in the relation.
– Cardinality of the relation is the number of tuples (rows) in the relation.

The terms commonly used by user, model, and programmers are given later.

3.3.4 Table and Relation

The general doubt that will rise when one reads the relational model is the difference between table and relation. For a table to be relation, the following rules holds good:

– The intersection row with the column should contain single value (atomic value).
– All entries in a column are of same type.
– Each column has a unique name (column order not significant).
– No two rows are identical (row order not significant).

Example of Relational Model

Representation of Movie data in tabular form is shown later.

In the earlier relation:

The degree of the relation (i.e., is the number of column in the relation) = 4.

The cardinality of the relation (i.e., the number of rows in the relation) = 3.

3.4 Concept of Key

Key is an attribute or group of attributes, which is used to identify a row in a relation. Key can be broadly classified into (1) Superkey (2) Candidate key, and (3) Primary key

3.4.1 Superkey

A superkey is a subset of attributes of an entity-set that uniquely identifies the entities. Superkeys represent a constraint that prevents two entities from ever having the same value for those attributes.

3.4.2 Candidate Key

Candidate key is a minimal superkey. A candidate key for a relation schema is a minimal set of attributes whose values uniquely identify tuples in the corresponding relation.

Primary Key

The primary key is a designated candidate key. It is to be noted that the primary key should not be null.

Example

Consider the employee relation, which is characterized by the attributes, emplo yee ID, employee name, employee age, employee experience, employee salary, etc. In this employee relation:

Superkeys can be employee ID, employee name, employee age, employee experience, etc.

Candidate keys can be employee ID, employee name, employee age. Primary key is employee ID.

Note: If we declare a particular attribute as the primary key, then that attribute value cannot be NULL. Also it has to be distinct.

3.4.3 Foreign Key

Foreign key is set of fields or attributes in one relation that is used to “refer” to a tuple in another relation.

3.5 Relational Integrity

Data integrity constraints refer to the accuracy and correctness of data in the database. Data integrity provides a mechanism to maintain data consistency for operations like INSERT, UPDATE, and DELETE. The different types of data integrity constraints are Entity, NULL, Domain, and Referential integrity.

3.5.1 Entity Integrity

Entity integrity implies that a primary key cannot accept null value. The primary key of the relation uniquely identifies a row in a relation. Entity integrity means that in order to represent an entity in the database it is necessary to have a complete identification of the entity’s key attributes.

Consider the entity PLAYER; the attributes of the entity PLAYER are Name, Age, Nation, and Rank. In this example, let us consider PLAYER’s name as the primary key even though two players can have same name. We cannot insert any data in the relation PLAYER without entering the name of the player. This implies that primary key cannot be null.

3.5.2 Null Integrity

Null implies that the data value is not known temporarily. Consider the relation PERSON. The attributes of the relation PERSON are name, age, and salary. The age of the person cannot be NULL.

3.5.3 Domain Integrity Constraint

Domains are used in the relational model to define the characteristics of the columns of a table. Domain refers to the set of all possible values that attribute can take. The domain specifies its own name, data type, and logical size. The logical size represents the size as perceived by the user, not how it is implemented internally. For example, for an integer, the logical size represents the number of digits used to display the integer, not the number of bytes used to store it. The domain integrity constraints are used to specify the valid values that a column defined over the domain can take. We can define the valid values by listing them as a set of values (such as an enumerated data type in a strongly typed programming language), a range of values, or an expression that accepts the valid values. Strictly speaking, only values from the same domain should ever be compared or be integrated through a union operator. The domain integrity constraint specifies that each attribute must have values derived from a valid range.

Example 1

The age of the person cannot have any letter from the alphabet. The age should be a numerical value.

Example 2

Consider the relation APPLICANT. Here APPLICANT refers to the person who is applying for job. The sex of the applicant should be either male (M) or female (F). Any entry other than M or F violates the domain constraint.

3.5.4 Referential Integrity

In the relational data model, associations between tables are defined through the use of foreign keys. The referential integrity rule states that a database must not contain any unmatched foreign key values. It is to be noted that referential integrity rule does not imply a foreign key cannot be null. There can be situations where a relationship does not exist for a particular instance, in which case the foreign key is null. A referential integrity is a rule that states that either each foreign key value must match a primary key value in another relation or the foreign key value must be null.

3.6 Relational Algebra

The relational algebra is a theoretical language with operations that work on one or more relations to define another relation without changing the original relation. Thus, both the operands and the results are relations; hence the output from one operation can become the input to another operation. This allows expressions to be nested in the relational algebra. This property is called closure. Relational algebra is an abstract language, which means that the queries formulated in relational algebra are not intended to be executed on a computer. Relational algebra consists of group of relational operators that can be used to manipulate relations to obtain a desired result. Knowledge about relational algebra allows us to understand query execution and optimization in relational database management system.

3.6.1 Role of Relational Algebra in DBMS

Knowledge about relational algebra allows us to understand query execution and optimization in relational database management system. The role of relational algebra in DBMS is shown in Fig. 3.1. From the figure it is evident that when a SQL query has to be converted into an executable code, first it has to be parsed to a valid relational algebraic expression, then there should be a proper query execution plan to speed up the data retrieval. The query execution plan is given by query optimizer.

3.7 Relational Algebra Operations

Operations in relational algebra can be broadly classified into set operation and database operations.

3.7.1 Unary and Binary Operations

Unary operation involves one operand, whereas binary operation involves two operands. The selection and projection are unary operations. Union, difference, Cartesian product, and Join operations are binary operations:

Three main database operations are SELECTION, PROJECTION, and JOIN.

Selection Operation

The selection operation works on a single relation R and defines a relation that contains only those tuples of R that satisfy the specified condition (Predicate). Selection operation can be considered as row wise filtering. This is pictorially represented in Fig. 3.2

Syntax of Selection Operation

The syntax of selection operation is: σPredicate (R). Here R refers to relation and predicate refers to condition.

Illustration of Selection Operation

To illustrate the SELECTION operation consider the STUDENT relation with the attributes Roll number, Name, and GPA (Grade Point Average).

Example

Consider the relation STUDENT shown later:

Query 1: List the Roll. No, Name, and GPA of those students who are having GPA of above 8.0

Query expressed in relational algebra as σGPA > 8 (Student).

The result of the earlier query is:

Query 2: Give the details of first four students in the class. Relational algebra expression is σRoll. No ≤ (student).

Table as a result of query 2 is

Projection Operation

The projection operation works on a single relation R and defines a relation that contains a vertical subject of R, extracting the values of specified attributes and elimination duplicates. The projection operation can be considered as column wise filtering. The projection operation is pictorially represented in Fig. 3.3.

Syntax of Projection Operation

The syntax of projection operation is given by:

Where a1, a2, . . . . . . an are attributes and R stands for relation.

Illustration of Projection Operation

To illustrate projection operation consider the relation STAFF, with the attributes Staff number, Name, Gender, Date of birth, and Salary.

Query 1: Produce the list of salaries for all staff showing only the Name and salary detail. Relational algebra expression:

Output for the Query 1

Query 2: Give the name and Date of birth of the all the staff in the STAFF relation.

Relational algebra expression for query 2:

3.7.2 Rename operation (ρ)

The rename operator returns an existing relation under a new name. ρA(B) is the relation B with its name changed to A. The results of operation in the relational algebra do not have names. It is often useful to name such results for use in further expressions later on. The rename operator can be used to name the result of relational algebra operation.

Example of Rename Operation

Consider the relation BATSMAN with the attributes name, nation, and BA.

The attributes of the relation BATSMAN can be renamed as name, nation and batting average as name, nation, batting average (BATSMAN) so that the relation BATSMAN after rename operation as shown later.

From the earlier operation it is clear that rename operation changes the schema of the database and it does not change the instance of the database.

Union Compatibility

In order to perform the Union, Intersection, and the Difference operations on two relations, the two relations should be union compatible. Two relations are union compatible if they have same number of attributes and belong to the same domain. Mathematically UNION COMPATIBILITY it is given as:

Let R(A1,A2,……..An) and S(B1,B2,………….Bn) be the two relations. The relation R has the attributes A1,A2,……..An and the relation S has the attributes B1,B2,………….Bn. The two relations R and S are union compatible if dom(Ai)=dom(Bi) for i = 1 to n.

3.7.3 Union Operation

The union of two relations R and S defines a relation that contains all the tuples of R or S or both R and S, duplicate tuples being eliminated.

Relational Algebra Expression

The union of two relations R and S are denoted by R ∪ S. R ∪ S is pictorially represented in the Fig. 3.4.

Illustration of UNION Operation

To illustrate the UNION operation consider the two relations Customer 1 and Customer 2 with the attributes Name and city.

Query Determine Customer 1 ∪ Customer 2

Result of Customer 1 ∪ Customer 2

3.7.4 Intersection Operation

The intersection operation defines a relation consisting of the set of all tuples that are in both R and S.

Relational Algebra Expression

The intersection of two relations R and S is denoted by R ∩ S.

Illustration of Intersection Operation

The intersection between the two relations R and S is pictorially shown in Fig. 3.5.

Example

Find the intersection of Customer 1 with Customer 2 in the following table.

3.7.5 Difference Operation

The set difference operation defines a relation consisting of the tuples that are in relation R but not in S.

Relational Algebra Expression

The difference between two relations R and S is denoted by R–S.

Illustration of Difference Operation

The difference between two relations R and S is pictorially shown in Fig. 3.6.

Example

Compute R–S for the relation shown in the following table.

3.7.6 Division Operation

The division of the relation R by the relation S is denoted by R ÷ S, where R ÷ S is given by:

To illustrate division operations consider two relations STUDENT and MARK. The STUDENT relation has the attributes Student Name and the mark in particular subject say mathematics. The MARK relation consists of only one column mark and only one row.

Case (1)

If we divide the STUDENT relation by the MARK relation, the resultant relation is shown as:

Case (2)

Now modify the relation MARK that is change the mark to be 98. So that the entry in the MARK relation is modified as 98.

If we divide the relation STUDENT by MARK relation then the resultant relation is given by ANSWER

Case (3)

Now the MARK relation is modified in such a way that the entry in the MARK relation is 99. If we divide the STUDENT relation with the MARK relation, the result is NULL. Because there is no student in the STUDENT relation with the mark 99.

The division of the STUDENT relation with the MARK relation is given by the ANSWER relation.

The division operation extracts records and fields from one table on the basis of data in the second table.

3.7.7 Cartesian Product Operation

The Cartesian product operation defines a relation that is the concatenation of every tuples of relation R with every tuples of relation S. The result of Cartesian product contains all attributes from both relations R and S.

Relational Algebra Symbol for Cartesian Product:

The Cartesian product between the two relations R and S is denoted by R × S.

Note: If there are n1 tuples in relation R and n2 tuples in S, then the number of tuples in R × S is n1*n2.

Example

If there are 5 tuples in relation “R” and 2 tuples in relation “S” then the number of tuples in R × S is 5 ∗ 2 = 10.

Illustration of Cartesian Product

To illustrate Cartesian product operation, consider two relations R and S as given later:

Determine R × S:

Note:

No. of tuples in R ×S = 2∗ 3 = 6

No. of attributes in R ×S = 2

3.7.8 Join Operations

Join operation combines two relations to form a new relation. The tables should be joined based on a common column. The common column should be compatible in terms of domain.

Types of Join Operation

Natural Join

The natural join performs an equi join of the two relations R and S over all common attributes. One occurrence of each common attribute is eliminated from the result. In other words a natural join will remove duplicate attribute. In most systems a natural join will require that the attributes have the same name to identity the attributes to be used in the join. This may require a renaming mechanism. Even if the attributes do not have same name, we can perform the natural join provided that the attributes should be of same domain.

Example of Natural Join Operation

Consider two relations EMPLOYEE and DEPARTMENT. Let the common attribute to the two relations be DEPTNUMBER. The two relations are shown later:

It is worth to note that Natural join operation is associative. (i.e.,) If R, S, and T are three relations then

Equi Join

A special case of condition joins where the condition C contains only equality.

Example of Equi Join

Given the two relations STAFF and DEPT, produce a list of staff and the departments they work in.

Theta Join

A conditional join in which we impose condition other than equality condition. If equality condition is imposed then theta join become equi join. The symbol θ stands for the comparison operator which could be >, <, >=, <=.

Expression of Theta Join

Illustration of Theta Join

To illustrate theta join consider two relations FRIENDS and OTHERS with the attributes Name and age.

Outer Join

In outer join, matched pairs are retained unmatched values in other tables are left null.

Types of Outer Join

The pictorial representation of the left and the right outer join of two relations R and S are shown in Fig. 3.7:

Left Outer Join. Left outer joins is a join in which tuples from R that do not have matching values in the common column of S are also included in the result relation.
Right Outer Join. Right outer join is a join in which tuples from S that do not have matching values in the common column of R are also included in the result relation.
Full Outer Join. Full outer join is a join in which tuples from R that do not have matching values in the common columns of S still appear and tuples in S that do not have matching values in the common columns of R still appear in the resulting relation.

Example of Full Outer Left Outer and Right Outer Join

Consider two relations PEOPLE and MENU determine the full outer, left outer, and right outer join.

The left outer join of PEOPLE and MENU on Food is represented as PEOPLE.

PEOPLE.Food=MENU.Food MENU. The result of the left outer join is shown in Table 3.1.
From this table, it is to be noted that all the tuples from the left table (in our case it is PEOPLE relation) appears in the result. If there is any unmatched value then a NULL value is returned.

2. The right outer join of PEOPLE and MENU on Food is represented in the relational algebra as PEOPLE

PEOPLE.Food=Menu.Food MENU. The result of the right outer join is shown in Table 3.2.

From this table, it is clear that all tuples from the right-hand side relation (in our case the right hand relation is MENU) appears in the result.

The full outer join of PEOPLE and MENU on Food is represented in the relational algebra as PEOPLE

PEOPLE.Food=MENU.Food MENU. The result of the full outer join is shown in Table 3.3. From this table, it is clear that tuples from both the PEOPLE and the MENU relation appears in the result.

Semi-Join

The semi-join of a relation R, defined over the set of attributes A, by relation S, defined over the set of attributes B, is the subset of the tuples of R that participate in the join of R with S. The advantage of semi-join is that it decreases the number of tuples that need to be handled to form the join. In centralized database system, this is important because it usually results in a decreased number of secondary storage accesses by making better use of the memory. It is even more important in distributed databases, since it usually reduces the amount of data that needs to be transmitted between sites in order to evaluate a query.

Expression for Semi-Join

Example of Semi-Join

In order to understand semi-join consider two relations EMPLOYEE and PAY

From the result of the semi-join it is clear that a semi-join is half of a join: the rows of one table that match with at least one row of another table. Only the rows of the first table appear in the result.

3.8 Advantages of Relational Algebra

The relational algebra has solid mathematical background. The mathematical background of relational algebra is the basis of many interesting developments and theorems. If we have two expressions for the same operation and if the expressions are proved to be equivalent, then a query optimizer can automatically substitute the more efficient form. Moreover, the relational algebra is a high level language which talks in terms of properties of sets of tuples and not in terms of for-loops.

3.9 Limitations of Relational Algebra

The relational algebra cannot do arithmetic. For example, if we want to know the price of 10 l of petrol, by assuming a 10% increase in the price of the petrol, which cannot be done using relational algebra.

The relational algebra cannot sort or print results in various formats. For example we want to arrange the product name in the increasing order of their price. It cannot be done using relational algebra.

Relational algebra cannot perform aggregates. For example we want to know how many staff are working in a particular department. This query cannot be performed using relational algebra.

The relational algebra cannot modify the database. For example we want to increase the salary of all employees by 10%. This cannot be done using relational algebra.

The relational algebra cannot compute “transitive closure.” In order to understand the term transitive closure consider the relation RELATIONSHIP, which describes the relationship between persons.

Consider the query, Find all direct and indirect relatives of Gopal? It is not possible to express such kind of query in relational algebra. Here transitive means, if the person A is related to the person B and if the person B is related to the person C means indirectly the person A is related to the person C. But relational algebra cannot express the transitive closure.

3.10 Relational Calculus

The purpose of relational calculus is to provide a formal basis for defining declarative query languages appropriate for relational databases. Relational Calculus comes in two flavors (1) Tuple Relational Calculus (TRC) and (2) Domain Relational Calculus (DRC). The basic difference between relational algebra and relational calculus is that the former gives the procedure of how to evaluate the query whereas the latter gives only the query without giving the procedure of how to evaluate the query:

– The variable in tuple relational calculus formulae range over tuples.
– The variable in domain relational calculus formulae range over individual values in the domains of the attributes of the relations.
– Relational calculus is nonoperational, and users define queries in terms of what they want, not in terms of how to compute it. (Declarativeness.)

Relational Calculus and Relational Algebra:

The major difference between relational calculus and relational algebra is summarized later:

– A relational calculus query specifies what information is retrieved
– A relational algebra query specifies how information is retrieved

3.10.1 Tuple Relational Calculus

Tuple relational calculus is a logical language with variables ranging over tuples. The general form of tuple relational calculus is given by:

Here t is the tuple variable, which stands for tuples of relation. COND (t) is a formula that describes t. The meaning of the earlier expression is to return all tuples T that satisfy the condition COND:

– T/R(T)} means return all tuples T such that T is a tuple in relation R.
– For example, {T.name/FACULTY(T)} means return all the names of faculty in the relation FACULTY.
– {T.name/ FACULTY(T) AND T.deptid=‘EEE‘ } means return the value of the name of the faculty who are working in EEE department.

Quantifiers

Quantifiers are words that refer to quantities such as “some” or “all” and tell for how many elements a given predicate is true. A predicate is a sentence that contains a finite number of variables and becomes a statement when specific values are substituted for the variables. Quantifiers can be broadly classified into two types (1) Universal Quantifier and (2) Existential Quantifier.

Existential Quantifier

Universal Quantifier

Free Variable

Any variable that is not bound by a quantifier is said to be free.

Bound Variable

Any variable which is bounded by universal or existential quantifier is called bound variable.

Example of selection operation in TRC:

Quantifier Example

Client(ID, fName, lName, Age)
Matches(Client1, Client2, Type)

– List the first and last names of clients that appear as client1 in a match of any type.
RAlg: p(fName, lName)(Client (ID=Client1) Matches)
RCalc: {c.fName, c.lName | CLIENT(c) AND (∃m)(MATCHES(m) AND
c.ID = m.Client1)}

Joins in Relational Calculus

Consider the two relations Client and Matches as

Client(ID, fName, lName, Age)
Matches(Client1, Client2, Type)
– List all information about clients and the corresponding matches that appear as client1 in a match of any type.

The earlier query can be expressed both in Relational Algebra and Tuple relational Calculus as:

– RAlg: Client (ID=Client1) Matches
– RCalc:
{c, m | CLIENT(c) AND MATCHES(m) AND c.ID = m.Client1}

3.10.2 Set Operators in Relational Calculus

The set operations like Union, Intersection, difference, and Cartesian Product can be expressed in Tuple Relational Calculus as:

Union

– R1(A,B,C) ∪ R2(A, B, C)
– {r | R1(r) OR R2(r)}

Intersection

– R1(A,B,C) ∩ R2(A, B, C)
– {r | R1(r) AND R2(r)}

Cartesian Product

– R(A, B, C) × S(D, E, F)
– {r, s | R(r) AND S(s)} // same as join without the select condition

Subtraction
– R1(A,B,C) − R2(A, B, C)
– {r | R1(r) AND NOT R2(r)}

Queries and Tuple Relational Calculus Expressions

Some of the queries and the corresponding relational calculus and their explanations are given later. Here we have given set of queries like SET 1, SET 2, and SET 3.

– Query set 1 deals with Railway Reservation Management
– Query set 2 deals with Library Database Management
– Query set 3 deals with Hostel Database Management

Query Set1: Query set 1 deals with railway reservation system.

Query 1: Find all the train details for the trains where starting place is “Chennai.”

Relational calculus expression: {t | t ∈ train_details ∧ start place = “Chennai”}

Explanation: Set of all tuples “t” that belong to the relation “train details” and also the starting place is “Chennai” is found by the query.

Query 2: Find all train names whose destination is “Salem.”

Relational calculus expression

{t | ∃ s ∈ train_details (t [ train no] = s [ train_no] ∧ s [destination] = “Salem”)}

Explanation: There exist a tuple “t” in the relation “r” such that the predicate is true.

The set of all tuples “t” such that, there exists a tuple “s” in relation train details for which the values of “t” and “s” for the train no attribute are equal and the value of “s” for the destination is “Salem.”

Query 3: Find the names of all passengers who have canceled the ticket and whose age is above 40.

Relational calculus expression {t | ∃ s ∈ cancel (t [train_no] = s [train_no] ∧∃ u ∈ passen_details (u [name] = s [name] ∧ u[age] > 40))}

Explanation: Set of all passenger names tuples for which the age is above 40 and the ticket is canceled. The tuple variable “s” ensures that the passenger canceled the ticket. The tuple “u” is restricted to having the same passenger name as “s.”

Query 4: List the train numbers of all trains which has no cancelation and only reservation.

Relational Calculus Expression

{t | ∃ s ∈ reserve (t [train_no] = s [train_no]) ￢∃ u ∈ cancel (t [train_no] = u[train_no])}

Explanation: Set of all tuples “t” such that there exists a tuple “s” that belongs to reserve such that the train_no attribute is equal for “t” and “s” and there exists a tuple “u” that belongs to cancel where the values of “t” and “u” for the train_no attribute is the same.

Query 5: List all female passengers name who are traveling by the train “Blue Mountain.”

Relational Calculus Expression

{t | ∃ s ∈ passen_details (t [p_name] = s [p_name] ∧ s[sex] = “female” ∧ s[train_name] = “Blue mountain”)}.

Explanation: Set of all tuples “t” such that there exists a tuple “s” that belongs to passen_details for which the values of “t” and “s” for the p_name attribute is same and the sex attribute = “female” and train_name attribute = “Blue mountain.”

Query Set 2: Query set 2 deals with frequent queries in library database management.

Query 1: Find the acc_no/- for each book whose price >1000.

Relational Calculus Expression

{t | ∃ s ∈ book (t[acc_no/-]=s[acc_no/-] ∧ s[price]>1000)}

Explanation: The set of all tuples “t” such that there exists a tuple “s” in relation book for which the values “t” and “s” for the acc_no/- attribute are equal an the value of the s for the price attribute is greater than 1000.

Query 2: Find the name of all the students who have borrowed a book and price of those book is greater than 1000.

Relational Calculus Expression

{t | ∃ s ∈ books_borrowed(t[std_name]=s[std_name] ∧ ∃ u ∈ book (u[acc_no/-]=s[acc_no/-] ∧ u[price]>1000))}

Explanation: The set of all tuples “t” such that there exists a tuple “s” in relation books_borrowed for which the values “t” and “s” for the student name attribute are equal and “u” tuple variable on book relation for which “u” and “s” for the acc_no/- attribute are equal and the value of “u” for the price attribute is greater than 1000.

Query 3: Find the name of the students who borrowed book, have book in his account or both.

Relational Calculus Expression

{t | ∃ s ∈ books_borrowed (t[stud_name]=s[std_name]) ∨ ∃ u ∈ books_remaining (t[std_name]=su[std_name])}

Explanation: The set of all tuples “t” such that there exists a tuple “s” in relation books borrowed for which the values “t” and “s” for the student name attribute are equal and “u” tuple variable on books remaining relation for which “u” and “s” for the stud_name attribute are equal.

Query 4: Find only those students’ names who are having both the books in their account as well as the books borrowed from their account.

Relational Calculus Expression

{t | ∃ s ∈ books_borrowed (t[std_name]=s[std_name])∧ ∃ u ∈ books_remaining (t[std_name]=s[std_name])}

Query 5: Query that uses implication symbol p ⇒ q find all students belongs to EEE department who borrowed the books.

Relational Calculus Expression

{t | ∃ r ∈ books_borrowed (r[std_name]=t[std_name] ∧ (∀ u ∈ department (u(dept_name]=“EEE”)))} ⇒ {t |∃ r∈ books_borrowed (r [std_name]=t[std_name] ∧∃ w∈ student (w[roll_no/-]=r[roll_no/-] ∧ w[dept_name ]=u [dept_name ]))}

Explanation: The set of all tuples “t” such that there exists a tuple “s” such that in relation books_borrowed for which the values “t” and “s” for the student name attribute are equal and “u” tuple variable on department relation must be equal to “EEE.” And this must be equal to the set of all tuple “t” such that there exists a tuple “r” in relation books_borrowed for which the values “r” and “t” for the student name attribute are equal and “w” the variable on relation student for which “w” and “r” are equal for the roll_no/- attribute and “w” and “u” are equal for the dept_name.

Query Set 3: Query set 3 deals with hostel management.

Query 1: Find all the students id who are staying in hostel.

Tuple Relational Calculus Expression

{t | ∃ s ∈ student_detail (t[roll no]=s[rollno])}

Explanation: Here t is the set of tuples in the relation student_detail such that there exists a tuple s which consists of students ID who are staying in the hostel.

Query 2: Find all the details of the student who are belonging to EEE branch.

Tuple Relational Calculus Expression

{t | t ∈ student_detail ∧ t[course name]=“EEE”

Explanation: Here t is the set of tuples in the relation student_detail such that it consists of all the details of the student who are belonging to the “EEE” branch.

Query 3: Find all the third semester BE-EEE students.

Tuple Relational Calculus Expression

{t | t ∈ student_detail ∧ t[coursename]=“EEE” ∧ t[semester]=3}

Explanation: Here t is the set of tuples in the relation student_detail such that it consists of all the details of the student who belongs to the third semester BE-EEE branch.

Query 4: Find all the lecturers name belonging to the EEE department.

Tuple Relational Calculus Expression

{t | ∃ s∈ staff_detail (t[staffname]=s[staffname])}

Explanation: Here t is the set of tuples in the relation staff detail and there exists a tuple s which consists of lecturers name who belongs to the “EEE” department.

Query 5: Find all the staff who are having leisure period at third hour on Monday.

Tuple Relational Calculus Expression

{t | ∃ s ∈ staff_detail (t[staffname]=s[staffname] ∧∃u ∈ lecturerschedule_monday (s[staffid]=u[staffid] ∧ u[third hour]=“EEE”))}

Explanation: Here t is the set of tuples in the relation staff detail and there exists a tuple s which consists of staff name who are all having leisure period at third hour on Monday for every week.

Safety of Expression

It is possible to write tuple calculus expressions that generate infinite relations. For example {t/∼t εR} results in an infinite relation if the domain of any attribute of relation R is infinite. To guard against the problem, we restrict the set of allowable expressions to safe expressions. An expression {t/P(t)} in the tuple relational calculus is safe if every component of t appears in one of the relations, tuples, or constants that appear in P (Here P refers to Predicate or condition).

Limitations of TRC

TRC cannot express queries involving:

– Aggregations.
– Groupings.
– Orderings.

3.11 Domain Relational Calculus (DRC)

Domain relational calculus is a nonprocedural query language equivalent in power to tuple relational calculus. In domain relational calculus each query is an expression of the form:

Domain variable: A domain variable is a variable whose value is drawn from the domain of an attribute.

3.11.1 Queries in Domain Relational Calculus:

Consider the ER diagram:

3.11.2 Queries and Domain Relational Calculus Expressions

Some of the queries and the corresponding relational calculus and their explanations are given later. Here we have given set of queries like SET 1, SET 2, and SET 3:

– Query set 1 deals with Railway Reservation Management
– Query set 2 deals with Library Database Management
– Query set 3 deals with Department Database Management

Query Set 1: Query set 1 deals with railway reservation system.

Query 1: List the details of the passengers traveling by the train “Intercity express.”

Domain Relational Calculus Expression

{< name, age, sex, train_no, “blue mountain”> | <name, age, sex, train_no, train_name>∈ passen_details}

Explanation: The attributes of the passen_details are listed where the train_name attribute = “Intercity express.”

Query 2: Select names of passengers whose sex = “female” and age > 20.

Domain Relational Calculus Expression

{< p_name > | ∃ p_age, p_sex, p_trainno. (< p_name, p_age, p_sex, p_trainno > ∈ passen_details ∧ p_sex = “female” ∧ p_age > 20)}

Explanation: Lists the names of passengers from the relation passenger_details where there are two constraints which are sex=female and age > 20.

Query 3: Find all the names of passengers who have “Salem” as start place and find their train names.

Domain Relational Calculus Expression

{< p_name, train_name> |∃ p_name > p_name, p_age, p_trainno, (< p_name, p_age, p_sex, p_train_no, p_trainname >∈ passen_details ∧ ∃ t_start, t_dest, t_route, t_no (< t_name, t_no, t_start, t_dest, t_route >∈ train_details ∧ t_start = “salem”))}

Explanation: Two relations – passen_details and train_details are involved in this query. The train names and the passenger names whose start place = Salem is displayed.

Query 4: Find all train names which has reservation and no cancelation.

Domain Relational Calculus Expression

{<t_name> | ∃ t_name, p_name, p_source, p_dest(<t_name, t_no, p_name, p_source, p_dest> .∈ reserve ∧∃ ticket_no, t_no, s_no, p_name (<t_name, t_no, tick_no, p_name, s_no>∈ cancel))}

Explanation: The reserve and cancel relations are involved here. The train names which satisfies both the conditions are displayed.

Query 5: Find names of all trains whose destination is “CHENNAI” and source is “COIMBATORE.”

Domain Relational Calculus Expression

{<t name> | ∃ t_no, t_start, t_dest, t_route (<t_name, t_no, t_start, t_dest, t_route> ∈ train_details ∧ t_source=“coimbatore”∧ t_desti=“chennai”)}

Explanation: The name of the trains that start from Coimbatore and reach Chennai are listed from the relations train details.

Query Set 2:

Query set 2 deals with Library Management.

Query 1: Find the student name, roll_no. for those belongs to “EEE” department.

Domain Relational Calculus Expression

{<std_name, std_roll_no> | dept_name (<std_name, roll_no, depart_name> ∈ student ∧ depart_name=“EEE”)}

Explanation: Student relation is involved in this. Std_name, roll_no are the attribute belongs to the student relation whose department name is “EEE.”

Query 2: Find the acc_no, books_cal_no, and author name for the books of price >120.

Domain Relational Calculus Expression

{< acc_no, book_call_no, author_name>/ ∃ book_name, price (<book_name, acc_no, call_no, author_name, price> ∈ books ∧ price >120)}

Explanation: Books relation is involved here. In this expression acc_no, book_call_no, and author name are selected for the book for which the price is greater than 120.

Query 3: Find the roll_no of all the students who have borrowed book from library and find the no/- of books they borrowed an that books belongs to “EEE” department.

Domain Relational Calculus Expression

{<roll_no/->| ∃ std_name, book_acc_no (< std_name, roll_no, book_accc_no, number of books borrowed > ∈ books_borrowed∧ ∃ name, dept_name(<name, roll_no, dept_name>∈ student ∧dept_name=“EEE”))}

Explanation: Here two relations are involved (1) books_borrowed and (2) student. The roll_no/- of the students who borrowed “EEE” department book involves both the earlier relations. Roll_no/- are selected from the both the relation of the student who borrowed book from library which belongs to “EEE” department.

Query 4: Find the std_name and their depart_name who have borrowed a book which is less than 2 in number.

Domain Relational Calculus Expression

{<dept_name, name>| ∃ roll_no/-, book_acc_no/-, no_of_books_borrowed (< roll_no/-, book_acc_no/-, no/- of books_borrowed, std_name >∈ books_borrowed ∧ no/- of books borrowed <2 ∧∃ roll_no/-(roll_no/-, name, dept_name>∈ student))}

Explanation: Here two relations are involved (1) books_borrowed and (2) student. For student name the relation involved is books_borrowed and for depart_name the relation involved is student and the constraint is no/- of books_borrowed is less than two.

Query 5: Find the name of all the students who have borrowed, having books in his account or both in the department EEE.

Domain Relational Calculus Expression

{<name> / ∃ roll_no/-, book_acc_no/-, no_of books_borrowed(<name, roll_no/-, book_acc_no/-, no/- of books_borrowed>∈ books_borrowed ∧∃ roll_no/-, depart_name(<name, roll_no/-, dept_name > ∈ student ∧ dept_name=“eee”)) ∨∃ roll_no/-, no/- of books_remaining(<name, roll_no/-, no/- of books_remaining>∈ books_remaining ∧∃ roll_no/-, dept_name(<name, roll_no/-, dept_name >∈ student ∧ dept_name= “EEE”))}

Explanation: Here three relations are involved (1) books remaining, (2) books_borrowed, and (3) student. Name is an attribute belonging to books_borrowed and books_remaining relations, dept_name belongs to student relation. The student borrowed books or having books in his account or both which belongs to “EEE” department is selected.

Query Set 3: Query set 2 deals with Department Database Management system.

Query 1: Find all the student name belongs to fifth sem ECE branch.

Domain Relational Calculus Expression

{<stud_name>| ∃ < r,cn,s,h,dob,pn,b > ∈ student_detail ∧ s = “V”∧ b = “ECE”}

Explanation: Students name domain is formed from relation V semester “ECE” branch.

Domain variables used:

r – roll no.; cn – course name; s – semester; h – hosteller

dob – date of birth; pn – phone no.; b – branch name

Query 2: Find all the details of students belonging to CSE branch.

Domain Relational Calculus Expression

{<sn,r,cn,s,h,dob,pn,b> | <sn,r,cn,s,h,dob,pn,b>∈ student–detail ∧ b= “CSE”}

Explanation: All domain variables included from student-detail table which consists of all details about students belonging to the CSE branch.

Query 3: Find all the students id whose date of birth is above 1985.

Domain Relational Calculus Expression

{<r>| ∃ sn,cn,s,h,dob,pn,b (<r,sn,cn,b,s,h,dob,pn>∈ student detail |∧ dob>“1985”)}

Explanation: Domain variable r (roll no) is included from student detail relation, which consists of students ID whose date of birth is above 1985.

Query 4: Find all the lecturers id belonging to production dept.

Domain Relational Calculus Expression

{<sid> |∃ sn,dob,desg,y,foi,e,d | <sid,sn,dob,desg,y,foi,e,d ∈ staff_detail ∧ d=“prod”)}

Explanation: Domain variables from staff_detail:

sid – staff ID; dob – date of birth; sn – staff name; desg – designation

y – year since serving; foi – field of interest; e – email id; d – department

The sid (staff id) from staff detail belonging to production department.

Query 5: Find all the lecturers’ names who are having fifth period as leisure period on Friday.

Domain Relational Calculus Expression

{<sn> |∃ sed,dob,desg,y,foi,e,d | <sn,sid,dob,desg,y,foi,e,d> ∈ staff_detail ∧∃ <sid,i,ii,iii,iv,v,vi,vii) <sid,sn,i,ii,iii,iv,v,vi,vii> ∈ rev_schedul_friday ∧ v =“free”)))}

Explanation: Staff name domain variable from staff detail relation with fifth period as leisure which is checked using lecture schedule relation on Friday. Thus, in this, we have used two relations: staff detail and lecture schedule for Friday.

3.12 QBE

QBE stands for Query By Example. QBE uses a terminal display with attribute names as table headings for queries. This looks a little strange in textbooks, but people like it when they have worked with it for a while on a terminal screen. It is very easy to list the entire schema, simply by scrolling information on the screen. QBE was developed originally by IBM in the 1970s to help users in their retrieval of data from a database. QBE represents a visual approach for accessing data in a database through the use of query templates. QBE can be considered as GUI (Graphical User Interface) based on domain calculus. QBE allows users to key in their input requests by filling in empty tables on the screen, and the system will also display its response in tabular form. QBE is user-friendly because the users are not required to formulate sentences for query requests with rigid query-language syntax. In QBE the request is entered in the form of tables whose skeletons are initially constructed by QBE.

Some of the QBE query template examples:

Example 1. Projection operation

In this template P. implies “Print.” The meaning is: Print the PLAYER_ADDRESS who belong to the country INDIA. To make a projection only put P. in any column of the projection. QBE will enforce uniqueness of projections automatically.

Example 2. Selection operation

To make a selection, put quantifiers in the columns of the attributes in the question. To print a whole record, put P. in the column with the name of the record.

The meaning is to print the PLAYER_ADDRESS who belong to the country INDIA.

Example 3. AND condition

To understand the AND condition consider the following template.

The meaning of the earlier template is: Print the PLAYER_ADDRESS who live in INDIA and belong to the city CHENNAI.

Example 4. OR condition

To understand the OR condition consider the following template:

The meaning of the earlier template is “Print the name of the Player who belongs to the country INDIA and city either CHENNAI or DELHI”.

Example 5. Query involving more than one table

Let us consider a query which involves data from more than one table. Let us consider two tables PLAYER_ADDRESS and PLAYER_RANK. Here we have two tables PLAYER ADDRESS and PLAYER RANK, the template meaning is: Print the name of the player who belong to the country INDIA and rank less than 50. The clue for understanding the query is the fact the variable NAME is the same in all rows of the display.

Example 6. Comparison operation

Consider the EMPLOYEE table with the columns EMPLOYEE_ID, EMPLOYEE_NAME, SALARY, and MANAGER_ID. If one wants to know the name of the employees who make more money than their managers, it can be shown in QBE as:

Example 7. Ordering of records

The records can be arranged either in the ascending order or in the descending order using the operator AO. and DO., respectively.

– AO. implies arrange the records in ascending order.

– DO. implies arrange the records in descending order.

– AO.ALL. implies arrange the records in ascending order by preserving duplicate records.

– DO.ALL. implies arrange the records in descending order by preserving duplicate records.

Both AO. and DO. operators automatically eliminates duplicate responses. However, if one wishes to have all duplicate records displayed, an ALL. Operator must be added.

Consider the relation VEGETABLE which has three attributes VEGETABLENAME, QUANTITY, and PRICE.

The QBE template to print the VEGETABLE in the increasing order of price is given later:

The QBE template to print the VEGETABLE in the decreasing order of price is given later:

Example 8. Retrieval using Negation

The symbol used for negation is +. For example print the quantity and price of the VEGETABLE that do not belong to Brinjal is given by:

Condition Box:
The condition box is used to store logical conditions that are not easily expressed in the table skeleton. A condition box can be obtained by pressing a special function key.

Example 9. Retrieval using condition box:

For example, if we want to print the quantity and price of the VEGETABLE, which is either Ladies Finger or Carrot, the condition box is used.

CONDITIONS
VN=Ladies Finger OR Carrot

Example 10. QBE Built-In Functions

QBE provides MIN, MAX, CNT, SUM, and AVG built-in functions:

– MIN.ALL implies the computation of minimum value of an attribute.
– MAX.ALL implies the computation of maximum value of an attribute.
– CNT.ALL implies COUNT the number of tuples in the relation.
– SUM.ALL implies the computation of sum of an attribute.
– AVG.ALL implies the computation of average value of an attribute.

Note: UNQ. which stands for unique operator is used to eliminate duplicates. For example, CNT.UNQ.ALL computes the number of tuples in the relation by eliminating duplicate values.

Example 10.1. MIN and MAX command

The QBE template to get the minimum and maximum vegetable price is given later:

Example 10.2. AVG command

The QBE template to get the average price of the vegetable is given later.

Example 10.3. CNT command

The QBE template to count the number of unique vegetables in the VEGETABLE relation is shown later.

Example 11. Update operation

The QBE template to increase the price of all vegetables by 10% is given as:

Here U. implies Update. The price UX of the vegetable is increased by 10% which is denoted by 1.1 * UX

Example 12. Record deletion
The QBE template to delete the record of all vegetables is shown later:

Here D. implies deletion of the entire relation.
Single Record Deletion
The QBE form to delete the record of the vegetable “Brinjal” is shown later:

Summary
In relational model, the data are stored in the form of tables or relations. Each table or relation has a unique name. Tables consist of a given number of columns or attributes. Every column of a table must have a name and no two columns of the same table may have identical names. The rows of the table are called tuples. The total number of columns or attributes that comprises a table is known as the degree of the table. The chapter has introduced the basic terminology used in relational model. Specific importance is given to E.F. Codd’s rule.
This chapter also introduced different integrity rules. Relational algebra concepts, different operators like SELECTION, PROJECTION, UNION, INTERSECTION, and JOIN operators were discussed with suitable examples. Relational calculus and its two branches, tuple relational calculus and domain relational calculus, were discussed in this chapter.
Finally, graphical user interface QBE, its relative advantage, different operations in QBE, concept of condition box in QBE, and aggregate functions in QBE were explained with suitable examples.

Review Questions

3.1. What is the degree and cardinality of the “Tennis Player” relation shown later:

3.2. A relation has a degree of 5 and cardinality of 7. How many attributes and tuples does the relation have?

3.3. A relation R has a degree of 3 and cardinality of 2 and the relation S has a degree of 2 and cardinality of 3, then what will be the degree and cardinality of the Cartesian product of R and S?

Ans: Cardinality = 6, Degree = 5.

3.4. What is the key of the following EMPLOYEE table?

Ans: In the earlier table, EMPLOYEE NUMBER is the primary key. Because keys are used to enforce that no two rows are identical.

3.5. Define the operators in the core relational algebra?

3.6. Explain the following concepts in relational databases:

(a) Entity integrity constraint
(b) Foreign key and how it can specify a referential integrity constraint between two relations
(c) Semantic integrity constraint

3.7. Mention the pros and cons of relational data model?

Pros of relational data model:

The relational data model is a well formed and data independent model which is easy to use for applications which fit well into the model.
The data used by most business applications fits this model, and that business applications were the first large customers of database system explains the popularity of the model.

Cons of relational data model:

The simplicity of the model restricts the amount of semantics, which can be expressed directly by the database.
Different groups of information, or tables, must be joined in many cases to retrieve data.

3.8. Bring out the reasons, why relational model became more popular?

Relational model was based on strong mathematical background.
Relational model used the power of mathematical abstraction. Operations do not require user to know storage structures used.
Strong mathematical theory provides tool for improving design.
Basic structure of the relation is simple, easy to understand and implement.

3.9. A union, intersection or difference can only be performed between two relations if they are type compatible. What is meant by type compatibility? Give an example of two type compatible and two nontype compatible relations?

Two relations are type compatible if they have same set of attributes Example of two type compatible relations is:

Men {<name:varchar>, <dob:date>, <address:varchar>} Women {<name:varchar>, <dob:date>, <address:varchar>} Example of two relations which are nontype compatible is: Husband {<name:varchar>, <dob:date>, <salary: number>} Wife {<name:varchar>, <dob:date>, <address:varchar>}

3.10. What are the advantages of QBE?

QBE can be considered as GUI (Graphical User Interface) based on domain calculus. QBE allows users to key in their input requests by filling in empty tables on the screen, and the system will also display its response in tabular form. QBE is user-friendly because the users are not required to formulate sentences for query requests with rigid query-language syntax.

3.11. What do you understand by domain integrity constraint?

The domain integrity constraints are used to specify the valid values that a column defined over the domain can take. We can define the valid values by listing them as a set of values (such as an enumerated data type in a strongly typed programming language), a range of values, or an expression that accepts the valid values.

3.12. What do you understand by “safety of expressions”?

3.13. What are “quantifiers”? How will you classify them?

Chapter 4

Structured Query Language

Learning Objectives. This chapter focuses on how to access the data within a DBMS. An introduction to SQL, an international standard language for manipulating relational database is given in this chapter. After completing this chapter the reader should be familiar with the following concepts in SQL.

– Evolution and benefits of SQL
– Datatypes in SQL
– SQL commands to create a table, inserting records into the table, and extracting information from the table
– Aggregate functions, GROUP BY clause
– Implementation of constraints in SQL using CHECK, PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE commands
– Concepts of sub query, view, and trigger

4.1 Introduction

SQL stands for “Structured Query Language.” The Structured Query Language is a relational database language. By itself, SQL does not make a DBMS. SQL is a medium which is used to communicate to the DBMS. SQL commands consist of English-like statements which are used to query, insert, update, and delete data. English-like statements mean that SQL commands resemble English language sentences in their construction and use and therefore are easy to learn and understand.

SQL is referred to as nonprocedural database language. Here nonprocedural means that, when we want to retrieve data from the database it is enough to tell SQL what data to be retrieved, rather than how to retrieve it. The DBMS will take care of locating the information in the database.

Commercial database management systems allow SQL to be used in two distinct ways. First, SQL commands can be typed at the command line directly. The DBMS interprets and processes the SQL commands immediately, and the results are displayed. This method of SQL processing is called interactive SQL. The second method is called programmatic SQL. Here, SQL statements are embedded in a host language such as COBOL, FORTRAN, C, etc. SQL needs a host language because SQL is not a really complete computer programming language as such because it has no statements or constructs that allow branch or loop. The host language provides the necessary looping and branching structures and the interface with the user, while SQL provides the statements to communicate with the DBMS.

Some of the features of SQL are:

– SQL is a language used to interact with the database.
– SQL is a data access language.
– SQL is based on relational tuple calculus.
– SQL is a standard relational database management language.
– The first commercial DBMS that supported SQL was Oracle in 1979.
– SQL is a “nonprocedural” or “declarative” language.

4.2 History of SQL Standard

The origin of the SQL language date back to a research project conducted by IBM at their research laboratories in San Jose, California in the early 1970s. The aim of the project was to develop an experimental RDBMS which would eventually lead to a marketable product. At that time, there was a lot of interest in the relational model for databases at the academic level, in conferences and seminars. IBM, which already had a large share of the commercial database market with hierarchical and network model DBMSs, realized that the relational model would dominate the future database products. The project at IBM’s San Jose labs was started in 1974 and was named System R. A language called SEQUEL (Structured English QUEry Language) was chosen as the relational database language for System R. A version of SEQUEL was developed at the IBM San Jose research facilities and tested with college students.

In November 1976, specifications for SEQUEL2 were published. In 1980 minor revisions were made to SEQUEL, and it was renamed “SQL.” SEQUEL was renamed to SQL because the name SEQUEL had already been used for hardware product. In order to avoid confusion and legal problems SEQUEL was renamed to SQL. In the first phase of the System R project, researchers concentrated on developing a basic version of the RDBMS. The main aim at this stage was to verify that the theories of the relational model could be translated into a working, commercially viable product. This first phase was successfully completed by the end of 1975, and resulted in a single-user DBMS based on the relational model. The System R project was completed in 1979. The theoretical work of the System R project resulted in the development and release of IBM’s first commercial relational database management system in 1981. The product was called SQL/DS (Structured Query Language/Data Store) and ran under the DOS/VSE operating system environment. Two years later, IBM announced a version of SQL/DS for VM/CMS operating system.

In 1983, IBM released a second SQL-based RDBMS called DB2, which ran under the MVS operating system. DB2 quickly gained widespread popularity and even today, versions of DB2 form the basis of many database systems found in large corporate data-centers. During the development of System R and SQL/DS, other companies were also at work creating their own relational database management systems. Some of them, Oracle being an example, even implemented SQL as the relational database language for their DBMSs concurrently with IBM. Later on, SQL language was standardized by ANSI and ISO. The ANSI SQL standards were first published in 1986 and updated in 1989, 1992, and 1999.

4.2.1 Benefits of Standardized Relational Language

The main advantages of standardized language are given below.

Reduced training cost
Enhanced productivity
Application portability
Application portability means applications can be moved from machine to machine when each machine uses SQL.
Application longevity
A standard language tends to remain so for a long time, hence there will be little pressure to rewrite old applications.
Reduced dependence on a single vendor

SQL language development is given in a nutshell below:

In 1970 E.F. Codd of IBM released a paper “A relational model of data for large shared data banks.” IBM started the project System R to demonstrate the feasibility of implementing the relational model in a database management system. The language used in system R project was SEQUEL. SEQUEL was renamed SQL during the project, which took place from 1974 to 1979.
The first commercial RDBMS from IBM was SQL/DS. It was available in 1981.
Oracle from relational software (now Oracle corporation) was on the market before SQL/DS, i.e., 1979.
Other products included INGRES from relational Technology Sybase from Sybase, Inc. (1986), DG/SQL from Data General Corporation (1984).

4.3 Commands in SQL

SQL commands can be classified in to three types:

Data Definition Language commands (DDL)
Data Manipulation Language commands (DML)
Data Control Language commands (DCL)

DDL

DDL commands are used to define a database, including creating, altering, and dropping tables and establishing constraints.

DML

DML commands are used to maintain and query a database, including updating, inserting, modifying, and querying data.

DCL

DCL commands are used to control a database including administering privileges and saving of data. DCL commands are used to determine whether a user is allowed to carry out a particular operation or not. The ANSI standard groups these commands as being part of the DDL.

The classification of commands in SQL is shown below.

4.4 Datatypes in SQL

In relational model the data are stored in the form of tables. A table is composed of rows and columns. When we create a table we must specify a datatype for each of its columns. These datatypes define the domain of values that each column can take. Oracle provides a number of built-in datatypes as well as several categories for user-defined types that can be used as datatypes. Some of the built-in datatypes are string datatype to store characters, number datatype to store numerical value, and date and time datatype to store when the event happened (history, date of birth, etc.).

STRING

In string we have CHAR and VARCHAR datatypes. Character datatype store data which are words and free-form text, in the database character set.

CHAR Datatype

The CHAR datatype specifies a fixed-length character string. The syntax of CHAR datatype declaration is:

CHAR (n) – Fixed length character data, “n” characters long.

Here “n” specifies the character length. If we insert a value that is shorter than the column length, then Oracle blank-pads the value to column length. If we try to insert a value that is too long for the column then Oracle returns error message.

VARCHAR2 Datatype

The VARCHAR2 datatype specifies a variable-length character string. The syntax of VARCHAR2 datatype declaration is:

VARCHAR2 (n) – Variable length character of “n” length.

Here “n” specifies the character length.

VARCHAR vs. VARCHAR2

The VARCHAR datatype behaves like VARCHAR2 datatype in the current version of Oracle.

In order to justify the above statement, let us create a table CHAMPION, which refers to Wimbledon Champions. The attributes of the table CHAMPION are Name, Nation, Year (the year in which the sportsman has won the title). For our example, let us use the datatype VARCHAR for the attribute Name and VARCHAR2 for the datatype Nation. The SQL command to create CHAMPION is shown in Fig. 4.1.

Now let us try to see the description of the table. The description of the table is shown in Fig. 4.2.

From Fig. 4.2, it is clear that both name and nation are stored as VARCHAR2(12). This means that VARCHAR datatype in the Oracle 8i version behaves the same as VARCHAR2.

NUMBER Datatype

The NUMBER datatype stores zero, positive, and negative fixed and floating point numbers.

The syntax to store fixed-point number is NUMBER (p, q) where “p” is the total number of digits and “q” is the number of digits to the right of decimal point.

The syntax to specify an integer is NUMBER (p).

DATE Datatype

The DATE datatype is used to store the date and time information. For each DATE value, Oracle stores the century, year, month, date, hour, minute, and second information. The ANSI date literal contains no time portion, and must be specified in YYYY-MM-DD format where Y stands for Year, M for month, and D for date.

TIME STAMP Datatype

The TIME STAMP datatype is used to store both date and time. It stores the year, month, and day of the DATE datatype, and also hour, minute, and second values.

LOB Datatype

Multimedia data like sound, picture, and video need more storage space. The LOB datatypes such as BLOB, CLOB, and BFILE allows us to store large block of data.

BLOB Datatype

The BLOB datatype stores unstructured binary data in the database. BLOBs can store up to 4GB of binary data.

CLOB Datatype

The CLOB datatype can store up to 4GB of character data in the database.

BFILE Datatype

The BFILE datatype stores unstructured binary data in operating system files outside the database. A BFILE can store up to 4GB of data.

4.5 Data Definition Language (DDL)

The Data Definition Language is

– Used to define schemas, relations, and other database structures
– Also used to update these structures as the database evolves

Examples of Structure Created by DDL

The different structures that are created by DDL are Tables, Views, Sequences, Triggers, Indexes, etc.

Tables
The main features of table are:
- – It is a relation that is used to store records of related data. It is a logical structure maintained by the database manager.
- – It is made up of columns and rows.
- – At the intersection of every column and row there is a specific data item called a value.
- – A base table is created with the CREATE TABLE statement and is used to hold persistent user data.
Views
The basic concepts of VIEW are:
- – It is a stored SQL query used as a “Virtual table.”
- – It provides an alternative way of looking at the data in one or more tables.
- – It is a named specification of a result table. The specification is a SELECT statement that is executed whenever the view is referenced in an SQL statement. Consider a view to have columns and rows just like a base table. For retrieval, all views can be used just like base tables.
- – When the column of a view is directly derived from the column of a base table, that column inherits any constraints that apply to the column of the base table. For example, if a view includes a foreign key of its base table, INSERT and UPDATE operations using that view are subject to the same referential constraints as the base table. Also, if the base table of a view is a parent table, DELETE and UPDATE operations using that view are subject to the same rule as DELETE and UPDATE operations on the base table.
Sequences
- – A sequence is an integer that varies by a given constant value. Typically used for unique ID assignment
Triggers
- – Trigger automatically executes certain commands when given conditions are met.
Indexes
- – Indexes are basically used for performance tuning. Indexes play a crucialbrole in fast data retrieval.

Create Table Command

– The CREATE TABLE command is used to implement the schemas of individual relations.

Steps in Table Creation

Identify datatypes for attributes
Identify columns that can and cannot be null
Identify columns that must be unique
Identify primary key–foreign key mates
Determine default values
Identify constraints on columns (domain specifications)
Create the table

Syntax

CREATE TABLE table name

(column-name1 data-type-1 [constraint],
column-name2 data-type-2 [constraint],
column-nameN data-type-N [constraint]
);

Example Table

See Table 4.1.

Syntax to Create the Table

The general syntax to create the table is given below. Here the key words are shown in bold and capital letters.

CREATE TABLE table name
(column name1 data type (size),
column name2 data type (size),
column name N data type (size));

Example

The SQL command to define Table 4.1 is shown in Fig. 4.3. In this example the name of the table is peaks. The table has five columns which are serial number, name of the mountain (peak), height, place where the mountain is situated, range of the mountain.

To see the description of the table

To see the description of the table we have created we have the command DESC. Here DESC stands for description of the table. The syntax of DESC command is:

Syntax: DESC table name;

The DESC command returns the attributes (columns) of the table, the datatype associated with the column, and also any constraint (if any) imposed on the column. Figure 4.4 shows the description of the table PEAKS.

To insert values into the table

Syntax: Insert into <tablename> values (‘&columnname1’, ‘&columnname2’, &col3,. . . );

(e.g.) The SQL syntax and the corresponding output are shown in Fig. 4.5. Now to insert the next set of values, use the slash as shown in Fig. 4.6.

To view the entire table

The SQL syntax to see all the columns of the table is:

SELECT * FROM table name;

Here the asterisk symbol indicates the selection of all the columns of the table.

Example

The SQL command to see all the columns of the table PEAKS and the corresponding output are shown in Fig. 4.7.

SQL> select * from peaks;

4.6 Selection Operation

Selection operation can be considered as row wise filtering. We can select specific row(s) using condition.

Syntax of SELECTION Operation

SELECT * FROM table name
WHERE condition;

Example of SELECTION operation

In the example Table 4.1, there are three rows. Let us filter two rows sothat only one row will appear in the result. Here the condition used to filter the rows is the “height” of the PEAKS. The SQL command to implement SELECTION operation and the corresponding output are shown in Fig. 4.8.

From Fig. 4.8 it is clear that even though there are three rows in the Table 4.1, it is reduced to one using the condition the height of the peaks. This operation which filters the rows of the relation is called SELECTION.

4.7 Projection Operation

The projection operation performs column wise filtering. Specific columns are selected in projection operation.

Syntax of PROJECTION Operation

SELECT column name1, column name2, Column name N FROM table name;

If all the columns of the table are selected, then it cannot be considered as PROJECTION.

The SQL command to perform PROJECTION operation on the relation PEAKS and the corresponding results are shown in Fig. 4.9.

From Fig. 4.9, it is clear that only three columns are selected in the result, even though there are five columns in the Table 4.1.

SELECTION and PROJECTION Operation

We can perform both selection and projection operation in a relation. If we combine selection and projection operation means naturally we are restricting the number of rows and the columns of the relation.

Syntax for Selection and Projection

SELECT column name1, column name 2. . . . . column name N
FROM table name
WHERE condition;

The selection and projection operation applied to the peaks relation is shown in Fig. 4.10.

From Fig. 4.10, we can observe that the selection operation is based on the “place” of the peaks. As a result only one row is obtained as the result. Because of projection operation only three columns are obtained in the result as shown in Fig. 4.10.

4.8 Aggregate Functions

SQL provides seven built-in functions to facilitate query processing. The seven built-in functions are COUNT, MAX, MIN, SUM, AVG, STDDEV, and VARIANCE. The uses of the built-in functions are shown in Table 4.2.

4.8.1 COUNT Function

The built-in function returns the number of rows of the table. There are variations of COUNT function. First let us consider COUNT (*) function. In order to understand the COUNT (*) function consider the relation PERSON SKILL as shown in Table 4.3, the relation PERSON has only two columns, name of the person and skills associated with the person. It is to be noted that some persons may have more than one skill and some persons may not have any skills.

From Table 4.3, we can observe that the table PERSON SKILL has six rows and two columns and the person Ashok has more than one skill and Sam has no skill hence a NULL is inserted against Sam.

(A) COUNT (*) Function

The syntax of Count (*) function is:

SELECT COUNT (*)
FROM table name;

Now let us try to view the table PERSON, and the contents of the table PERSON as shown in Fig. 4.11. From this figure, it is clear that the number of rows of the table is six.

Now let us use the COUNT (*) function to view the number of rows of the relation PERSON. The SQL command and the corresponding output are shown in Fig. 4.12.

From Fig. 4.12, we can observe that the number of rows returned is six, which means that the COUNT(*) function takes into account the NULL values.

(B) COUNT (attribute name) Function

A slight variation of COUNT (*) function is COUNT (attribute name) function. The syntax of this function is given by:

SELECT COUNT (attribute name)
FROM table name;

The application of COUNT (attribute name) to the PERSON table and the corresponding output are shown in Fig. 4.13.

From Fig. 4.13, it is clear that count (attribute name) command will take NULL values into account as a result the number of rows selected is six.

(C) COUNT (DISTINCT attribute name)

The COUNT (DISTINCT attribute name) command returns the number of rows of the relation, by eliminating duplicate values. The syntax of COUNT (DISTINCT attribute name) is:

SELECT COUNT (DISTINCT attribute name)
FROM table name;

The usage of COUNT (DISTINCT attribute name) in the table PERSON and the corresponding output is shown in Fig. 4.14.

It is worthwhile to note that the DISTINCT command will not take into consideration the NULL value. In order to prove this, let us select the attribute be skill rather than the attribute name. The result of choosing the attribute as skill is show in Fig. 4.15.

4.8.2 MAX, MIN, and AVG Aggregate Function

In order to understand MAX, MIN, and AVG aggregate function consider the relation CONSUMER PRODUCTS. The relation CONSUMER PRODUCTS has two attributes, the name of the product and the price associated with the product as shown in Table 4.4.

(A) MAX Command

The MAX command stands for maximum value. The MAX command returns the maximum value of an attribute. The syntax of MAX command is:

SELECT MAX (attribute name)
FROM table name;

Let us apply the MAX command to Table 4.4 to get the maximum price of the product, the SQL command and the corresponding output are shown in Fig. 4.16.

Let us try to find the name of the product which has maximum price by using PROJECTION operation and the IN operator as shown in Fig. 4.17.

(B) MIN Command

The MIN command is used to return the minimum value of an attribute. The syntax of MIN command is same as MAX command.

Syntax of MIN Command is

SELECT MIN (attribute name)
FROM table name;

The use of MIN command and the corresponding result are shown in Fig. 4.18.

From Table 4.4 the minimum price of the product is 3,500 which are returned as the result.

To know the name of the product which has minimum price, we can use IN operator as shown in Fig. 4.19.

From Fig. 4.19, it is clear that we can use IN operator along with PROJECTION operation to get the name of the product with minimum price.

(C) AVG Command

The AVG command is used to get the average value of an attribute. The syntax of AVG command is:

SELECT AVG (attribute name)
FROM table name;

Let us apply AVG command to the Table 4.4, to get the average price of the product. The result of applying AVG command is shown in Fig. 4.20. The average price of the product is (15, 000 + 10, 000 + 17, 000 + 3, 500)/4 which is 11,375 as shown in Fig. 4.20.

(D) STDDEV Function

The STDDEV function is used to compute the standard deviation of the attribute values. The syntax of the standard deviation function is:

SELECT STDDEV (attribute name)
FROM table name;

The STDDEV function applied to the relation CONSUMERPRODUCT (Table 4.4) is shown in Fig. 4.21.

(E) VARIANCE Function

The variance function is used to get the variance of the attribute values. The syntax of VARIANCE function is:

VARIANCE (attribute name)
FROM table name;

Let us apply the VARIANCE to the consumer product table; the result is shown in Fig. 4.22. We know that the variance is the square of the standard deviation. We have obtained the standard deviation from Fig. 4.21 as 6019.0669; the square of this value is approximately 36229167 which is obtained in Fig. 4.22.

(F) GROUP BY Function

The GROUP BY clause is used to group rows to compute group-statistics. It is to be noted that when the GROUP BY clause is present, then the SELECT clause may include only the columns that appear in the GROUP BY clause and aggregate functions.

In order to understand the GROUP BY Function let us consider the table PLACEMENT as shown in Table 4.5 which refers to the number students placed in different companies. The table PLACEMENT consists of three attributes (columns) which are company name, department name which refers to the curriculum stream and strength which refers to the number of students placed.

Now we want to know the total number of students placed in each branch. For this we can use the GROUP BY command. The syntax of GROUP BY command is:

SELECT attribute name, aggregate function
FROM table name
GROUP BY attribute name;

It is to be noted that the attribute name after SELECT command should match with the attribute name after GROUP BY command. The GROUP BY command which is used to find the total number of students placed in each branch is shown in Fig. 4.23.

(G) HAVING Command

The HAVING command is used to select the group. In other words HAVING restricts the groups according to a specified condition. The syntax of HAVING command is:

SELECT attribute name, aggregate function
FROM table name
GROUP BY attribute name
HAVING condition;

Let us use the HAVING command as shown in Fig. 4.24 to find the details of the department in which more than 90 students got placement.

From Fig. 4.24, we are able to get the details of the department where more than 90 students were placed.

(H) SORTING of Results

The SQL command ORDER BY is used to sort the result in ascending or descending order.

The table used to understand ORDER BY command is BESTCRICKETER. The table BESTCRICKETER as shown in Table 4.6 gives the details of best batsman of the world. The attributes of the BESTCRICKETER are the name of the batsman, the country they belong to, and the number of centuries they scored.

Case 1: The syntax of ORDER BY command to arrange the result in ascending order is:

SELECT *
FROM table name
ORDER BY attribute name ASC;

Here ASC stands for ascending order.

Let us apply the command to the Table 4.6, the result of using ORDER BY command and the corresponding results are shown in Fig. 4.25.

Case 2: The syntax to arrange the result in descending order is:

SELECT *
FROM table name
ORDER BY attribute name DESC.

Here DESC stands for descending order.

Let us apply this DESC keyword to arrange the centuries in descending order. The SQL command and the corresponding output are shown in Fig. 4.26.

Case 3: If we do not specify as ASC or DESC after ORDER BY key word, by default, the results will be arranged in ascending order.

From Fig. 4.27, it is evident that if nothing is specified as ASC or DESC then by default, the results will be displayed in ascending order.

(I) Range Queries Using Between

The SQL has built-in command BETWEEN which is used to perform range queries.

Let us try to find the details of the batsman who has scored centuries greater than 20 and less than 30. The SQL command to accomplish this task and the corresponding output are shown in Fig. 4.28.

4.9 Data Manipulation Language

The data manipulation language is used to add, update, and delete data in the database. The SQL command INSERT is used to add data into the database, the SQL command UPDATE is used to modify the data in the database, and the SQL command DELETE is used to delete data in the database. Here the term database refers to the table.

4.9.1 Adding a New Row to the Table

The INSERT command is to add new row to the table. The syntax of INSERT command is:

INSERT INTO table name
VALUES (‘&column1-name’, ‘&column2-name’. . . &columnN-name);

It is to be noted that apostrophe is not required for numeric datatype. Let us try to insert a new row to the Table 4.6 (which has already six rows) to include the little master Sachin Tendulkar. The SQL command and the corresponding output are shown in Fig. 4.29.

To verify whether the new row has been added to the Table 4.6 which had six rows before inserting the new row, let us issue SELECT command as shown in Fig. 4.30.

From Fig. 4.30, it is clear that little master Sachin Tendulkar record being added to the best cricketer table so that the total number of rows is seven.

4.9.2 Updating the Data in the Table

The data in the table can be updated by using UPDATE command. The syntax of the UPDATE command is:

UPDATE table name
SET attribute value=new value
WHERE condition;

Let us apply this UPDATE command to the table BESTCRICKETER. The motive is to modify the number of centuries hit by Sachin Tendulkar to 35. The corresponding SQL command and the output are shown in Fig. 4.31.

To see whether the table has been updated or not use SELECT statement to view the content of the table BESTCRICKETER. The updated table is shown in Fig. 4.32.

4.9.3 Deleting Row from the Table

The DELETE command in SQL is used to delete row(s) from the table. The syntax of DELETE command is

DELETE FROM table name
WHERE condition;

Let us delete the record of a particular player (say Gooch) from the table BESTCRICKETER. The SQL command to delete a particular row and the corresponding output are shown in Fig. 4.33.

To verify whether the player Gooch record has been deleted, let us use SELECT command to view the content of the table as shown in Fig. 4.34. From this figure it is evident that the player Gooch record has been successfully deleted.

4.10 Table Modification Commands

We can use ALTER command to alter the structure of the table, that is we can add a new column to the table. It is also possible to delete the column from the table using DROP COLUMN command.

4.10.1 Adding a Column to the Table

We can add a column to the table by using ADD command. The syntax to add a new column to the table is:

ALTER TABLE table name
ADD column name datatype;

Example to Add a New Column

Let us consider the Table 4.6 BESTCRICKETER, which has three columns which are name of the player, country the player belong to, and the centuries which refer to the number of centuries scored by the player. Now try to add one more column to the table BESTCRICKETER. The new column to be added is age which refers to player age. The SQL command to add the new column age and the corresponding output are shown in Fig. 4.35.

To see the description of the table after adding the new column age to the table bestcricketer, let us use DESC command as shown in Fig. 4.36.

From Fig. 4.36 we can observe that a new column age of datatype number has been added to the table bestcricketer.

After successfully inserting the column age, we will be interested to know the content of the table to see any value is assigned to the column age. Figure 4.37 shows the content of the table after adding a new column.

From Fig. 4.37, it is clear that the table already contains rows when the column age is added, then the new column age is initially null for all the rows.

To Insert Values into the New Column

Data can be inserted to the newly added column (in our example it is age) by using UPDATE command.

For example, we want to insert the age of Sachin Tendulkar to be 33. This is done using UPDATE command as shown in Fig. 4.38.

To verify whether the age of sachin tendulkar has been added as 33, see Fig. 4.39.

4.10.2 Modifying the Column of the Table

We can modify the width of the datatype of the column by using ALTER and MODIFY command. The syntax to change the datatype of the column is:

ALTER table name
MODIFY column-name datatype;

Example to Modify the Width of the Datatype of the Column

For example, we want to modify the width of the datatype age which is three as shown in Fig. 4.36 to four. The SQL command and the corresponding output are shown in Fig. 4.40.

From Fig. 4.40 we can observe that the width of the datatype age modified as four which was three earlier as shown in Fig. 4.36.

4.10.3 Deleting the Column of the Table

The DROP COLUMN command can be used along with the ALTER table command to delete the column of the table. The syntax to delete the column from the table is:

ALTER table name
DROP COLUMN column name;

Example

Let us try to delete the column age from the BESTCRICKETER by using DROP COLUMN command. The syntax to drop the column and the corresponding output are shown in Fig. 4.41.

After dropping the column age, the description of the table will be as shown in Fig. 4.42.

From Fig. 4.42, it is evident that the column age is not included in the table description.

The content of the table after dropping the column age is shown in Fig. 4.43.

4.11 Table Truncation

The TRUNCATE TABLE command removes all the rows from the table. The truncate table also releases the storage space used by the table. The syntax of TRUNCATE command is:

TRUNCATE TABLE table name;

Example

Let us try to delete all the rows of the table bestcricketer by issuing TRUNCATE TABLE command. The SQL command and the corresponding output are shown in Fig. 4.44.

After table truncation, if we try to select the rows, what will be the output? To answer this question, let us try to see the content of the table by using SELECT command as shown in Fig. 4.45.

From Fig. 4.45, it is clear that all the rows are deleted by issuing TRUNCATE TABLE command. After the TRUNCATE TABLE command if we try to see the description of the table by issuing DESC command as shown in Fig. 4.46.

From Fig. 4.46, it is clear that the TRUNCATE TABLE command deletes the content (all rows) of the table but not the table definition.

Note Another way to delete all the rows of the table is to use DELETE command. The syntax is:

DELETE FROM table name;

4.11.1 Dropping a Table

The definition of the table as well as the contents of the table is deleted by issuing DROP TABLE command. The syntax of DROP TABLE command is:

DROP TABLE table name;

Example

Let us issue the DROP TABLE command to the table BESTCRICKETER as shown in Fig. 4.47.

After issuing the DROP TABLE command if we try to see the description of the table, we will get the result as shown in Fig. 4.48.

From Fig. 4.48 it is clear that DROP TABLE command deletes both the content and the descriptions of the table.

4.12 Imposition of Constraints

Constraints are basically used to impose rules on the table, whenever a row is inserted, updated, or deleted from the table. Constraints prevent the deletion of a table if there are dependencies. The different types of constraints that

can be imposed on the table are NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, and CHECK.

Whenever an attribute is declared as NOT NULL then it specifies that the attribute cannot contain a NULL value.

The UNIQUE constraint specifies that whenever an attribute or set of attributes are specified as UNIQUE, then the values of the attribute should be unique for all the rows of the table. For example, consider the Roll number of the student in the class, every student should have UNIQUE roll number.

PRIMARY KEY constraint is used to identify each row of the table uniquely.

FOREIGN KEY constraint specifies that the value of an attribute in one table depends on the value of the same attribute in another table.

CHECK constraint defines a condition that each row must satisfy. Also there is no limit to the number of CHECK constraints that can be imposed on a column.

4.12.1 NOT NULL Constraint

If one is very much particular that the column is not supposed to take NULL value then we can impose NOT NULL constraint on that column. The syntax of NOT NULL constraint is:

CREATE TABLE table name
(column name1, data-type of the column1, NOT NULL
column name2, data-type of the column2,
column nameN, data-type of the columnN);

The above syntax indicates that column1 is declared as NOT NULL.

Example

Consider the relation PERSON, which has the attributes name of the person, salary of the person, phone number of the person. Let us try to declare the column name as NOT NULL. This implies that every person should have a name. The syntax to declare the column name as NOT NULL is shown in Fig. 4.49.

From Fig. 4.49, it is clear that the attribute name is declared as NOT NULL. Now let us try to insert NOT NULL values and NULL value to the attribute name.

Case 1: Inserting a NOT NULL value to the attribute name.

From Fig. 4.50, it is clear that when we try to insert a NOT NULL name into the name attribute, the name is included in the relation PERSON1.

Case 2: A NULL value to the attribute name.

From Fig. 4.51, it is clear that when we try to insert a NULL value into the PERSON1 relation, we get the error message as shown in Fig. 4.51 since the attribute name is declared as NOT NULL.

4.12.2 UNIQUE Constraint

The UNIQUE constraint imposes that every value in a column or set of columns be unique. It means that no two rows of a table can have duplicate values in a specified column or set of columns.

Example

In order to understand unique constraint, let us create the table CELLPHONE, which has three attributes. The three attributes are model of the cellphone, make which refers to manufacturer, and the price.

The relation CELLPHONE is created as shown in Fig. 4.52 with unique constraint on model. When a unique constraint is imposed on the attribute model, then no two models should have same number.

The values are inserted into the table CELLPHONE. The resulting tables after inserting the values are shown in Fig. 4.53.

From Fig. 4.53, we can observe that the table CELLPHONE has three rows.

Case 1: Now let us try to insert a row in the relation CELLPHONE by violating the UNIQUE constraint, i.e., we are trying to insert a row with model number 1100 which already exists. The insertion and the corresponding result are shown in Fig. 4.54. From this figure, we can observe that there is an error message “unique constraint (SCOTT.SYS C00820) violated.” The reason for getting this error message is we tried to enter the model (1100) which exists already in the CELLPHONE relation as shown in Fig. 4.53.

Case 2: Insertion of NULL Value to the Model Attribute. Let us try to insert a null value to the attribute model. The SQL command to insert a null value to the attribute model and the corresponding result are shown in Fig. 4.55.

Difference Between NOT NULL and UNIQUE Constraint

The unique constraint accepts NULL value as shown in Fig. 4.55, whereas the NOT NULL constraint will not accept NULL values.

Note NOT NULL constraint accepts duplicate values, whereas UNIQUE constraint will not accept null values. Moreover when a UNIQUE constraint is imposed on an attribute means that attribute can accept NULL values. Whereas NOT NULL constraint will not accept NULL values.

4.12.3 Primary Key Constraint

When an attribute or set of attributes is declared as the primary key, then the attribute will not accept NULL value moreover it will not accept duplicate values. It is to be noted that “only one primary key can be defined for each table.”

Example

Consider the relation EMPLOYEE with the attributes ID which refers to Employee identity, NAME of the employee, and SALARY of the employee. Each employee will have unique ID hence ID is declared as the primary key as shown in Fig. 4.56.

From Fig. 4.56, it is clear that the attribute employee ID is declared as the primary key.

Case 1: Insertion of NULL Value to the Primary Key Attribute.

It is to be noted that the primary key will not take any NULL value. This is called entity integrity. Now let us try to insert a NULL value to the employee ID in the SQL syntax, and the corresponding output is shown in Fig. 4.57. From Fig. 4.57, it is evident that an attribute or set of attributes declared as primary key will not accept NULL values.

Case 2: Insertion of Duplicate Values into an Attribute Declared as Primary Key.

When an attribute is declared as primary key, all the values of the attribute should be UNIQUE. The primary key attribute will not accept duplicate values.

Let us try to insert duplicate values to the attribute employee ID which is declared as primary key. The SQL command and the corresponding output are shown in Fig. 4.58.

We got an error message in Fig. 4.54, because we have tried to insert the employee ID e101 twice. From this we can understand that when an attribute is declared as primary key, the values of the attribute should be UNIQUE.

Difference Between UNIQUE and NOTNULL Constraint

The difference between UNIQUE and NOTNULL constraint is given in the tabular form as

Difference Between UNIQUE and PRIMARY KEY Constraint

The difference between UNIQUE and PRIMARY KEY is given in tabular form as

4.12.4 CHECK Constraint

CHECK constraint is added to the declaration of the attribute. The CHECK constraint may use the name of the attribute or any other relation or attribute name may in a subquery. Attribute value check is checked only when the value of the attribute is inserted or updated.

Syntax of CHECK Constraint

In order to understand check constraint, consider the relation VOTERS. In India, only those who have completed the age of 19 are eligible to vote. Let us impose this constraint on age in our relation VOTERS. The VOTERS relation has the attributes name, which refers to the name of the voter, age of the voter, address of the voter.

The creation of the table VOTERS with CHECK constraint imposed on age is shown in Fig. 4.59.

From Fig. 4.59, we can observe that CHECK constraint is imposed on the attribute age.

Case 1: Insertion of Data Without Violating the Constraint.

Let us try to insert the values into the table VOTERS without violating the constraint, that is the age of the voter is greater than 19. The SQL syntax and the corresponding output are shown in Fig. 4.60. From this figure, it is evident that the data are successfully inserted into the table VOTERS because the age of the voter is greater than 19.

Case 2: Insertion of Data into the Table VOTERS by Violating the CHECK Constraint.

Now let us try to insert data into the table VOTERS by violating the CHECK constraint, that is inserting the record of the voter with age less than 19. The SQL command to insert the data and the corresponding output are shown in Fig. 4.61.

From Fig. 4.61, we can observe that we try to insert a value which violates the CHECK constraint, we get error message.

Case 3: CHECK Constraint During Updation of Record.

The content of the VOTER table is given in Fig. 4.62.

For simplicity, there is only one record in the VOTERS table. Now let us try to update the record by changing the age of the voter to less than 19, as shown in Fig. 4.63.

From Fig. 4.63, we can observe that it is not possible to update the record by violating the CHECK constraint.

4.12.5 Referential Integrity Constraint

According to referential integrity constraint, when a foreign key in one relation references primary key in another relation, the foreign key value must

match with the primary key value. In other words, the referential integrity says “pointed to” information must exist.

Example

In order to understand referential constraint, consider two relation DEPARTMENT and EMPLOYEE. Here the DEPARTMENT relation forms the parent table. The meaning is the DEPARTMENT table contains the primary key. The relation EMPLOYEE forms the child table. The meaning is the relation EMPLOYEE has foreign key which references to primary key in DEPARTMENT table. Figure 4.64 shows parent–child relationship.

In our example, the relation DEPARTMENT is the parent table which holds the parent table, and the relation EMPLOYEE forms the child table which has foreign key which references primary key in DEPARTMENT table. It is to be noted that the parent table should be created first, then the child table.

The SQL syntax to create the two relations DEPARTMENT and EMPLOYEE with primary key and foreign key constraints is shown in Fig. 4.65 and Fig. 4.66, respectively.

Case 1: Now let us try to insert a value into DepartmentID of the employee table which is not in department table. The department relation has only three department IDs D100, D101, D102. Now we are trying to insert D103 in the DID (which stands for department ID) of employee table. The SQL command and the corresponding output are shown in Fig. 4.67.

From Fig. 4.67, it is evident that the values are not able to insert into the employee table. The reason for not able to insert value into the employee table is: we have tried to insert the DID (department id) into the employee table (child table) which is not matching with DeptID (department id) of the department table (parent table). In other words the foreign key value in the child table does not match with the primary key value in the parent relation.

The referential integrity rule says that the foreign key value should match with the primary key value.

Case 2: NULL Value into Foreign Key Attribute.

Now let us try to insert a null value into the foreign key attribute. The SQL command and the corresponding output are shown in Fig. 4.68.

From Fig. 4.68, it is evident that NULL value cannot be inserted into foreign key attribute unless it matches with the primary key attribute.

4.12.6 ON DELETE CASCADE

When the clause ON DELETE CASCADE is included in the child table, and if a row is deleted from the parent table then the corresponding referenced value in the child table will also be deleted.

Example

Let us consider the DEPARTMENT (parent table) and EMPLOYEE (child table) relation. The employee relation is modified as shown in Fig. 4.69. From this figure, it is clear that we have included the clause ON DELETE CASCADE in the child table.

The content of the table DEPARTMENT and EMPLOYEE are shown below.

Now let us try to delete the department “Civil” in the DEPARTMENT table. If we delete the row “civil” in the DEPARTMENT table, what will be the impact in the EMPLOYEE table?

First the content of employee table is shown in Fig. 4.70. The number of tuples in the EMPLOYEE relation is three.

Now we are going to delete the department “civil” in the table DEPARTMENT. The SQL command and the corresponding output are shown in Fig. 4.71.

Now let us see the impact of deleting the record “civil” in the child table which is EMPLOYEE in our case. The modified table EMPLOYEE is shown in Fig. 4.72.

By carefully analyzing the Figs. 4.71 and 4.72, we can observe that the record “civil” in the child table (employee) being deleted.

If ON DELETE CASCADE clause is included in the child table means whatever record deleted in the parent table will be deleted in the child table.

4.12.7 ON DELETE SET NULL

If ON DELETE SET NULL clause is include in the child table means, whenever a row in the parent table is deleted, then the corresponding referenced value in the child table will be set null.

Example

Let us consider the parent table as DEPARTMENT and the child table as EMPLOYEE as before. The child table is created with ON DELETE SET NULL as shown in Fig. 4.73.

The EMPLOYEE table before modification is shown below.

Now modify the table DEPARTMENT by deleting the “electrical” department record. The SQL command to delete the record “electrical” and the corresponding output are shown in Fig. 4.74.

The impact of deleting the record “electrical” in parent table DEPARTMENT on the child table EMPLOYEE is shown in Fig. 4.75.

From Fig. 4.75, we can observe that a NULL value is there corresponding to the ID of the “electrical” department. This is due to inclusion of the clause ON DELETE NULL in the child table (EMPLOYEE).

4.13 Join Operation

Join operation is used to retrieve data from more than one table. Before proceeding to JOIN operation let us discuss first the Cartesian product. Cartesian product with suitable selection and projection operation forms different types of join.

Cartesian Product

If we have two tables A and B, then Cartesian product combines all rows in the table A with all rows in the table B. If n1 is the number of rows in the table A and n2 is the number of rows in the table B. Then the Cartesian product between A and B will have n1 × n2 rows.

Example

In order to understand Cartesian product, let us consider two relations doctor and nurse. The relation doctor has the attribute ID which refers to identity of the doctor, name and department. Similarly, the relation nurse has three attributes NID, which refers to nurse identity, name and department. The doctor relation is shown in Fig. 4.76.

Similarly the nurse relation is shown in Fig. 4.77.

From Figs. 4.76 and 4.77 we can observe that the number of rows in doctor and nurse relation is 4. Now let us try to find the Cartesian product between the two relations doctor and nurse. The Cartesian product should return 4×3 = 12 rows. The SQL command to perform Cartesian product between the two relations doctor and nurse and the corresponding output are shown in Fig. 4.78. From this figure, it is evident that the Cartesian product between two relations has 12 tuples (rows).

4.13.1 Equijoin

In equijoin, the join condition is based on equality between values in the common columns. Moreover the common columns appear redundantly in the result. Equijoins are also called as simple joins or inner joins. The equijoin between the two relations doctor and nurse (The relations doctor and nurse are shown in Figs. 4.76 and 4.77, respectively) is shown in Fig. 4.79.

From Fig. 4.79, it is evident that the join condition is equality condition on the attribute department. We can also observe that the common columns appear redundantly in the result.

4.14 Set Operations

The UNION, INTERSECTION, and the MINUS (Difference) operations are considered as SET operations. Out of these three set operations, UNION, INTERSECTION operations are commutative, whereas MINUS (Difference) operation is not commutative. All the three operations are binary operations. The relations that we are going to consider for UNION, DIFFERENCE, and MINUS operations are IBM DESKTOP and DELL DESKTOP as shown in Figs. 4.80 and 4.81, respectively.

4.14.1 UNION Operation

If we have two relations R and S then the set UNION operation contains tuples that either occurs in R or S or both.

Case 1: UNION command.

The union of two relations IBM DESKTOP, DELL DESKTOP is given in Fig. 4.80. From Fig. 4.81, it is clear that the UNION command eliminates duplicate values.

Case 2: UNION ALL command.

The UNION command removes duplicate values. In order to get the duplicate values, we can use UNION ALL command. The use of UNION ALL command and the corresponding results are shown in Fig. 4.83.

By carefully looking into the Figs. 4.82 and 4.83, the number of tuples in the Fig. 4.82 is four; whereas the number of tuples in Fig. 4.83 is five. The difference in two results is due to the fact that UNION command rejects duplicate values, whereas UNION ALL command includes duplicate values.

4.14.2 INTERSECTION Operation

The intersection operation returns the tuples that are common to the two relations. The intersection of the two relations IBM DESKTOP and DELL DESKTOP is shown in Fig. 4.84.

4.14.3 MINUS Operation

If R and S are two union compatible relations then R–S returns the tuples that are present in R but not in S. S–R returns the tuples that are present in S but not in R. It is to be noted that MINUS operation is not commutative. That is R–S # S–R.

Case 1: IBM_DESKTOP–DELL_DESKTOP.

Let us first determine IBM_DESKTOP–DELL_DESKTOP. The SQL command and the corresponding output are shown in Fig. 4.85.

From Fig. 4.85, we can observe that the result contains the tuples that are present in IBM_DESKTOP and not in DELL_DESKTOP.

Case 2: DELL_DESKTOP–IBM_DESKTOP.

Let us try to compute DELL_DESKTOP–IBM_DESKTOP. The SQL command and the corresponding output are shown in Fig. 4.86. From Fig. 4.86, it is clear that the result contains tuple that are present in DELL_DESKTOP but not in IBM_DESKTOP.

Note From Figs. 4.85 and 4.86 it is clear that MINUS operation is not commutative.

4.15 View

View is a pseudotable or virtual table. View is called as “pseudotable” because view does not actually store data. View just displays the data. The data are derived from one or more base tables. View table can be used like any other table for querying. View can be considered as a window to the database. The view can also be considered as customized presentation of data from one or more tables. It is to be noted that all views are not updatable.

The Syntax of VIEW is given as

CREATE VIEW view name
AS SELECT attribute list
FROM table(s)
WHERE condition(s)

Case 1: VIEW from a Single Table.

Consider the base table RECORD which gives the record of the student such as his/her Roll Number, Age, GPA (Grade Point Average), and institution which refers to the institution where he/she has got the degree (Fig. 4.87).

The base table RECORD is shown below.

Now we want to create a view by name PLACED, which gives the list of students placed in a particular organization (say IBM). The attribute associated with the view PLACED are Name, Age, and Institution. The view PLACED is shown below.

From the table PLACED, it is obvious that only those students with GPA greater than nine are placed. The SQL command to create the view PLACED from the base table RECORD and the output are shown in Fig. 4.88. From Fig. 4.88, it is clear that the view PLACED has only three columns Name, Age, and Institution.

4.15.1 Nonupdatable View

Case 1: A view created using DISTINCT clause is usually nonupdatable.

Example

To prove that the view created using DISTINCT clause is nonupdatable, consider the base relation SAMPLE, which has two attributes Name and Age. Let us create a view UPSAMPLE from the base relation SAMPLE using DISTINCT clause. The base relation SAMPLE and the view UPSAMPLE is shown below:

The SQL command to create the view UPSAMPLE from the base relation SAMPLE using DISTINCT clause is shown in Fig. 4.89.

The created view UPSAMPLE is shown in Fig. 4.90. Now let us try to update the view UPSAMPLE, the SQL command to update the view and the corresponding output are shown in Fig. 4.91.

From Fig. 4.91, it is clear that the view defined by DISTINCT clause is nonupdatable.

Case 2: It is not possible to update the view if it contains group function or if it contains group by clause.

Example

In order to prove that the view is nonupdatable if it contains group function or group by clause, let us consider the base relation BOOKS. The attributes of the relation BOOKS are author, title, price. The content of the base relation BOOKS is shown in Fig. 4.92. Now let us define the view COUNTS, which gives the number of books written by the author. The SQL syntax to create the view is shown in Fig. 4.93. The contents of the view COUNTS are shown in Fig. 4.94.

Try1

First let us try to delete a row from the view COUNTS. The SQL command to delete a row from the view COUNTS and the corresponding output are shown in Fig. 4.95. From Fig. 4.95, it is clear that it is not possible to delete a row from the view if it is created using group function or group by clause.

Try2

Now let us try to update the view COUNTS by modifying the name malvino to malvinoleech. The SQL command to modify the name in the view COUNTS and the corresponding output are shown in Fig. 4.96.

From Fig. 4.96, it is clear it is not possible to update the view if it contains group function or group by clause.

4.15.2 Views from Multiple Tables

Views from multiple tables are termed as complex views, whereas views from single table are termed as simple views. View from multiple tables is illustrated as follows:

Example

Let us try to create view from two tables. Here one table is COURSE and the other table is STAFF. The attribute of the COURSE table are courseID, course name, LectID (which refers to Lecturer Identity number). The attributes of STAFF table are name, LectID, and position.

The view COURSE STAFF is created by selecting course name from course and Name from staff as shown in Fig. 4.97.

The SQL command to create the view COURSE STAFF from COURSE and STAFF is shown in Fig. 4.98. From Fig. 4.98 it is evident that the view COURSE STAFF is created from two tables COURSE and STAFF.

Let us try to see the contents of the view COURSE STAFF by using SELECT command as shown in Fig. 4.99.

Note The view COURSE STAFF is created from two tables, hence it can be considered as complex views. Complex views are in general not updatable. Let

us check whether the view COURSE STAFF is updatable or not by trying to insert tuples into the view COURSE STAFF as shown in Fig. 4.100.

From Fig. 4.100, it is clear that it is not possible to insert tuples into complex view (COURSE STAFF). Now let us try to update the view COURSE STAFF by modifying the name Rajan as Siva as shown in Fig. 4.101.

From Fig. 4.101, it is clear that the complex view (view created from more than one table) is usually nonupdatable.

4.15.3 View From View

It is possible to create view from another view. This is diagrammatically shown in Fig. 4.102. From Fig. 4.102, it is clear that the view2 is created from view1 and not from the base table. View1, View2 can be queried similar to the base table.

Example

Let us consider base table STAFF as shown in Fig. 4.103, the view ITSTAFF is created from the base table STAFF (Fig. 4.104). Then the view YOUNGITSTAFF is created from the view ITSTAFF (Fig. 4.105). The view ITSTAFF is shown in Fig. 4.106 and the view YOUNGITSTAFF is shown in Fig. 4.107.

Figure 4.104 shows the SQL command to create the view ITSTAFF from the base table STAFF. The view ITSTAFF contains only the details of the staff who belong to the IT department as shown in Fig. 4.104.

The contents of the view YOUNGITSTAFF is shown in Fig. 4.107. We can observe that the view YOUNGITSTAFF contains only the details of IT staff whose age is less than 30.

Doubt 1: Whether the view YOUNGSTAFF which is created from another view ITSTAFF can be queried like the base table?

Answer : Yes. The view YOUNGITSTAFF, which is created from another view ITSTAFF can be queried like the base table.

Example

Let us consider the query: What is the pay offered to the YOUNGITSTAFF Radha? The SQL command to answer the query is shown in Fig. 4.108.

From Fig. 4.108, it is clear that the view YOUNGITSTAFF which is created from another view ITSTAFF can be queried similar to the base table STAFF

Doubt 2: If it is possible to make any change in the view ITSTAFF which was created from the base table STAFF, will it reflect in the base table STAFF.

Answer : Yes, if it is possible to make any change in the view which was derived from the base table then the change will be reflected in the base table.

Example

Let us modify the view ITSTAFF by including one row. Before modification the contents of the base table STAFF is shown in Fig. 4.109.

From Fig. 4.109, we can observe that there are eight rows in the base table STAFF.

Now let us update the view ITSTAFF by including one row in the view ITSTAFF. The SQL command to insert the row in the view ITSTAFF is shown in Fig. 4.110.

Contents of the ITSTAFF after inserting a row are shown in Fig. 4.111. From Fig. 4.111, we can observe that the new row being included in the ITSTAFF view.

Now let us see the content of the base table STAFF to find whether the change made in the view ITSTAFF is reflected in the base table STAFF. The content of the base table STAFF is shown in Fig. 4.112.

Comparing Fig. 4.109 with Fig. 4.112 it is clear that one new row being included in the base table STAFF. This means that the change in the view will be reflected in the base table.

Doubt 3: If the view ITSTAFF is dropped, then is it possible to get the content of the view YOUNGITSTAFF which is derived from ITSTAFF?

Answer : For the view YOUNGITSTAFF, the contents are from another view ITSTAFF. Hence if ITSTAFF is dropped means it is not possible to get the contents of the view YOUNGITSTAFF.

Example

Let us drop the view ITSTAFF as shown in Fig. 4.113. Figure 4.114 ensures that the view ITSTAFF is successfully dropped.

Now let us try to see the content of the view YOUNGITSTAFF which is derived from the view ITSTAFF. The SQL command to retrieve the contents of the view YOUNGITSTAFF is shown in Fig. 4.115.

From Fig. 4.115, it is clear that once the view ITSTAFF is dropped then

it is not possible to retrieve the contents of the view YOUNGITSTAFF which

is derived from the view ITSTAFF.

4.15.4 VIEW with CHECK Constraint

It is possible to create view with CHECK constraint. If we create a view with CHECK constraint, then it is not possible to update the view if the CHECK constraint is violated.

Example of View with CHECK Constraint

Let us consider the base relation CITIZEN which has the attributes name, age, and address. Now let us create the view VOTERS from the base relation

CITIZEN. We know that, the citizen of India becomes eligible voter if he/she attains the age of 18. The base relation CITIZEN is shown in Fig. 4.116. The view VOTER from base relation CITIZEN is shown in Fig. 4.117.

Case 1: Low let us try to insert value into the view voter who is eligible to vote, that is the age of the voter is greater than 18. The SQL command and the corresponding output are shown in Fig. 4.118. From Fig. 4.118, it is clear that the value is successfully inserted into the view VOTER.

Case 2: Let us try to insert a row into the view VOTER by violating the check constraint (age of the voter is less than 18). The SQL command and the corresponding output are shown in Fig. 4.119.

4.15.5 Views with Read-only Option

A view can be created with read only option. Such views cannot be modified using INSERT, DELETE, and UPDATE commands.

Example

Consider the base table STAFF as shown in Fig. 4.120. Let us create the view electronicsstaff from the base table staff with readonly option as shown in Fig. 4.121.

From Fig. 4.121 it is clear that the view electronicsstaff is created with read only option. Now we have to check whether the view electronicsstaff is updatable, that is whether is it possible to INSERT, DELETE and UPDATE values in the view electronicsstaff. The content of the view electronicsstaff is shown in Fig. 4.122.

Case 1: INSERTING Values into the Read-Only View.

Let us try to insert values into the view “electronicsstaff.” The SQL command and the corresponding output are shown in Fig. 4.123.

From Fig. 4.123, it is clear that it is not possible to insert values into a read-only view.

Case 2: Deleting value from a read-only view.

Let us try to delete a value (in our case deleting the record of the electronicsstaff “usha”) from the read-only view “electronicsstaff,” the SQL command and the corresponding output are shown in Fig. 4.124.

From Fig. 4.124, it is evident that it is not possible to delete value from the read-only view.

Case 3: Updating the record of read-only view.

Let us try to update the record of the read-only view “electronicsstaff” by modifying the age of “usha” to 30. The SQL command to modify the age of the staff “usha” and the corresponding output are shown in Fig. 4.125.

From Fig. 4.125, it is clear that it is not possible to update the view since it is read-only.

4.15.6 Materialized Views

A materialized view is a physical copy of the base table with the results moved to another schema object. Materialized views are also called snapshots, because they are a kind of photograph of the base table.

Advantage of VIEW

The main advantages of view are improved security, less complexity, better convenience, and customization.

Improved security. We can restrict the user to access on the data that are appropriate for the user. Hence views provide improved security.
Less complexity. A view can simplify queries, by getting data from several tables into a single table thus transforming multitable queries into a single table queries.
Convenience. A database may contain much information. All the information will not be useful to the users. The users are provided with only the part of the database that is relevant to them rather than the entire database; hence views provide great convenience to the users.
Customization. Views provide a method to customize the appearance of the database so that the users need not see full complexity of database. View creates the illusion of a simpler database customized to the needs of a particular category of users.

Drawback of VIEW

If the base table is modified by adding one or more columns then the columns added will not be available in the view unless it is recreated.
When a view is created from the base table, it is to be noted that all the views are not updatable. Views created from multiple tables are in general not updatable when there is a group function, a GROUP BY clause, or restriction operators.

4.16 Subquery

Subquery is query within a query. A SELECT statement can be nested inside another query to form a subquery. The query which contains the subquery is called outer query.

Scalar subquery

A scalar subquery returns single row, single column result.

Example of Scalar Subquery

Scalar subquery returns single row single column result. To understand scalar subquery, consider two relations STUDENT and COURSE. The attributes of the STUDENT relation are SID, SNAME, AGE, and GPA. The attributes of COURSE relation are CID (Course ID), CNAME (Course ID), SID (Student ID), and INSTRUCTOR (Name of the Instructor). The two relations are shown below.

Query 1: Find the name of the student who has opted for the course RDBMS?

Solution. From the STUDENT and COURSE table, it is clear that only one student has opted for RDBMS (just for example). We can get the name of the student using scalar subquery. The SQL command and the corresponding output are shown in Fig. 4.126.

Query 2: Find the Names of the Student who have Opted for DSP Course

Solution. From the STUDENT and COURSE table, we can observe that more than one student has opted for DSP course. Here we cannot use scalar subquery because scalar subquery gives single row and single column result. But our result has more than one row. First let us try to get by scalar subquery. The SQL command and the corresponding output are shown in Fig. 4.127.

From Fig. 4.127, it is clear that scalar subquery cannot be used to retrieve multiple rows or multiple column result.

The solution to get the name of the student who has opted for DSP course is to use IN operator. The IN operator is true if value exists in the result of subquery. The SQL command using IN operator and the corresponding output are shown in Fig. 4.128.

4.16.1 Correlated Subquery

In the case of correlated subquery, the processing of subquery requires data from the outer query.

EXISTS Operator in Correlated Subquery

The EXISTS operator is used in correlated subquery to find whether a value retrieved by the outer query exists in the results set of the values retrieved by the inner query or subquery.

Example of EXISTS Command

Let us consider two tables ORDER1 and PRODUCT. The attributes (columns) of the table ORDER1 are orderID, quantity, productID. The attributes of the table PRODUCT are productID, productname, and price. The contents of the two table ORDER1 and PRODUCT are shown in Figs. 4.129 and 4.130.

The orderID which gives the order for the car “Maruti Esteem” can be found using the SQL command EXISTS. The SQL command and the corresponding output are shown in Fig. 4.131.

From Fig. 4.131, we can observe that the data for the inner query require the data from the outer query.

Example of NOT EXISTS Operator

In order to understand NOT EXISTS clause, let us consider two relations EMPLOYEE and DEPENDENT. Here DEPENDENT refers to those who are dependent on EMPLOYEE. The attributes of EMPLOYEE relation are eid (employee ID), ename (employee name). The attributes of the DEPENDENT relation are name (which refers to dependent name) and eid (employee ID).

The contents of the table EMPLOYEE and DEPENDENT are shown in Figs. 4.132 an d 4.133.

Query: Find the name of the employee who is not having any dependent?

Solution. The SQL command to get the name of the employee who is not having any dependent and the corresponding output are shown in Fig. 4.134.

The NOT EXISTS clause is used to retrieve the name of the employee who is not having dependent.

Comparison Operator ALL

The comparison operators that are used in multiple row subqueries are IN, ANY, ALL. In this section let us discuss the use of ALL comparison operator. The ALL comparison operator compare value to every value returned by the subquery.

Example

In order to understand the ALL comparison operator, let us consider the relation STAFF. The attributes of the staff relation are shown in table STAFF.

Query: Find the name of the employee in Electronics Department who is getting the maximum salary?

Solution: The SQL command ALL can be used to find the name of the STAFF in the Electronics who is getting maximum salary. The SQL command and the corresponding output are shown in Fig. 4.135.

Here the ALL comparison operator is used to retrieve the name of the staff from a particular department who is getting maximum salary.

Comparison Operator ANY

The ANY operator compares a value to each value returned by a subquery.

Here <ANY means less than maximum
>ANY means more than the minimum

Case 1: <ANY. Let us use the operator <ANY to retrieve the names of the staff who are getting salary less than the staff who is getting the maximum salary (in our case it is “ramakrishnan”).

The SQL command to retrieve the names of the staff who are getting salary less than the staff who is getting the maximum salary is shown in Fig. 4.136.

The SQL command <ANY is used to retrieve the name of the staff who are getting the salary less than the staff who is getting the maximum salary. In our case staff “ramakrishnan” of electronics is getting the maximum salary (refer STAFF table). Our query should return the name of the staff who are getting salary less than the staff “ramakrishnan.” From Fig. 4.136, it is evident the name returned by the query are the staff who are getting salary less than the staff “ramakrishnan.”

Case 2: >ANY Clause. The operator >ANY returns values that are greater than the minimum value.

Example

Query: Retrieve the name of the staff who are getting salary greater than the staff who is getting the least salary?

Solution: The SQL operator >ANY can be used to get answer for the query mentioned above. The SQL command and the corresponding output are shown in Fig. 4.137.

From the table STAFF it is clear that the staff who is getting the least salary is “Bhaskar.” We have to get the names of the staff who are getting salary greater than “Bhaskar.”

From Fig. 4.137, it is clear that the operator >ANY has returned the names of the staff who are getting salary greater than “Bhaskar.”

Dual Table

The dual table contains one row and one column. The datatype associated with the dual table is varchar2(1). In order to know about dual table, we can issue DESC command as shown in Fig. 4.138.

From Fig. 4.138, it is clear that the name of the column is DUMMY. If we want to know how many rows that a DUAL table can return, we can issue SELECT command as shown in Fig. 4.139. From Fig. 4.139, it is clear that the dual table can return a row. Dual table can be used to compute a constant expression.

Determining System Date from Dual

It is possible to determine system date from the dual table. The SQL command and the corresponding output are shown in Fig. 4.140.

We have evaluated the system date from the dual table. It is also possible to evaluate constant expression using the dual table.

Evaluation of Constant Expression Using DUAL

It is possible to evaluate constant expressions using DUAL table. Some of the examples of evaluation of constant expressions are shown in Fig. 4.141.

From Fig. 4.141 it is clear that DUAL table can be used to evaluate constant expressions which will give single row output. For our example we have taken simple mathematical operations like addition, multiplication, division, and subtraction.

4.17 Embedded SQL

SQL can be used in conjunction with a general purpose programming language such as PASCAL, C, C++, etc. The programming language is called the host language. Embedded SQL statements are SQL statements written within application programming languages such as C and Java. The embedded SQL statement is distinguished from programming language statements by prefixing it with a special character or command so that a preprocessor can extract the SQL statements. These statements are preprocessed by an SQL precompiler before the application program is compiled. There are two types of embedded SQL, Static SQL, and Dynamic SQL. Embedded SQL provides the 3GL (Third Generation Language) with a way to manipulate a database. Embedded SQL supports highly customized applications. It also supports background applications running without user intervention.

SQL Precompiler

A precompiler is used to translate SQL statements embedded in a host language into DBMS library calls, which can be implemented in the host language. The function of the precompiler is shown below:

Sharing Variables

Variables to be shared between the embedded SQL code and the host language have to be specified in the program.

EXEC SQL begin declare section;
Varchar userid [10], password [10], cname [15];
Int cno;

EXEC SQL end declare section;
We also should declare a link to the DBMS so that database status
information can be accessed.

EXEC SQL include sqlca;
This allows access to a structure sqlca, of which the most common element sqlca.sqlcode has the value 0 (operation OK), >0 (no data found), and <0 (an error).

Connecting to the DBMS

Before operations can be performed on the database, a valid connection has to be established. A model is shown below:
EXEC SQL connect :userid identified by :password;

– In all SQL statements, variables with the “:” prefix refer to shared host variables, as opposed to database variables.
– This assumes that userid and password have been properly declared and initialized.

When the program is finished using the DBMS, it should disconnect using: EXEC SQL commit release;

Queries Producing a Single Row

A single piece of data (or row) can be queried from the database so that the result is accessible from the host program.

EXEC SQL SELECT custname
INTO :cname
FROM customers
WHERE cno = :cno;

Thus the custname with the unique identifier :cno is stored in :cname.

However, a selection query may generate many rows, and a way is needed for the host program to access results one row at a time.

SELECT with a Single Result

The syntax to select with a single result is shown below:

Static SQL

The source form of a static SQL statement is embedded within an application program written in a host language such as COBOL. The statement is prepared before the program is executed and the operational form of the statement persists beyond the execution of the program.

A source program containing static SQL statements must be processed by an SQL precompiler before it is compiled. The precompiler turns the SQL statements into host language comments, and generates host language statements to invoke the database manager. The syntax of the SQL statements is checked during the precompile process.

The preparation of an SQL application program includes precompilation, the binding of its static SQL statements to the target database, and compilation of the modified source program.

Dynamic SQL

Programs containing embedded dynamic SQL statements must be precompiled like those containing static SQL, but unlike static SQL, the dynamic SQL statements are constructed and prepared at run time. The SQL statement text is prepared and executed using either the PREPARE and EXECUTE statements, or the EXECUTE IMMEDIATE statement. The statement can also be executed with cursor operations if it is a SELECT statement.

Summary

This chapter has introduced the most popular relational database language SQL (Structured Query Language). SQL has become the de facto standard language for interacting with all major database programs. The three main divisions in SQL are DDL, DML, and DCL. The data definition language (DDL) commands of SQL are used to define a database which includes creation of tables, indexes, and views. The data manipulation commands (DML) are used to load, update, and query the database through the use of the SELECT command. Data control language (DCL) is used to establish user access to the database.

This chapter has focused on how to create the table, how to insert data into the table. Examples are shown to understand the table creation and manipulation process. The subset of SELECT command described in this chapter allows the reader to formulate problems involving the project, restrict, join, union, intersection, and difference operators of relational algebra.

Review Questions

4.1. Prove the statement “When the column of a view is directly derived from a column of a base table, that column inherits any constraints that apply to the column of the base table” by using suitable example.

To prove this statement, let us create a base table by name t1. The base table t1 has two columns name and age. Now a constraint is imposed on the age, that is age should be greater than 18. The syntax to create the base table t1 with the constraint on the age is shown below:

Step 1: Base table creation with the name t1 and constraint on age (Fig. 4.142).

SQL> create table t1
2 (name varchar(12),
3 age number(3),
4 check(age>18));
Table created.

Step 2: Create a view by name t2 from the base table t1. The SQL command to create the view t2 is shown in Fig. 4.143.

Step 3: Now try to insert values into view t2 by not violating the constraint and then by violating the constraint (Fig. 4.144). Then try to insert values into the view t2 by violating the check constraint.

Note: Since the age is greater than 18 the values are inserted into view t2. Now insert value into t2 by violating the constraint (by inserting the age less than or equal to 18).

If we are violating the constraints on the column of the base table we are getting an error message.

4.2. What is the difference between the two SQL commands DROP TABLE and TRUNCATE TABLE?

Drop table command deletes the definition as well as the contents of the table, whereas truncate table command deletes only the contents of the table but not the definition of the table.

Example

We have a table by name t1. The contents of the table are seen by issuing the select command as shown in Fig. 4.145.

Step 1: Now issue the truncate table command. The syntax is:
TRUNCATE TABLE table name; as shown in Fig. 4.146.

Step 2: After issuing the truncate table command try to see the contents of the table. You will get the message as no rows selected as shown in Fig. 4.147.

Step 3: Now we have the table t2. See the contents of the table by issuing select command as shown in Fig. 4.148.

Step 4: Now use the drop command, to drop the table t2 as shown in Fig. 4.149.

Step 5: Now see the effect of the drop command by using the select command as shown in Fig. 4.150.

Note: If we issue the drop command, the definition as well as the contents of the table is deleted and we get the error message as shown in Fig. 4.150.

4.3. Is it possible to create a table from another table. If so give an example

Yes, it is possible to create table from another table using SQL. Consider table t1 as shown in Fig. 4.151. We can create another table t2 from the table t1. The SQL command to create the table t2 from the table t1 is shown in the Fig. 4.152.

Now let us try to view the content of the table t2. The content of the table t2 is shown in Fig. 4.153.

From Fig. 4.153, it is clear that the contents of the table t2 matches with the table t1 (refer Fig. 4.151). Hence it is possible to create table from another table.

4.4. What is the difference between COUNT, COUNT DISTINCT, and COUNT (*) in SQL?

The command COUNT counts the number of rows in a table by ignoring all null values. The command COUNT (*) counts the number of rows in a

table by including the rows that contains null values. COUNT DISTINCT counts the number of rows in the table by ignoring duplicate values.

4.5. If we want to delete all the rows in the table, it can be done in two ways (1) Issue the command DELETE FROM table name (2) TRUNCATE TABLE table name. What is the difference between these two commands?

We have a table by name BOOKS. The content of the table BOOKS are shown in the Fig. 4.154.

Step 1: The contents of the table BOOKS are deleted by using DELETE command as shown in Fig. 4.155.

Step 2: The table BOOKS is again populated with the data and the command TRUNCATE is used to delete the contents of the table which is shown in Fig. 4.156.

The advantage offered by the TRUNCATE command is the speed. When Oracle executes this command, it does not evaluate the existing records within a table; it basically chops them off. In addition to speed, the TRUNCATE

command provides the added benefit of automatically freeing up the table space that the truncated records previously occupied.

When the table contents are deleted by using DELETE command, it forces Oracle to read every row before deleting it. This can be extremely time consuming.

4.6. What are subqueries? How will you classify them?

Subquery is query within a query. A SELECT statement can be nested inside another query to form a subquery. The query which contains the subquery is called outer query. It can be classified as (a) scalar subquery and (b) correlated subquery, and (c) uncorrelated subquery.

Chapter 5

PL/SQL

Learning Objectives. This chapter focuses on the shortcomings of SQL and how it is overcome by PL/SQL. An introduction to PL/SQL is given in this chapter. After completing this chapter the reader should be familiar with the following concepts in PL/SQL.

– Structure of PL/SQL
– PL/SQL language elements
– Control structure in PL/SQL
– Steps to create PL/SQL program
– Concept of CURSOR
– Basic concepts related to Procedure, Functions
– Basic concept of Trigger

5.1 Introduction

PL/SQL stands for Procedural Language/Structured Query Language, which is provided by Oracle as a procedural extension to SQL. SQL is a declarative language. In SQL, the statements have no control to the program and can be executed in any order. PL/SQL, on the other hand, is a procedural language that makes up for all the missing elements in SQL. PL/SQL arose from the desire of programmers to have a language structure that was more familiar than SQL’s purely declarative nature.

5.2 Shortcomings in SQL

We know, SQL is a powerful tool for accessing the database but it suffers from some deficiencies as follows:

(a) SQL statements can be executed only one at a time. Every time to execute a SQL statement, a call is made to Oracle engine, thus it results in an increase in database overheads.
(b) While processing an SQL statement, if an error occurs, Oracle generates its own error message, which is sometimes difficult to understand. If a user wants to display some other meaningful error message, SQL does not have provision for that.
(c) SQL is not able to do the conditional query on RDBMS, this means one cannot use the conditions like if . . . then, in a SQL statement. Also looping facility (repeating a set of instructions) is not provided by SQL.

5.3 Structure of PL/SQL

PL/SQL is a 4GL (fourth generation) programming language. It offers all features of advanced programming language such as portability, security, data encapsulation, information hiding, etc. A PL/SQL program may consist of more than one SQL statements, while execution of a PL/SQL program makes only one call to Oracle engine, thus it helps in reducing the database overheads. With PL/SQL, one can use the SQL statements together with the control structures (like if . . . then) for data manipulation. Besides this, user can define his/her own error messages to display. Thus we can say that PL/SQL combines the data manipulation power of SQL with data processing power of procedural language.

PL/SQL is a block structured language. This means a PL/SQL program is made up of blocks, where block is a smallest piece of PL/SQL code having logically related statements and declarations. A block consists of three sections namely:

Declare, Begin, and Exception followed by an End statement. We will see the different sections of PL/SQL block.

Declare Section

Declare section declares the variables, constants, processes, functions, etc., to be used in the other parts of program. It is an optional section.

Begin Section

It is the executable section. It consists of a set of SQL and PL/SQL statements, which is executed when PL/SQL block runs. It is a compulsory section.

Exception Section

This section handles the errors, which occurs during execution of the PL/SQL block. This section allows the user to define his/her own error messages. This section executes only when an error occurs. It is an optional section.

End Section

This section indicates the end of PL/SQL block. Every PL/SQL program must consist of at least one block, which may consist of any number of nested sub-blocks. Figure 5.1 shows a typical PL/SQL block.

5.4 PL/SQL Language Elements

Let us start from the basic elements of PL/SQL language. Like other programming languages PL/SQL also have specific character sets, operators, indicators, punctuations, identifiers, comments, etc. In the following sections we will discuss about various language elements of PL/SQL.

Character Set

A PL/SQL program consists of text having specific set of characters. Character set may include the following characters:

– Alphabets, both in upper case [A–Z] and lower case [a–z]
– Numeric digits [0–9]
– Special characters ( ) + − * /< >= ! ∼ ˆ ; : . _ @ % , __ # $ & | { } ? [ ]
– Blank spaces, tabs, and carriage returns.

PL/SQL is not case sensitive, so lowercase letters are equivalent to corresponding uppercase letters except within string and character literals.

Lexical Units

A line of PL/SQL program contains groups of characters known as lexical units, which can be classified as follows:

– Delimiters
– Identifiers
– Literals
– Comments

Delimiters

A delimiter is a simple or compound symbol that has a special meaning to PL/SQL. Simple symbol consists of one character, while compound symbol consists of more than one character. For example, to perform the addition and exponentiation operation in PL/SQL, simple symbol delimiter + and compound symbol delimiter ** is used, respectively. PL/SQL supports following simple symbol delimiters:

+ − * / => <; % ‘ , ( ) @ : “
Compound symbol delimiters legal in PL/SQL are as follows:
<>! =∼= ˆ= <=>= := ** .. || << >>
In the following sections we will discuss about these delimiters.

Identifiers

Identifiers are used in the PL/SQL programs to name the PL/SQL program items as constants, variables, cursors, cursor variables, subprograms, etc.

Identifiers can consists of alphabets, numerals, dollar signs, underscores, and number signs only. Any other characters like hyphens, slashes, blank spaces, etc. are illegal. An identifier must begin with an alphabetic letter optionally followed by one or more characters (permissible in identifier). An identifier cannot contain more than 30 characters.

Example

Some of the valid identifiers are as follows:

A – Identifier may consist of a single character
A1 – identifier may consist of numerals after first character
Share$price – dollar sign is permitted
e_mail – under score is permitted
phone# – number sign is permitted

The following identifiers are illegal:

mine&yours – ampersand is illegal
debit-amount – hyphen is illegal
on/off – slash is illegal
user id – space is illegal

However, PL/SQL allows space, slash, hyphen, etc. except double quotes if the identifier is enclosed within double quotes. Thus, the following identifiers are valid:

“A&B”
“TATA INFOTECH”
“True/false”
“Student(s)”
“*** BEGIN ***”

However, the maximum length of a quoted identifier cannot exceed 30 characters, excluding double quotes.

An identifier can consists of lower, upper, or mixed case letters. PL/SQL is not case sensitive except within string and character literals. So, if the only difference between identifiers is the case of corresponding letters, PL/SQL considers the identifiers to be the same. Take for example, a character string “HUMAN” as an identifier; it will be equivalent to each of following identifiers:

Human
human
hUMAN
hUmAn.

An identifier cannot be a reserve word, i.e., the words that have special meaning for PL/SQL. For example, the word DECLARE, which is used for declaring the variables or constants; words BEGIN and END, which enclose the executable part of a block or subprogram are reserve words. An attempt to redefine a reserve word gives an error.

Literals

A literal is an explicitly defined character, string, numeric, or Boolean value, which is not represented by an identifier. In the following sections we will discuss about each of these literals in detail:

Numeric Literals

A numeric literal is an integer or a real value. An integer literal may be a positive, negative, or unsigned whole number without a decimal point. Some examples of integer numeric literals are as follows:

A real literal is a positive, negative, or unsigned whole or fractional number with a decimal point. Some examples of real integer literals are as follows:

PL/SQL treats a number with decimal point as a real numeric literal, even if the number does not have any numeral after decimal point. Besides integer and real literals, numeric literals can also contain exponential numbers (an optionally signed number suffix with an E (or e) followed by an optionally signed integer). Some examples of exponential numeric literals are as follows:

where, E stands for “times ten to the power of.” For example the exponential literal 7E3 is equivalent to following numeric literal:

Another exponential literal −8.3e−2 would be equivalent to following numeric literal:

An exponential numeric literal cannot be smaller than 1E−130 and cannot be greater than 10E125. Note that numeric literals cannot contain dollar signs or commas.

Character Literals

A character literal is an individual character enclosed by single quotes (apostrophes). Character literals include all the printable characters in the PL/SQL character set: letters, numerals, spaces, and special symbols. Some examples of character literals are as follows:

PL/SQL is case sensitive within character literals. For example, PL/SQL considers the literals “A” and “a” to be different. Also, the character literals “0”. . .“9” are not equivalent to integer literals but can be used in arithmetic expressions because PL/SQL implicitly converts them to integers.

String Literals

A character string can be represented by an identifier or explicitly written as a string literal. A string literal is enclosed within single quotes and may consist of one or more characters. Some examples of string literals are as follows:

“Good Morning!”
“TATA INFOTECH LTD”
“04-MAY-00”
“$15,000,000”

All string literals are of character data type.
PL/SQL is case sensitive within string literals. For example, PL/SQL considers the following literals to be different:

“HUMAN”
“Human”

Boolean Literals

Boolean literals are the predefined values TRUE, FALSE, and NULL. Keep in mind Boolean literals are values, not strings. For example a condition: if (x = 10) is TRUE only for the value of x equal to 10, for any other value of x it is FALSE and for no value of x it is NULL.

Comments

Comments are used in the PL/SQL program to improve the readability and understandability of a program. A comment can appear anywhere in the program code. The compiler ignores comments. Generally, comments are used to describe the purpose and use of each code segment. A PL/SQL comment may be a single-line or multiline.

Single-Line Comments

Single-line comments begin with a double hyphen (–) anywhere on a line and extend to the end of the line.

Example

– start calculations

Multiline Comments

Multiline comments begin with a slash-asterisk (/*) and end with an asterisk-slash (*/), and can span multiple lines.

Example

/* Hello World! This is an example of multiline comments in PL/SQL */

Variables and Constants

Variables and constants can be used within PL/SQL block, in procedural statements and in SQL statements. These are used to store the values. As the program executes, the values of variables can change, but the values of constants cannot. However, it is must to declare the variables and constants, before using these in executable portion of PL/SQL. Let us see how to declare variables and constants in PL/SQL.

Declaration

1Variables and constants are declared in the Declaration section of PL/SQL block. These can be any of the SQL data type like CHAR, NUMBER, DATE, etc.

I. Variables Declaration

The syntax for declaring a variable is as follows:
identifier datatype;

Example

To declare the variable name, age, and joining date as datatype VARCHAR2(10), NUMBER(2), DATE, respectively; declaration statement is as follows:

DECLARE
Name VARCHAR2(10);
Age NUMBER(2);
Joining date DATE;

Initializing the Variable

By default variables are initialized to NULL at the time of declaration. If we want to initialize the variable by some other value, syntax would be as follows:

Example

If a number of employees have same joining date, say 01-JULY-99. It is better to initialize the joining date rather than entering the same value individually, any of the following declaration can be used:

Joining date DATE := 01-JULY-99; (or)
Joining date DATE DEFAULT 01-JULY-99;

Constraining a Variable

Variables can be NOT NULL constrained at the time of declaring these, for example to constrain the joining date NOT NULL, the declaration statement would be as follows:

Joining date DATE NOT NULL: = 01-JULY-99;

(NOT NULL constraint must be followed by an initialization clause)
thus following declaration will give an error:

Joining date DATE NOT NULL; – illegal

Declaring Constants

Declaration of constant is similar to declaration of variable, except the keyword CONSTANT precedes the datatype and it must be initialized by some value. The syntax for declaring a constant is as follows:

identifier CONSTANT datatype := value;

Example

To define the age limit as a constant, having value 30; the declaration statement would be as follows: Age limit CONSTANT NUMBER := 30;

Restrictions

PL/SQL imposes some restrictions on declaration as follows:

(a) A list of variables that have the same datatype cannot be declared in the same row
Example
A, B, C NUMBER (4,2); – illegal
It should be declared in separate lines as follows:

A NUMBER (4,2);
B NUMBER (4,2);
C NUMBER (4,2);
(b) A variable can reference to other variable if and only if that variable is declared before that variable. The following declaration is illegal:

A NUMBER(2) := B;
B NUMBER(2) := 4;

Correct declaration would be as follows:

B NUMBER(2) := 4;
A NUMBER(2) := B;
(c) In a block same identifier cannot be declared by different datatype. The following declaration is illegal:

DECLARE
X NUMBER(4,2);
X CHAR(4); – illegal

5.5 Data Types

Every constant and variable has a datatype. A datatype specifies the space to be reserved in the memory, type of operations that can be performed, and valid range of values. PL/SQL supports all the built-in SQL datatypes. Apart from those datatypes, PL/SQL provides some other datatypes. Some commonly used PL/SQL datatypes are as follows:

BOOLEAN

One of the mostly used datatype is BOOLEAN. A BOOLEAN datatype is assigned to those variables, which are required for logical operations. A BOOLEAN datatype variable can store only logical values, i.e., TRUE, FALSE, or NULL. A BOOLEAN variable value cannot be inserted in a table; also, a table data cannot be selected or fetched into a BOOLEAN variable.

%Type

The %TYPE attribute provides the datatype of a variable or database column. In the following example, %TYPE provides the datatype of a variable:

balance NUMBER(8,2);
minimum balance balance%TYPE;

In the above example PL/SQL will treat the minimum balance of the same datatype as that of balance, i.e., NUMBER(8,2). The next example shows that a %TYPE declaration can include an initialization clause:

balance NUMBER(7,2);
minimum balance balance%TYPE := 500.00;

The %TYPE attribute is particularly useful when declaring variables that refer to database columns. Column in a table can be referenced by %TYPE attribute.

Example

To declare a column my_empno of the same datatype as that of empno column of emp table in scott/tiger user, the declaration statement would be as follows:

my_empno scott.emp.empno%TYPE;

Using %TYPE to declare my_empno has two advantages. First, the knowledge of exact datatype of empno is not required. Second, if the database definition of empno changes, the datatype of my_empno changes accordingly at run time. But %TYPE variables do not inherit the NOT NULL column constraint, even though the database column empno is defined as NOT NULL, one can assign a null to the variable my_empno.

%Rowtype

The %ROWTYPE attribute provides a record type that represents a row in a table (or view). The record can store an entire row of data selected from the table.

Example

emp_rec is declared as a record datatype of emp table. emp_rec can store a row selected from the emp table.

emp_rec emp%ROWTYPE;

Expressions

Expressions are constructed using operands and operators. PL/SQL supports all the SQL operators; in addition to those operators it has one more operator, named exponentiation (symbol is **). An operand is a variable, constant, literal, or function call that contributes a value to an expression. An example of simple expression follows:

where A, B, and 3 are operand; = and ** are operators. B**3 is equivalent to value of thrice multiplying the B, i.e., B*B*B.

Operators may be unary or binary. Unary operators such as the negation operator (−) operate on one operand; binary operators such as the division operator (/) operate on two operands. PL/SQL evaluates (finds the current value of) an expression by combining the values of operands in ways specified by the operators. This always yields a single value and datatype. PL/SQL determines the datatype by examining the expression and the context in which it appears.

5.6 Operators Precedence

The operations within an expression are done in a particular order depending on their precedence (priority). Table 5.1 lists the operator’s level of precedence from top to bottom. Operators listed in the same row have equal precedence.

Operators with higher precedence are applied first, but if parentheses are used, expression within innermost parenthesis is evaluated first. For example the expression 8 + 4/2 ∗ ∗2 results in a value 9, because exponentiation has the highest priority followed by division and addition. Now in the same expression if we put parentheses, the expression 8+((4/2)∗∗2) results in a value 12 not 9, because now first it will solve the expression within innermost parentheses.

5.7 Control Structure

Control structure is an essential part of any programming language. It controls the flow of process. Control structure is broadly divided into three categories:

– Conditional control,
– Iterative control, and
– Sequential control

In the following sections we will discuss about each of these control structures in detail.

Conditional Control

A conditional control structure tests a condition to find out whether it is true or false and accordingly executes the different blocks of SQL statements. Conditional control is generally performed by IF statement. There are three forms of IF statement. IF-THEN, IF-THEN-ELSE, IF-THEN-ELSEIF.

IF-THEN

It is the simplest form of IF condition. The syntax for this statement is as follows:

Example

To compare the values of two variables A and B and to assign the value of A to HIGH if A is greater than B. The IF construct for this is as follows:

IF A > B THEN
HIGH := A;
ENDIF;

The sequence of statements is executed only if the condition is true. If the condition is FALSE or NULL, the sequence of statements is skipped and processing continues from statements following END IF statements.

IF-THEN-ELSE

As it is clear with the IF-THEN construct, if condition is FALSE the control exits to next statement out of IF-THEN clause. To execute some other set of statements in case condition evaluates to FALSE, the second form of IF statement is used, it adds the keyword ELSE followed by an alternative sequence of statements, as follows:

Example

To become clear about it, take the previous example, to compare the value of A and B and assign the value of greater number to HIGH. The IF construct for this is as follows:

IF A > B THEN
HIGH := A;
ELSE
HIGH := B;
ENDIF;

The sequence of statements in the ELSE clause is executed only if the condition is FALSE or NULL.

IF-THEN-ELSIF

In the previous constructs of IF, we can check only one condition, whether it is true or false. There is no provision if we want to check some other conditions if first condition evaluates to FALSE; for this purpose third form of IF statement is used. It selects an action from several mutually exclusive alternatives. The third form of IF statement uses the keyword ELSIF (not ELSEIF) to introduce additional conditions, as follows:

5.8 Steps to Create a PL/SQL Program

First a notepad file can be created as typing in the Oracle SQL editor. Figure 5.2 shows the command to create a file,
Then a Notepad file will appear and at the same time background Oracle will be disabled. It is shown in Fig. 5.3
We can write our PL/SQL program in that file, save that file, and we can execute that program in the Oracle editor as in Fig. 5.4. In this program Cursor (Current Set of Records) concept is used which we will see in the following pages. Here content of EMP table is opened by the cursor and they are displayed by the DBMS OUTPUT package. Command IF is used to check whether the cursor has been opened successfully by using %Found attribute.
Then we can execute that file as follows in Fig. 5.5

5.9 Iterative Control

In iterative control a group of statements are executed repeatedly till certain condition is true, and control exits from loop to next statement when the condition becomes false. There are mainly three types of loop statements:

LOOP, WHILE-LOOP, FOR-LOOP.

LOOP

LOOP is the simplest form of iterative control. It encloses a sequence of statements between the keywords LOOP and END LOOP. The general syntax for LOOP control is as follows:

With each iteration of the loop, the sequence of statements gets executed, then control reaches at the top of the loop. But a control structure like this gets entrapped into infinite loop. To avoid this it is must to use the key word EXIT and EXIT-WHEN.

LOOP – EXIT
An EXIT statement within LOOP forces the loop to terminate unconditionally and passes the control to next statements. The general syntax for this is as follows:

LOOP – EXIT WHEN
The EXIT-WHEN statement terminates a loop conditionally. When the EXIT statement is encountered, the condition in the WHEN clause is evaluated. If the condition is true, the loop terminates and control passes to the next statement after the loop. The syntax for this is as follows:

Example
Figures 5.4 and 5.5 are also the example of LOOP – EXIT WHEN. Condition used here is that the cursor does not return anything by using %NOTFOUND attribute.

WHILE-LOOP

The WHILE statement with LOOP checks the condition. If it is true then only the sequence of statements enclosed within the loop gets executed. Then control resumes at the top of the loop and checks the condition again; if it is true the sequence of statements enclosed within the loop gets executed. The process is repeated till the condition is true. The control passes to the next
statement outside the loop for FALSE or NULL condition.

FOR-LOOP

FOR loops iterate over a specified range of integers. The range is part of iteration scheme, which is enclosed by the keywords FOR and LOOP. A double dot (..) serves as the range operator. The syntax is as follows:

The range is evaluated when the FOR loop is first entered and is never re-evaluated. The sequence of statements is executed once for each integer in the range. After every iteration, the loop counter is incremented.

Example

To find the sum of natural numbers up to 10, the following program can be used as in Fig. 5.6.

Sequential Control

The sequential control unconditionally passes the control to specified unique label; it can be in the forward direction or in the backward direction. For sequential control GOTO statement is used. Overuse of GOTO statement may increase the complexity, thus as far as possible avoid the use of GOTO statement.

The syntax is as follows:

5.10 Cursors

Number of rows returned by a query can be zero, one, or many, depending on the query search conditions. In PL/SQL, it is not possible for an SQL statement to return more than one row. In such cases we can use cursors. A cursor is a mechanism that can be used to process the multiple row result sets one row at a time.

In other words, cursors are constructs that enable the user to name a private memory area to hold a specific statement for access at a later time. Cursors are an inherent structure in PL/SQL. Cursors allow users to easily store and process sets of information in PL/SQL program.

Figure 5.7 shows the simple example for the cursor where two rows are selected from the query and they are pointed by the cursor namely All Lifetime.

There are two types of cursors in Oracle

Implicit cursors
Explicit cursors

5.10.1 Implicit Cursors

PL/SQL implicitly declares a cursor for every SQL DML statement, such as INSERT, DELETE, UPDATE, and SELECT statement that is not a part of an explicitly declared cursor, even if the statement processes a single row. PL/SQL allows referencing the most recent cursor or the cursor associated with the most recently executed SQL statement, as the “SQL” cursor. Cursor
attributes are used to access information about the most recently executed SQL statement, using SQL cursor.

Implicit Cursor Attributes

In PL/SQL every cursor, implicit or explicit, has four attributes: %NOTFOUND, %FOUND, %ROWCOUNT, and %ISOPEN. These cursor attributes can be used in procedural statements (PL/SQL), but not in SQL statements. These attributes let user access information about the most recent execution of INSERT, UPDATE, SELECT INTO, and DELETE commands. These attributes are associated with the implicit “SQL” cursor and can be accessed by appending the attribute name to the implicit cursor name (SQL). Syntax to use cursor attribute is as follows:

SQL %<attribute name>

%Notfound

This attribute is used to determine if any rows were processed by a SQL DML statement. This attribute evaluates to TRUE if an INSERT, UPDATE, or DELETE affected no rows or a SELECT INTO returned no rows. Otherwise, it returns FALSE. %NOTFOUND attribute can be useful in reporting
or processing when no data is affected. If a SELECT statement does not return any data, the predefined exception NO DATA FOUND is automatically raised, and program control is sent to an exception handler, if it is present in the program. If a check is made on %NOTFOUND attribute after a SELECT statement, it will be completely skipped when the SELECT statement returns no data.

Example

Figures 5.8 and 5.9 show the example of all the implicit cursor attributes. The program will return the status of each cursor attribute depending on the previously executed DML statement.

%Found

This attribute is used to determine if any rows were processed by a SQL DML statement. In fact %FOUND works just the opposite of %NOTFOUND attribute. Until a SQL DML statement is executed, this attribute evaluates to NULL. It equates to TRUE if an INSERT, UPDATE, or DELETE affects one or more rows or select returns one row. If a select statement returns more than one row, the predefined exception TOO MANY ROWS is automatically raised and %FOUND attribute is set to FALSE.

%Rowcount

This attribute is used to determine the number of rows that are processed by an SQL statement. It returns the number of rows affected by an INSERT, UPDATE, or DELETE statement or returned by a SELECT INTO statement. %ROWCOUNT returns zero if the SQL statement affects or returns no rows. If a SELECT statement returns more than one row, the predefined exception TOO MANY ROWS is raised automatically. In such a case %ROWCOUNT attribute is set to 1 and not the actual number of rows that satisfy the query.

Example

Figures 5.8 and 5.9 show this example.

%Isopen

%ISOPEN is used to determine if a cursor is already open. It always equates to FALSE in an implicit cursor. Oracle automatically closes implicit cursor after executing its associated SQL statements.

Example

Figures 5.8 and 5.9 show this example.

5.10.2 Explicit Cursor

Explicit cursors are declared by the user and are used to process query results that return multiple rows. Multiple rows returned from a query form a set called an active set. PL/SQL defines the size of the active set as the number of rows that have met search criteria. Inherent in every cursor is a pointer that keeps track of the multiple rows being accessed, enabling program to process the rows one at a time. An explicit cursor points to the current row in the
active set. This allows the program to process one row at a time.

Multirow query processing is somewhat like file processing. For example, a program opens a file to process records, and then closes the file. Likewise,

a PL/SQL program opens a cursor to process rows returned by a query, and then closes the cursor. Just as a file pointer marks the current position in an open file, a cursor marks the current position in an active set.

After a cursor is declared and opened, the user can FETCH, UPDATE, or DELETE the current row in the active set. The cursor can be CLOSED to disable it and free up any allocated system resources. Three commands are used to control the cursor – OPEN, FETCH, and CLOSE. First the cursor is initialized with an OPEN statement, which identifies the active set. Then, the FETCH statement is used to retrieve the first row. FETCH statement can be executed repeatedly until all rows have been retrieved. When the last row has been processed, the cursor can be released with the CLOSE statement. Figure 5.10 shows the memory utilization by a cursor when each of these statements is given.

5.11 Steps to Create a Cursor

Following are the steps to create a cursor:

5.11.1 Declare the Cursor

In PL/SQL a cursor, like a variable, is declared in the DECLARE section of a PL/SQL block or subprogram. A cursor must be declared before it can be referenced in other statements. A cursor is defined in the declarative part by naming it and specifying a SELECT query to define the active set.

CURSOR <cursor_name> IS
SELECT. . .

The SELECT statement associated with a cursor declaration can reference previously declared variables.

Declaring Parameterized Cursors

PL/SQL allows declaration of cursors that can accept input parameters which can be used in the SELECT statement with WHERE clause to select specified rows. Syntax to declare a parameterized cursor:

CURSOR <cursor_name>> [(parameter. . . . . .)] IS
SELECT. . . . . .
WHERE <column_name> = parameter;

Parameter is an input parameter defined with the syntax:

<variable_name> [IN] <datatype> [{:= | DEFAULT} value]

The formal parameters of a cursor must be IN parameters. As in the example above, cursor parameters can be initialized to default values. That way, different numbers of actual parameters can be passed to a cursor, accepting or overriding the default values.

Moreover, new formal parameters can be added without having to change every reference to the cursor. The scope of a cursor parameter is local only to the cursor. A cursor parameter can be referenced only within the SELECT statement associated with the cursor declaration. The values passed to the cursor parameters are used by the SELECT statement when the cursor is opened.

5.11.2 Open the Cursor

After declaration, the cursor is opened with an OPEN statement for processing rows in the cursor. The SELECT statement associated with the cursor is executed when the cursor is opened, and the active set associated with the cursor is created.

The active set is defined when the cursor is declared, and is created when cursor is opened.

The active set consists of all rows that meet the SELECT statement criteria. Syntax of OPEN statement is as follows.

5.11.3 Passing Parameters to Cursor

Parameters to a parameterized cursor can be passed when the cursor is opened. For example, given the cursor declaration

CURSOR Mem_detail (MType VARCHAR2) IS SELECT. . .

Any of the following statements opens the cursor.

OPEN Mem_detail(‘L’);
OPEN Mem_detail(Mem); where Mem is another variable.

Unless default values are to be accepted, each formal parameter in the cursor declaration must have a corresponding actual parameter in the OPEN statement. Formal parameters declared with a default value need not have a corresponding actual parameter. They can simply assume their default values when the OPEN statement is executed. The formal parameters of a cursor must be IN parameters. Therefore, they cannot return values to actual parameters. Each actual parameter must belong to a datatype compatible with the datatype of its corresponding formal parameter.

5.11.4 Fetch Data from the Cursor

After a cursor has been opened, the SELECT statement associated with the cursor is executed and the active set is created. To retrieve the rows in the active set one row at a time, the rows must be fetched individually from the cursor. After each FETCH statement, the cursor advances to the next row in the active set and retrieves it. Syntax of FETCH is:

FETCH <cursor_name> INTO <variable_name>, <variable_name>, . . . .

where variable name is the name of a variable to which a column value is assigned. For each column value returned by the query associated with the cursor, there must be a corresponding variable in the INTO list. This variable datatype must be compatible with the corresponding database column.

5.11.5 Close the Cursor

After processing the rows in the cursor, it is released with the CLOSE statement. To change the active set in a cursor or the values of the variables referenced in the cursor SELECT statement, the cursor must be released with CLOSE statement. Once a cursor is CLOSEd, it can be reOPENed. The CLOSE statement disables the cursor, and the active set becomes undefined. For example, to CLOSE Mem_detail close statement will be:

Example

Figures 5.4 and 5.5 show the example of declaring, opening, and fetching the cursor called SALCUR.

Explicit Cursor Attributes

It is used to access useful information about the status of an explicit cursor. Explicit cursors have the same set of cursor attributes %NOTFOUND, %FOUND, %ROWCOUNT, and %ISOPEN. These attributes can be accessed in PL/SQL statements only, not in SQL statements. Syntax to access an explicit cursor attributes:

%Notfound

When a cursor is OPENed, the rows that satisfy the associated query are identified and form the active set. Before the first fetch, %NOTFOUND evaluates to NULL. Rows are FETCHed from the active set one at a time. If the last fetch returned a row, %NOTFOUND evaluates to FALSE. If the last fetch failed to return a row because the active set was empty, %NOTFOUND evaluates to TRUE. FETCH is expected to fail eventually, so when that happens, no exception is raised.

Example

Figures 5.4 and 5.5 show the example for this attribute. In this example, it is used for checking whether all the rows have been fetched or not.

%Found

%FOUND is the logical opposite of %NOTFOUND. After an explicit cursor is open but before the first fetch, %FOUND evaluates to NULL. Thereafter, it evaluates to TRUE if the last fetch returned a row or to FALSE if no row was returned. If a cursor is not open, referencing it with %FOUND raises INVALID_CURSOR exception.

Example

Figures 5.4 and 5.5 show the example for this attribute. In this example, it is used for checking whether the cursor has been opened successfully or not.

%Rowcount

When you open a cursor, %ROWCOUNT is initialized to zero. Before the first fetch, %ROWCOUNT returns a zero. Thereafter, it returns the number of rows fetched so far. The number is incremented if the latest fetch returned a row.

Example

Figures 5.8 and 5.9 show the example of this attribute where cursor updatcur is used.

%Isopen

%ISOPEN evaluates to TRUE if the cursor is open; otherwise, %ISOPEN evaluates to FALSE.

Example

Figures 5.11 and 5.12 show the example of this attribute where cursor updatcur is used.

Using FOR UPDATE and CURRENT

The FOR UPDATE clause is used to specify that the rows in the active set of a cursor are to be locked for modification. Locking allows the rows in the active set to be modified exclusively by your program. This protects simultaneous modifications until update by one transaction is complete.

CURSOR <cursor_name> IS SELECT <column_name> […..] FROM…..
FOR UPDATE [OF <column_name> . . . . . .];

FOR UPDATE specifies that the rows of the active set are to be exclusively locked when the cursor is opened and specifies the column names that can be updated. The FOR UPDATE clause must be used in the cursor declaration statement whenever UPDATE or DELETE are to be used after the rows are FETCHed from a cursor.

Syntax of CURRENT clause with UPDATE statement is:

UPDATE <table_name> SET <column_name> = expression […..]
WHERE CURRENT OF <cursor_name>;

Syntax of CURRENT OF Clause with DELETE Statement is:

DELETE table_name WHERE CURRENT OF cursor_name;

Example

Figures 5.11 and 5.12 show this example where a row of id E101 is locked for updation and its name of the Employee is changed to Karthikeyan.

Cursor FOR Loop

PL/SQL provides FOR loop to manage cursors effectively in situations where the rows in the active set of cursor are to be repeatedly processed in a looping manner. A cursor FOR loop simplifies all aspects of processing a cursor. Cursor FOR loop can be used instead of the OPEN, FETCH, and CLOSE statements.

A cursor FOR loop implicitly declares its loop index as a %ROWTYPE record, opens a cursor, repeatedly fetches rows of values from the active set into fields in the record, and closes the cursor when all rows have been processed. Syntax to declare and process a cursor in a cursor FOR loop is:

FOR <record_name> IN <cursor_name> LOOP
. . . . . . . . .
END LOOP;

where record name is the cursor FOR loop index implicitly declared as a record of type %ROWTYPE. Cursor is assumed to be declared in the DECLARE section. In the FOR loop declaration, the FOR loop index is uniquely named and implicitly declared as a record of type %ROWTYPE. This RECORD variable consists of columns referenced in the cursor SELECT statement.

In the FOR loop, the cursor is implicitly opened for processing. No explicit OPEN statement is required. Inside the FOR loop, the column values for each row in the active set can be referenced by the FOR loop index with dot notation in any PL/SQL or SQL statement. Before any iteration of the FOR loop, PL/SQL fetches into the implicitly declared record, which is equivalent to a record declared explicitly. At the end of the active set, the FOR loop implicitly closes the cursor and exits the FOR loop. No explicit CLOSE statement is required. A COMMIT statement is still required to complete the operation. We can pass parameters to a cursor used in a cursor FOR loop. The record is defined only inside the loop. We cannot refer to its fields outside the loop. The sequence of statements inside the loop is executed once for each row that satisfies the query associated with the cursor. On leaving the loop, the cursor is closed automatically. This is true even if an EXIT or GOTO statement is used to leave the loop prematurely or if an exception is raised inside the loop.

Example

Figures 5.13 and 5.14 show the example of cursor execution using FOR loop.

5.12 Procedure

A procedure is a subprogram that performs some specific task, and stored in the data dictionary. A procedure must have a name, so that it can be invoked or called by any PL/SQL program that appears within an application. Procedures can take parameters from the calling program and perform the specific task. Before the procedure or function is stored, the Oracle engine parses and compiles the procedure or function. When a procedure is created, the Oracle automatically performs the following steps:

Compiles the procedure
Stores the procedure in the data dictionary

If an error occurs during creation of procedure, Oracle displays a message that procedure is created with compilation errors, but it does not display the errors. To see the errors following statement is used:

SELECT * FROM user_errors;

When the function is invoked, the Oracle loads the compiled procedure in the memory area called system global area (SGA). Once loaded in the SGA other users can also access the same procedure provided they have granted permission for this.

Benefits of Procedures and Functions

Stored procedures and functions have many benefits in addition to modularizing application development.

It modifies one routine to affect multiple applications.
It modifies one routine to eliminate duplicate testing.
It ensures that related actions are performed together, or not at all, by doing the activity through a single path.
It avoids PL/SQL parsing at runtime by parsing at compile time.
It reduces the number of calls to the database and database network traffic by bundling the commands.

Defining and Creating Procedures

A procedure consists of two parts: specification and body. The specification starts with keyword PROCEDURE and ends with parameter list or procedure name. The procedures may accept parameters or may not. Procedures that do not accept parameters are written parentheses.

The procedure body starts with the keyword IS and ends with keyword END. The procedure body is further subdivided into three parts:

Declarative part which consists of local declarations placed between keywords IS and BEGIN.
Executable part, which consists of actual logic of the procedure, included between keywords BEGIN and EXCEPTION. At least one executable statement is a must in the executable portion of a procedure. Even a single NULL statement will do the job.
Error/Exception handling part, an optional part placed between EXCEPTION and END.

The syntax for creating a procedure is follows:

Create: Creates a new procedure, if a procedure of same name already exists, it gives an error.

Replace: Creates a procedure, if a procedure of same name already exists, it replace the older one by the new procedure definition.

Schema: If the schema is not specified then procedure is created in user’s current schema.

Figure 5.15 shows the procedure to raise the salary of the employee. The name of the procedure is raise_sal.

Argument: It is the name of the argument to the procedure.

IN: Specifies that a value for the argument must be specified when calling the procedure.

OUT: Specifies that the procedure pass a value for this argument back to its calling environment after execution.

IN OUT: Specifies that a value for the argument must be specified when calling the procedure and that the procedure passes a value for this argument back to its calling environment after execution. If no value is specified then it takes the default value IN.

Datatype: It is the unconstrained datatype of an argument. It supports any data type supported by PL/SQL. No constraints like size constraints or NOT NULL constraints can be imposed on the data type. However, you can put on the size constraint indirectly.

Example

To raise the salary of an employee, we can write a procedure as follows.

Declaring Subprograms

Subprograms can be declared inside any valid PL/SQL block. The only thing to be kept in mind is the declaration of programs must be the last part of declarative section of any PL/SQL block; all other declarations should precede the subprogram declarations.

Like any other programming language, PL/SQL also requires that any identifier that is used in PL/SQL program should be declared first before its use. To avoid problems arising due to such malpractices, forward declarations are used.

System and Object Privileges for Procedures

The creator of a procedure must have CREATE PROCEDURE system privilege in his own schema, if the procedure being created refers to his own schema. To create a procedure in other’s schema, the creator must have CREATE ANY PROCEDURE system privilege.

To create a procedure without errors (compiling it without errors), the creator of procedure must have required privileges to all the objects he refer to from his procedure. It must be noted that the owner will not get the required privileges through roles, he must be granted those privileges explicitly.

As soon as the privileges granted to the owner of procedure change, the procedure must be reauthenticated in order to bring into picture the new privileges of the owner. If a necessary privilege to an object referenced by a procedure is revoked/withdrawn from the owner of the procedure, the procedure cannot be run.

To EXECUTE any procedure a user must have EXECUTE ANY PROCEDURE privilege. With this privilege he can execute a procedure which belong to some other user.

Executing/Invoking a Procedure

The syntax used to execute a procedure depends on the environment from which the procedure is being called. From within SQLPLUS, a procedure can be executed by using the EXECUTE command, followed by the procedure name. Any arguments to be passed to the procedure must be enclosed in parentheses following the procedure name.

Example

Figure 5.16 shows the execution of procedure raise_sal.

Removing a Procedure

To remove a procedure completely from the database, following command is used:

To remove a procedure, one must own the procedure he is dropping or he must have DROP ANY PROCEDURE privilege.

Example

To drop a procedure raise sal. Figure 5.17 indicate the dropping of the procedure raise_sal.

5.13 Function

A Function is similar to procedure except that it must return one and only one value to the calling program. Besides this, a function can be used as part of SQL expression, whereas the procedure cannot.

Difference Between Function and Procedure

Before we look at functions in deep, let us first discuss the major differences between a function and a procedure.

A procedure never returns a value to the calling portion of code, whereas a function returns exactly one value to the calling program.
As functions are capable of returning a value, they can be used as elements of SQL expressions, whereas the procedures cannot. However, user-defined functions cannot be used in CHECK or DEFAULT constraints and cannot manipulate database values, to obey function purity rules.
It is mandatory for a function to have at least one RETURN statement, whereas for procedures there is no restriction. A procedure may have a RETURN statement or may not. In case of procedures with RETURN statement, simply the control of execution is transferred back to the portion of code that called the procedure.

The exact syntax for defining a function is given below:

where RETURN datatype is the datatype of the function’s return value. It can be any PL/SQL datatype.

Thus a function has two parts: function specification and function body. The function specification begins with keyword FUNCTION and ends with RETURN clause which indicates the datatype of the value returned by the function. Function body is enclosed between the keywords IS and END. Sometimes END is followed by function name, but this is optional. Like procedure, a function body also is composed of three parts: declarative part, executable part, and an optional error/exception handling part.

At least one return statement is a must in a function; otherwise PL/SQL raises PROGRAM_ERROR exception at the run time. A function can have multiple return statements, but can return only one value. In procedures, return statement cannot contain any expression, it simply returns control back to the calling code. However in functions, return statement must contain an expression, which is evaluated and sent to the calling code.

Example

To get a salary of an employee, Fig. 5.18 shows a function.
Figure 5.19 shows that how the calling of a function is different from procedure calling.

Purity of a Function

For a function to be eligible for being called in SQL statements, it must satisfy the following requirements, which are known as Purity Rules.

When called from a SELECT statement or a parallelized INSERT, UPDATE, or DELETE statement, the function cannot modify any database tables.
When called from an INSERT, UPDATE, or DELETE statement, the function cannot query or modify any database tables modified by that statement.

3. When called from a SELECT, INSERT, UPDATE, or DELETE statement, the function cannot execute SQL transaction control statements (such as COMMIT), session control statements (such as SET ROLE), or system control statements (such as ALTER SYSTEM). Also, it cannot execute DDL statements (such as CREATE) because they are followed by an automatic commit.

If any of the above rules is violated, the function is said to be not following the Purity Rules and the program using such functions receives run time error.

Removing a Function

To remove a function, use following command:

Example

Figure 5.20 illustrates the dropping of a function.

To remove a function, one must own the function to be dropped or he must have DROP ANY FUNCTION privilege.

Parameters

Parameters are the link between a subprogram code and the code calling the subprogram. Lot depends on how the parameters are passed to a subprogram. Hence it is absolutely necessary to know more about parameters, their modes, their default values, and how subprograms can be called without passing all the parameters.

Parameter Modes

Parameter modes define the behavior of formal parameters of subprograms. There are three types of parameter modes: IN, OUT, IN/OUT.

IN Mode

IN mode is used to pass values to the called subprogram. In short this is an input to the called subprogram. Inside the called subprogram, an IN parameter acts like a constant and hence it cannot be assigned a new value.

The IN parameter in actual parameter list can be a constant, literal, initialized variable, or an expression. IN parameters can be initialized to default values, which is not the case with IN/OUT or OUT parameters.

It is important to note that IN mode is the default mode of the formal parameters. If we do not specify the mode of a formal parameter it will be treated as an IN mode parameter.

OUT Mode

An OUT parameter returns a value back to the caller subprogram. Inside the subprogram, the parameter specified with OUT mode acts just like any locally declared variable. Its value can be changed or referenced in expressions, just like any other local variables.

The points to be noted for an OUT parameter are:

1. The parameter (in actual argument list) corresponding to OUT parameter must be a variable; it cannot be a constant or literal.
Formal OUT parameters are by default initialized to NULL, so we cannot constraint the formal OUT parameters by NOT NULL constraint.
The parameter (in actual argument list) corresponding to OUT parameter can have a value before a call to subprogram, but the value is lost as soon as a call is made to the subprogram.

IN/OUT

An IN/OUT parameter performs the duty of both IN parameter as well as OUT parameter. It first passes input value (through actual argument) to the called subprogram and then inside subprogram it receives a new value which will be assigned finally to the actual parameter. In short, inside the called subprogram, the IN/OUT parameter behaves just like an initialized local variable.

Like OUT parameter, the parameter in the actual argument list that corresponds to IN/OUT parameter, must be a variable, it cannot be a constant or an expression. If the subprogram exits successfully, PL/SQL assigns value to actual parameters, however, if the subprogram exits with unhandled exception, PL/SQL does not assign values to actual parameters.

5.14 Packages

A package can be defined as a collection of related program objects such as procedures, functions, and associated cursors and variables together as a unit in the database. In simpler term, a package is a group of related procedures and functions stored together and sharing common variables, as well as local procedures and functions. A package contains two separate parts: the package specification and the package body. The package specification and package body are compiled separately and stored in the data dictionary as two separate objects. The package body is optional and need not to be created if the package specification does not contain any procedures or functions. Applications or users can call packaged procedures and functions explicitly similar to standalone procedures and functions.

Advantages of Packages

Packages offer a lot of advantages. They are as follows.

Stored packages allow us to sum up (group logically) related stored procedures, variables, and data types, and so forth in a single-named, stored unit in the database. This provides for better orderliness during the development process. In other words packages and its modules are easily understood because of their logical grouping.
Grouping of related procedures, functions, etc. in a package also make privilege management easier. Granting the privilege to use a package makes all components of the package accessible to the grantee.
Package helps in achieving data abstraction. Package body hides the details of the package contents and the definition of private program objects so that only the package contents are affected if the package body changes.
An entire package is loaded into memory when a procedure within the package is called for the first time. This load is completed in one operation, as opposed to the separate loads required for standalone procedures. Therefore, when calls to related packaged procedures occur, no disk I/O is necessary to execute the compiled code already in memory. This results in faster and efficient operation of programs.
Packages provide better performance than stored procedures and functions because public package variables persist in memory for the duration of a session. So that they can be accessed by all procedures and functions that try to access them.
Packages allow overloading of its member modules. More than one function in a package can be of same name. The functions are differentiated, depending upon the type and number of parameters it takes.

Units of Packages

As described earlier, a package is used to store together, the logically related PL/SQL units. In general, following units constitute a package.

– Procedures
– Functions
– Triggers
– Cursors
– Variables

Parts of Package

A Package has two parts. They are:

– Package specification
– Package body

Package Specification

The specification declares the types, variables, constants, exceptions, cursors, and subprograms that are public and thus available for use outside the package. In case in the package specification declaration there is only types, constants, exception, or variables, then there is no need for the package body because package specification are sufficient for them. Package body is required when there is subprograms like cursors, functions, etc.

Package Body

The package body fully defines subprograms such as cursors, functions, and procedures. All the private declarations of the package are included in the package body. It implements the package specification. A package specification and the package body are stored separately in the database. This allows calling objects to depend on the specification only, not on both. This separation enables to change the definition of program object in the package body without causing Oracle to interfere with other objects that call or reference the program object. Oracle invalidates the calling object if the package specification is changed.

Creating a Package

A package consists of package specification and package body. Hence creation of a package involves creation of the package specification and then creation of the package body.

The package specification is declared using the CREATE PACKAGE command.

The syntax for package specification declaration is as follows.

All the procedures, sub programs, cursors declared in the CREATE PACKAGE command are described and implemented fully in the package body along with private members. The syntax for declaring a package body is as follows:

Member functions and procedures can be declared in a package and can be made public or private member using the keywords public and private. Use of all the private members of the package is restricted within the package while the public members of the package can be accessed and used outside the package.

Referencing Package Subprograms

Once the package body is created with all members as public, we can access them from outside the program. To access these members outside the packages we have to use the dot operator, by prefixing the package object with the package name. The syntax for referencing any member object is as follows:

To reference procedures we have to use the syntax as follows:

But the package member can be referenced by only its name if we reference the member within the package. Moreover the EXECUTE command is not required if procedures are called within PL/SQL. Functions can be referenced similar to that of procedures from outside the package using the dot operator.

Public and Private Members of a Package

A package can consist of public as well as private members. Public members are those members which are accessible outside the package, whereas the private members are accessible only from within the package. Private members are just like local members whose are not visible outside the enclosing code block (in this case, a package).

The place where a package member is declared, also matters in deciding the visibility of that member. Those members whose declaration is found in the package specification are the public members. The package members that are not declared in the package specification but directly defined in the package body become the private members.

Viewing Existing Procedural Objects

The source code for the existing procedures, functions, and packages can be queried from the following data dictionary views.

Removing a Package

A package can be dropped from the database just like any other table or database object. The exact syntax of the command to be used for dropping a package is:

To drop a package a user either must own the package or he should have DROP ANY PACKAGE privilege.

5.15 Exceptions Handling

During execution of a PL/SQL block of code, Oracle executes every SQL sentence within the PL/SQL block. If an error occurs or an SQL sentence fails, Oracle considers this as an Exception. Oracle engine immediately tries to handle the exception and resolve it, by raising a built-in Exception handler.

Introduction to Exceptions

One can define an EXCEPTION as any error or warning condition that arises during runtime. The main intention of building EXCEPTION technique is to continue the processing of a program even when it encounters runtime error or warning and display suitable messages on console so that user can handle those conditions next time.

In absence of exceptions, unless the error checking is disabled, a program will exit abnormally whenever some runtime error occurs. But with exceptions, if at all some error situation occurs, the exceptional handler unit will flag an appropriate error/warning message and will continue the execution of program and finally come out of the program successfully.

An exception handler is a code block in memory that attempts to resolve the current exception condition. To handle very common and repetitive exception conditions Oracle has about 20 Named Exception Handlers. In addition to these for other exception conditions Oracle has about 20,000 Numbered Exception Handlers, which are identified by four integers preceded by hyphen. Each exception handler, irrespective of how it is defined, (i.e., by Name or Number) has code attached to it that attempts to resolve the exception condition. This is how Oracle’s Internal Exception handling strategy works.

Oracle’s internal exception handling code can be overridden. When this is done Oracle’s internal exception handling code is not executed but the code block that takes care of the exception condition, in the exception section, of the PL/SQL block is executed. As soon as the Oracle invokes an exception handler the exception handler goes back to the PL/SQL block from which the exception condition was raised. The exception handler scans the PL/SQL block for the existence of exception section within the PL/SQL block. If an exception section within the PL/SQL block exists the exception handler scans the first word, after the key word WHEN, within the exception section. If the first word after the key word WHEN is the exception handler’s name then the exception handler executes the code contained in the THEN section of the construct, the syntax follows:

Exceptions can be internally defined (by the run-time system) or user defined. Internally defined exceptions are raised implicitly (automatically) by the run-time system. User-defined exceptions must be raised explicitly by RAISE statements, which can also raise internally defined exceptions. Raised exceptions are handled by separate routines called exception handlers. After an exception handler runs, the current block stops executing and the enclosing block resumes with the next statement. If there is no enclosing block, control returns to the host environment.

Advantages of Using Exceptions

Control over abnormal exits of executing programs on encountering error conditions, hence the behavior of application becomes more reliable.
Meaningful messages can be flagged so that the developer can become aware of error and warning conditions and act upon them.
In traditional error checking system, if same error is to be checked at several places, you are required to code the same error check at all those places. But with exception handling technique, we will write the exception for that particular error only once in the entire code. Whenever that type error occurs at any place in code, the exceptional handler will automatically raise the defined exception.
Being a part of PL/SQL, exceptions can be coded at suitable places and can be coded isolated like procedures and functions. This improves the overall readability of a PL/SQL program.
Oracle’s internal exception mechanism combined with user-defined exceptions, considerably reduce the development efforts required for cumbersome error handling.

Predefined and User-Defined Exceptions

As discussed earlier there are some predefined or internal exceptions, and a developer can also code user-defined exceptions according to his requirement. In next session we will be looking closely at these two types of exceptions.

Internally (Predefined) Defined Exceptions

An internal exception is raised implicitly whenever a PL/SQL program violates an Oracle rule or exceeds a system-dependent limit. Every Oracle error has a number, but exceptions must be handled by name. So, PL/SQL predefines a name for some common errors to raise them as exception. For example, if a SELECT INTO statement returns no rows, PL/SQL raises the predefined exception NO_DATA_FOUND, which has the associated Oracle error number ORA-01403.

Example

Figure 5.21 shows the internally defined exception NO DATA FOUND, when we want to get a salary of an employee who is not in the EMP table.

If we execute this query with some emp name say “XYZ” as input and if emp name column of employee table does not contain any value “XYZ,” Oracle’s internal exception handling mechanism will raise NO_DATA_FOUND exception even when we have not coded for it.

PL/SQL declares predefined exceptions globally in package STANDARD, which defines the PL/SQL environment. Some of the commonly used exceptions are as follows:

User Defined Exceptions

Unlike internally defined exceptions, user-defined exceptions must be declared and raised explicitly by RAISE statements. Exceptions can be declared only in the declarative part of a PL/SQL block, subprogram, or package. An exception is declared by introducing its name, followed by the keyword EXCEPTION.

Name of the exception	Raised when …
ACCESS_INTO_NULL	Your program attempts to assign values to the attributes of an uninitialized (atomically null) object.
COLLECTION_IS_NULL	Your program attempts to apply collection methods, other than EXISTS to an uninitialized (atomically null) nested table or varray, or the program attempts to assign values to the elements of an uninitialized nested table or varray.
CURSOR_ALREADY_OPEN	Your program attempts to open an already open cursor. A cursor must be closed before it can be reopened. A cursor FOR loop automatically opens the cursor to which it refers. So, your program cannot open that cursor inside the loop.
DUP_VAL_ON_INDEX	Your program attempts to store duplicate values in a database column that is constrained by a unique index.
INVALID_CURSOR	Your program attempts an illegal cursor operation such as closing an unopened cursor.
INVALID_NUMBER	In a SQL statement, the conversion of a character string into a number fails because the string does not represent a valid number. (In procedural statements, VALUE_ERROR is raised.)
LOGIN_DENIED	Your program attempts to log on to Oracle with an invalid username and/or password.
NO_DATA_FOUND A	SELECT INTO statement returns no rows, or your program references a deleted element in a nested table or an uninitialized element in an index-by table. SQL aggregate functions such as AVG and SUM always return a value or a null. So, a SELECT INTO statement that calls an aggregate function will never raise NO_DATA_FOUND. The FETCH statement is expected to return no rows eventually, so when that happens, no exception is raised.
NOT_LOGGED_ON	Your program issues a database call without being connected to Oracle.
ROWTYPE_MISMATCH	The host cursor variable and PL/SQL cursor variable involved in an assignment have incompatible return types. For example, when an open host cursor variable is passed to a stored subprogram, the return types of the actual and formal parameters must be compatible.
PROGRAM_ERROR	PL/SQL has an internal problem.
SELF_IS_NULL	Your program attempts to call a MEMBER method on a null instance. That is, the builtin parameter SELF (which is always the first parameter passed to a MEMBER method) is null.
STORAGE_ERROR	PL/SQL runs out of memory or memory has been corrupted.
SUBSCRIPT_BEYOND_COUNT	Your program references a nested table or varray element using an index number larger than the number of elements in the collection.
SUBSCRIPT_OUTSIDE_LIMIT	Your program references a nested table or varray element using an index number (−1 for example) that is outside the legal range.
SYS_INVALID_ROWID	The conversion of a character string into a universal rowid fails because the character string does not represent a valid rowid.
TIMEOUT_ON_RESOURCE	A time-out occurs while Oracle is waiting for a resource.
TOO_MANY_ROWS	A SELECT INTO statement returns more than one row.
VALUE_ERROR	An arithmetic, conversion, truncation, or size constraint error occurs. For example, when your program selects a column value into a character variable, if the value is longer than the declared length of the variable, PL/SQL aborts the assignment and raises VALUE_ERROR. In procedural statements, VALUE_ERROR is raised if the conversion of a character string into a number fails. (In SQL statements, INVALID_NUMBER is raised.)
ZERO_DIVIDE	Your program attempts to divide a number by zero.

The syntax is as follows:

Exceptions are declared in the same way as the variables. But exceptions cannot be used in assignments or SQL expressions/statements as they are not data items. The visibility of exceptions is governed by the same scope rules which apply to variables also.

Raising User-Defined and Internal Exceptions

As seen in the previous example, one can notice a statement “RAISE Exception1.” This statement is used to explicitly raise the exception “Exception1,” the reason being, unlike internally defined exceptions which are automatically raised by “OracleS” run time engine, user-defined exceptions have to be raised explicitly by using RAISE statement. However, it is always possible to RAISE predefined (internally defined) exceptions, if needed, in the same way as do the user-defined exceptions, which is illustrated in Fig. 5.22

Example

Create a table as follows,

CREATE TABLE ROOM STATUS (ROOM NO NUMBER(5)
PRIMARY KEY,
CAPACITY NUMBER(2),
ROOMSTATUS VARCHAR2(20),
RENT NUMBER(4),
CHECK (ROOMSTATUS IN (‘VACANT’,‘BOOKED’)));

User-Defined Error Reporting – Use of Raise_Application_Error

RAISE_APPLICATION_ERROR lets display the messages we want whenever a standard internal error occurs. RAISE_APPILCATION_ERROR associates an Oracle Standard Error Number with the message we define. The syntax for RAISE_APPLICATION_ERROR is as follows:

Figures 5.23 and 5.24 shows the output for two conditions ‘Room Available’ and ‘Vacant’.

Oracle error number is the standard Oracle error (−20000 to −20999) that we want to associate with the message (max 2,048 kb) defined, TRUE/FALSE indicates whether to place the error message on previous error stack (TRUE) or to replace all the errors with this message (FALSE).

RAISE_APPLICATION_ERROR can be called only from an executing subprogram. As soon as the subprogram encounters RAISE_APPLICATION_ERROR, the subprogram returns control back to the calling PL/SQL code thereby displaying the error message. We can handle the exception raised in the calling portion of PL/SQL block.

Example

Following Fig. 5.25 illustrates the use of RAISE_APPLICATION_ERROR command with the procedure named get_emp_name.

5.16 Database Triggers

A database trigger is a stored PL/SQL program unit associated with a specific database table. It can perform the role of a constraint, which forces the integrity of data. It is the most practical way to implement routines and granting integrity of data. Unlike the stored procedures or functions, which have to be explicitly invoked, these triggers implicitly get fired whenever the table is affected by the SQL operation. For any event that causes a change in the contents of a table, a user can specify an associated action that the DBMS should carry out. Trigger follows the Event-Condition-Action scheme (ECA scheme).

Privileges Required for Triggers

Creation or alteration of a TRIGGER on a specific table requires TRIGGER privileges as well as table privileges. They are:

To create TRIGGER in one’s own schema, he must have CREATE TRIGGER privilege. To create a trigger in any other’s schema, one must have CREATE ANY TRIGGER system privilege.
To create a trigger on table, one must own the table or should have ALTER privilege for that table or should have ALTER ANY TABLE privilege.
To ALTER a trigger, one must own that trigger or should have ALTER ANY TRIGGER privilege. Also since the trigger will be operating on some table, one also requires ALTER privilege on that table or ALTER ANY TABLE table privilege.
To create a TRIGGER on any database level event, one must have ADMINISTER DATABASE TRIGGER system privilege.

Context to Use Triggers

Following are the situations to use the triggers efficiently:

– Use triggers to guarantee that when a specific operation is performed, related actions are performed.
– Do not define triggers that duplicate the functionality already built into Oracle. For example, do not define triggers to enforce data integrity rules that can be easily enforced using declarative integrity constraints.
– Limit the size of triggers. If the logic for our trigger requires much more than 60 lines of PL/SQL code, then it is better to include most of the code in a stored procedure and call the procedure from the trigger.
– Use triggers only for centralized, global operations that should be fired for the triggering statement, regardless of which user or database application issues the statement.
– Do not create recursive triggers which cause the trigger to fire recursively until it has run out of memory.
– Use triggers on DATABASE judiciously. They are executed for every user every time the event occurs on which the trigger is created.

Uniqueness of Trigger

Different types of integrity constraints provide a declarative mechanism to associate “simple” conditions with a table such as a primary key, foreign keys, or domain constraints. Complex integrity constraints that refer to several tables and attributes cannot be specified within table definitions. Triggers, in contrast, provide a procedural technique to specify and maintain integrity constraints.

Triggers even allow users to specify more complex integrity constraints since a trigger essentially is a PL/SQL procedure. Such a procedure is associated with a table and is automatically called by the database system whenever a certain modification (event) occurs on that table.

Simply we can say that trigger is less declarative and more procedural type constraint enforcement. Triggers are used generally to implement business rules in the database. It is the major difference between Triggers and Integrity Constraints.

Create Trigger Syntax

The Create trigger syntax is as follows:

CREATE [OR REPLACE] TRIGGER <trigge_ name>
[BEFORE/AFTER/INSTEAD OF]
[INSERT/UPDATE/DELETE [of column,..]] ON <table_name>
[REFERENCING [OLD [AS] <old_name> | NEW [AS]
<new_name>]
[FOR EACH STATEMENT/FOR EACH ROW]
[WHEN <condition>]
[BEGIN
–PL/SQL block
END];

This syntax can be explained as follows.

Parts of Trigger

A trigger has three basic parts:

– A triggering event or statement
– A trigger restriction
– A trigger action

Trigger Event or Statement

A triggering event or statement is the SQL statement, database event, or user event like update, delete, insert, etc. that causes a trigger to be fired. It also specifies the table to which the trigger is associated. Trigger statement or an event can be any of the following:

INSERT, UPDATE, or DELETE on a specific table or view.
CREATE, ALTER, or DROP on any schema object.
Database activities like startup and shutdown.
User activities like logon and logoff.
A specific error message on any error message.

Figure 5.26 shows a database application with some SQL statements that implicitly fire several triggers stored in the database. It shows three triggers, which are associated with the INSERT, UPDATE, and DELETE operation in the database table. When these data manipulation commands are given, the corresponding trigger gets automatically fired performing the task described in the corresponding trigger body.

Trigger Restriction

A trigger restriction is any logical expression whose outcome is TRUE/FALSE/UNKNOWN. For a trigger to fire, this logical expression must evaluate to TRUE. Typically, a restriction is a part of trigger declaration that follows the keyword WHEN.

Trigger Action

A trigger action is the PL/SQL block that contains the SQL statements and code to be executed when a triggering statement is issued and the trigger restriction evaluates to TRUE. It is also called the trigger body. Like stored procedures, a trigger action can contain SQL and PL/SQL.

Following statements will explain the various keywords used in the syntax.

BEFORE and AFTER keyword indicates whether the trigger should be executed before or after the trigger event, where a triggering event can be INSERT, UPDATE, or DELETE. Any combination of triggering events can be included in the same database trigger.

When referring the old and new values of columns, we can use the defaults(“old” and “new”) or we can use the REFERENCING clause to specify other names. FOR EACH ROW clause causes the trigger to fire once for each record created, deleted, or modified by the triggering statement. When working with row triggers, the WHEN clause can be used to restrict the records for which the trigger fires.

We can use INSTEAD OF triggers to tell the database what to do instead of performing the actions that invoked the trigger. For example, we can use it on a VIEW to redirect the inserts into a table or to update multiple tables that are parts of the view.

5.17 Types of Triggers

Type of trigger firing, level at which a trigger is executed, and the types of events form the basis classification of triggers into different categories. This section describes the different types of triggers. The broad classification of triggers is as shown below.

On the Basis of Type of Events

– Triggers on System events
– Trigger on User events

On the Basis of the Level at which Triggers are Executed

– Row Level Triggers
– Statement Level Triggers

On the Basis of Type of Trigger/Firing or Triggering Transaction

– BEFORE Triggers
– AFTER Triggers
– INSTEAD OF

Triggers on System Events

System events that can fire triggers are related to instance startup and shutdown and error messages. Triggers created on startup and shutdown events have to be associated with the database; triggers created on error events can be associated with the database or with a schema.

BEFORE Triggers

BEFORE triggers execute the trigger action before the triggering statement is executed. It is used to derive specific column values before completing a triggering DML, DDL statement or to determine whether the triggering statement should be allowed to complete.

Example

We can define a BEFORE trigger on the passengers detail table that gets fired before deletion of any row. The trigger will check the system date and if the date is Sunday, it will not allow any deletion on the table.

The trigger can be created in Oracle as shown in Fig. 5.27.

The trigger action can be shown as in Fig. 5.28.

As soon as we try to delete a record from passenger detail table, the above trigger will be fired and due to SUNDAY EXP fired, all the changes will be rolled back or undone and the record will not be deleted.

AFTER Triggers

AFTER triggers execute the trigger action after the triggering statement is executed. AFTER triggers are used when we want the triggering statement to complete before executing the trigger action, or to execute some additional logic to the before trigger action.

Example

We can define an AFTER trigger on the reserv det table that gets fired every time one row is deleted from the table. This trigger will determine the passenger_id of the deleted row and subsequently delete the corresponding row from the passengers det table with same passenger_id.

Trigger can be created as shown in Fig. 5.29

Trigger action can be shown as in Fig. 5.30. In this figure, the content of the relations passenger det and reserve det are shown before and after the triggering event.

Triggers on LOGON and LOGOFF Events

LOGON and LOGOFF triggers can be associated with the database or with a schema. Their attributes include the system event and username, and they can specify simple conditions on USERID and USERNAME.

– LOGON triggers fire after a successful logon of a user.
– LOGOFF triggers fire at the start of a user logoff.

Example

Let us create a trigger on LOGON event called pub log, which will store the number, date, and user of login done by different user in that particular database. The trigger will store this information in a table called log detail. The table log detail must be created before trigger creation by logging into Administrator login. The trigger can be created as shown in Fig. 5.31.

After logging into another login, if we see the content of the relation log detail it will show who are all logged into database. The value of the attribute log times would go on increasing with every login into the database which is indicated in Fig. 5.32.

Note The log detail relation is visible only in Administrator login.

Triggers on DDL Statements

This trigger gets fired when DDL statement such as CREATE, ALTER, or DROP command is issued. DDL triggers can be associated with the database or with a schema. Moreover depending on the time of firing of trigger, this

trigger can be classified into BEFORE and AFTER. Hence the triggers on DDL statements can be as follows:

– BEFORE CREATE and AFTER CREATE triggers fire when a schema object is created in the database or schema.
– BEFORE ALTER and AFTER ALTER triggers fire when a schema object is altered in the database or schema.
– BEFORE DROP and AFTER DROP triggers fire when a schema object is dropped from the database or schema.

Example

Let us create a trigger called “no drop pass” that fires before dropping any object on the schema of the user with username “skk.” It checks whether the object type and name. If the object name is “passenger det” and object type is table, it raises an application error and prevents the dropping of the

table. The syntax for creating the trigger is as follows. Remember to create the trigger by logging as administrator in the database. The trigger can be created as shown in Fig. 5.33.

The trigger is executed as shown in Fig. 5.34.

Triggers on DML Statements

This trigger gets fired when DML statement such as INSERT, UPDATE, or DELETE command is issued. DML triggers can be associated with the database or with a schema. Depending on the time of firing of trigger, this trigger can be classified into BEFORE and AFTER. Moreover, when we define a trigger on a DML statement, we can specify the number of times the trigger action is to be executed: once for every row or once for the triggering statement.

Row Level Triggers

A row level trigger, as its name suggests, is fired for each row that will be affected by the SQL statement, which fires the trigger. Suppose for example if an UPDATE statement updates “N” rows of a table, a row level trigger defined for this UPDATE on that particular table will be fired once for each of those “N” affected rows. If a triggering SQL statement affects no rows, arow trigger is not executed at all. To specify a trigger of row type, FOR EACH ROW clause is used after the name of table.

In row level triggers, the statements in a trigger action have access to column values (new and old) of the current row being processed by the trigger. The names of the new and old values are called correlation names. They allow access to new and old values for each column. By means of new, one refers to the new value with which the row in the tableis updated or inserted. On

the other hand by means of old, one refers to the old value, which is being updated or deleted. Row level triggers are useful if the code in the trigger action depends on data provided by the triggering statement or rows that are affected.

Example

The AFTER trigger on reserv det table that deletes all corresponding rows from passenger det table with the same passenger id is a row level trigger as shown in Figs. 5.29 and 5.30, respectively.

Statement Level Triggers

Unlike row level trigger, a statement level trigger is fired only once on behalf of the triggering SQL statement, regardless of the number of rows in the table that the triggering statement affects. Even if the triggering statement affects no rows, the statement level trigger will execute exactly once. For example, if a DELETE statement deletes several rows from a table, a statement-level DELETE trigger is fired only once, regardless of how many rows are deleted from the table. Default type of any trigger is Statement level trigger. Statement level triggers are useful if the code in the trigger action does not depend on the data provided by the triggering statement or the rows affected.

Example

The BEFORE trigger on passenger det table that checks that no row should be deleted on Sunday is a statement level trigger as shown in Figs. 5.27 and 5.28, respectively.

INSTEAD-OF Triggers

INSTEAD-OF triggers are used to tell Oracle what to do instead of performing the actions that executed the trigger. It is applicable to both object views and standard relational database. This trigger can be used to redirect table inserts into a different table or to update different tables that are the part of the view. This trigger is used to perform any action instead of the action that executes the trigger. In simpler words if the task associated with this trigger fails, the trigger is fired. It is used mostly for object views rather than tables. This trigger is used to manipulate the tables through the views.

Enabling and Disabling a Trigger

By default, a trigger is enabled when it is created. Only an enabled trigger gets fired whenever the trigger restriction evaluates to TRUE. Disabled triggers do not get fired even when the triggering statement is issued. Thus a trigger can be in either of two distinct modes:

– Enabled (an enabled trigger executes its trigger action if a triggering statement is issued and the trigger restriction (if any) evaluates to TRUE).
– Disabled (a disabled trigger does not execute its trigger action, even if a triggering statement is issued and the trigger restriction (if any) would evaluate to TRUE).

The need to disable the trigger is there are some situations like heavy data load or partially succeeded load operations. In case of heavy data load condition, disabling trigger may dramatically improve the performance. After load, one has to do all those data operations manually which otherwise a trigger would have done. In case of partial succeeded load, since a part of load is successful, the triggers are already executed for that part. Now when we start the same load fresh, it may be possible that the same trigger would be executed twice which may cause some undesirable effects. So the best way is to disable the trigger and do the operations manually after the entire load is successful.

For enabled triggers, Oracle automatically does the following:

– Prepares a definite plan for execution of triggers of different types.
– Decides time for integrity constraint checking for each type of trigger and ensures that none of the triggers is violating integrity constraints.
– Manages the dependencies among triggers and schema objects referenced in the code of the trigger action.
– No definite order for firing of multiple triggers of same type.

Syntax

ALTER TRIGGER <Trigger name> ENABLE/DISABLE;

Example

The passenger_bef_del trigger can be disabled and enabled as shown in Fig. 5.35, it shows how Oracle behaves for enabled/disabled triggers.

Replacing Triggers

Triggers cannot be altered explicitly. Triggers have to be replaced with a new definition using OR REPLACE option with CREATE TRIGGER command. In such case the old definition of the trigger is dropped and the new definition is entered in the data dictionary.

The exact syntax for replacing the trigger is as follows:

Syntax

CREATE OR REPLACE TRIGGER <trigger_name> AS/IS
<trigger_definition>;

The trigger_definition should be as shown in the definition for creating trigger. Alternately the trigger can be dropped and re-created. On dropping a trigger all grants associated with the trigger are dropped as well.

Dropping Triggers

Triggers can be dropped like tables using the drop trigger command. The drop trigger command removes the trigger structure from the database. User needs

to have DROP ANY TRIGGER system privilege to drop a trigger. The exact syntax for dropping a trigger is as follows.

Syntax

DROP TRIGGER <trigger_name>

Example

We drop the trigger passenger bef del as shown in Fig. 5.36.

Summary

This chapter has introduced the concept of PL/SQL. The shortcomings of SQL and the need for PL/SQL are given in detail. PL/SQL combines the data manipulation power of SQL with data processing power of procedural language. The PL/SQL language elements like character sets, operators, indicators, punctuation, identifiers, comments, etc. are introduced with examples in this chapter. The different types of iterative control like FOR loop, WHILE loop, their syntax and concepts are given through examples.

A cursor is a mechanism that can be used to process the multiple row result sets one row at a time. Cursors are an inherent structure in PL/SQL. Cursors allow users to easily store and process sets of information in PL/SQL program. The concept of cursor and different types of cursors like implicit cursor, explicit cursor are given through examples.

A procedure is a subprogram that performs some specific task, and stored in the data dictionary. The concept of procedure, function, the difference between procedure and function are given in this chapter.

A package is a collection of related program objects such as procedures, functions, and associated cursors and variables together as a unit in the database. In simpler term, a package is a group of related procedures and functions stored together and sharing common variables, as well as local procedures and function. In this chapter, the package body and how to create a package are explained with examples.

An EXCEPTION is any error or warning condition that arises during runtime. The main intention of building EXCEPTION technique is to continue the processing of a program even when it encounters runtime error or warning and display suitable messages on console so that user can handle those conditions next time. The advantage of using EXCEPTION, different types of EXCEPTIONS are given through example in this chapter.

A database trigger is a stored PL/SQL program unit associated with a specific database table. It can perform the role of a constraint, which forces the integrity of data. The concept of trigger, the uniqueness of trigger, and the use of trigger are explained with examples in this chapter.

Review Questions

5.1. Mention the key difference between SQL and PL/SQL?

SQL is a declarative language. PL/SQL is a procedural language that makes up for all the missing elements in SQL.

5.2. Mention two drawbacks of SQL?

– SQL statements can be executed only one at a time. Every time to execute a SQL statement, a call is made to Oracle engine, thus it results in an increase in database overheads.
– While processing an SQL statement, if an error occurs, Oracle generates its own error message, which is sometimes difficult to understand. If a user wants to display some other meaningful error message, SQL does not have provision for that.

5.3. Identify which one is not included in PL/SQL Character Set?

(a) * (b)> (c)! (d)

Answer: (d)

5.4. What are Lexical units related with PL/SQL?

A line of PL/SQL program contains groups of characters known as lexical units, which can be classified as follows:

– Delimiters
– Identifiers
– Literals
– Comments

5.5. What is Delimiter?

A delimiter is a simple or compound symbol that has a special meaning to PL/SQL.

5.6. Identify which identifier is not permitted in PL/SQL?

(a) Bn12 (b) Girt–1 (c) Hay# (d) I am

Answer: (d)

5.7. Give the syntax for single-line comments and multiline comments?

Single line comment: –
Multiline comment: /* . . . . . . Some text. . . . . . */

5.8. How you declare a record type variable in PL/SQL?

We can declare record type variable for particular table by using the syntax.

<Variable Name> <Table name>%ROWTYPE.
ROWTYPE is a keyword for defining record type variables.

5.9. Find out the error in the following PL/SQL statement?

IF condition THEN
sequence_of_statements1
ELSE
sequence_of_statements2
END IF;

Answer : No Error in the Statement.

5.10. Mention the facilities available for iterating the statements in PL/SQL?

(a) For-loop
(b) While-loop
(c) Loop-Exit

5.11. What is cursor and mention its types in Oracle?

A cursor is a mechanism that can be used to process the multiple row result sets one row at a time.

There are two types of cursors in Oracle

(a) Implicit and
(b) Explicit cursors.

5.12. Mention the syntax for opening and closing a cursor.

For Opening: Open <cursor name>
For Closing: Close <cursor name>

5.13. Mention some implicit and explicit cursor attributes.

Implicit:
%NOTFOUND, %FOUND, % ROWCOUNT, and %ISOPEN

Explicit:
Similar to Implicit.
%NOTFOUND, %FOUND, %ROWCOUNT, and %ISOPEN

5.14. What is Procedure in PL/SQL?

A procedure is a subprogram that performs some specific task, and stored in the data dictionary. A procedure must have a name so that it can be invoked or called by any PL/SQL program that appears within an application. Procedures can take parameters from the calling program and perform the specific task. Before the procedure or function is stored, the Oracle engine parses and compiles the procedure or function.

5.15. Mention any four advantages of procedures and function?

It modifies one routine to affect multiple applications.
It modifies one routine to eliminate duplicate testing.
It ensures that related actions are performed together, or not at all, by doing the activity through a single path.
It avoids PL/SQL parsing at runtime by parsing at compile time.

5.16. What is the syntax used in PL/SQL for dropping a procedure?

DROP PROCEDURE <PROCEDURE NAME>

5.17. Mention three differences between functions and procedures?

A procedure never returns a value to the calling portion of code, whereas a function returns exactly one value to the calling program.
As functions are capable of returning a value, they can be used as elements of SQL expressions, whereas the procedures cannot. However, user defined functions cannot be used in CHECK or DEFAULT constraints and can not manipulate database values, to obey function purity rules.
It is mandatory for a function to have at least one RETURN statement, whereas for procedures there is no restriction. A procedure may have a RETURN statement or may not. In case of procedures with RETURN statement, simply the control of execution is transferred back to the portion of code that called the procedure.

5.18. What is Purity rule for functions in PL/SQL?

For a function to be eligible for being called in SQL statements, it must satisfy following requirements, which are known as Purity Rules.

1. When called from a SELECT statement or a parallelized INSERT, UPDATE, or DELETE statement, the function cannot modify any database tables.

2. When called from an INSERT, UPDATE, or DELETE statement, the function cannot query or modify any database tables modified by that statement.

3. When called from a SELECT, INSERT, UPDATE, or DELETE statement, the function cannot execute SQL transaction control statements(such as COMMIT), session control statements (such as SET ROLE), or system control statements (such as ALTER SYSTEM). Also, it cannot execute DDL statements (such as CREATE) because they are followed by an automatic commit.

5.19. What is a syntax for deleting a function in PL/SQL?

DROP FUNCTION <FUNCTION NAME>

5.20. What are parameters?

Parameters are the link between a subprogram code and the code calling the subprogram. Lot depends on how the parameters are passed to a subprogram.

5.21. What are Packages?

5.22. Mention any two advantages of Packages?

Stored packages allow you to sum up (group logically) related stored procedures, variables, and datatypes, and so forth in a single-named, stored unit in the database. This provides for better orderliness during the development process. In other words packages and its modules are easily understood because of their logical grouping.
Grouping of related procedures, functions, etc. in a package also make privilege management easier. Granting the privilege to use a package makes all components of the package accessible to the grantee.

5.23. Mention how exception handling is done in Oracle?

5.24. Mention two advantages of using exceptions in Oracle?

Control over abnormal exits of executing programs on encountering error conditions, hence the behavior of application becomes more reliable.
In traditional error checking system, if same error is to be checked at several places, you are required to code the same error check at all those places. But with exception handling technique, you will write the exception for that particular error only once in the entire code. Whenever that type error occurs at any place in code, the exceptional handler will automatically raise the defined exception.

Chapter 6

Database Design

Learning Objectives. This chapter deals with various phases in database design, objectives of database design, database design tools. The important concept in database design like functional dependency and normalization are also discussed in this chapter. After completing this chapter the reader should be familiar with the following concepts:

– Various phases in database design
– Database design tools
– Identify modification anomalies in tables
– Functional dependency, ˚Armstrong’s axioms
– Concept of normalization and different normal forms
– Denormalization

6.1 Introduction

Database design process integrates relevant data in such a manner that it can be processed through a mechanism for recording the facts. A database of an organization is an information repository that represents facts about the organization. It is manipulated by some software to incorporate the changes that take place in the organization. The database design is a complex process. The complexity arises mainly because of the identification of relationships among individual components and their representation for maintaining correct functionality are highly involved. The degree of complexity increases if there are many-to-many relationships among individual components. The process of database design usually requires a number of steps which are in Fig. 6.1.

Feasibility Study

When designing a database, the purpose for which the database is being designed must be clearly defined. In other words the objective of creating the database must be crystal clear.

Requirement Collection and Analysis

In requirement collection, one has to decide what data are to be stored, and to some extent, how that data will be used. The people who are going to use the database must be interviewed repeatedly. Assumptions about the stated relationships between various parts of the data must be questioned again and again. For example, in designing the database about medical records of a patient, the following queries must be clearly defined.

Does a patient have more than one doctor? Is there a separate billing number for each drug ordered by a patient?

Prototyping and Design

Design implies a procedure for analyzing and organizing data into a form suitable to support business requirements and makes use of strategic technology. The three phases in relational database design are conceptual design, logical design, and physical design.

Implementation

Database implement involves development of code for database processing, and also the installation of new database contents, usually form existing data sources.

6.2 Objectives of Database Design

The objectives of database design vary from implementation to implementation. Some of the important factors like efficiency, integrity, privacy, security, implementability, flexibility have to be considered in the design of the database.

Efficiency

Efficiency is generally considered to be the most important. Given a piece of hardware on which the database will run and a piece of software (DBMS) to run it, the design should make full and efficient use of the facilities provided. If the database is made online, then the users should interact with the database without any time delay.

Integrity

The term integrity means that the database should be as accurate as possible. The problem of preserving the integrity of data in a database can be viewed at a number of levels. At a low level it concerns ensuring that the data are not corrupted by hardware or software errors. At a higher level, the problem of preserving database integrity concerns maintaining an accurate representation of the real world.

Privacy

The database should not allow unauthorized access to files. This is very important in the case of financial data. For example the bank balance of one customer should not be revealed to other customers.

Security

The database, once loaded, should be safe from physical corruption whether from hardware or software failure or from unauthorized access. This is a general requirement of most databases.

Implementation

The conceptual model should be simple and effective so that mapping from conceptual model to logical model is easy. Moreover while designing the database, care has to be taken such that application programs should interact effectively with the database.

Flexibility

The database should not be implemented in a rigid way that assumes the business will remain constant forever. Changes will occur and the database must be capable of responding readily to such change. Other than the factors which were mentioned above, the design of the database should ensure that data redundancy is not there.

6.3 Database Design Tools

Once the objectives of the database design and the various steps in database design is known, it is essential to know the database design tools which are used to automate the task of designing a business system. Using automated design tools is the process of using a GUI tool to assist in the design of a database or database application. Many database design tools are available with a variety of features. The design tools are vendor-specific. CASE tools are software that provides automated support for some portion of the systems development process. Database drawing tools are used in enterprise modeling, conceptual data modeling, logical database design, and physical data modeling.

6.3.1 Need for Database Design Tool

The database design tools increase the overall productivity because the

manual tasks are automated and less time is spent in performing tedious tasks

and more time is spent in thinking about the actual design of the database.

The quality of the end product is improved in using database design tools;

because the design tool automates much of the design process as a result the

time taken to design a database is reduced. As a result, more time is available

to interview the customer, conduct user feedback sessions, and naturally the

quality of the product is improved.

6.3.2 Desired Features of Database Design Tools

The database design tools should help the developer to complete the database

model of database application in a timely fashion. Some of the features of the

database design tools are given below:

– The database design tools should capture the user needs.

– The capability to model the flow of data in an organization.

– The database design tool should have the capability to model entities and

their relationships.

– The database design tool should have the capability to generate Data

Definition Language (DDL) to create database object.

6.3 Database Design Tools 287

– The database design tool should support full life cycle database support.

– Database and application version control.

– The database design tool should generate reports for documentation and

user-feedback sessions.

6.3.3 Advantages of Database Design Tools

Some of the advantages of using database design tools for system design or

application development are given as:

– The amount of code to be written is reduced as a result the database

design time is reduced.

– Chances of errors because of manual work are reduced.

– Easy to convert the business model to working database model.

– Easy to ensure that all business requirements are met with.

– A higher quality, more accurate product is produced.

6.3.4 Disadvantages of Database Design Tools

Some of the disadvantages of database design tools are given below:

– More expenses involved for the tool itself.

– Developers might require special training to use the tool.

6.3.5 Commercial Database Design Tools

The database design tools which are commercially popular are given along

with their websites.

1. CASE Studio 2 – Powerful database modeling, management, and reporting

tool.

http://www.casestudio.com/enu/default.aspx

2. Design for Databases V3 – Database development tool using an entity

relationship diagram.

http://www.datanamic.com/dezign

3. DBDesigner4 – Visual database design system that integrates database

design, modeling.

4. ER/Studio – Multilevel data modeling application for logical and physical

database design and construction.

http://www.embarcadero.com/products/erstudio/index.html