Relational Integrity | Database Management System

Relational Integrity | Database Management System

Relational Integrity : We noted at the beginning of the chapter that the relational model has three main components; data structure, data integrity and data manipulation. The aim of data integrity is to specify rules that implicitly or explicitly define a consistent database state or changes of State. These rules may include facilities like those provided by most programming languages for declaring data types which constrain the user from operations like Comparing data of different data types and assigning a variable of one type to another of a different type. This is done to stop the user from doing things that generally do not make sense.

The integrity of RDBMS is based on certain rules proposed by E.F. Codd and few constraints which also proposed by Codd. So let us first discuss Codd rules then we will look towards other constraints.

Codd’s Rules

Rule 0. The system must qualify as relational, as a database and as a management system.

“For any system that is advertised as, or claimed to be, a relational database management System, that system must be able to manage databases entirely through its relational capabilities, no matter what additional capabilities the System may support.”

Rule 1. The information Rule.

“All information in a relational database is represented explicitly at the logical level and in exactly one way – by values in tables.”

Everything within the database exists in tables and is accessed via table access routines.

All information in the database is to be represented in one and only one way, namely by values in Column positions within rows of tables.

Rule 2. Guaranteed access Rule.

“Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and Column name.”

To access any data-item you specify which Column within which table it exists, there is no reading of characters 10 to 20 of a 255 byte String.

Rule 3. Systematic treatment of null values.

“Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.”

If data does not exist or does not apply then a value of NULL is applied, this is understood by the RDBMS as meaning non-applicable data.

Rule 4. Active on-line catalog based on the relational model (Data Dictionary)

“The database description is represented at the logical level in the same way as-ordinary data, So that authorized users can apply the same relational language to its interrogation as they apply to the regular data.”

The Data Dictionary is held within the RDBMS, thus there is no-need for off-line volumes to tell you the Structure of the database.

Rule 5. Comprehensive data sub-language Rule.

“A relational System may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items:

Data Definition

View Definition

Data Manipulation (Interactive and by program).

Integrity Constraints

Authorization.

Every RDBMS should provide a language to allow the user to query the contents of the RDBMS and also manipulate the Contents of the RDBMS,

Rule 6. View updating Rule

“All views that are theoretically updatable are also updatable by the System.”

Not only can the user modify data, but so can the RDBMS when the user is not logged-in.

Views are virtual tables. They appear to behave as conventional tables except that they are built dynamically when the query is run. This means that a view is always up to date. It is not always theoretically possible to update views. Codd himself, did not completely understand this. One problem exists when a view relates to part of a table not including a candidate key. This means that potential updates would violate the entity integrity rule.

Rule 7. High-level insert, update and delete.

“The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update and deletion of data.”

The user should be able to modify several tables by modifying the view to which they act as base tables.

Rule 8. Physical data independence.

“Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.” Changes to the physical level (how the data is stored, whether in arrays or linked lists etc.) must not require a change to an application based on the structure. The user should not be aware of where or upon which media data-files are stored.

Rule 9. Logical data independence.

“Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit un-impairment are made to the base tables.” User programs and the user should not be aware of any changes to the structure of the tables (such as the addition of extra columns). Changes to the logical level (tables, Columns, rows, and so on) must not require a change to an application based on the structure. Logical data independence is more difficult to achieve than physical data independence.

Rule 10. Integrity independence.

“Integrity constraints specific to a particular relational database must be definable in the relational data sub-language and storable in the catalog, not in the application programs.” If a column only accepts certain values, then it is the RDBMS which enforces these constraints and not the user program, this means that an invalid value can never be entered into this Column, while if the Constraints were enforced via programs there is always a chance that a buggy program might allow incorrect values into the System.

Rule 11. Distribution independence.

“A relational DBMS has distribution independence.” The RDBMS may spread across more than one system and across several networks, however to the end-user the tables should appear no different to those that are local. Applications should still work in a distributed database (DDB).

Rule 12. Non-subversion Rule.

“If a relational System has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple-records-at-a-time).” The RDBMS should prevent users from accessing the data without going through the Oracle data-read functions. If the system provides a low-level (record-at-a-time) interface, then that interface cannot be used to Subvert the System, for example, bypassing a relational Security or integrity constraint. In Rule 5 Codd Stated that an RDBMS required a Query Language, however Codd does not explicitly state that SOL should be the query tool, just that there should be a tool, and many of the initial products had their own tools, Oracle had UFI (User Friendly Interface), Ingres had OUEL (OUery Execution Language) and the never released DB1 had a language called sequel, the acronym SOL is often pronounced such as it was sequel that provided the core functionality to SOL. Even when the vendors eventually all started offering SOL the flavours were/are all radically different and contained wildly varying syntax. This situation was somewhat resolved in the late 80’s when ANSI brought out their first definition of the SOL syntax. This has since been upgraded to version 2 and now all vendors offer a standard core SOL, however ANSI SOL is somewhat limited and thus all RDBMS providers offer extensions to SOL which may differ from vendor to vendor.

Integrity constraints (integrity Rules)

In a DBMS, play integrity constraints plays a similar role. The integrity Constraints are necessary to avoid situations like the following:

  1. Some data has been inserted in the database but it cannot be identified (that is, it is not clear which object or entity the data is about).
  2. A student is enrolled in a course but no data about him is available in the relation that has information about students.
  3. During a query processing, a student number is compared with a course number (this should never be required).
  4. A student quits the university and is removed from the student relation but is still enrolled in a COURS6.

Constraints are a way of implementing business rules in the database. For instance, a Constraint can restrict an integer attribute to values between 1 and 10. Constraints restrict the data that can be stored in relations. These are usually defined using expressions that result in a boolean value, indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation. Constraints are not formally part of the relational model, but because of the integral role that they play in organizing data, they are usually discussed together with relational concepts.

Integrity Rules

The following are the 2 integrity rules to be satisfied by any relation.

  • Entity integrity: Primary Key cannot be null.
  • Referential Integrity: The Database must not contain any unmatched Foreign Key values. This is Called the referential integrity rule.

1. Can the Foreign Key accept nulls?

Ans. Yes, if the application business rule allows this.

How do we explain this?

Unlike the case of Primary Keys, there is no integrity rule saying that no component of the foreign key can be null. This can be logically explained with the help of the following example: Consider the relations Employee and Account as given below.

EmpACC# in Employee relation is a foreign key Creating reference from Employee to Account. Here, a Null value in EmpACC# attribute is logically possible if an Employee does not have a bank account. If the business rules allow an employee to exist in the System without opening an account, a Null value can be allowed for EmpACC# in Employee relation. in the case example given, Custi in Ord_Aug cannot accept Null if the business rule insists that the Customer No. needs to be stored for every order placed. The next issue related to foreign key reference is handling deletes / updates of parent? in the case example, can we delete the record with Cust value 002, 003 or 005? The default answer is NO, as long as there is a foreign key reference to these records from some other table. Here, the records are referenced from the order records in Ord_Aug relation. Hence, Restrict the deletion of the parent record. Deletion can still be carried if we use the Cascade or Nullify strategies.

Cascade : Delete/Update all the references successively or in a cascaded fashion and finally delete/update the parent record. In the case example, Customer record with Custi O02 can be deleted after deleting order records with Ordi 101 and 104. But these order records, in turn, can be deleted only after deleting those records with Ordi 101 and 104 from Ord items relation.

Nullify: Update the referencing to Nulland then delete/update the parent record. In the above example of Employee and Account relations, an account record may have to be deleted if the account is to be closed. For example, if Employee Raj decides to close his account, Account record with ACC# 12.0002 has to be deleted. But this deletion is not possible as long as the Employee record of Raj references it. Hence, the strategy can be to update the EmpACCH field in the employee record of Raj to Null and then delete the account parent record of 120002. After the deletion the data in the tables will be as follows: