RELATIONAL DATA STRUCTURE Database Management System

RELATIONAL DATA STRUCTURE Database Management System

RELATIONAL DATA STRUCTURE

We have already seen examples of relations, let us now get into detail concepts. Often the information that an organization wishes to store in a computer and process is complex and unstructured. For example, we may know that a department in a university has 200 students, most are full-time with an average age of 22 years, and most are females. Since natural language is not a good language for machine processing, the information must be structured for efficient processing. In the relational model the information is structures in a very simple way.

Relation Basics

The beauty of the relational model is its simplicity of structure. Its fundamental property is that all information about the entities and their attributes as well as the relationships is presented to the user as tables (called relations) and nothing buttables. The rows of the tables may be considered records and the Columns as fields. Each row therefore consists of an entity occurrence or a relationship occurrence. Each Column refers to an attribute. The model is called relational and not tabular because tables are a lower level of abstractions than the mathematical concept of relation. Tables give the impression that positions of rows and Columns are important. In the relational model, it is assumed that no ordering of rows and Columns is defined.

We consider the following database to illustrate the basic concepts of the relational data model.

RELATIONAL DATA STRUCTURE

A simple student database

We have not included the attributes in the above E-R diagram. The attributes are studentid, student name and student address for the entity set student and subject number, Subject name and department for the entity set subject.

The above database could be mapped into the following relational Schema which consists of three relation schemes. Each relation scheme presents the structure of a relation by specifying its name and the names of its attributes enclosed in parenthesis. Often the primary key of a relation is marked by underlining.

Student(studentid, student name, address)

enrolment (student_id, Subject_id)

Subject(subject id, subject name, department)

An example of a database based on the above relational model is:

Table 1. The student relation/table

 

RELATIONAL DATA STRUCTURE

RELATIONAL DATA STRUCTURE

RELATIONAL DATA STRUCTURE

Properties of Relations

  • Each relation contains only one record type.
  • Each relation has a fixed number of columns that are explicitly named. Each attribute name within a relation is unique.
  • No two rows in a relation are the same.
  • Each item or element in the relation is atomic, that is, in each row, every attribute has only one value that Cannot be decomposed and therefore no repeating groups are allowed.
  • Rows have no ordering associated with them.
  • Columns have no ordering associated with them (although most Commercially available Systems do).

 

The above properties are simple and based on practical considerations. The first property ensures that only one type of information is stored in each relation. The second property involves naming each Column uniquely. This has several benefits. The names can be chosen to convey what each column is and the names enable one to distinguish between the column and its domain. Furthermore, the names are much easier to remember than the position of the position of each column if the number of Columns is large.

The third property of not having duplicate rows appears obvious but is not always accepted by all users and designers of DBMS. The property is essential since no sensible context free meaning can be assigned to a number of rows that are exactly the same.

The next property requires that each element in each relation be atomic that cannot be decomposed into smaller pieces. In the relation model, the only composite or compound type (data that can be decomposed into smaller pieces) is a relation. This simplicity of Structure leads to relatively simple query and manipulative languages.

A relation is a set of tuples. As in other types of sets, there is no ordering associated with the tuples in a relation. Just like the second property, the ordering of the rows should not be significant since in a large database no user should be expected to have any understanding of the positions of the rows of a relation. Also, in some situations, the DBMS should be permitted to reorganise the data and change row orderings if that is necessary for efficiency reasons. The Columns are identified by their names. The tuples on the other hand are identified by their contents, possibily the attributes that are unique for each tuple.

The relation is a set of tuples and is closely related to the concept of relation in mathematics. Each row in a relation may be viewed as an assertion. For example, the relation student asserts that a Student by the name of Reena Rani has student_id 8654321 and lives at 88, Long Hall. Similarly, the relation subject asserts that one of the subjects offered by the Department of Computer Science is CP302 Database Management.

Each row also may be viewed as a pointina n-dimensional space (assuming nattributes). For example, the relation enrolment may be considered a 2-dimensional space with one dimension being the student id and the other being subject id. Each tuple may then be looked at as a point in this 2-dimensional space.

In the relational model, a relation is the only compound data structure since relation do not allow repeating groups or pointers.

Formal Definition of a Relation

Basic Structure

  1. Following Figure shows the deposit and customer tables for our banking example.

Deposit Relation (Table)

RELATIONAL DATA STRUCTURE

 

The deposit and customer relations.

  • Deposit ewation has four attributes.
  • For each attribute there is a permitted set of values, called the domain of that attribute.
  • g., the domain of bname is the set of all branch names.

Let D denote the domain of bname, and D, D; and D, and the remaining attributes’ domains respectively.

 

Then, any row of deposit consists of a four-tuple (V1,V2, V3, V4) where

V1  ED1, V2 ED2, V3 ED3, V4ED4

In general, deposit contains a subset of the set of all possible rows.

That is, deposit is a subset of

D1 X D, X D3 X D4 or abbreviated to, X4i=1Di

 

In general, a table of n Columns must be a subset of

Xni=1Di  (all possible rows)

  1. Mathematicians define a relation to be a subset of a Cartesian product of á list of domains. You can see the Correspondence with our tables.

We will use the terms relation and tuple in place of table and row from now on.

  1. We’ll also require that the domains of all attributes be indivisible units.
  • A domain is atomic if its elements are indivisible units.
  • For example, the set of integers is an atomic domain.
  • The set of all sets of integers is not.
  • Why? Integers do not have subparts, but sets do – the integers comprising them.
  • We could consider integers non-atomic if we thought of them as ordered lists of digits.

 

Database Scheme

  1. We distinguish between a database Scheme (logical design) and a database instance (data in the database at a point in time).
  2. A relation scheme is a list of attributes and their corresponding domains.
  3. The text uses the following conventions :
  • italics for all names
  • lowercase names for relations and attributes
  • names beginning with an uppercase for relation schemes

These notes will do the same.

For example, the relation scheme for the deposit relation :

Deposit-scheme = (bname, account#, Cname, balance)

RELATIONAL DATA STRUCTURE

We may state that deposit is a relation on Scheme Deposit-scheme by writing deposit(Deposit-scheme). If we wish to specify domains, we can write:

(bname: String, accountf: integer, Cname: String, balance: integer).

Note that customers are identified by name. In the real World, this would not be allowed, as two or more customers might share the same name.

Following Figure shows the E-R diagram for a banking enterprise.

  1. The relation schemes for the banking example used throughout the text are:
  • Branch-scheme = (bname, assets, bcity)
  • Customer-scheme = (cname, street, CCity)
  • Deposit-scheme = (bname, accountil, Cname, balance)
  • Borrow-scheme = (bname, loanh, Cname, amount)

 

Note :  Some attributes appear in several relation Schemes (e.g., bname, Cname). This is legal, and provides a way of relating tuples of distinct relations.

  1. Why not put all attributes in one relation?

Suppose we use one large relation instead of Customer and deposit :

 

  • Account-Scheme = (bname, account#, Cname, balance, Street, ccity)
  • ifa Customer has several accounts, we must duplicate her or his address for each account.
  • If a Customer has an account but no current address, we cannot build a tuple, as we have no values for the address.
  • We would have to use null values for these fields.
  • Null values cause difficulties in the database.
  • By using two separate relations, we can do this without using null values.

 

Relational Keys

Definition

According to E.F. Codd, Date and all other experts, a key has only one meaning in relational theory: “it is a set of one or more columns whose combined values are unique among all occurrences in a given table. A key is the relational means of specifying uniqueness.”

An key is an attribute or collection of attributes that may be used to identify or retrieve one or more records.

The word “key” is much used in the context of relational database.

There are many types of keys in RDBMS :

(1) Candidate key

A candidate key is any set of one or more columns whose combined values are unique among all OCCurrences (i.e., tuples or rows). There can be any number of candidate keys in a table A candidate key is an key that can be used to uniquely identify a record i.e., it may be used to retrieve One specific record So, it becomes a “Candidate” for becoming a primary key.

In Simple words, a candidate key is a attribute (or a set of attribute) which can be selected as a primary key.

Each such attribute (or a set of attributes) is called a candidate key of the relation if it satisfies the following properties:

(a) the attribute or the set of attributes uniquely identifies each tuple in the relation (called uniqueness), and

(b) if the key is a set of attributes then no subset of these attributes has property (a) (called minimality).

 (2) Primary key

There may be several distinct set of attributes that may serve as candidate keys. One of the candidate keys is arbitrarily chosen as the primary key of the relation.

 

“The primary key of a relation is a candidate key that has been designated as the main key.”

The three relations above student, enrolment and subject have degree 3, 2 and 3 respectively and Cardinality 4, 6 and 5 respectively. The primary key of the relation student is student id, of relation enrolment is (student id, subject id), and finally the primary key of relation subject is Subject id. The relation student probably has another candidate key. If we can assume the names to be unique than the student name is a candidate key. If the names are not unique but the names and address together are unique.then the two attributes (student id, address) is a candidate key. Note that both student id and (student id, address) cannot be candidate keys, only one can. Similarly, for the relation subject, subject name would be a candidate key if the subject names are unique.

The definition of a relation presented above is not precise. A more precise definition is now presented. Let D1, D2, …, Dn be natomic domains, not necessarily distinct. R is then a degreen relation on these domains, if it is a subset of the cartesian product of the domains, that is, D1xD2x.xDn. If we apply this definition to the relation student, we note that student is a relation on domains student id (that is, all permitted student id’s), student name (all permitted student names) and address (all permitted values of addresses). Each instance of the relation student is then a subset of the product of these three domains.

 

We will often use symbols like R or Sto denote relations and rands to denote tuples. When we writer C R, we assert that tupler is in relation R.

In the relational model, information about entities and relationships is represented in the same way i.e., by relations. Since the structure of all information is the same, the same operators may be applied to them. We should note that not all relationship types need to be represented by separate relations. For 1:1 and 1:m relationships, key propagation may be used. Key propagation involves placing the primary key of one entity within the relation of another entity. Many-to-many relationships however do require a separate relation to represent the association.

We illustrate the use of key propagation by considering the following E-R model :

 

RELATIONAL DATA STRUCTURE

Let the relationship occupies be many-to-one, that is, one staff may occupy only one room but a room may be occupied by more than one staff. Let the attributes of the entity staff be staff numbers num), staff name (name) and Status. Attributes of the entity room are room number (r_num), capacity, building. occupies has no attributes. Snum and rinum are the primary keys of the two entities staff and room.

 

One way to represent the above database is to have the following three relations :

staff (s. num, name, status)

occupies (s_num, r_num)

room (r_num, capacity, building)

 

There is ofcourse another possibility. Since the relationship occupies is many-to-one, that is, each Staff occupies only one room, we can represent the relationship by including the primary key of relation room in the entity staff. This is propagation of the key of the entity room to entity staff. We then obtain the following database:

staff (s_num, name, status, r_num)

room (r_num, capacity, building)

 

We should note that propagating the key of the entity staff to the entity room would not be appropriate since some rooms may be associated with more than one staff member. Also note that key propagation would not work if the relationship occupies is many-to-many.

In relational databases, the data is not arranged in any particular order in tables. The data in tables requires keys for identification of rows. Each table has rows and columns. Sets of values in a row may describe a particular item such as customers. Consider the following table:

RELATIONAL DATA STRUCTURE

Columns include Customer ID, Name, Street Address, City and so forth. Each column has a different set of information about all customers such as their phone number. This information could be considered Customer characteristics or attributes. Each row has all information about one particular customer. Each row must have a unique means of identification. We could have used the customer name to identify the Customer, but it is possible that more than one customer may have the same name. Therefore, we are using unique Customer identifiers. The Customer ID is the key used to identify individual customers. The column to be used as the key must be identified to the relational database. This is called the primary key.

(3) Foreign Keys

A foreign key is a reference to a key in another relation, meaning that the referencing tuple has, as one of its attributes, the values of a key in the referenced tuple. Foreign keys need not have unique values in the referencing relation.

In the Context of relational database, a foreign key is a referential constraints between two tables. The foreign key identifies a Column or a set of Columns in one (referencing) table that refers to a column or set of Columns in another (referenced) table. The Columns in the referencing table must be the primary key or other candidate key in the referenced table. The values in one row of the referencing columns must occur in a single row in the referenced table. Thus, a row in the referencing table cannot contain values that don’t existin the referenced table (except potentially NULL). This way references can be made to link information together and it is an essential part of database normalization. Multiple rows in the referencing table may refer to the same row in the referenced table. Most of the time, it reflects the one (master table, or referenced table) to many (child table, or referencing table) relationship.

Let’s say we’re running a store with the customers listed above. The first table below lists items for Sale. The second table lists orders that have been placed in the store, and the thirdtable lists items ordered for an individual Order.

RELATIONAL DATA STRUCTURE Database Management System

 

RELATIONAL DATA STRUCTURE Database Management System

 

RELATIONAL DATA STRUCTURE Database Management System

If we want to print out what items customer 234902 ordered we do the following:

  1. Go to the orders table and get all orders customer 234902 ordered.
  2. For each order found (1 order #190389575). The order number is considered to be a foreign key to another table and Column which is to table “Individual Order 1903895.75” and Item ID.
  3. Go to the “Individual Order 1903895.75” table and get each itemID that was ordered. Each item ID is a foreign key to the “Store items” table and Item ID Column.
  4. Go to the store items table, find each itemID and print out each item attribute such as price, description, and quantity purchased (from the individual order table).

Therefore, foreign keys are used to access data in related tables.

 

(4) Other types of Keys

Super key

A Super key is a column or set of columns that uniquely identifies a row within a table.

Example :

Given table: EMPLOYEES{employee id, firstname, surname, salary}

Possible superkeys are:

  • employee id}
  • employeeid, firstname}
  • (employee_id, firstname, surname, salary)

Only the the minimal Superkey – employee id}- will be considered as a candidate key.

A Superkey is a Combination of attributes that can be uniquely used to identify a database record. A table might have many Superkeys. Candidate keys are a special subset of Superkeys that do not have any extraneous information in them.

Examples: Imagine a table with the fields <Name>, <Ages, and <Phone Extension>. This table has many possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those listed, only is a candidate key, as the others contain information not necessary to uniquely identify records

A Superkey is defined in the relational model as a set of attributes of a relation variable (relvar) for which it holds that in all relations assigned to that variable there are no two distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relvar upon which all attributes of the relvar are functionally dependent.

  1. The notions of Superkey, Candidate key and primary key all apply to the relational model.
  2. For example, in Branch-scheme,
  • {bname} is a superkey.
  • {bname, bcity} is a superkey.
  • {bname, bcity) is not a candidate key, as the Superkey (bname} is contained in it.
  • {bname} is a candidate key.
  • {bcity} is not a superkey, as branches may be in the same city.
  • We will use {bname as our primary key.
  1. The primary key for Customer-scheme is {Cname}.
  2. More formally, if we say that a subset K of R is a superkey for R, we are restricting consideration to relations r(R) in which no two distinct tuples have the same values on all attributes in K. In other words,
  • lft1 and t2 are in r, and
  • t1 # t2,
  • then t1 [K] # t2 [K]

Alternate key

An alternate key is any candidate key which is not selected to be the primary key.

Compound key

A Compound key (also called a composite key or concatenated key) is a key that consists of 2 or more attributes.

Secondary key

A secondary key is a data field that is used for data searches and retrieval.

A secondary key is any column (or a set of columns) which is used for searching records from the database. For example: in a STUDENT table (Roll No, Name, DOB, City) Roll No is primary key and we want Search students belonging to “Meerut” city then in the SOL query only city field will be used in “where clause” and there will be no use of rollino. So, for this query “City” is secondary key.