Data Fragmentation | Database Management System

Data Fragmentation | Database Management System

Data Fragmentation : Data Fragmentation implies dividing a Relation r into n Fragments r1,r2,…, ra. These fragments contain sufficient information to allow reconstruction of the original relation r. There are two schemes of fragmentation:

(a) Horizontal Fragmentation: A relation r is partitioned into a number of subsets such that each tuple of the relation belongs to at least one of the subsets. Ex. Relation account may be divided into fragments based on branch

 

And so on…..

Horizontal Fragmentation is usually used to keep the tuples of a relation at sites where they are used most, to minimize data transfer. It is defined as a SELECTION on the Global relation r, i.e., r = o(r)

Vertical Fragmentation: Vertical Fragmentation of relation r(R) into n fragments implies: (i) Defining n subsets of attributes i.e., R1, R2,…, Rn, of the schema R such that R= R1 U R2 U R3 U… U Rn and

(ii) Defining in fragments such that fragment r is defined as ri = TRi= (r)

The Fragmentation is done in such a way that original relation r can be reconstructed by the natural join of the fragments i.e.,

r = r1 x r2 x r3 x … rn

One way of ensuring that r can be reconstructed is by including primary key attributes in each of the Sub-Schema. OR a special attribute called ‘tuple-id” is added to the schema, which has a unique value for each tuple and forms part of all Sub-Schema. Tuple-id enables join of the sub-schema.

Transparency: The user of a distributed database system should not be required to know where a data item is stored physically and how that is to be accessed at a local site. Transparency can be of various types like:

(a) Fragmentation Transparency: Users are not required to know how a relation is fragmented.

(b) Replication Transparency: User is not concerned about replication of relations and where all replicas are stored.

(c) Location Transparency: Users are not required to know the physical location of data. Distributed Transactions: Transactions in a Distributed Database System can be classified as:

  • (a) local Transaction : A transaction, which only the database stored at the site where the transaction was initiated, is called a Local Transaction.
  • (b) Global Transaction : A transaction, which accesses database at sites which are different than the site where the transaction was initiated, or which accesses the database at Several sites, is called Global Transaction.

System Structure for Distributed Transaction Processing

Each site has a local Transaction Manager, whose function is to ensure ACID properties of those transactions that execute at that site. Various Transaction Managers cooperate to execute Global Transactions. Abstract model for this has a Transaction Manager and Transactions Coordinator with the broad functions as follows:

(a) Transaction Manager : It manages execution of those transactions (or sub transactions) that access data stored in a local site. It is responsible for:

  • Maintaining a log for recovery purposes.
  • Participating in an appropriate concurrency-Control Scheme to coordinate the concurrent execution of transactions executing at that site.

(b) Transaction Coordinator : it coordinates the execution of transactions (both local and global) initiated at that site. For each Such transaction, the Coordinator is responsible for:

  • Starting the execution of the transaction.
  • Breaking the transaction into a number of sub-transactions and distributing these Sub-transactions to the appropriate sites for execution.
  • Coordination the termination of the transaction, which may result in the transaction being Committed at all sites or aborted at all sites.

Distributed Database System Failure Modes

A distributed System will have all failure modes which are common to centralized systems. In addition, it will have following types of failures:

(a) Failure of a Site.

(b) Loss of messages.

(c) Failure of a Communication Link.

(d) Network Partition. A System is said to be partitioned if due to some failures it splits into two or more Subsystems called partitions.