ALLOCATION FRAGMENTS OF THE DISTRIBUTED DATABASE ALLOCATION FRAGMENTS OF THE DISTRIBUTED DATABASE

The paper describes the distribution fragments of the database under a mathematical model with criterial function involving the influence of the Transaction and Concurrency proccesing in Database systems. The model could solve variants for replication of the fragments setting constraints in the model. This approach is prepared for revision of actual distribution using real values of the database (cardinality of tables, referential integrity, important requests etc).


Introduction
The design of a distributed database system involves making decisions on the placement of data and programs across the sites of a computer network. In distributed database systems the main problem of distribution is the data distribution.
The Database Allocation Problem (DAP) model dates back to the mid-1970s to the work of Eswaran (1974) [Eswaran75], Levin and Morgan (1975) [Levin75], and others. One of the best is described precisely in [Ozsu91]. DAP has been studied in many specialized settings. In 1975 Eswaran [Eswaran75] proved the simple file allocation model as NP-complete. All known solutions of the allocation were solved with heuristic algorithms. For an allocation model we need to know: database information, site information, network information and set of constraints. Each of them defines the set of parameters for the allocation model. The cost unit will be a/the time unit.

Database information
We need to know: The set of fragments, [Matiasko02] The size of each fragment, The selectivity of each fragment, The read access, The update access, The read polarization, The update polarization.
The size of fragment. The size of the fragment F j is given by where length(F j ) is the length in bytes of one tuple of fragment F j , card (F j ) is the cardinality of the fragment F j and it is number of tuples in the fragment.
The selectivity of the fragment The selectivity of the fragment F j is given by seli(F j ) where it is number of tuples of F j that need to be accessed in order to precede q i .

Read access
Read access f r ij is the number read access (frequenting of requests) that the query q i makes to a fragment F j during its execution[Matma99a, Matma99b].

Update access
Update access f w ij is the number update access (frequenting of requests) that the query q i makes to a fragment F j during its execution.

Polarization read access
Polarization read access r ij is the localization the fragments in the query where r ij ϭ 1 if the query q i reads from the fragment F j r ij ϭ 0 if the query q i does not read from the fragment F j

Site information
For each site of the computer network in Slovakia we need to know: • set of the clients computers C jk and the set of the queries q i running on the these clients' computers, • storage capacity, • processing capacity.
The unit cost of storing data at site S k will be CM k . The costs of processing one unit of work at site S k will be CP k The work unit should be identical with read and update access.

Network information
For the network we need to specify the communication cost. c ij denotes the communication cost between site S i and S j . This cost depends on the protocol overhead, distances between sites, channel capacities, etc.
For each query q i it is necessary to solve the simple decomposition operation.

Decision variables
The decision variable is xij, and it is binary.
where ND i is the query processing cost of application q i NM jk is the fragment storing cost of fragment F j on the site S k The storage costs are given by and the two summations give us the total storage costs at all sites for all fragments of the computer network.
The query processing costs are given by where NDB i is database-processing cost for the application q i NT i is transmission cost for the application q i The processing costs are given by where NRW i is the access cost for the query q i to fragment F j NIC i is the integrity and concurrency enforcement cost for the query q i to fragment F j The access costs are given by The summation gives us the total number of update and read accesses for all fragments referenced by the query q i . Multiplication by CP k gives us the cost of this access at site S k .
The NI cost and NC cost can be specified much like the processing component and depend on the actual computer, operating system, database system and the set of queries performed on the actual site of the computer network.

The transmission cost
The transmission costs are different for read and for update access. If the update request exists, it is necessary to make it on all sites where replicas are situated. For read access we need read only one of the copies.
The transmission cost for the query q i is given by where the first term is sending the update message to the originating site i of q i , to all the fragment replicas that need to be updated. The second term is for the confirmation. [Matgr98] The value w i,k is the value of the transmission time for sending the request or answer message from the origin site of the query q i to the site S k .
For w z(i),k we suppose w z(i),k (F j ) ϭ length(F j )/V z(i),k z(i) is the assignment the origin of the query q i The retrieve component NTR i of the transmission is  A data model and data of information system of our university were used for the experiments with allocation. For computation as a data sample, data of 20 real applications from the information system of our university were used, which was working on five database relations and fragments allocation to five nodes of the university network. Two of these were used on the remote campuses in Prievidza and Ružomberok, and the others were used within the campus in Žilina.

Experiments
The sets of fragments F ϭ {F i } were defined, where particular fragments corresponding with relations or fragments of relations under the following data model: • Relation Student is horizontally fragmented by study town to F 1 is relation StudentZA

is relation StudentRB
• Relation Person is horizontally fragmented by derived fragmentation by joining relation Student, with a study town to F 4 is relation PersonZA F 5 is relation PersonPD F 6 is relation PersonRB • Relation Education is horizontally fragmented by derived fragmentation by joining the relation Student, with a study town to F 7 is relation EducationZA F 8 is relation EducationPD F 9 is relation EducationRB • Relation Course is fragment Crepresents static part of database.

Applications:
As a set of application A ϭ {a i } we prepared 10 of the most typical selections and 10 of the most typical destructing operations from our university information system which created an experimental base for verification functionality of allocation for various counted variants. a 1 a 2 a 3 , a 5 -selection form F 1 F 2 F 3 a 6 -selection form F 4 F 5 F 6 a 7 -selection form F 3 F 7 F 9 a 8 -selection form F 1 * F 4 F 2* * F 5 F 3* * F 6 a 9 -selection form F 7 * F 10 F 8 * F 10 F 9 * F 10 a 10 -selection form F 10 a 11 -a 20 update in the fragments F 1 to F 10 where is operation UNION The values of monitored features were measured during a normal running of the information system. These features represented frequentations of nondestructive operations, selection of particular fragments, response times between a workplace of the network, size of relations of particular fragments and duration of elementary operations. First experiment presents the basic variant. The main goal is searching the suboptimal solution of the one level fragmentation. One-level fragmentation means that each fragment will be used only one time. The best allocation of the fragments is illustrated in Fig. 1.
The objective function for this variant has the value of 878202. This result shows that most fragments are allocated to the workplaces, which provides minimal cost considering transmission speed in the network.
We prepared an intuitive allocation, which related with the method BestFeed [Ceri84]. In this variant every fragment is situated to that workplace, under its maximal query frequency. If we suppose no destructive operation, the objective function enhances to the value 783035 and another fragment allocation - Fig. 2. According to the results the centralized variant would be the best as allocated fragments on the node S 4 with objective function value 953792.  When we treat the nonfragmented variant, in which the fragments F 1 , F 2 , F 3 collect one fragment, allocated always on the one node, and by the same way fragments F 4 , F 5 , F 6 and fragments F 7 , F 8 , F 9 then the cost for distribution has the value of the objective function 1000908 -(Tab.3).

The result for the nonfragmented variant
Tab.3 When we compare the result, which we get for the fragmental variant, it is different from the optimal value by 12 percent. In another case of this variant we watched the change of the value of the objective function (N2) when the number of the destructive operations is constant and the number of the nondestructive is changed, as in the previous variant, in each step by 10percent. The objective function value is improved by 50 percent of the number of nondestructive operations. DN is the difference between variant costs and optimal costs.

Conclusion
Development of information technology allows development of information systems effectively and in harmony with organization structure of firms. Therefore, distributed database systems are the tools that are helpful for the development of those systems. But designing the data model for a distributing database system is always challenge from the fragmentation database to the allocation the fragments or all databases, regardless of the available conditions.