HEPATITIS B DISEASE DIAGNOSIS USING ROUGH SET HEPATITIS B DISEASE DIAGNOSIS USING ROUGH SET

be very useful for the analysis of the decision problems concerning objects described in a data table by a set of condition attributes as well as a set of decision attributes. In order to make efficient data analysis and suggestive predictions in a case of the data of patients suffering from viral hepatitis were used to predict a probability of their death or serious disability. This paper also demonstrates an extension of the Rough Set methodology for reducing number of input data in order to increase prediction accuracy without loss of knowledge.


Introduction
The Rough Sets and their theory have been developed as a way of dealing with incomplete sets of information in the early eighties by Zdzislaw Pawlak. The Rough Set Theory has led to many interesting applications and extensions. The theory is in a wide spread used in a scientific world and now it is one of the fastest growing methods of artificial intelligence. As the author of the theory stated [1] it seems that the Rough Set approach is fundamentally important in artificial intelligence and machine learning, especially in research areas such as pattern recognition, cognitive sciences, mereology, decision analysis, intelligent systems, expert systems, inductive reasoning and knowledge discovery [2].
The Rough Sets, as the name suggests, are the sets defined on the discrete split place. The space is discretised by the definition of the elementary set and its size depends on the level of space approximation. The items in the elementary set have interesting features; they are indiscernible among themselves and each of them has all characteristic properties typical for the whole set. The membership function takes a set of values corresponding to the number of groups to which the item is added: 1 -if the element belongs to class 1, 2-if the item belongs to class 2, and so on; the value 0 is assigned to those items which are not classified, that is for those ones we cannot determine the group they belong to.
Basic operations on the Rough Set [1] are the same as the operations on classical sets, for example: The information system is defined as I ϭ (U, A), where U is a finite, non-empty set of objects called a universum and A is a finite, non-empty set of attributes such that ᭙a ʦ A : U → V a . V a is the set of values that attribute a may take. The information table assigns a value a(x) from V a to each attribute a and object x in the universum U.
The indiscernibility relationship of the x and y is written in the form (x is in indiscernibility relation to y in the set of B-attributes), which means the elements x and y have the same values of attributes in B. In other words, owing to the set of attributes in B, the elements x and y cannot be distinguished between each other.
For each sub-set of features B ʕ A there is association with an indiscernibility relation:

Lower approximation B(X)
____ is the complete set of objects in U which can be certainly classified as the elements in X by using the set of attributes B. It is the largest subset of B contained in X.
is the set of elements in S that can be possibly classified as the elements in X.
The B-boundary of X in the information system I, is defined as: The most important properties of the Rough Set are shown in Fig. 1.
The reduct presents minimum attributes subset that keeps the degree of dependencies attributes to the conditional attributes. It is subset R ʕ B ʕ A such that X B (X) ϭ X R (X) and is noted by The intersection of all reducts is called a core. It cannot be removed from the information system without deteriorating the basic knowledge of the system. Thus, none of its elements can be removed without affecting the classification power of attributes. The set of all indispensable attributes of B is called the X-core. Formally, The parameter characterizing the Rough Set numerically is the accuracy of the approximation and it measures how much the set is rough. If a set has B _ (X) ϭ B _ (X) ϭ X, the set is precisely called the crisp and for its every element the relationship: x ʦ X ʦ U is valid. It is represented by the formula: where Card|X| denotes the cardinality of X л.
When 0 Յ μ B Յ 1, and if μ B ϭ 1 then X is a crisp in respect to B.
Additionally, several new concepts were introduced by Ziarko and Shan [3], [4]. They distinguish in an information system two disjoint classes of attributes, called condition and decision attributes.
The number supp x (C, D) ϭ |A(x)| ϭ |C(x) ʝ D(x)| is called support of the decision rule C → x D and the number where |X| denotes the cardinality of X.
The certainty factor of the decision rule is denoted cer x (C, D) and defined as follows: where . The certainty factor may be interpreted as a conditional probability that y belongs to D(x) given y belongs to C(x), symbolically π x (C|D). If cer x (C, D) ϭ 1, then C → x D is called a certain decision rule; if 0 Ͻ cer x (C, D) Ͻ 1 the decision rule is refered to as an uncertain decision rule.
The coverage factor of decision rule is denoted cov x (C, D) and defined as follows: where .
The inverse decision rule is denoted D → x C and it is inversion of decision rule C → x D. It can be used to give explanation (reason) for a decision.

Data Pre-processing
The attribute reduction is very important in the rough set-based data analysis -according to Smolinski [5], [6] it improves the efficiency of the predictor itself and cuts down the time needed for the future data processing.

A. Attributes filtering
The way to improve the predictor is to select the most important attributes. Following Aboul ella Hassanien [7], in common practices a domain expert's opinion is required to set the data importance, but sometimes the problem is too complicated for a single expert. In that case, according to Slezak et al. [8], a filter (selecting algorithm) is needed to check each attribute significances and influences on the whole result, and then to create a new set of attributes as a linear combination of the weighted sum of selected ones. The original aim of the presented method is to create another attributes to adjust the classification result. Here I suggest using my own algorithm, which filters the attribute set and leaves only those attributes which possess the weight over a specified level -similar to

Fig. 1 A graphical representation of a Rough Set environment
Wroblewski's Classification Algorithms [9]. The algorithm repeatedly check accuracy and coverage rules by the calculation of the different cuts result. The rules are created by LEM2 algorithm which proved to have the best result in the experimental data. As stated by Polkowsky [10], the exhaustive selection algorithm test shows significantly lower accuracy of the created rules and therefore was omitted in further experiments.

B. Attributes reduction
Attributes reduction is done in 2-steps. The first one creates a reconstruction of a decision table. The second one computes the optimal reduct for data analysis. Thus the knowledge is exquisite by continuous dataset discretisation. Discretisation of the dataset is the process of reducing the domain of a continuous attribute with an irreducible and optimal set of cuts, while preserving the consistency of the dataset classification. The basic idea of the Quick Reduct Algorithm (QRA) is based on the fact that the discernibility matrix (table) DM and the reduct Red cannot be empty for any items intersection. The object of matrix i and j would be indiscernible to the reduct, if there are any empty intersections between items c ij with reduct, this contradicts the definition that reduct is the minimal attribute set discerning all objects. As X.Hu notices [11], the frequency of attribute is used as heuristic and makes it applicable to the optimal rule generation. A QRA reduct set Red ϭ φ, then sort the discernibility matrix |c ij | and examine every items of discernibility matrix |c ij |. In case that their intersection is empty the shorter and frequent attribute is picked and inserted in Red, otherwise the entry is skipping. A shorter and frequent attributes contribute more classification power to the reduct. If there is only one element in |c ij |, it must be a member of reduct. The procedure is repeated until all entries of discernibility matrix are examined. Finally, QRA get the optimal reduct in Red. According to Thangavel et al. [12], the discretisation improves classification of unseen objects. The algorithm used for a data reduct is presented below.
The input is I ϭ (U, B ʜ {d}), B ϭ ʜ b i , i ϭ 1 … n. The count(b i ) sums up frequency of the attribute computing by f(b i ), DM is decision matrix, |c| is cardinality of c, d is the decision. The output is the optimal reduct Red.

A. Data
In order to present the proposed method of data processing, let me consider an example of UCI repository [13] -dataset of patients records. It was donated by Josef Stefan Institute in Ljubljana. Hepatitis (in Greek) means 'liver' and the suffix -itis denotes 'inflammation' of the liver and may be due to infectious or non-infectious causes. As stated by Worman [14], the five types of hepatitis viruses are common infectious causes of the liver inflammation, and some of them such as hepatitis A (HAV), B (HBV) and C (HCV) are more frequently seen as the infectious agents. The inflammation may lead to death of the liver cells (hepatocytes) which severely compromises the normal liver function. An acute HBV Infection (less than 6 months) may resemble the fever, flu, muscle aches, joint pains and general being unwell. The symptoms specifying those states are: dark urine, loss of appetite, nausea, vomiting, jaundice, pain up the liver. Chronic hepatitis B is the infection persisting more than 6 months, the clinical features of that state correspond to the liver dysfunction, so the following signs may be noticed: enlarged liver, splenomegaly, hepatosplenomegaly, jaundice, weakness, abdominal pain, confusion and abdominal swelling.
The dataset of patient's probability of survival is used in the given example. The dataset contains 155 records of which 32 patients die and 123 survive. There are 20 attributes (including the class attribute) -14 nominal and 6 numerical. All the symptoms found in the patient's record are the following: 20. HISTOLOGY: {no, yes}

B. Application of the Rough Set Theory
During the experiment, the data was divided randomly into two datasets in the rate of 50 : 50 % by Orthogonal Array-Based Latin Hypercubes (OABLH) method [15]. In Orthogonal sampling, the sample space is divided into equally probable subspaces. All sample points are then chosen simultaneously making sure that the total ensemble of sample points is a Latin Hypercube sample [16] and that each subspace is sampled with the same density. The first dataset (T) is applied for train the algorithms and the second one (C) is used for the classification and the rules estimation. The results do not depend on the dataset division. The test dataset (C) has 77 records, of which 12 patients died and 65 survived. The train dataset (T) has 78 records, of which 20 patients died and 58 survived.

C. Number of attributes reduction
The result of the approximate reduct is {BILIRUBIN, ALK_ PHOSPHATE, SGOT, ALBUMIN, PROTEIN}. In this case we have numeric attributes only, but the approximate reduct can be a combination of any available attributes. The attribute reduction is (20 Ϫ 5) / 20 ϭ 75%. The train dataset (T) was used for selection of classification rule and then the rules were applied to the test dataset (C). The result is shown in Table II. We can observe the increase of accuracy after reducing the number of attributes. The same is shown in the confusion matrix in Table 1.
The confusion matrix [17] is a specific table layout that allows visualization of the performance of an algorithm. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. The matrix also shows the overall accuracy of the classifier as the percentage of correctly classified patterns in a given class divided by the total number of classified patterns. The overall coverage is the percentage of whole classified patterns divided by the total number of patterns. The specificity measures the proportion of messages that are negative of all the messages that are actually negative. The sensitivity is the proportion of messages that are positive of all the messages that are actually positive. In general here, Sensitivity means the accuracy on the class Negative, and Specificity means the accuracy on the class Positive.

D. Decomposition of attributes value
After the subset of attributes was created, another algorithm is used to generate decompositions of attribute value sets. As Bazan et al. suggested [18] the decomposition may be done by discretisation of numerical attributes or by grouping (quantisation) of nominal attributes. The decomposition algorithm indicates the following division: BILIRUBIN into intervals: The result of the test of the decision rules created after discretisation can be found in Table III. The global accuracy increased by about 2.85 % but however, the global coverage decreases by about 2.57%. It means that the rules can better classified unseen cases. The confusion matrix after decomposition is shown in Table  2.

E. Results and discussion
In this study, hepatitis disease diagnosis was conducted by the use of a novel medical decision support system based on the Rough Set Theory and modification of filtering / reducing algorithm. The obtained maximal diagnostic accuracy is 94% and effective 93.75% using LEM2 algorithm to generate decision rules. Sensitivity and Confusion matrix for training dataset (T) specificity for the hepatitis disease dataset were obtained as 0, 100, 77.78, and 87.50%, respectively. Only half of the data was used as a training set, which shows the underestimated power of that solution for the future use in medical data analysis. Moreover, the sensitivity and specificity values for the hepatitis disease dataset were obtained as 92.50%, 92.31% and 95.83%, respectively. These obtained values are shown in Table 3. It contains four columns indicating accuracy and coverage of four different steps of the algorithm: initially, after filtering, after reduction and after decomposition. Similar misclassification occurs after filtering (94%) and after decomposition (93.75%). Table 2 presents confusion matrix, which shows the misclassification of the rules. In the confusion matrix, each cell contains the raw number of examples classified for the corresponding combination of desired and actual network outputs. By combining the Rough Set and filter / reduce algorithm modification, the obtained classification accuracy is the highest among classifier reports found by Polat and Gune [19] in literature. In the view of classification accuracy, Table IV shows my accuracy classification methods with comparison to other methods.
A new medical diagnosis system gives very promising results in classifying the healthy and ill patients suffering from the hepatitis disease. I propose a complimentary system that can be implemented into the medical diagnostic devices. The benefit of the system is to assist the physician to make the final decision without hesitation.

Conclusion
It was proved that the proposed algorithm is capable of confirm the people suffering from viral hepatitis -based on the real bio-metric data. Further work can lead into increasing overall algorithm accuracy and deeper data analysis as well. Combining the Rough Set Theory and modified pre-processing algorithm revealed some possibilities of their use in many other domains.
Literature example of classification accuracies