DIAGNOSTIC RULE MINING BASED ON ARTIFICIAL IMMUNE SYSTEM FOR A CASE OF UNEVEN DISTRIBUTION OF CLASSES IN SAMPLE DIAGNOSTIC RULE MINING BASED ON ARTIFICIAL IMMUNE SYSTEM FOR A CASE OF UNEVEN DISTRIBUTION OF CLASSES IN SAMPLE

Problem statement Let us assume that there is a training set , S P T 1 2 = , where P is a set of input parameters (features) of an objects and set T is a set of values of the output parameter. Set P is represented as a matrix P p qm QM = ^ h , where p qm is a value of the m -th feature of q -th instance in the set S . Variable m is a feature of the object ( m = 1, 2, ..., M ). Variable q is a number of instance (object) in the sample S ( q = 1, 2, ..., Q ). Value M is measure of cardinality of features of set S ; Q is a cardinality of instances on set S . Set of values of the output parameter is represented as a vector T t q Q = ^ h , where t T q ! l is a value of the output The problem of development automation of classification rules synthesis on the basis of negative selection in the case of uneven distribution of classes in the sample is solved. The method for the synthesis of classification rules on the basis of negative selection in the case of uneven distribution of class instances of sample is proposed. This method uses a priori information about instances of all classes of the sample. The software implementing the proposed method is developed. Some experiments on the solution of practical problem of gas turbine air-engine blade diagnosis are conducted.


Introduction
Process of building decision models for non-destructive testing, for technical or medical diagnostics, and for pattern recognition is a topical task [1][2][3][4]. The situation when most data of a training set belong to one class is typical for such a process [5 and 6]. We have to develop new models for object formalization or process descriptions. One of the perspective approaches for developing such models is based on conception of artificial immune systems [7][8][9]. This model can be created based on one class. The difference between numbers of instances belonging to different classes is significant in this case. Then the usage of artificial immune systems with negative selection is proposed in [10][11][12][13]. These systems involve the construction of a set of detectors (computational elements) that are capable of recognizing unknown instances [14][15][16]. This approach allows to detect anomalies or random variations in diagnosed objects [7 and 10], and to recognize instances of "non-self" classes (classes of objects which are not represented in the training set) [8, 12 and 15]. There are well known methods for the synthesis of artificial immune systems based on the negative selection [8][9][10][11][12][13][14][15][16]. These methods generate an exhaustive number of detectors (the possible solutions) and employ instances with one class only. Instances with other classes are not taken into account. Moreover, these methods have got high requirements for computing resources.
Consequently, the development of methods for the synthesis of artificial immune systems on the basis of negative selection, which are free from these disadvantages, is a topical problem. In addition, the diagnostic models based on artificial immune systems have a low level of generalization. The detectors (rules) of the immune system are easy to understand. However, because of the low level of generalization, a detector system has a large dimension. It is difficult to understand and analyze by human, which generally leads to reduction of interoperability of the diagnostic model.
So, the purpose of this paper is to develop a method of classification rules synthesis on the basis of a set of detectors. These rules handle data of a training set with a significant difference in the number of instances which belong to different classes.

Problem statement
Let us assume that there is a training set detector. It is a contrast to known methods of negative selection, in which a hypersphere is used as a form of detector. This hypercube allows to eliminate the necessity of solving a resource intensive problem of search of optimal radius of hyperspheres of detectors.
Evaluation of the significance of features p m with respect to the output parameter T is the initial stage of the proposed method. It allows to identify and to exclude irrelevant features from further consideration, thereby reducing the search space and time of the method.
As noted above, in this paper we consider the problem in which the initial sample ,. Therefore, to estimate the significance V m of features p m it is advisable to apply different criteria. These criteria allow to carry out an assessment of significance of features with respect to a discrete output parameter T [2, 4 and 17-22]. We propose to use entropy as an essential criterion [4 and 17]. The entropy reflects the degree of uncertainty of the state of the object. This criterion is calculated as: probability that the value of the output parameter T is equal to t l (will be in l-th interval t l ) provided that the m-th feature p m gets to the n-th interval p mn ; , h is a number of instances of the sample S, whose value of output parameter T is equal to t l (belong parameter of q-th instance; T l is a set of possible values of the output parameter (usually in problems of non-destructive quality control and pattern recognition a set T l consists of two elements , ,, determining class of suitability of object, such if The number of instances of the sample of one class (for example, instances of the class t t q 1 = l ) is significantly different from the number of instances of another class, which is defined by (1): where Nt t (2)

The method of classification rule mining based on negative selection
As noted above, the known methods of negative selection [8][9][10][11][12][13][14][15][16] have got such disadvantages as the generation of the exhaustive number of detectors, the usage of information of one class instances only, low interoperability of synthesized set of solutions of detectors etc. In addition, most methods based on the principles of negative selection as detectors used hypersphere with a fixed radius. This radius determines the area of feature space covered by the detector. The choice of the radius of the hypersphere-detector is a very complex task. It can be explained by the fact that for large values of the radius recognition accuracy is reduced and for low values the number of generated detectors increases. It lowers the generalization properties of the synthesized model in a form of set of detectors of artificial immune network.
These disadvantages necessitate the extension of essential requirements to the computer resources. It decreases the speed of solutions search and in some cases does not allow to find an acceptable solution. To eliminate these drawbacks it is advisable to use the method of classification rules synthesis on the basis of negative selection in the case of uneven distribution of instances of the sample classes. In this method is used: The value rnd is calculated as a randomly generated number between [0;1): rnd = rand[0;1). The volume of the hypercube Ab k decreases based on this transformation. It is explained that an instance s q is located outside the space described by the candidate for detector Ab k . Ratio n h is defined by user as a parameter of the method. We define this ratio in the range of ; 0 1 n ! h^@. This ratio influences the distance between instances of the sample S 1 and hypercube detector Ab k directly. The higher value of the ratio corresponds to the greater distance.
Candidate detector Ab k , is re-checked with every instance in the sample S 1 by the condition (5) after conversion of the boundary values of one of the features. When the condition (5) is true then we have to re-transform the boundary values of one feature of the candidate Ab k . This process will be repeated until the condition (5) is fulfilled.
We search examples s q in set , S P T t is created by using the principles of negative selection. This set allows determining the identity of instances s q to the same class with high accuracy [10][11][12][13][14][15][16]. So, we use expressions (6) and (7) as to l-th interval of the range of change t l ) with the proviso that the value of m-th feature belongs to n-th interval p m .
Features p m with values of the individual significance below minimum V Vmin m 1 h , are considered uninformative and are excluded from the sample We propose to estimate the relationship of features as the significance of one of them in relation to another. It allows to identify groups of interdependent features. Only one highly informative feature is saved in each of these groups. Other features can be excluded from further consideration. Because these features are redundant, they complicate the synthesis of diagnostic models and reduce their interoperability. The evaluation of the significance V md is provided by the use of the entropy in (3). We believe that one of the features p d is output parameter T (interval of feature values, which is considered as an output parameter p d . This interval is split into N int (T) discrete intervals). After that the analogous features are excluded from the sample S (if the value of the mutual significance V md is more than the maximum permissible V md >V max ).
Next, a set of detectors (structures that can determine whether the estimated instance belongs to a particular class) is built. The use of the principles of negative selection detectors for the class T t1 = l l can detect unknown instances. Some of these instances do not belong to the relevant class t1 l [9, 11 and 13].
So, we form a set of detectors, which takes two output parameter values t1 l (class "self") and t0 l (class "non-self"). It is necessary for making samples S 0 and S 1 from We are using instances belonging to classes t1 l and t0 l . , S P T t 1 1 ..,M. Q 1 is the number of instances in the sample S 1 . This first candidate in detectors Ab 1 is presented in the form of a hypercube. Set AB 1 of detectors AB k is formed based on the set of "self" instances S 1 and it allows to detect "non-self" instances, i.e. those instances that do not belong to the class t1 l .
We detect correspondence of every n-th instance s q in the sample , S P T t As the metric for estimation of the coordinates of detectors Ab k (0) we used average detector length that can be considered as normalized Manhattan distance. The size of their hypercube edges correspond to the same sizes of detectors created on the basis of the sample S 1 .
Consequently, the coordinates of the detector are determined using (8) and (9): where Abm T Then, the comparison of the generated detectors Ab k (0) to the instances of sample S 1 is performed by (4). If the condition (5) is true then the detectors Ab k (0) are converted similarly to the above stage of additional training. Then the value of one of criteria G(Ab k (0) ) is calculated. This criteria estimates the ability of detector to generalize the data. If its value is above the threshold, the detector Ab k ; The right part T r contains the value of the output parameter T when performing an r-th set of conditions P r (11).
When generating a set of rules PR to the rules P r antecedently, we will include only the boundaries of evaluation criteria of ability of the detector Ab k for generalization of the data: Criteria (6) and (7)  . Problems with the creation of detectors, which adequately reflect the space of instances S 0 , may occur in these tasks. In particular, the detectors Ab k in a form of hypercube with too large volume can be generated. These detectors cannot be able to summarize the data adequately. This is due to the insufficient number of instances in the sample S 0 (Q 0 <<Q 1 ). Therefore, we propose to calculate information on the size of the detectors which built on the basis of sample S 1 . This information will also be used for generation of detectors for instances S 0 .
These detectors will show information about the presence of instances of the sample S 0 in the hypercube. Information about their absence is not shown in this case. It is an essential difference of the proposed method from the approach based on classic negative selection. These detectors will be fully consistent with the search space based on these characteristics. A set consisting of 80 artificial features was obtained based on this reducing.
The resulting sample Experimental investigation of characteristics of the proposed method has been compared with other methods of negative selection.
The first part of experimental results is given in Table 1.  This table contains   then the instance corresponds to the class of "non-self" T t1 ! h.
The upper limit of feature p 2 and feature p 3 has not been included in the rule explicitly. Because the corresponding values of the detector do not affect the quality of recognition. In addition, the exclusion of such values from the rule PR k reduces its complexity. Interoperability and comprehensibility of the rule will be increased from the other side. So, classification rules based on each detector Ab k are constructed. So, a set PR of N Ab classification rules PR P T r r r " | has been created with the using of proposed approach.
The proposed method for the classification rules synthesis is based on negative selection approach. This method is oriented to the case of uneven distribution of class instances of sample in generating a set of detectors. The proposed method uses known information about instances of all classes of the sample. It also takes into account information about the individual significance of features. A hypercube of maximum possible volume is used as a form of detector. It allows to exclude irrelevant and redundant features from the sample, thereby reducing the search space and time of the method implementation. As a result, a set of detectors with high approximation and generalization capability is formed.
The proposed method increases the generalizing properties of the synthesized model by reducing the number of detectors and conditions of antecedents. This method improves interoperability of the model, reduces its dimension (structural and parametric complexity) and volume of the used memory. All of these improvements increase the model performance with sequential computation.

Experiments and results
A computer program has been developed for implementation of the proposed method of classification rule synthesis based on negative selection. This software is oriented to the verification and analysis of different characteristics of this method. This software deals with a blade diagnosis of an aircraft engine gas turbine [23]. The blades of gas turbine were characterized by the values of the power spectra of damped oscillations after impact excitation. These values of the the power spectra are used as input features. Classes of blade quality were defined with the help of experts: undamaged and defective (potentially dangerous). Each blade was described by 10240 characteristics of the power spectrum of damped oscillations. Artificial features were constructed to reduce undamaged blades. Part of the sample S S 1 3 h was used for constructing the third model based on a set of detectors using the method MMD. We had to do it because this method deals with instances of one class only. Some criteria of these models have been calculated for analysis. The results of calculation are given in Table 2. We analyzed the next criteria of such classification models: • N param is a criterion determining the parametric complexity of the model. This criterion is calculated as the number of model parameters. In particular, the total number of parameters Ab km min and Ab km max for the first and the third models; and the total number of adjustable parameters (weight coefficients) for the second model; • criteria E, E t , P , / t t t t t q q 1 0 = = l l and P , / t t t t t q q 0 1 = = l l were described in Table 1. Table 1 shows that the misclassification error values E produced by the method MMD [16]  = l (defective instances).

Discussion
The second part of experimental investigation compared the proposed method with other classification methods. The abovedescribed problem of blade diagnosis of a gas turbine of aircraft engines [23] was solved by these methods. Several different models of this problem were obtained based on these methods. We analyzed the next classification models: • a model in form of classification rules synthesized by the proposed method MPRSBNS; • a feed-forward neural network which has been trained by using error back-propagation. This network consists of three layers of neurons. The first layer of the neural network has five neurons, the second has three neurons and the third layer has one neuron only. Neurons in the first and the second layers have got a logistic sigmoid activation function. In the third layer a single neuron has got a threshold activation function; • a model in form a set of detectors constructed by the method MMD [16].
The whole training set of 114 instances was used for experiment investigation of the first and the second models. This training set included (a) a subset of 42 instances characterizing defective blades and (b) a subset of 72 instances representing Results of the first part of experiments Table 1 Method  Table 2 Model The number parameters N param of the model synthesized by the proposed method (N param = 652) is less than a similar model constructed by the method MMD [16] (N param = 804) (see Table 2). This is due to the fact that when using the proposed method, the average size of the generated detector is less. Such reduction can be explained that a priori information of feature significance is used during the process of negative selection. It allows to exclude from further consideration irrelevant and redundant features that complicate the process of synthesis of diagnostic models and reduce their interpretability. Thus, the model synthesized by the proposed method MPRSBNS is more simple and straightforward compared to the model created by the method of [16]. Approximation and generalization capabilities of the model synthesized by the method MPRSBNS are also higher. This fact is confirmed by the values of the criteria E, E t , The comparison of the model synthesized on the basis of MPRSBNS method and the neural network model allows next to conclude. The model constructed by the proposed method has higher generalizing and approximation abilities (criteria E, . It can be explained by representation of neural network as a set of neurons interconnected in a certain way and characterized by weighting coefficients as adjustable parameters. And each neuron corresponds to a function of many arguments. At the same time, this neural network model is difficult enough for human perception. The model as a set of classification rules synthesized by the proposed method is more intuitive in comparison with the neural network model. Really, classification rules of the form "if condition, then action" are much more understandable and human-readable than a set of coefficients that reflect the degree of neuronal connections in the neural network model.
Thus, the results of experiments showed that the proposed method due to the usage of a priori information and exclusion of irrelevant and redundant features of the sample makes it possible to reduce the search space and time of execution. Proposed method allows to synthesize classification models in a form of a set of detectors with high approximation and generalization capabilities. Also by reducing the number of detectors and the conditions in antecedents it increases interpretability of the model, reduces its dimension and, therefore, the size of the used memory.

Conclusions
In this paper we solve the problem of automation of classification rule synthesis based on negative selection for the case of uneven class distribution in the sample. l were used for the analyzing of properties and characteristics of the investigated methods. These criteria describe misclassification error and the probability of making a wrong decision based on test data. Misclassification errors of models synthesized by the proposed method MPRSBNS and methods [13][14][15][16] are shown in Table 1. These errors have been calculated on test data E t . Misclassification error of the proposed method MPRSBNS is significantly lower than the error of other known methods (E t = 0.136, E t = 0.077 and E t = 0.055 for the methods [13, 15 and 16], respectively). It can be explained by using the characteristics G(Ab k ). These characteristics allow to evaluate the ability of the detector for the generalization of data. The proposed method MPRSBNS allowed to reach misclassification error E t = 0.037 (using a part of the sample S S 1 3 ) and E t = 0.011 (using the full sample , S P T 1 2 = ). It is important to note the specificity of the solved problem of blade diagnosis. An error of assignment to "non-self" class t t q 0 = l h has a very high cost when the instance actually belongs to a «self» class t t q 1 = l h. This error has been evaluated by criterion P , / t t t t t q q 0 1 = = l l. This is due to the fact that the classification of defective blades to the class of undamaged can cost human lives. The test data has zero error probability P , / t t t t t q q 0 1 = = l l for the proposed method MPRSBNS (see Table 1). This fact indicates high efficiency of the proposed method for solving such problems.
The zero level of error probability P , / ,. Note that this set of detectors was obtained using a priori information about the importance of features; • a high generalizing ability of synthesized set of detectors.
That is caused by the use of the criteria (6) and (7). These criteria allow to estimate ability of the detector to the data generalization.
reduces its dimension, size of used memory and improves the model performance for the sequential computation. It is obtained by increasing the generalizing properties of synthesized model by reducing the number of detectors and conditions of antecedents. An experimental study of the proposed method and its comparison with the known analogues is performed. A practical task of diagnosing the vanes of gas turbine of aircraft engines has been solved. The mathematical approach proposed at [24 and 25] can be used for reliability analysis of the proposed solution.
The developed method of classification rule synthesis based on negative selection uses a priori information about instances of all classes in the sample at detector set generation. It also takes into account information about the individual feature significance. A hypercube of maximum possible volume is used as a form of detector. It allows to exclude irrelevant and redundant features from the sample, thereby reducing the search space and time of execution of the method, as well as to generate a set of detectors with high approximation and generalization capability. The proposed method improves interoperability of the model,