CLASSIFICATION ACCURACY ENHANCEMENT BASED MACHINE LEARNING MODELS AND TRANSFORM ANALYSIS

© 2 0 2 1 U N I V E R S I T Y O F Z I L I N A C O M M U N I C A T I O N S 2 3 ( 2 ) C 4 4 C 5 3 Resume The problem of leak detection in water pipeline network can be solved by utilizing a wireless sensor network based an intelligent algorithm. A new novel denoising process is proposed in this work. A comparison study is established to evaluate the novel denoising method using many performance indices. Hardyrectified thresholding with universal threshold selection rule shows the best obtained results among the utilized thresholding methods in the work with Enhanced signal to noise ratio (SNR) = 10.38 and normalized mean squared error (NMSE) = 0.1344. Machine learning methods are used to create models that simulate a pipeline leak detection system. A combined feature vector is utilized using wavelet and statistical factors to improve the proposed system performance. CLASSIFICATION ACCURACY ENHANCEMENT BASED MACHINE LEARNING MODELS AND TRANSFORM ANALYSIS Hanan A. R. Akkar1, Wael A. H. Hadi2, Ibraheem H. Al-Dosari3,*, Saadi M. Saadi4, Aseel Ismael Ali4

improve the denoised signal after re-transforming back the threshold wavelet coefficients into time domain [2]. In the last decade many proposals are suggested by researchers for monitoring the water pipelines network, some of these methods are based on distributed sensor motes and Wireless sensor network to read the status for the fluid inside the pipeline and monitor the network accordingly. Other methods used the statistical features for the gathered data and use them for diagnosing the system status at different scenarios [3]. In this work, different features are used to construct the feature vector, some of them based on statistical values and others used wavelet decomposition coefficients, the work also utilized some classifiers, which used the correlation between two features to enhance the classification accuracy. The real time signals are gathered from wireless pressure sensors, which are installed along a water pipeline at different locations. Artificial leak valves are inserted within the pipeline to make manual leak with different size ranging from small leak (less than12.7mm) and through medium leak (between 0.5'' and 1'') continuing up to the large leak (more than25.4mm) [4]. These measured pressure signals are processed and their extracted features are fed to different classifier models to diagnose the existence of a system leak. Three main classes are considered for this problem (large, medium, and small); classification confusion matrix is used for classification comparison test.

Introduction
Many signals in the real-time processing need filtering prior to feature extraction or further processing. The wavelet analysis tools play an important role in signal processing theory, also wavelet transformed coefficients are considered as features of signals, these features can be used for classification and recognition for some classes or objects respectively [1]. In this work a dataset for water pipelines network are used to train some machine learning based models. A comparison study for different classifier is established. Some performance measures are used to validate the comparison test. Although there exist some popular wavelet thresholding methods, such as soft and hard, however researcher still seek for the new thresholding method to improve the wavelet based denoising or compression algorithms. As a contribution to the knowledge of art, a novel thresholding method is proposed which try to improve the denosied signal in terms of increasing improved signal to noise ratio (ISNR) factor or minimize the mean squared error (MSE). The principal core for the proposed wavelet rectified thresholding method is demonstrated in the principle of cutting the details wavelet coefficients in the ripple manner rather than popular flat method. Basically, based on the frequency domain analysis this new thresholding technique will reserve some of the energy in the details wavelet coefficient and hence two methods from machine learning science. Dataset for the problem to be solved is provided from the real-time measurements for a water pipeline network. Artificial leaks are created with different sizes at various locations along the pipeline. After data gathering process from wireless pressure sensor nodes, the data is conditioning using denoising process based wavelet transform method. A new proposed Hardy-Rectified thresholding is suggested by authors of this work, a universal threshold and five decomposing levels are used to improve the SNR at the input to the proposed classifier. Both domains (statistical and wavelet) features are extracted and correlated, and then the new correlated features are used to construct a feature matrix that is used for training different classifier models. Comparison test is held between different classifiers types, in order to summarize the obtained results for about 23 classifiers with different criteria and kernels [6]. The Zigbee protocol and its transceiver modules are adopted for data transmission between wireless sensor nodes in the network. A comparison study between different modulation types is established, the results show a sample for the signal between transmitter and receiver of the two nodes with a proper choice of modulation scheme and signal to noise ratio.

Proposed wavelet thresholding mathematical models
Denoising for signals and images can be achieved using different methods, wavelet based method can be considered as a one of the well-known methods for denoising, which can be summarized by three main steps: firstly the wavelet decomposition for the signal or image into two coefficients known as approximations and details, then thresholding the details coefficients by proper threshold and finally reconstruct the approximations and threshold details coefficients to reconstruct the de-noised signal or image. There are many thresholding functions used in the literature, such as soft and hard thresholding, however in this work a new proposed thresholding function, known as rectified sine thresholding function, is proposed with two models (Softy and Hardy) in order to enhance the denoising process. The models of the thresholding techniques are given below: Hardy-rectified thresholding Hard thresholding 2 Theoretical background

Support vector machine
The support vector machine (SVM) is a machine learning tool that can be used for solving problems of seeking for the minimizing certain parameters such as MSE. The principle idea in the SVM theory is to transform the given data to other space using some transformation kernel. New representation for the data has been gotten, such that a hyper plane was used to classify the data into classes with different proposed intervals. Classification algorithms represent one of supervised learning which has ability to create a new model based on a given dataset. This predictive model can be used for further response estimation when the new test data is applied to the predictive model.

Nearest neighbor
This technique uses the principle of distance calculation between certain data point from the dataset and the closet predication. There are many distances' measure suggestion using different aspects such as Euclidian, Spearman, Jaccard, correlation, cosine, and Chebyshev distance. The KNN can be used for solving different kind of problems in machine learning such as classification, clustering, and regression. It has wide application in new area such as signal processing, data mining and database analysis.

Zigbee protocol
The IEEE STD 802.15.4 standard describes the physical layer (PHY) and medium access control (MAC) sub layer particulars for low-information rate remote availability with fixed, compact and moving gadgets with no battery or constrained battery utilization prerequisites, normally working in the individual working space of 10 m. It is predicted that, contingent upon the application, a more drawn-out range at a lower information rate might be an adequate tradeoff. The IEEE 802.15.4 standard (2003) characterizes the gadget types that can be utilized in a low rate wireless personal area network (LR-WPAN), which are Full Functional Device (FFD) and Reduced Functional Device (RFD). The RFD can be utilized in basic applications in which they do not have to transmit a lot of information and they need to discuss just with a particular FFD. The FFD can fill in as a personal area network (PAN) organizer, as a facilitator, or as a straightforward gadget. It can speak with either another FFD or a RFD [5].

Proposed system
In this work, a new technique for the leak detection and classification is suggested, this technique combines which is divided to three parts -60% for training the model, 30% for validating it and 10% for testing. The confusion matrix is used for comparison between different classifiers [7]. Referring to Table 4, the best classification accuracy is obtained with linear discriminant model, however, fine tree classifier model appeared to be a model with the highest prediction speed and minimum training time. Figure 1 shows the novel wavelet thresholding method (hardy-rectified and softy-rectified) as compared to the conventional methods (hard and soft); 22 machine learning models are trained with the same dataset (see Table 4), Figures 2 through 6 explain the performance for the best classifier model (linear discriminant). Figure 2 explains the confusion matrix between the true and predicted classes for 100 samples of different leak sizes. Figure 3 shows the percentage confusion matrix with positive predictive value and false discovery rate for each class. Figure 4 show the scatter plot for the prediction model with three leak size (large, medium, and small); where the cross mark indicates the misclassification point. From Receiver operating characteristic (ROC) curve in Figure 5 the classifier shows 98% true positive rate with 2% false positive rate [8]. Table  5 explains a comparison between the proposed method and different wavelet thresholding methods in recent researches, like the mixed method [9], improved method [10] hierarchical method [11] and adaptive method [12]. Finally, Figure 6 shows the classifier model with parallel coordinates for the four predictors and different standard deviation ranges.

Conclusions
In this work different classifiers from the machine learning toolbox in MATLAB program are used to solve the problem of the leak detection and size prediction in the water pipeline network. Pressure sensors are utilized and installed properly along the water pipelines for data gathering prior to feature extraction. Different features had been used for classification problem, statistical features and the wavelet-based features are the most popular Soft thresholding Hardy-sawtooth thresholding where: W j -the input signal to wavelet thresholding at level j, λ -threshold, Q j -the output signal from wavelet thresholding at level j.

Results and discussion
Referring to the obtained results, Tables 1 through 3 explain the procedure followed by the wavelet denoising process. In Table 1 a comparison has been done for different wavelet mother functions from the coiflet family, which yields the success for coif2 wavelet mother function with hardy-rectified thresholding, while other parameters are set to: the thresholding rule as universal and 5 decomposing levels are used with two thresholding methods (hard and hardy-rectified). After success for coif 2 in the first stage, it is then used (as shown in Table  2) with different decomposing levels to search for the optimal level. However, level 5 is the best level, due to its highest ISNR and lowest MSE. In Table 3 different kinds for thresholding methods (soft, hard, softy-rectified, and hardy-rectified) have been used with various threshold selection rules (huresure, sqtwolog, rigrsure and minimaxi). The hardy-rectified thresholding with sqtwolog as threshold (universal) selection rule is the best obtained results from Table 3. Different classifiers are used with the same dataset,  Figure 7. So, it can be used for wireless sensor network (WSN) deployment and especially with OQPSK 2450MHz, which shows excellent transceiver communications and minimum absolute error at the receiver side, see Figure 8.