MULTISPECTRAL SATELLITE IMAGERY CLASSIFICATION USING A FUZZY DECISION TREE MULTISPECTRAL SATELLITE IMAGERY CLASSIFICATION USING A FUZZY DECISION TREE

imagery classification evaluate some quantitative similarity of the current pixel with each of the defined classes. For classical statistical algorithms it is the class probability or the likelihood ratio, for other ones it is a more or less reasonable similarity measure [9]. Land surface remote imagery contains a significant fundamental stochasticity caused by variations in composition and material of objects observed, by temporal changes in imaging condition and radiation transfer environment and by sensor noises [10]. Therefore, a crisp data analysis for remote imagery classification is not always reasonable under such stochasticity. In some cases, better results can be obtained by fuzzy-logic processing. This paper is structured as follows. Section 2 describes a fuzzy decision tree model implemented in remote sensing imagery classification. Section 3 presents two ways of processing algorithm for both multispectral (without dimensionality reduction) and hyperspectral imagery classification with the aforementioned model. Section 4 discusses preliminary results obtained from test A land cover classification system is very important nowadays for various remote sensing applications and many sectors of economy. Therefore, development of algorithms for multi- and hyperspectral imagery classification is an urgent task. In this paper we present a new efficient algorithm for multi- and hyperspectral imagery classification based on a fuzzy decision tree approach. Multispectral imagery spectral bands are used as fuzzy data source attributes and cumulative mutual information between them and the resulting fuzzy classification as a decision tree inducing criterion. The proposed algorithm ensures good classification accuracy.


Introduction
A reliable land cover classification is the key data for a variety of remote sensing applications, such as natural resources prospecting, environmental management and geospatial planning [1]. The CORINE Land Cover (CLC) implementation over whole territory is a mandatory requirement for all countries of the European Union. In reality, land cover accuracy rarely achieves the commonly recommended 85% target [2].
In remote sensing image classification, useful information is determined by physical fields of landscape. Different classes have different spectral reflectance or self radiance. Multispectral imagery and hyperspectral imagery in particular provide more accurate registration of class spectra, which improves the accuracy of classification [3]. There are well known land cover classification methods. These methods are based on interpretation of multidimensional imagery by class identification features [4].
Modern satellite imagery processing includes different methods for supervised and unsupervised land cover classification. However, known algorithms developed during 1960s -1980s on the basis of statistical or object-oriented paradigm do not provide the required classification accuracy without additional tweaks [5]. Such algorithms include the maximum likelihood (ML) classifier, k nearest neighbors (kNN) classifier, support vector machine (SVM), object-based image analysis (OBIA) [6].
The basic approach to classification of multispectral imagery is an optical signal in every pixel described as a discrete function E(λ), λ = 1, ... , m in a multidimensional spectral space Λ = { λ }. It is obvious that the average classification accuracy of m-dimensional radiometric fields will increase asymptotically with the number of spectral samples m increment [7].
An important problem is the limited a priori knowledge of scene extrapolation to a detailed pixel-level classification. Usually it is performed by assigning a learning sample for each class. The classification theory requires a fundamental discriminability of classes in feature space, i. e., from a mathematical point of view, the probability density distribution should be a separable mixture of probability densities of the classes [8].
Known algorithms used in multispectral imagery classification evaluate some quantitative similarity of the current pixel with each of the defined classes. For classical statistical algorithms it is the class probability or the likelihood ratio, for other ones it is a more or less reasonable similarity measure [9].
Land surface remote imagery contains a significant fundamental stochasticity caused by variations in composition and material of objects observed, by temporal changes in imaging condition and radiation transfer environment and by sensor noises [10]. Therefore, a crisp data analysis for remote imagery classification is not always reasonable under such stochasticity. In some cases, better results can be obtained by fuzzy-logic processing.
This paper is structured as follows. Section 2 describes a fuzzy decision tree model implemented in remote sensing imagery classification. Section 3 presents two ways of processing algorithm for both multispectral (without dimensionality reduction) and hyperspectral imagery classification with the aforementioned model. Section 4 discusses preliminary results obtained from test

MULTISPECTRAL SATELLITE IMAGERY CLASSIFICATION USING A FUZZY DECISION TREE Sergey Stankevich -Vitaly Levashenko -Elena Zaitseva *
A land cover classification system is very important nowadays for various remote sensing applications and many sectors of economy. Therefore, development of algorithms for multi-and hyperspectral imagery classification is an urgent task. In this paper we present a new efficient algorithm for multi-and hyperspectral imagery classification based on a fuzzy decision tree approach. Multispectral imagery spectral bands are used as fuzzy data source attributes and cumulative mutual information between them and the resulting fuzzy classification as a decision tree inducing criterion. The proposed algorithm ensures good classification accuracy.
Keywords: Remote sensing, multispectral imagery classification, fuzzy decision trees, classification accuracy, spectral band selection.
where μ k (B j ) is the k-th value of the membership function of fuzzy attribute B j , w is the number of possible values in a support set of B j attribute.
For spectral bands attributes of multispectral imagery, the support set is just a gradation range of the registered signal in each band, usually uniform over the entire image. The defined (1) criterion provides a recurrent selection of the spectral band that contains the most information about the target classes. To avoid exhaustive search of all spectral bands, growth of the tree should be limited by pruning low-information branches.

Algorithm
Practical implementation of the proposed model in classification of remote sensing images should be performed in two ways. For a small number of spectral bands (multispectral image case), all of them are quite informative. For a sufficiently large number of spectral bands (hyperspectral image case) a fullyfeatured decision tree is built with branch pruning. A decision tree node is transformed into a leaf node, if the relative frequency of any solution alternative solution exceeds some a priori specified level β: where I(C, B 1 , ..., B m ) determined by the (3) and An important stage of algorithm is to bring the source multispectral crisp data into fuzzy form required for FDT operations. In remote sensing, Gaussian distribution functions are usually used for this purpose [16], but in our research histogrambased membership functions in spectral bands were formed. The distributions obtained are smoothed by sliding Gaussian window. As to pixel fuzzification, it was carried out within the Gaussian-weighting two-dimensional window. This incorporates some positive properties of OBIA into the image classification result [17].
The computation takes into account the distribution of possible states of input fuzzy attributes with different membership degrees. A certain weighting factor, which is associated with tree branches up to the current leaf node, was assigned to each spectral band. Similarly, the output image classification contains the membership function values also for all allowable classes. To obtain a crisp (hard) classification, the class with the maximum value of the output membership function must be selected.
processing of actual multi-and hyperspectral satellite imagery. Section 5 presents conclusions drawn from this research.

Model
A fuzzy logic approach is quite suitable for multispectral imagery classification. A lot of fuzzy classification algorithms are commonly used in remote sensing, including fuzzy c-means (FCM), fuzzy k nearest neighbors (FNN), semi-supervised fuzzy cluster labeling (SFCL), and object-oriented fuzzy classifier (OOFC) [11]. However, all these algorithms are generalizations of the corresponding crisp statistical algorithms. These algorithms have inherited drawbacks in hyperspectral imagery classification. The major drawbacks include equalization of any quantitative estimations on high-dimensional data and computational instability.
In most cases of hyperspectral data processing, optimal selection of spectral bands is used. This reduces the strong information redundancy of hyperspectral imagery and enhances the overall informativity for the actual remote sensing application [12]. A fuzzy decision tree (FDT) is thus an ideal tool for hyperspectral data classification and simultaneous dimensionality reduction [13]. In terms of FDT, fuzzy data source attributes are spectral bands of a multispectral image, and an image classifying FDT is made using an information-theoretic approach. Up to now a simple unordered FDT is implemented in remote imagery classification. In this case, choice of the next fuzzy attribute depends on results of the previous attributes estimation [14 and 15].
The next spectral band B j+1 is selected for each branch of the FDT, providing the maximum amount of information about the target fuzzy classification C = { C i }, i = 1, ..., n, at minimal cost using the already known sequence of previous spectral bands B j , j = 1, ..., m estimates. The amount of such information I(C; B 1 , ..., B m ) is a cumulative mutual information between the resulting fuzzy attribute and the source ones. A number of spectral bands already selected can be used as the cost. The band selection criterion then takes the following form ; ,..., max m (1) Cumulative mutual information in (1) is calculated according to the rule [15]:  I(B 1 , ..., B m ) -I(C, B 1 , ..., B m ), (2) where I (C, B 1 Fig. 1a). Supervised classification was performed using FDT. Also a conventional crisp classification using the ML method was carried out as reference. The outputs are shown in Fig. 1b and 1c.
A full-featured processing (a limited FDT with branch pruning) was done over the EO-1/Hyperion hyperspectral 220bands satellite image fragment (Kiev suburb, Ukraine, March 17, 2012, see Fig. 3a). In the source full-band hyperspectral image, 78 of 220 spectral bands with a better signal-to-noise ratio were preselected for further analysis. A fuzzy decision tree was formed in the first stage of the algorithm, as shown in Fig. 2.

Fig. 2 Fuzzy decision tree for a test hyperspectral satellite image classification
In Fig. 2, decision nodes with codes of evaluating spectral bands are presented as rectangles, and leaf nodes with class numbers are presented as ovals. It is clear that another decision tree will be induced for another hyperspectral image (HSI) or for other classes composition.
The result of the FDT supervised classification is given in Fig. 3c. A spectral angle mapper (SAM) classification was used as a reference (see Fig. 3b) because the generic ML classifier is inoperable over high-dimensional data due to covariance matrices singularization.
Analysis of the classification results leads us to conclude that the FDT algorithm provides a better classification in general. In the Sich-2/MSU multispectral image, arable lands are classified more accurately, sand and concrete are discriminated better, but water surface is identified with less confidence. Even if ground-based validation of test images is not carried out, visual assessment using high-resolution imagery can roughly estimate classification accuracy of 75% for the ML algorithm and 80% for the FDT one. In SAM-based classification of the EO-1/Hyperion hyperspectral test image, a substantial percentage of the land cover was incorrectly classified as water. Correct classification was not observed in the vicinity of high spectral reflectance areas (snow). Artificial cover was falsely detected among deep woodlands, etc. A classification accuracy was estimated as 65% for the SAM algorithm and as 75% for the FDT algorithm. In

Results and discussion
The developed algorithm for remote sensing imagery classification using FDT was applied to test land cover classification on real satellite images. All data processing procedures have been coded and debugged with a Free Pascal compiler. All the source DN-data bands of satellite images calibrated through values of radiance at sensor are converted into land surface spectral reflectance to avoid the influence of solar irradiance and atmosphere. The first type of processing (ranking bands only) was applied to the Sich-2/MSU multispectral 3-bands satellite image fragment imagery. The results obtained are supported preliminary by experiments in test processing and analysis of actual multi-and hyperspectral satellite imagery.
The problem of achieving the required performance of the developed algorithms and demonstration software modules has not been solved yet. This can be seen in problems with difficult processing of large-size imagery. Moreover, positive results obtained previously must be validated through in situ observations.
We look forward to continue our joint research in improving models for FDT-based remote sensing image classification, in particular by implementation of more sophisticated decision strategies and new remote sensing application-oriented informativity metrics entering into the algorithms. addition, the number of spectral bands selected for processing was reduced from 78 to 20, which would significantly cut down computation costs without any degradation in classification accuracy. A decrease in classification accuracy in comparison with a multispectral image can be most likely explained by low spatial resolution of the hyperspectral image. No serious misclassification effect intrinsically inherent in the traditional pixel-based classification has been found in the FDT-classified images.

Conclusions
A new algorithm for remote sensing multi-and hyperspectral imagery classification based on a fuzzy decision tree approach has been developed. In some cases, this algorithm provides more exact classification accuracy than traditional ones, approximately by 5-10%. In addition, a significant reduction of data dimensionality (more than 3.5-fold) for hyperspectral imagery processing is ensured by informative spectral bands selection using branch pruning of the decision tree. A proper and productive reduction in data dimensionality is very important for large-scale highperformance practical analysis of remote sensing hyperspectral