APPLICATION OF DATA MINING ALGORITHM TO INVESTIGATE THE EFFECT OF INTELLIGENT TRANSPORTATION SYSTEMS ON ROAD ACCIDENTS REDUCTION BY DECISION TREE

Due to the large amount of available data in this study, authors have utilized data mining algorithms, especially the decision tree, to process these data and obtain the information, which would result in increasing road safety, determining the causes affecting it and patterns leading to traffic accidents. The effective use of this tool and its role in controlling the number of driving accidents is the subject of this study with the help of data mining algorithms. The results show that the increase in the number of roadside assistances to more than 41; number of driving accidents (fatally injured) is not significantly different, hence one of the proposed strategies for intelligent relay stations and its organization with the intelligent transportation tool is available. The intelligent transportation system utilities comprise of monitoring, guidance and enforcement tools, plus service tools such as rescue, driver assistant and road improvement.


Introduction
Annually more than twenty thousand people die on the roads of Iran because of car accidents. This issue is a social health problem not only for Iran but for other countries, as well. For example, more than 40000 people die on European roads annually. Every year, road accidents impose 200 million euros expenses in the European economy [1]. Expenses of road accidents in Iran are unprecedented in their kind and they account for about 7 % of the country's GDP [2][3][4][5][6].
Presence and monitoring by police at mostly accident hotspots, modifying the geometric scheme of these places, using warning signs, increasing vehicle safety and using seat belts are some of the solutions that can reduce road casualties. Besides these solutions, applying intelligent transportation systems is another method that the majority of developed countries and many developing countries have followed that today and by applying this system they have been able to increase the road safety level [4][5][6][7][8][9].
Currently, the road conditions in Iran are such that, in spite of the implemented efforts by relevant organizations and bodies over the past few years, this country is still considered one of the most unsafe countries in terms of the traffic problems. Despite the new telecommunication and communication technologies, the intelligent transportation system and 2010, was because of overturning accidents and after that, vehicles accidents with each other. During these years, deaths due to vehicle overturning was approximately 59 % of the total casualties in suburban road accidents in Semnan province and this statistics is more than twice of the national average. Semnan province had the first rank in country at 2011 for overturning vehicle casualties. It is worth mentioning that regarding to regulations from roads and transportation organization of the country, points or sections on road that has p number (quantity of measuring the accident rate of points) equal or more than 30, are considered as the high priority accident points and approaches should have provided for solving them in two-time length; short time and middle time. The changes after installing the intelligent transportation systems are in this way. Currently, 29 points of province roads have online controlling cameras, 40 critical and important axis have online traffic counting system, 6 sections of the province have variable message signs (VMS), 28 points have variant speed limit (VSL) and 34 sites have speed controlling cameras. Locating the installation of weigh in motion system (WIM) in two important sections of province (Semnan-Sarkhe and Shahrud-Sabzevar) were completed. The results of the mentioned research show that by considering the installation of intelligent transportation system and casualty statistics and traffic in Shahrud-Sabzevar road, the casualty numbers in the mentioned road during 2011 to 2013 (after installing intelligent systems) had a significant decreasing trend and by considering that the rate of traffic has increased 12 % from 2011 to 2013, in spite of this increase in traffic, the number of casualties in Shahrud-Sabzevar has decreased [1]. Another case study, related to one of the intelligent transportation systems in Karaj city has been implemented. In this study, besides the introduction of effective parameters in selecting the location of variable message sign with the purpose of high efficiency of these tools in traffic distribution in city passages, ARC GIS software was used.
In this regard, with simulation of Karaj city passages in the mentioned software and defining the parameters involved in location selection, some points were proposed for locating that provided favorable visual success in passages by considering traffic volumes, as well as decision making by users in selecting other routs after passing the sign. In the following, after defining the favorable passages at the level of the urban passages network, in the next stage by using visual and geometrical parameters from the existing regulations, locating the variable message sign (VMS) along this passage with preparing logical points of view in the distances of vision and change of direction took place [5].
Application of data mining knowledge has been used in extensive safety researches in safety field, for example in studying the effects of variables and this aim, a summary of these researches is presented here and in the following they are discussed. Finally, the decision tree is described as an effective way in the road transportation accidents.

Reviewing the references
In one of these studies, the efforts are made to introduce some effective intelligent systems in road safety and it is tried to review some case studies in this issue. The results of this study show that these systems can decrease the number of accidents significantly or the time for assistance on roads. Video controlling systems are the main tools of traffic management and their advantage is to provide video information for deciding and law enforcement. Speed controlling cameras have a significant role in the speed reduction of passing vehicles and this speed decrease will reduce accidents and finally reduce road casualties [3][4][5][6][7][8][9]. In this regard, the locating issue of this tool is essential for making them more practical and optimized their performance. In this research, after identifying the effective layers in locating speed controlling cameras, first desired information and maps were collected and in following, by weighing them, these layers were put on each other in ARCGIS media by aids of the hierarchical analysis technique and maps of appropriate zones for installing speed controlling cameras were extracted [10][11][12][13][14][15][16][17].
Another study is based on describing-analyzing method and it tried to determine the correct localization of intelligent road transportation equipment on the Shahrud-Sabzevar road in Semnan province by using analyzing and recognition of mostly accidental places and by means of the GIS localization method, localization of installing the intelligent transportation systems in Shahrud city was implemented. ARC GIS, Google Earth, RMG and MAP Source software were used for optimal localization in installing the mentioned systems. The results of this research can provide a scheme for installing intelligent systems in all of the provinces and it can provide approaches for optimal usage of intelligent systems [4].
The experimental Bayes method (EB) is commonly used by transportation safety analyzers for doing different safety analyses, such as before and after study and analyzing spots. Until today, most of implementation the EB methods were used by a negative two section method. Recent studies have shown that a limited mixture of the NB models with mixing GFMNB-K sections can be used in simulation of accident data in excessive closure and in overall it provides better statistical performance in comparison to the NB models [9][10][11][12][13] A group of authors, in a case study in Semnan province, evaluated the conditions of the Shahroud-Sabzevar road before and after the installing the intelligent systems. The study on the share of different accidents on the roads shows that the highest number

F38
K H A B I R I e t a l .
at 2004 to 2.10 % at 2007 [14]. In the studies before and after the installation of surveillance cameras in Jordan by Naqvi and et al. (2018), it was described that in the period between 2011 and 2014 the sped controlling cameras were installed in eight locations, an average of 59 to 63 % of traffic accidents decreased, but in a location traffic accidents were increased to 35 % [10][11][12][13]. A summary of related researches with the importance of subject are presented in Table 2.

Research method and data gathering
For analyzing data in this research by the SPSS software, a statistical pattern from the decision tree was plotted. There are different techniques for plotting a decision tree utilizing different algorithms, regarding the analyzed database, selecting an appropriate algorithm for applying on data is essential. The different parameters, the effects of sewage caps cover in the showoff of the motorcyclists were analyzed by decision tree, in the result of the mentioned research it is pointed out that more than 50 % of motorcyclists make dangerous sudden change in location when they are faced with these cap covers [9][10][11][12][13]. Table 1 shows the comparisons between effectiveness of intelligent transportation system in country and different areas that are proposed by researchers, Sheikh Zeinodin and et al. [2].
Based on performed analyses by Niazi et al. (2018) [14], in reviewing the references it was clear that installing the speed controlling cameras have decreased the accidents 29 %, or they say that 19 % of all accidents and 44 % of dangerous accidents in the UK have decreased after the installation of speed controlling cameras as a tool of intelligent transportation systems, with the practical implementation of intelligent transportation tools in China, the death statistics reached from 2.17 % Table 1 The comparison of effectiveness of intelligent transportation systems in roads safety [2] Tool type The comparison of safety performance of tools in intelligent transportation systems in different places  Table 2 Overall summary from related studies with the current research [3][4][5][6][7][8][9] Subjects of study Results

1
The severity of the driver injury in single car accidents The safety of new cars influence the driver reaction. 2 Analyzing the effectiveness of human factors on predicting and categorization of accident severity in Iran Safety belt, age and gender are human factor signs that have influence in road accident severities in Iran.

3
Defining the relation between injury severity of accidents with driver behavior, vehicle and environment Type of variable vehicle is important that has the highest effect in flow.

4
Investigating the effect of weight and size of vehicle on the severity of accidents Difference in weight and distance of vehicle axis in accidents increase the accidents.

5
Analyzing the effective factors on accident severity The speed factor is very important in increasing the severity of the injury. So that, at high speeds the importance of age and gender is less than speed.
In the decision tree algorithm, the classification algorithm is considered with monitoring, thus, data are classified in learning and teaching media, in this kind of algorithm, the performance and accuracy of their classification in data mining is in lower entropy index [15][16][17][18]. The display of the resulting knowledge in the shape and structure of a tree makes it comprehensible. Each branch of a tree includes a combination of different variables that have close properties [15].
The optimum number of clusters in each model is calculated by using the Dun index based on Equation (6). The purpose of this index is to maximize the distances in clusters and minimize the distance between clusters [15][16][17][18].
where: d(c i , c j ): the minimum distance between existing records in branch i that is calculated from Equation (7): diam c ĵ h : the maximum distance between existing records in branch j that is calculated from Equation (8): where: d (x,y) : is the distance in each branch [15][16][17][18]. For analyzing the performance and the used algorithm, convergent validity is used. This index calculates the correct performance of the clustering algorithm, if a record is not placed correctly, it is recorded as "positive correct index" and if it is placed in an unrelated branch it is recorded as "negative correct index" and finally, the convergent validity is calculated from [15][16][17][18]: convergent valitidy positive data negative data positive integer negative integer , = + + (9) in that: positive data: data with positive sign, negative data: data with negative sign [15][16][17][18]. Equation (9) shows the degree of reliability of data segmentation and classification. The convergent validity parameter denotes the validity of the classified data to the total data. The larger this parameter is, the more reliable the decision tree algorithm is. Nevertheless, if a large amount of data is not properly segmented (negative data), it reduces the level of reliability. The decision tree is one of the non-parametric methods of data classification. Tree classification in audit analysis methods is based on logistic regression. In this research, the decision tree algorithm has been used to present the advantages of the decision tree are: • Using the knowledge of human • Easy perception in training and test data • Easy perception of categories • Applicable for training data with low volume and easy problems [15][16][17][18][19]. • Facility in understanding the relationship between variables, but despite its easiness, it can work with complex data and make decisions from it. • Ability to use a massive amount of data and a large number of variables and work with complex data are its advantages. • It can be combined with other decision-making techniques to achieve better results. For determining the appropriate branches in the categorization of data, a criteria for calculating the impurity amount for each group is defined that is obtained based on: where: i (N) : entropy impurity criteria, P(w): the relation of samples that in N knot it is in J group.
When the amount of i (N) is maximum, all the samples with uniform distribution belong to all categories, this means that equal probability is created for all the groups or sub-branches [15].
One of the criteria for categorization in the decision tree is variance impurity, which is obtained for even groups based on: ( The Gini impurity i_j (N) is the generalized impurity variance for the multi-group state that is obtained from: i_j (N) Another impurity criterion is misclassification impurity that shows the minimum possibility of misclassification and in the highest state, in which all the groups are equal, is obtained from: mi is the index of misclassification impurity. The multi-branch state with a branching rate of B, can be used for the number of possible branches in classification and based on Equation (5) for normalization the number of impurity decrease is used.

Introduction of statistical population
The statistical population includes a complete collection of recorded data or possible measurements of a qualitative characteristic or property, about the complete collection of units, which is considered, perceptions about it are performed. This research seeks to link safety variables and applications of intelligent transportation system tools, that the summary of statistical data from the statistical yearbook of the Road Transport Organization are extracted and are introduced in Table 4 in the form of descriptive statistics from statistical software [20].
For analyzing the normal state of data, the tensile data are tested first; that its histogram is drawn in Figure 1. In Table 4 the tensile descriptive statistics or kurtosis show a normal distribution that its amounts are in (-2, +2) range.

Decision tree model
A model is a mental and non-physical display of the internal components and connections of a phenomenon pattern of traffic accidents.
In the decision tree, Chi-square Automatic Interaction Detector (CHAID) algorithm, this possibility exists that one knot is divided into more than two knots but it is used in the Cluster-Based Routing (CRT) algorithm for double trees (it means that each knot is divided into two other knots).

Introduction of research variables
In this study, variables of research include the number of accidents (death-injury) and independent variants and the intelligent control and surveillance of the road police or other intelligent transportation systems (ITS) in provinces from 2008 to 2017 (10 years' statistics). First, with using the available data from the summary of management statistics of roads and transportation in 2018 [20], the information statistics were inserted in Excel software and the next effort was modeling the decision tree for reaching an appropriate graphical display from the relationship of independent and dependent data, which in Table 3 input data for research are presented.  The graphical tool for displaying the classification and clustering data from decision support tools has the following advantages: • Training samples are placed in the correct categorization as possible.
that shows the relationships between data and different variables of that phenomenon. Models are used for drawing future phenomena and future situations. In the following, two types of statistical models, the multiple linear and the data mining model of the decision tree are used.  in these categories that are the roadside assistance (emergency). It is observed that in this node, with increasing the number of roadside assistance to more than 41 bases, the number of accidents (deaths) changes dramatically. As the number of vehicles increases, these are effective in accidents, as the number of crossings and the volume of vehicles increases by 50 %, the number of accidents (death) increases. Based on this graph, the importance of existence of the relief stations and attention to the nature of their services is emphasized, especially when the possibility of making smart these Intelligent Equipment Relief Service in order to the in-time presence in traffic accidents. Another measure of relative reduction in traffic volume is the widening of road lanes, which in proportion to the passing traffic, reduces the intensity and number of accidents. In addition, it is recommended that the higher priority be given to locating intelligent traffic and transportation equipment on routes with higher passing traffic. In Figure 3, the diagram of the extended decision tree model is considered as a dependent variable for the accidents in 2015, this pattern is based on the CHAID data clustering model. An important variable, affecting the data clustering in the first open node in these statistical data, is the number of roadside • The emergence of training samples in a form that neglected samples are categorized with high accuracy. • In the availability of new training samples, facile updating of trees is possible. • It has as simple a structure as possible.
In the decision tree, the first leaf is divided into two leaves with various samples and different predictions. The used pattern in this modeling is the CHAID technique. Each category is divided into other categories with different numbers of specimens and different predictions, except for its own leaf. In addition, these steps continue to reach the final nodes called the leaf, the best branch and its final result is called the leaf, the category is found, the best branch and its final result that means leaf is selected according to the percentage of importance of the independent variables. Then, the constructed branches of the tree are removed or pruned to achieve the stopping criteria or reaching to the desired level of complexity.
The first sample of a constructed decision tree, from 13 variables in this study, is shown in Figure 2 based on the CHAID pattern; as it is seen, this decision tree is based on a double classification pattern, it has 5 main leaf or data groups, that describe variables • Dependent variables, such as network length and volume of passing traffic, were of a great importance in constructing the accident prediction model, which, due to the specificity of this issue, is proof of the accuracy of the constructed models. • Compared to more important variables, such as passing volume and route length, variables related to intelligent transportation did not have a significant impact on the model construction, hence it is essential to pay attention to develop the appropriate primary network infrastructures. • The influential factor in the number of accident models in the drawing techniques was the relief situation, that equipping the relief stations and coordinating them with other equipment of the intelligent transportation system can be effective in reducing the severity of accidents.

Recommendations for increasing the traffic safety
According to the research results, in general, approaches for increasing the effectiveness of the application of road intelligence tools and road transport are provided. assistance stations The effectiveness of the roadside assistance parameter in this collection is such that in the first node three separate leaves are made for the data categorization. The second factor in the separation and categorization of data is the road networks. Of course, as the length of the route network increases, the number of accidents increases obviously, but this result emphasizes the correct construction of this graphic model.

Summary of results
Analysis of accidents occurring on suburban roads with the aim of identifying the parameters affecting the increase in the frequency of accidents, can be effective in the decision of those involved in improving a road safety in order to reduce the road accidents. The purpose of this study was holistic, in other words, minor indicators such as speed were not included in the study. In this study, according to the published general data related to the last ten years on the status of the road network and intelligent transportation system tools and statistical modeling techniques, clustering models are used that in summary, the main results of this study in a divided form include: of telecommunication and internal electronic systems of used cars in the country such as car-car systems and car infrastructure needs more attention.
Finally, in order to complete the present study, several study horizons are suggested as follows.
• Providing an appropriate data mining framework to implement appropriate models in determining the influence of effective parameters on the performance of intelligent transportation tools. • Using other data mining and neural network models to calculate the effectiveness of intelligent transportation tools in the severity and the number of events such as CART and K-MEANS methods and phase neural network.
• Despite the widespread use of smart devices in the road transport, there are still many shortcomings, for example, in-time informing in accidents and the golden time of rescue to rescue forces can decrease a significant proportion of the number of accidents and casualties and the severity of injuries. • Decentralization and proportional distribution of intelligent transportation system infrastructures between the provinces and accident-prone places with high transit volume and providing its related tools are among the essentialities of increasing the efficiency of this system. • Completing the primary infrastructures and then equipping them with intelligent transportation is an obligation, but along with it, the development