A COMPARISON OF MACHINE LEARNING-BASED INDIVIDUAL MOBILITY CLASSIFICATION MODELS DEVELOPED ON SENSOR READINGS FROM LOOSELY ATTACHED SMARTPHONES

General mobility estimation is demanded for strategy, policy, systems and services developments and operations in transport, urban development and telecommunications. Here is proposed an individual motion readings collection with preserved privacy through loosely fit smartphones, as a novel sole inertial sensors use in commercial-grade smartphones for a wide population data collection, without the need for the new infrastructure and attaching devices. It is shown that the statistical learning-based models of individual mobility classification per means of transport are capable of overcoming the variance introduced by the proposed data collection method. The success of the proposed methodology in a small-scale experiment for the Individual Mobility Classification Model development, using selected statistical learning methods, is demonstrated.

a discipline addressing discovery, identification, exploration and exploitation of common patterns in location (i.e. context-related) dynamics. The term localisation will comprise a set of methods for determination of location relation classes. This may include identification of the mobility patterns (classes of location-related behaviour). Formal description of human mobility has been addressed from biological and medical [7], to mathematical [8] perspectives.
A number of studies addressed exploitation of position and location for location intelligence, utilising increasingly smartphone sensors readings and records of telecommunication activity as data sources. In the now landmark manuscript, authors defined the statistical model of human mobility in the physical (material) world, [9]. Hausmann in [10] outlined the mathematical framework for studying human mobility. Researches showed that position awareness is not a necessary requirement for (human) mobility classification. Gustafsson established the framework and demonstrated the approach in the position estimation through sensor data fusion for requirementsdefined classes of Location-Based Services, [5]. In [3] is defined a generalised framework for human mobility classification based on various sets of motion activity observations, while in [11] methodologies for position determination using wearable sensors were surveyed. A smartphone sensors-based position estimation method for constrained in-doors environments is presented in [12]. The selected machine learning methods performance in recognition of individual human motion detection were

Introduction
Mobility may be seen as the level of one's ability to move physically, easily and without restrictions in a given framework of the individual and collective transportation infrastructure, that involves different means (modes) of transport and walk. The mobility estimation is a mathematical process of the evidence-based estimation of the level of mobility within the given spatio-temporal constraints. It is considered as a result of integration of the individual mobilities observations. Modern information and communications systems provide foundation for the information and services provision in relation to user whereabouts. The mobile telecommunications Location-Based Services (LBS) started to exploit the case [1]. This trend expanded to other disciplines including Intelligent Transport Systems, general human activity recognition [2][3], mobile health, medical diagnostics and convalescence [4]. The provision of information and services relies upon user's position determination [5][6], followed by sub-setting the contextual information to the estimated position of the user served. Authors of [1] proposed the now accepted LBS model that facilitate the position-location duality, through recognition of position as a place of existence determined in the physical world, and location as a place of existence in the world of context (the world of information). It clarified the need for the context recognition, as an alternative to position determination and contribution to establishment of location intelligence. Location intelligence is understood as integration with context. Palaghias reviewed and deployed machine learning methods for interpretation of human behaviour, depending on the complex set of various on-body or external infrastructure-related sensors, [26]. Authors of [27] presented a method for Origin-Destination Matrices (ODMs) estimation, as an indicator of general mobility, from the mobile communications activity records, without utilisation of the smartphone inertial sensors and without consideration of individual mobility, for privacy reasons.
The studies presented here rely upon the physical attachment of a motion sensor on the individual's body using a dedicated additional attaching device. Such an assemblage should be operated in a prescribed manner by trained users. Observations are then utilised for mobility classification as inputs for model development methods, selected according to statistical properties of observations [28]. A complexity of deployment reduces prospects of the wide application in a more objective population sample and scenarios of usage. Dependence on position determination renders many methods inconsistent and unusable without the high-quality position estimation.
This study was aimed at recognition and classification of the means of travel based on observations from smartphone sensors through utilisation of machine learningbased classification model development methods for experimental data-based individual mobility classification, [28]. The means of travel are defined as the manner in which an individual using a smartphone roams within the given transportation infrastructure (including pedestrian areas) utilising either a transport device, or simply walking. This research hypothesises that: (i) the novel method for the individual mobility data collection provides observations of the quality and resolution sufficient for mobility classification task and (ii) the methods for the individual mobility classification model development exist that utilise observations from the proposed novel method for the individual motion data collection of the sufficient model performance. The objectives of this study are, as follows: (i) development of the motion data collection method based on the loosely attached smartphone inertial sensors (accelerometers, gravity sensors, magnetometers, gyros) with the embedded reference frames alignment, to serve the mobility classification (estimation) model development, (ii) the mobility classification method selection that will exploit potentials of the loosely attached smartphone sensor observations and (iii) comparison of the mobility estimation and classification models developed using the presented methodology to provide the best practice for the mobility estimation studies of different scales. This research does not aim at development of the overall mobility model. It proposes the methodology for collection of the representative massive dataset and the statistical learning-based methods for development of the individual mobility classification model. The results have the potential of utilisation in Location-Based Services (LBS), urban and transport strategic planning, location intelligence, medical diagnostics and conditions' observation, (urban) surveyed in [2]. The group mobility has been addressed as well in a number of studies, as outlined in [13].
Smartphones have come equipped with a set of highprecision motion sensors and introduced them for everyday utilisation in a modern society. Disciplines ranging from location intelligence and telecommunications to medicine, benefit from data science methods applications on massive position-and location-related observation data collection. The smartphone opportunistic sensor approach is on the rise to integrate physical sensors with the context [14]. Ahmed and Song developed and demonstrated a Bayesian method for human motion classification, which includes the learning process of new classes of motion unknown to the system, based on the utilisation of smartphone accelerometers, [15]. Performance of four machine learning methods in resolution of the problem of the human motion class identification, based on bespoke inertial sensors set on human limbs (i.e. not smartphone ones), with the k-Nearest Neighbour method presenting the best performance for the set-up were examined in [16]. Authors of [17] gave a general overview of the machine learning methods for classification of human motion using smartphone-based sensors, along with recommendations on the experimental studies design. Sama demonstrated the approach in human activity recognition using observations from smartphone sensors, [18] while in [19] application of feature selection methods for human activity recognition using bespoke wearable sensors is discussed. Smartphone sensor data were associated with driver environment descriptors to accomplish machine learning-based driver behaviour profiling in [20]. In [21] is demonstrated a method for seamless rover tracking based on the internet activity data collection. [Benefits of context awareness for indoors localisation were examined in [22]. The smartphone-based opportunistic sensing still fails in expansion beyond the physically attached-to-body devices and the infrastructurerelated context (availability and quality of the network communications signals, or a number of infrastructurebased sensors such as CCTV). The majority of utilisations rely on satellite navigation, at least partially, rendering the approach limited to spaces with the GNSS signals available in required quality.
Numerous studies consider smartphone sensor readings in medical applications and services especially as diagnostic tool in neurology, orthopaedic and sport medicine. Haines et al. addressed the global health observations using smartphone sensor readings and modern internet-based communication technology, [23], while in [24] a case of mobility characterisation using wearable sensors in healthy, elderly and stroke patients was examined. Authors of [25] assessed mobility sensors, including those in smartphones, as sources of diagnostic and status information, requiring physical attachment to the patient's body for the alignment of the reference systems.
Machine learning methods have been used for navigation and mobility modelling scenarios. Still, the traditional approaches failed to loose requirements on smartphone sensor deployment and to consider a wider acceptable in classification modelling scenarios, since the model development methods anticipated the existence of variance in observations and addressed them in the model development process. However, the measurement methodology must ensure that the reference frames (coordinate systems) of all the components of the system comply with each other [31]. Mobility estimation problem concerns the measurement environment that comprises components depicted in Figure  1, as follows: (i) measuring unit (smartphone equipped with inertial motion sensors), (ii) measurement object (mobile individual), (iii) transport device (if a mobile individual is assisted in his or her mobility by train, tram, car, bicycle or some other device), (iv) stationary mobility infrastructure (road, pavement).
Each component of mobility estimation environment utilises its own spatial reference frame (co-ordinate system) K (l) , l = 1, …, 4, respectively. Measurements taken by measuring unit may be considered in relation to the other components of the mobility estimation environment only if at least one of the presumptions are fulfilled, as follows: P1: Spatial reference frames of a measuring unit (K 1 ), measurement object (K 2 ), transport device (K 3 ) and mobility infrastructure (K 4 ) are correspondent (equal), P2: a set of transformations f i,j , i = 1, …, 4, j = 1, …, 4, i ≠j, that transform positions and measurements taken in one spatial reference frame into another, exists.
The P1 presumption is addressed with a tailored physical fitting of a mobile device to a measurement object's body. The attachment means assures that the sensors in a mobile unit are measuring the same motion as the individual's one. The need for an additional, often costly, infrastructure (fitting device and method) and user training in operation emerge as a drawback of this approach, preventing its wider use and affecting the quality of representation of a targeted population. Here, a novel method for motion readings collection, using inexpensive and accurate inertial sensors in a widespread and frequently used smartphones, is proposed. The proposal results from the statistical operational mobility management, Intelligent Transport Systems and emergency relief.
The manuscript maintains a simple structure, as follows. Section 1 (this section) states the problem, surveys the research state-of-the-art, and outlines aims and objectives of the research presented in the manuscript. Section 2 formalises the problem of reference frameworks alignment, proposes the novel method for the loosely fit smartphone data collection, describes the experimental smartphone motion data collected using the proposed method in Krakow, Poland, and Zagreb, Croatia. Description includes the exploratory statistical analysis. Finally, a selection of statistical learning methods for the individual mobility classification model development is introduced, based on the statistical properties of experimental motion data. Section 3 outlines the individual mobility classification models developed using the selected statistical learning methods and experimental data and examines their performance for decision on the most suitable individual mobility classification model. The concluding Section 4 summarises the study's results and contributions, the benefits and shortcomings identified and outlines subjects of the future research.

Method and material
Inertial sensors measure motion properties very accurately, while retaining a low-cost investments of measurements [6]. The smartphone inertial sensors allow for the low-cost high-precision (i.e. with the consistent repeatability) observations of the motion variables (linear acceleration, direction of movement etc.). The high accuracy is accomplished by simple calibration procedures, already conducted by users without the need for further education, or with the sensor information fusion processes [29]. The accuracy levels differ slightly, depending on the actual smartphone realisation [29][30]. The slight differences in the motion sensor accuracy are considered A comprehensive overview of the methods involved, their characteristics, ranges of applications and details of implementation in the open-source R framework for statistical computing [33] may be found elsewhere [28,[34][35].
The classification statistical learning methods for the model development were deployed using resampling approach [28,35] to mitigate randomness (variance) in data split and method. The cross-validation-based approach was determined as required after significant differences in models performances were found, resulting from the particular means of the original data set division into training and test sets. A dedicated software was developed in the open-source R environment for statistical computing, using the R libraries: caret [35] and forecast [36] and their dependencies. The used R software integrated statistical learning methods for the model development with those aimed at developed models performance assessment. Model performance was examined through the performance metrics involving Confusion Matrix and Classification Accuracy and Kappa parameters [28,35].
The smartphone inertial sensor readings were collected in the identified application usage pose from smartphone inertial sensors, as follows: analysis of inertial sensors readings and common practices in smartphone utilisation.
Modern smartphone application usage often requires the end-user's attention and machine-to-individual communication through an application interface. The common practice of a smartphone application utilisation results in the common pose in which an individual is loosely fixed with his or her smartphone. A loose (approximate) compliance between a mobile measurement's devices (inertial sensor in a smartphone) and measurement object (an end-user) is accomplished without the need for physical fitting or training the end-user. The proposed method assures at least a loose compliance between the spatial reference frames of measured object (an individual using a smartphone) and of a transport device, considering an end-user would utilise his or her smartphone while standing or being seated in a transport device (car, taxi, train, tram etc.).
A simple algorithm is developed to identify a common pose of a smartphone utilisation that involves comparison of the rate of change of the inertial sensor readings. Inertial sensor readings are collected when taken during the interaction with a smartphone in a characteristic pose. The proposed method opens prospects for a widespread anonymised individual mobility data collection using commercial-grade smartphones without additional equipment or operational training.
The proposed motion data collection method was demonstrated in the solution of the individual mobility classification problem. The individual motion smartphone data sets were collected by the team members volunteering to utilise the proposed loose-fit smartphone in a smallscale experiment. It intended to serve as the proof-ofprinciple and as the motivation for a wider data collection for the mobility estimation for targeted regions in the future. Volunteering data collectors followed the proposed methodology and behaved like ordinary smartphone users, frequently utilising various applications. A dedicated smartphone application and the data post-processing identified scenarios of the attention given to interactions with smartphones and recorded the inertial sensors reading for the individual mobility classification model development. The proposed methodology was used in three common cases of urban mobility, with overlapping nature of descriptors: (i) walking (walk), (ii) travelling by bus (bus), (iii) travelling by tram (tram). The cases selected in this research are to be expanded in the future further to various means (modes) of transport. Overlapping statistical characteristics arise inevitably from the similarity of movement in urban traffic, especially during the rushhours, thus reflecting the variance added to the original data.
The problem of the added variance in data was addressed by utilisation of the supervised machine learningbased methods for classification model development, selected for their nominal capacity in addressing data with statistical properties of the same nature as those collected with our experiment [28,32] of the process modelled. That fact leads to selection of the model development methods capable of encompassing the complexity of variances, including those generated by the loosely attached smartphone sensors data collection method [32]. The original data set was split into training (80%) and test (20%) sets in a randomised manner, utilising the common statistical learning practice based on the Pareto An intentionally unbalanced set of readings was collected, (Figure 2), to examine robustness of the statistical learning classification methods in the specific scenario under consideration. Overlapping of statistical distribution density functions of predictors is observed and a wide range of statistical properties of predictors are evident from the exploratory data analysis. The variety of individual predictors variances results in the complexity between the reference frames) and (ii) introduction of complex parameter-overlapping classification scenarios. It was intended to tackle the problems with the dedicated statistical and statistical learning methods for model development, as outlined in Section 2. Despite the differences in statistical properties observed in data, it was proceeded by modelling the development without deployment of any data preparation activity (transforms, feature reduction, or normalisation). Introduction of the resampling procedure was the only modification of the original method implementation that was used to mitigate randomness introduced by the very methods and variance inherited in data. The caret R package allowed for a targeted model developments, and assessment of their performance. The resampling procedure was optimised using the Kappa parameter [35]. Statistical analysis of model performance parameters is presented in Figure 5 and box-plots of the same performance parameters in Figure 6.
Two model development strategies were examined for their difference in classification approach, and results of their models are addressed and presented here in more details. The Support Vector Machine (SVM) approach is recognised widely as a very robust method for the principle [28]. Cross-validation was utilised to compensate for the randomised original data split and for the optimised model development, as an another common statistical learning practice [32].

Research results
This research aimed at developing the method for massive data collection in different scenarios across the population and at justifying utilisation of statistical learning methods in mitigation the additional variance imposed by the process. The experimental data set collected extends a wide variety of statistical properties, as evident in estimates of experimental density functions depicted in Figure 3. The predictors (features) in the experimental data set were mostly uncorrelated, as evident from the correlation matrix in Figure 4.
The proposed loose-fitted smartphone procedure is more contextually oriented, compared with the traditional approach of the tight physical fitting. As a result, the experimental results (readings) encompassed a larger variance, due to: (i) loose fitting (approximate equality   Table 2. The RF Mobility Classification Model reached better accuracy of more than 97% and improved the Kappa parameter value. Its confusion matrix contains just four instances of the bus-tram misclassification, apparently in the very similar dynamical conditions. Walk is identified correctly. Table 3 outlines the overall statistics (model performance parameters) for the two competing models extending the best performance.
The z-statistics yields the statistical significance of the accuracy difference between the SVM and the RF models at p = 0.02811 (α = 0.05). classification model development. With a range of their kernel function, the SVM allows for fine-tuning of the models developed, and yields well-behaving models [28]. The Random Forest (RF) approach tackles easily a largevariance data, encompassing well the extended variance. Additionally, the RF models are not prone to overfitting [28]. Both approaches were exploited, with the resampling procedure deployed in both cases.
The Support Vector Machine with the Radial Kernel Function (SVM) model performed as depicted in Table 1. While providing good accuracy and fair Kappa parameter, the SVM model struggles in recognition of the transport means in low-level dynamics conditions, as seen from the confusion matrix. The SVM model failed in recognition of tram travel and even walk, during the low-speed and    was demonstrated. In comparison of various classification models approaches examined, it was found that the Random Forest Individual Mobility Classification Model is the best performer, with the development and performance assessment conducted with the tailored software, developed in the R environment for statistical computing. The RF Individual Mobility Classification Model has returned the 95% confidence interval of 0.0305 ± 0.0290, for the N = 135 testing sub-set of the sparse and unbalanced original data. The methodology proposed and models developed here form a foundation for expanding the crowdsourcing efforts in experimental data collection for the individual mobility classification based on the widespread set of scenarios of smartphone usage while travelling. The general nation-wide, regional, or city-wide mobility estimation will emerge from information fusion of the individual mobility classification models. The intention is to pursue this research with assemblage of the loosely attached smartphone sensor observation database with different mobility scenarios and a wider range of individual mobility means (transportation modes) involved and with examination of potentials for utilisation of refined statistical learning-based classification model development methods.

Conclusion
The problem of general mobility estimation is addressed here with introduction of an inexpensive method for individual mobility data collection using wide-spread utilisation of commercial-grade smartphones. With the loose smartphone attachment to user in a common pose of a smartphone usage, the correspondence of the all the reference frames involved was accomplished. This accomplishment allows for utilisation of the smartphone inertial sensor readings for the individual mobility classification. The use of inertial sensor readings establishes a foundation for anonymised mobility data collection, thus allowing for sustained privacy of individual users. Additionally, the proposed methodology does not rely upon the high-precision absolute position determination, often unavailable in a number of high-population scenarios (indoors, city centres) where the individual mobility assessment is particularly needed.
The acceptable extent of additional variance in data resulting from the loose-attachment approach was shown and the effectiveness of the statistical learning approach in the Individual Mobility Classification Model development under the circumstances of the enlarged variance in data