USING DATA ON BIKE-SHARING SYSTEM USER STOPOVERS IN SMART TOURISM: A CASE STUDY

Bike-sharing systems are an important element in development of the smart cities and datasets from these systems are one of the ways to obtain large amount of information on bicycle traffic. These usually contain data on the origin and destination of each trip, as well as its time and duration. Alongside the basic data, some operators also provide information on the exact route picked by each user. This allows researchers to study stopovers, which may serve as a source of interesting information on human behaviour in public spaces and, as a consequence, help improve its analysis and design. However, using the raw data may lead to important errors because most stops occur in the vicinity of bike stations or are related to traffic problems, as evidenced by the case study of Cracow. The data filtering method proposed below opens up the possibility for using such datasets for further research on bike user behaviour and public spaces.

bicycle traffic, bike-sharing, transport geography, stopover behaviours use of classical traffic measurement methods. Thanks to bike-sharing datasets, it is now possible to perform a more accurate modelling of the bicycle traffic and examine the individual elements that shape the demand for such services. Over the last few years, a number of studies have already focused on analysing the factors that affect the traffic within the framework of city bike systems. These surveys usually aim to identify potential locations for new stations, estimate traffic flows and bike use, and include social and demographic variables, as well as data on spatial organization.
In contrast, this article focuses on analysing datasets from a single city bike system to look at the issue of stopovers; no previous study of this kind was identified in the literature review process. The bike-sharing systems are an interesting source of data on human behaviour in public spaces, which can be useful for spatial analysis and design, including in terms of developing bike infrastructure. Analysis in question is focused on a case study of Cracow. Once the dataset obtained from the city bike system was cleansed, user stopovers were visualized to create a map of local-related stop concentrations, which largely overlapped with rental stations. Analysing data in this form would be ineffective; a method of filtering data on city bike user stopovers had to be developed, which is the main objective of this article, to help understand the bike-sharing system stopovers behaviour and obtain more precious data to delimit attractive public spaces in the city. The visualization of the final data sample demonstrated the effectiveness of the method and enabled further analysis, which may prove

Introduction
The advent of the bicycle forever changed the way we travel and increased our freedom. The growth of bike traffic was only halted by the rise of the automotive industry; however, because of their many advantages, bicycles are once again becoming popular as means for the daily commute. The vision of "smart cities" rests on the central idea that investment in social capital, technology and infrastructure should fuel the growth of the city and continually improve the quality of life for its residents. One of the pillars of the "smart" philosophy is to provide modern and integrated transportation, the pivotal element of which is development of the bike-sharing systems, which have been growing in importance in recent years, not only in the largest metropole, but in the medium-sized towns and even the smallest municipalities, as well [1]. Better understanding of users of the bike-sharing systems can help in smart tourist management in urban areas. Cities around the world are increasingly recognizing the smart tourism city concept and related strategies as means of optimizing sustainable environments. Particularly for cities facing emerging issues of residents' negative perceptions towards tourism, smart tourism city empowers a city to rise to this challenge by creating urban spaces that residents and visitors can enjoy together [2].
Apart from becoming a convenient means of the lastmile transportation, city bikes have become a valuable source of information on trips taken by hundreds of thousands of cyclists, a data pool impossible to collect with value for understanding the human transport behaviour [7]. Until not so long ago, exact figures on bike demand were extremely difficult to come by because of insufficient data; today, however, we observe a rapid rise in availability of relevant statistics. The new, exhaustive datasets that have come to complement classical methods rely on automated meters or publicly available GPS traces [8]. One way to obtain large quantities of data on bike traffic today is to analyse records from the short-term bike-sharing systems [9]. This data source meets the definition criteria for big datasets and has enormous potential for study of urban dynamics and aggregate human behaviour [10]. Recently, bicycle sharing has been becoming increasingly popular around the world. As a potential travel mode for both the "first mile" and "last mile" transportation solutions, bicycle sharing is usually used at the beginning or end of the trip chain and play an important role in bridging the gap among existing transportation networks and is useful for recreation and tourism-related activities [11].
The Global Positioning System (GPS) has been gaining importance for travel surveys since the 1990s. While it is successfully used to collect accurate information about travelled routes and travel times, only little is known on extracting added information like transport modes and trip purposes [12]. City bike systems store multiple records of trips by thousands of cyclists, who, by using the service, consent to the collection of their GPS traces, along with data on each individual trip. This allows improvements to accuracy of the bike traffic modelling, as well as in studying the factors that shape the demand for the city bike system services.
Compared to other public travel modes, the bicycle sharing travel data usually include the starting or ending point of the original trip chains, which is more useful for the land-use analysis [11]. Over the last few years, a number of studies have been devoted to analysing the factors that affect traffic within the framework of city bike systems. These surveys are usually aimed at identifying potential locations for new stations and estimating traffic flows and bike use and include social and demographic variables, data on spatial organization (such as population and job density), as well as topological and meteorological parameters for all the proposed spots [13][14][15]. Data from the city bike systems have also been used to study the systems already in place [16], e.g. in Turin, Italy to estimation of passenger transport flows for development of the transport models in urban contexts [17]. Such studies look at how, for instance, bike trips are affected by factors such as: the number of retail stores and business offices near bike stations, various demographic features, job density, type of buildings and spatial organization, purpose of buildings in the area, impact of meteorological factors, the closest bike infrastructure, or the type of available transport solutions [13,[18][19][20][21][22][23][24][25][26]. The spatio-temporal usage patterns of dockless bike-sharing service, linking to metro stations, have been analysed e.g. in Shanghai, China [27]. Using the data from the bike-sharing in Beijing, China, Wang et. al. applied the geographic weighted regression model to useful for urban planners and sociologists, allowing them to better understand cyclists' behaviour and user preferences regarding areas in which their stopovers occur.
The paper is organised as follows. In the next section, the background to city bike systems, the data collected and their usage are introduced. In section 3, a generic method is introduced to identify and filter the stopovers in mobility traces and apply it to the city bike data. In section 4, the method is illustrated using the example of Cracow. Finally, in section 5, the results are synthesized and the potential applications and limitations of the proposed method are discussed.

Literature review
With acceleration of urbanization, a number of cities face challenges to design and develop a better city for living [2]. The concept of the smart city is to optimize infrastructures in order to ideally ensure quality of citizens' life: transportation, water and power supply, waste management, IT connectivity, efficient urban mobility, e-governance and citizen participation, [3]. Cycling has strategic importance for the sustainable development of cities and has become one of the fundamental parts of the urban mobility strategies. Use of bikes as a transportation solution in urban and/or tourist contexts is universally recognized as positive due to the lack of polluting emissions, the reduction of traffic congestion and improvement of users' health [4]. Transportation with accommodation, gastronomy, attraction and ancillary service is one of the essential components of the smart tourism [2]. There are six main tourist-related elements of the smart cities: smart mobility, smart government, smart economy, smart people, smart living and smart environment [5]. Presence of an effective bike sharing service, being part of a smart mobility, can indeed make a city more attractive and easier to visit, strongly motivating tourists to choose it as a holiday destination. Bike trips can also become an integral part of the tourist experience, even when implemented to connect specific points of interest to the city center [4].
The growth in bike traffic has attracted the attention of researchers and spawned multiple studies of its various aspects. The bicycle traffic has long been appreciated for its health benefits. However, it has important advantages not only for health and physical fitness, but also for the natural environment. Choosing the bike as a means of transportation may be dictated by various individual and environmental factors, including age and gender [6]. Because of its seasonal nature, randomness of choice and dependence on weather conditions, however, bike traffic is far from easy to forecast and model. Reliable studies on the subject are also difficult to carry out. The most extensive data on pedestrian and bicycle traffic usually come from Complex Traffic Studies, conducted on a local and regional level in individual households. However, the most popular way to collect bike traffic data continues to consist of counting cyclists in the field, despite its rather limited where a technical problem was discovered soon after the bike was rented or the user decided not to continue with the trip, were removed, [33]. The final sample was subsequently studied to analyse the remaining stopovers. The stopover is a set of consequent travel segments characterized by zero value of covered distance: where TS idle is a set of all the segments within the trip that have zero travel distance; TS k is the k-th trip segment of the journey track; N TS is the total number of trip segments. The first step was to identify locations where users started their trips, which did not always overlap with the bike rental stations, since, in Cracow, the operator allowed users to leave the bike anywhere within the city for an extra charge. The corresponding heatmap, however, shows that almost all the trips began at bike stations. First, the number of stops within a radius of 1 to 20 m from the trip origin was studied and it was looked at percentage drops in the number of stopovers in the vicinity of the origin site to identify the point of stability, beyond which the number could be described by a linear function with the coefficient of determination R 2 equal to 99%. The data thus filtered then underwent another round of filtering.
The second stage served to eliminate stops related to traffic problems. In order to include stopovers by cyclists who used bicycle lanes and sidewalks, as well as those cycling on public roads, it was decided to focus on areas around pedestrian crossings and traffic lights. This stage relied on vector data from the Open Street Map, which included information on their geographical coordinates. The analysis focused on stopovers that occurred within a radius of 5 to 50 meters from a pedestrian crossing. Once again, percentage drops were studied to identify the point of stability, but manual corrections had to be introduced at this stage. A number of control points were examined at which long waiting lines of bicycles were identified at pedestrian crossings and adjusted the radius accordingly so as to discard data from such spots. The same approach was adopted at the third stage regarding the railway crossings.
The last step focused on stopover duration. It was decided to discard very short stops, which could be related e.g. to temporary stopovers at the intersections of uncontrolled neighbourhood roads that did not have a pedestrian crossing (and which, for this reason, had not been eliminated at the second filtering stage). A fixed time threshold was calibrated at several known locations where cyclists typically stopped and then adjusted to effectively filter out the traffic-related stopovers. The data sample obtained after the last filtering stage were also visualized as a heatmap.
In addition, after every stage of filtering factor of filtering effectiveness was checked. It is described by formula: carry out a spatiotemporal characteristic analysis of the relationship between the bike-sharing usage in railwaystation service areas and its determinants, including the passenger flow in stations, land use, bus lines and roadnetwork characteristics [28]. However, the literature review has not identified any in-depth analyses of city bike user stopovers to identify and delimit attractive public spaces. Identified attractive spaces were usually based on arbitrary expert knowledge. Other traditional methods include data from accommodation providers and guest surveys, which are time consuming and expensive. Data can also be taken from official guides, but these fail to adapt to the rapidly-changing tastes of tourists and their actual preferences [29]. Today, everyone leaves digital footprints on the internet, which can be used as data what is essential of smart tourism. The literature contains works that aimed to identify tourist hotspots based on user activity on recommendation websites and social media such as Flickr or Twitter [29][30][31]. The similar idea is followed in this paper and explore how the bike-sharing system data may help in revealing spatial patterns of touristic cities. Mobility traces from public bikes, were broadly employed in transport and spatial analyses, yet, so far, have not been used to identify tourist hotspots [32].

Methodology
This article presents a data filtering method, which allows the dataset to be trimmed down to guarantee that it only includes user stopovers that were not related to traffic problems or technical activities involved in locking and unlocking the bike at the rental station or checking its technical condition before the trip. Dataset obtained after filtering process can be used for further analysis of attractive public spaces.
The dataset first needs to be cleansed in order to discard data, like for instance records where location tracking failed or the bike was rented only to be returned moments later due to, e.g. a technical glitch. The datasets from the GPS transmitters are stored in GPX-files (GPX is an XML schema designed as a common GPS data format). To read the data in these GPX files, a Python code was developed that presents it as a set of trip parameters: the trip identification number, the number of segments, trip duration, idle times, total distance and mean velocity. Accordingly, the dataset obtained from the city bike system was first cleansed of corrupt records related to signal failures in the GPS transmitters. The trip data were eliminated if the GPS outage lasted at least five minutes. After the first filtering stage, the sample continued to contain many trips with an average speed of 0 km/h. Therefore, it is also decided to eliminate all the trips with a duration or distance of 0, most likely related to situations where a bike was unlocked, but not taken out of the stand, and then locked again, e.g. because of a technical problem. At the last stage, the records that contained information on very short trips, which could correspond to situations

Case study: stopovers in the bike-sharing system of Cracow
The filtering method was tested using the example of Cracow, in a case study based on datasets obtained from the local bike-sharing system, the first of this kind in Poland. Known as "Wavelo", the system was established in 2008 and, in different guises, continued in place until 2019. The records in question covered one week of the high tourist season in 2017, i.e. the period between 31 May and 7 June. According to data provided by the Institute of Meteorology and Water Management, weather conditions at the time were auspicious for bike traffic and recreation ( Table 1). The sample represented the total population of city bike users in Cracow over the analysed period and where x i is a number of all the stopovers after the i-th stage of the filtering procedure; x i is the total number of stopovers obtained after the final stage of filtering. Factor F allows the numerical assessment of results obtained at the corresponding stage of filtering. To sum up, the adopted methodology of data filtering allows for obtaining information about stops of the city bike users from the raw data set, which: § were not related to renting a bike at a station and checking its technical condition, § were not stops related to the road traffic obstructions, e.g. in vicinity of pedestrian crossings, railway crossings and traffic lights, § twere not short stops related to the difficulties in general traffic.  The third stage, which involved filtering out short trips, identified 421 trips with a distance shorter than 50 m. Once these were eliminated, the final sample consisted of 27,927 routes. The number of stopovers in the cleansed sample was 54,143, with a mean duration of 79.17 seconds. A heatmap was judged sufficient for analysis, containing a total of 34,969 tracks. The data on routes was obtained from the GPS transmitters attached to every Wavelo bike; apart from the origin and destination of each trip, the devices also recorded its itinerary. The data on each trip were presented as a list of points, with specific locations (geographical latitude and longitude) and readout times.
The first step was to cleanse the dataset provided by the city bike system. First, all the data corrupted by the After the first filtering stage, the sample included all the stopovers that occurred at a distance of more than 7 m from the trip origin. The second step studied stops in the vicinity of pedestrian crossings, analysing those within a radius of 50 m, spaced at 5 m intervals ( Table 3).
The table indicates that the drop in number of stopovers beyond radius r stabilized at 15 m. The number of stops beyond the radius of 5 to 50 m could be described by a linear function with R 2 ≈ 0.96; beyond the radius of 15 to 50 m, however, the coefficient was even higher and equalled R 2 ≈ 0.99. A histogram showing the number of stops beyond radius r (from 5 to 50 m) is presented in Figure 4.
However, an analysis performed at selected control points, i.e. pedestrian crossings situated along main bike traffic corridors, showed that the 15 m radius was not sufficient to eliminate all the stops related to the presence in Figure 1 shows the stopovers included in the cleansed dataset before filtering.
The map indicates that most stopovers concentrated around the city bike stations, i.e. the origin and destination of each trip, which are shown in Figure 2.
In accordance with the adopted methodology, the first filtering stage discarded all stopovers in the vicinity of the trip origin. The stopovers within a radius of 20 m, spaced at intervals of 1 m were considered ( Table 2).
As evidenced by the table, the drop in the number of stopovers beyond radius r stabilizes at a distance of 7 meters. The number of stops beyond the radius of 1 to 20 m can be described by a linear function with R 2 ≈ 0.94; beyond the radius of 7 to 20 m, the coefficient is even higher and equals: R 2 ≈ 0.99. A histogram showing the number of stops beyond radius r (from 1 to 20 m) is presented in Figure 3.

Figure 3 The number of stops at a radius greater than r from the trip origin
radius r is presented in Figure 5.
After the third filtering stage, the sample thus contained stopovers that occurred at a distance of more than 7 m from the trip origin, 30 m from a pedestrian crossing and 30 m from a railway crossing. An analysis performed at control points, however, revealed that the data still included some traffic-related stops, e.g. those on uncontrolled neighbourhood roads without a pedestrian crossing (which had not been eliminated at the previous filtering stages). The fourth stage thus focused on stopover duration. The span of 30 seconds was confirmed at selected control points as a reliable cut-off point for stopovers unrelated to traffic. A table was then drawn up to illustrate the number of stopovers and their duration (Table 5).
A histogram showing stopovers longer than the minimum duration T is presented in Figure 6. of pedestrian crossings, since cyclists tended to cluster in long lines at such spots. To discard all such data, a 30-m cut-off radius had to be adopted for the purposes of the study. After the second filtering stage, the sample thus included all the stopovers that occurred at a distance of more than 7 m from the trip origin and 30 m from pedestrian crossings. The third step was to discard all the data from the vicinity of railway crossings. To do so, the stopovers that occurred within a radius of 50 m, spaced at 5 m intervals, were studied (Table 4).
In this case, the number of stopovers beyond radius r remained nearly constant at distances of 30,35,40,45 and 50 m, which means it was easy to determine the upper distance limit. This cut-off radius was also confirmed at control points, i.e. cyclists tended to cluster in lines of max.
from a pedestrian crossing, 30 m from a railway crossing and lasted more than 30 seconds. The number of stopovers in the filtered sample was 5,791, with a mean duration of The fourth stage was the last in the filtering process. The final sample contained all the stopovers that occurred at a distance of more than 7 m from the trip origin, 30 m    focused mainly on number and the spatial structure of such trips, looking into the various factors that affect the use of city bikes. In contrast, this study draws attention to possibility of employing data on the user stopovers for purposes of designing the bike infrastructure; it could also help to improve understanding of urban sociology, determine the patterns of urban mobility and identify the sites that are attractive for the city residents and tourists. This, in turn, may help municipal decision-makers to assess the potential of various public spaces and plan the urban development accordingly.
The proposed method has helped to use the new kind of data source to delimit attractive public spaces. The analysed example of Cracow showed that a large proportion of stopovers occurred in places widely recognized as tourist attractions, including the Vistula Boulevards and the Main Market Square. The data, however, would not have been useful without the proposed data filtering method, which discarded all the traffic-related stopovers from the sample. Using the raw data sample could have led to wrong conclusions, since for instance 82% of stopovers were connected with the bike-sharing stations. The proposed method allows to use the bike-sharing big datasets in the process of identifying the attractive public spaces. The proposed method is generally applicable to any city where the city bike data are available.
As the bike-sharing system in Cracow, Poland, case study shows, big data analytics is a technology with the potential to develop smart city services. These new data largely contribute to understanding the consumption of space within the urban tourist destinations and therefore enable to differentiate the overcrowded places from those with the potential to grow. This allows the decision-makers to imagine new ways of planning and managing towards a sustainable "smart" future. a little over 6 minutes; only 25% of all the stops, however, were longer than 5 minutes 35 seconds. Table 6 shows the basic sample statistics obtained after each filtering stage. The largest drop in the number of stopovers was recorded after the first step, which involved eliminating all those within a radius of 7 m from the trip origin. Further decreases were less steep, but a clear relationship could still be observed between an increase in the mean duration and the lower number of stopovers in the sample.
The stopovers that remained after the filtering process were visualized in the form of a heatmap shown in Figure 7.
Analysis of the obtained data shows that the concentration of the city bike users stopovers takes place in areas indicated by guidebooks as the most attractive in Cracow, i.e. in the Old Town area -around the Main Market Square, Old Jewish District Kazimierz and the Vistula Boulevards. A visual analysis of the heatmap shows that the areas of Blonia Common Green, the canoeing track and the monastery in Tyniec are also attractive for bike-sharing system users.
At the end of each filtering stage the filtering effectiveness factor F was checked ( Figure 8). It shows that only 11% of stopovers in raw sample were not connected with the rental stations or traffic. The most visible difference is observed after the first stage -60% of stopovers were connected with recreation or services (not with rental or traffic). After the second stage it was 92% and finally, after the whole procedure all the stopovers were not related to rental or traffic.

Conclusion
The article contributes to a better understanding and use of data on the city bike travel. Research thus far has