APPLICATION OF UNIVARIATE TIME SERIES THEORY TO PASSENGER DEMAND FORECASTING APPLICATION OF UNIVARIATE TIME SERIES THEORY TO PASSENGER DEMAND FORECASTING

passenger demand properties respecting, modification by Cipra [10] described calendar adjustment procedures. The output of the calendar adjustment process was fully homogeneous time series of passenger (carried per school reduced fare) demand for suburban bus transport { Q ( t ); 1 (cid:15) t (cid:15) 96}. At first there were by subjective methods identified and later by objective methods properly confirmed – constant trend, monthly additive seasonality of Q ( t ) time series in pre-forecasting analyses [9]. The models presented in this paper respect these properties completely. The methods, which are used for the purpose of passenger demand forecasting by Slovak transportation companies at the present time, are considerably simplified, and what is more, they are not already considered to be accurate. These limitations might be caused by insuffi-cient research in this area over last years. Purpose of this paper is to identify a statistical model of passenger demand for suburban bus transport which satisfies the statistical significance of its parameters and randomness of its residuals. Three different methodologies – exponential smoothing, multiple linear regression and autoregressive models were used in order to identify more accurate and reliable statistical model compared with nowadays used ones.


Introduction
Statistical modelling and forecasting of passenger demand by using univariate time series theory is probably one of the most common forecasting methods used for work with periodic time series data. This methodology has been successfully applied in the sphere of urban transport [1,2] and in recently published models of passenger (carried per school reduced [3,4] and normal fare [3,5,6]) demand for suburban bus transport. The main goal of this paper is to introduce method of the statistical modelling of passenger (carried per school reduced fare) demand by using univariate time series theory which appears to be more accurate and reliable alternative to automated forecasting procedures published in the literature [3]. In accordance with the main goal of the paper there was designed a statistical model which is suitable for short-term (forecast horizon h Յ 1 year) forecasting of passenger (carried per school reduced fare) demand for suburb bus transport in Zilina region. The most of analyses, modelling and forecasting procedures of the time series mentioned in this paper were worked out by using SAS LE 4.1 [7] and SAS 9.3.1 [8] software.

Materials and Methods
Properties of the used data, methods of its analysis, modelling and testing are briefly described in this section.

Properties and Adjustments of Input Data
Input data of experiments presented in the paper were counts of carried pupils and students collected by the cooperating carrier. These values were aggregated by summing so that an output of the aggregation process was monthly time series of passenger demand carried per school reduced fare {Q p (t); 1 Յ t Յ 96} (for period of months 1/2000-12/2007) in the Zilina Region.
Values in such a manner designed time series Q P (t) were considered to be spatially and substantially homogeneous as the carrier had changed neither his geographic scope nor transportation technology in the range affecting substantial and spatial aspects of the analysed time series within the specified period of months. "Trading day effects" were eliminated by own [9], passenger demand properties respecting, modification by Cipra [10] described calendar adjustment procedures. The output of the calendar adjustment process was fully homogeneous time series of passenger (carried per school reduced fare) demand for suburban bus transport {Q(t); 1 Յ t Յ 96}.
At first there were by subjective methods identified and later by objective methods properly confirmed -constant trend, monthly additive seasonality of Q(t) time series in pre-forecasting analyses [9]. The models presented in this paper respect these properties completely.

Methods
Multiple regression, exponential smoothing and autoregressive models were used in order to statistical modelling of Q(t) time series. The seasonal exponential smoothing model (method A) was developed and fitted by using exponential smoothing methodology. Smoothing state at time t ϭ 0 of the model was obtained by Chatfield's backcasting method [11]. Smoothing weights (level α, sea-sonal δ) were determined so as to minimize the sum of squared one-step-ahead prediction errors: . (1) Multiple regression was used in combination with Box-Jenkins methodology. The multiple regression (constant term with seasonal dummies) model combined with an autoregressive process of order p ϭ 1 (AR(1) -method B) was used for the first time and then in the case of the multiple regression (constant term with seasonal dummies) model combined with an autoregressive/moving average process (ARMA (1,1) -method C). There were used practices and principles of linear stochastic models designing [10,12] in the process of developing and fitting of Q(t) time series models by using Box-Jenkins methodology. Applying this methodology were designed three autoregressive integrated moving average models of seasonal time series (ARIMA(1,0,1)(0,1,1) 12 -method D, ARIMA(1,0,1) (2,1,0) 12 -method E and ARIMA(1,0,1)(1,1,0) 12 -method F)all without intercept parameter.
The statistical models presented in the paper were tested for compliance with the requirements imposed on mutual linear independence, stationarity and the normality of probability distribution of their standardized residuals (ε t ϭ 1, …, 96). Mutual linear independence of models ε t was tested by Bartlett´s test for autocorrelation [13] and Ljung-Box's χ 2 statistics [14]. Stationarity of the residual components was evaluated by augmented Dickey-Fuller's tests (ADF tests) [15] and Dickey-Fuller's unit root tests of seasonal time series (SDF tests) [16]. Normality of the standardized residuals probability distribution was tested by Shapiro-Wilk's (SW) [17] and by D'Agostino [18], Prins [19] and Filiben [13] described Kolmogorov-Smirnov's (K-S), Anderson-Darling's (A-D) and Cramér von Mises's (C-M) tests. Statistical significance of estimated parameters of the models was tested by Student's t-test [20]. These

Empirical results
The outputs of the forecasting procedures (analyses, modelling, testing) presented in the paper are goodness-of-fit statistics (Tab. 1), outputs of the randomness tests (Tab. 2) as well as evaluation of statistical significance of model parameters (Tab. 3). According to high volume of available outputs of computations they are presented only in considerably reduced form in the paper. Full outputs for all models including estimates of model parameters and their statistical significance evaluation, goodness-of-fit statistics, point and interval forecasts are part of dissertation thesis [9] and in the case of model estimated by using method E also in Perner's contacts [5].
In order to measure how well different models (methods A-F) fit the data there was computed traditional (root mean square error -RMSE, mean absolute percent error -MAPE) and penalty (Akaike's information criterion -AIC [21], Schwarz Bayesian information criterion -SBIC [22]) as well as extrapolational (MAPE 3 , MAPE 12 ) goodness-of-fit statistics. Computed values of these measures see Tab. 1.
Based on the results of the tests for mutual linear independence, stationarity, normality of probability distribution and statistical significance of estimated parameters of the models seems the method E as the only one suitable for forecasting (ex-post, ex-ante) of Q(t). The model (2) estimated by the method E showed very well fitting ability for actual data by its forecasts compared with other ones. Estimated values of its parameters with standard errors and outputs of their statistical significance tests (see Tab. 3). Graphical output of modelling and forecasting by using the method E (see Fig. 1) where estimated values are expressed by smooth curve and empirical values by black points. The graphical interpretation of the actual (empirical) and forecasted values show that this model accurately describes the variability of empirical values of Q(t). This fact is also supported by low levels of residuals of the model (displayed by the bar diagram in Fig. 1).
It was objectively proved that it is possible to reduce the confidence interval (3) around the estimator Q(t); t ϭ n ϩ 1 , …, n ϩ h (at the confidence level of 0.95), from Ϯ200 to Ϯ16 thous. passengers carried, compared with outputs of computations published by Konečný [3].
where: L 95t is lower limit of the confidence interval, U 95t is upper limit of the confidence interval, 1Ϫα is given probability, called confidence level of the interval, Q(t) is estimated value of passenger demand.
More detailed comparison of forecasting abilities and statistical properties of the method presented in the paper with statistical model designed by Konečný [3] in view of goodness-of-fit statistics inaccessibility was not possible. It is obvious that the increase of statistical model (method E) reliability defined by the reduced confidence interval (3) is also the attendant phenomenon of its increasing interpolation accuracy. Note: Statistical tests provided "+"-satisfactory "-"-unsatisfactory "+/-"-boundary (satisfactory) results.

Conclusion
Outputs of the statistical tests of standardized residuals randomness and the values of goodness-of-fit statistics proved that the autoregressive integrated moving average model of seasonal time series ARIMA(1,0,1)(2,1,0) 12 without intercept parameter (method E) fulfils the requirements for statistical significance of its parameters, and what is more, mutual linear independence, stationarity and normality of probability distribution of its standardized residuals. The model presented in the paper is also because of these facts very good alternative to nonperiodic passengers demand time series forecasting methodologies [23,24] and moreover provides more detailed monthly multi-step ahead forecasts. This model with respect to cross-regional differences cannot be considered as universally applicable throughout the Slovak Republic, but only in the Zilina region.
ARIMA(1,0,1)(2,1,0) 12 without intercept parameter presented in this paper despite the abovementioned restriction represents more reliable and more accurate passenger demand forecasting method in comparison with up to this time used ones. The attendant phenomenon of application in the paper described model in relevant transport company management is the reduction of manager's decisions uncertainty, and what is more, it can result in increase of company´s revenues.