RE-AGGREGATION HEURISTICS FOR THE LARGE LOCATION PROBLEMS WITH LEXICOGRAPHIC MINIMAX OBJECTIVE RE-AGGREGATION HEURISTICS FOR THE LARGE LOCATION PROBLEMS WITH LEXICOGRAPHIC MINIMAX OBJECTIVE

is organised as follows: section 2 briefly introduces the data processing procedure. In section 3 we summarise the lexicographic minimax problem. The re-aggregation heuristics is explained in section 4. Results of numerical experiments are reported in section 5. We conclude in section 6. The OpenStreetMap (OSM) provides all necessary data to generate DP locations and to extract the road network. To estimate the position of demand points we use OSM layers describing positions of buildings, roads, residential, industrial and commercial areas. To generate DPs we use a simple procedure. We propose a new heuristic algorithm that provides solutions to the discrete lexicographic minimax location problem. The algorithm is applicable to large instances of the problem. The lexicographic minimax location problem is known to be NP-hard. Therefore, the large instances of the problem are not computable in reasonable time. An aggregation is a valuable tool that allows to adjust the size of the problem and approximate the problem by another one that can be solved. An inevitable consequence of aggregation is the loss of the precision. Typically, an aggregation method is used only once, in the initial phase of the solving process. Here, we propose iterative re-aggregating approach which adapts aggregated problem to achieve more precise location of facilities. To test the efficiency of the proposed method, we compute large location problems reaching 67 000 aggregated customers. The proposed approach is versatile and can be used for large range of location problems.


Introduction
A location problem consists of finding a suitable set of facility locations from where services could be efficiently distributed to customers [1, 2 and 3]. Many location problems are known to be NP-hard. Consequently, the ability of algorithms to compute the optimal solution quickly decreases as the problem size is growing. There are two basic approaches how to deal with this difficulty. First approach is to use a heuristic method, which, however, does not guarantee that we find the optimal solution. Second approach is to use the aggregation that lowers the number of customers and candidate locations. The aggregated location problem (ALP) can be solved by exact methods or by heuristics. Aggregation, however, induces various types of errors. There is a strong stream of literature studying aggregation methods and corresponding errors [4 and 5]. Various sources of aggregation errors and approaches to minimise them are discussed by [4, 6 and 7].
Here, we are specifically interested in finding the fair design of a public service system that is serving spatially large geographical area with many customers. Customers are modelled by a set of demand points (DP) representing their spatial locations [5]. To include all possible locations of customers as DPs is often impossible and also unnecessary. In similar situations the aggregation is a valuable tool to obtain ALP of computable size.
It is well known that the solution provided by a heuristic method using more detailed data is often better than a solution achieved by the exact method, when solving aggregated problem [8 and 9]. Often, aggregation is used only in the initial phase of the solving process, to match the problem size with the performance of the used solving method. In this paper, we propose simple re-aggregation heuristics, where the solved problem is modified to minimise the aggregation error in the following iterations. The aim of this article is to assess whether re-aggregation approach can be used for solving large location problem with lexicographic minimax objective. Our results show that the re-aggregation may provide better solutions than the solution obtained by the exact or heuristic method, when the aggregation is used only once on the large location problems.
The paper is organised as follows: section 2 briefly introduces the data processing procedure. In section 3 we summarise the lexicographic minimax problem. The re-aggregation heuristics is explained in section 4. Results of numerical experiments are reported in section 5. We conclude in section 6.

Data model
The OpenStreetMap (OSM) provides all necessary data to generate DP locations and to extract the road network. To estimate the position of demand points we use OSM layers describing positions of buildings, roads, residential, industrial and commercial areas. To generate DPs we use a simple procedure. and the assigned facility is exactly D k . The component B k is a value defined as B b k j j Jk = ! / , which denotes the number of individual customers situated in the subset J k . If a set is empty, then the associated value B k is zero.
The lexicographically minimal solution in the set is a solution that corresponds to the lexicographically minimal vector [B 1 , B 2 , ..., B k max ] [12].
To solve this problem we use the algorithm A-LEX [13] which similarly to the algorithm [11] solves optimisation problems in stages corresponding to individual distance values. Correctness and finiteness of the algorithm A-LEX, including the optimality conditions, are in more details analysed in reference [13]. Alternatively, there is also another fairness criterion known as a proportional fairness. Solutions provided with respect to this criterion tend to achieve higher system efficiency, for more details see reference [14].

Re-aggregation heuristics
In this section we describe our re-aggregation approach. The main goal is to re-aggregate solved problem in each iteration, to achieve more precise locations of facilities in the following iterations. Aggregation is an essential part of the heuristics and it leads to locations errors [4 and 5]. To minimise the effect of aggregation errors we need to understand the possible sources of errors. Therefore, we start by a brief summary of known sources of aggregation errors that are related to the input data. These sources of errors are denoted as A, B, C and D in the literature.

Aggregation errors
Aggregation errors are caused by the loss of information when DPs are replaced by aggregated demand points (ADP). In [7] are introduced A, B and C source errors. Elimination of the source A and B errors was studied in [6]. Minimisation of the source C error was analysed in [15]. Source D error and possibilities how it can be minimised were studied in [16]. Source errors A and B can be eliminated by pre-processing the input data, while source errors C and D can be eliminated by post-processing the results. We summarise source errors in Table 1.
Some errors are also often made by designers or decision makers who are preparing the input data or evaluating the aggregation errors. Examples of such errors are: use of uniform demand distribution, aggregation method that ignores population clusters, or incorrect methods used to measure the aggregation error [4].
First, we generate a spatial grid which consists of uniform square cells with a size of 100 meters. For each cell, we extract elements from the OSM layers that are situated inside each cell. Second, DPs are located as centroids of the cells with a non-empty content. Third, generated DPs are connected to the road network and we compute the shortest paths distances between them. Finally, we calculate Voronoi diagrams, while using DP as generating points, and we associate with each DP a demand by intersecting Voronoi polygons with residential population grids produced by [10].

Problem formulation
We consider the set of potential locations of facilities I and the set of customers J. Each customer j ! J is characterised by a unique geographical position and by an integer weight b j . The weight b j represents the number of individual customers situated in the location j. The decisions to be made can be represented by a set of binary variables. The variable y i equals to 1 if the location i ! I is used as a facility location and 0 otherwise. Allocation decisions are modelled by variables x ij for i ! I and j ! J, where x ij = 0 if location is serving the customer and otherwise. In order to obtain a feasible solution, the decision variables have to satisfy the following set of constraints: x 1 where the equation (1) specifies that the number of located facilities equals to p. The constraints (2) make sure that each customer is assigned to exactly one facility, and the constraints (3) allow to assign a customer only to the located facilities. Following reference [11], we use the symbol Q to denote the set of all feasible location patterns, which satisfy the constraints (1)-(5).
By the symbol d ij we denote the distance from customer j ! J to the potential facility location i ! I. We identify all unique distance values D k , k = 1, ..., k max in the set of all feasible distance values d ij , where k max is the number of unique d ij values and we order the set of D k values into the descending sequence. Thus D 1 denotes the maximal possible distance between a customer and a facility. Each feasible solution in the set Q can be associated with a sequence of subsets [J 1 , J 2 , ..., J k max ] and with a vector [B 1 , B 2 , ..., B k max ]. Each customer j ! J is assigned to one of the subsets J 1 , J 2 , ..., J k max . The distance between customers in the set J k B by pre-processing the input data. Therefore, we eliminate only source errors C and D. Re-aggregation algorithm is composed of phases that are executed in the following order:

Phase 0: Initialisation:
Set i=1 and prepare ALP, of size corresponding to the reduction coefficient 1 a^h by aggregating the input data and compute the distance matrix.

Phase 1: Location of facilities:
Solve ALP using A-LEX algorithm. As a result, we obtain p located facilities.

Phase 2: Elimination of source C and D errors:
• To minimise the source C errors reallocate DPs to the closest facilities. • To minimise the source D errors decompose the problem into p location problems each consisting of one facility location and of all associated DPs. Solution of each problem is given by locating one facility that minimises maximal distance to the associated DPs. As a result we obtain p new locations of facilities.

Phase 3: Re-aggregation:
If every facility is established in a DP that cannot be further de-aggregated or if i imax 2 then terminate. Output the best

Aggregation method
To aggregate DP, we use a row-column aggregation method proposed by [8 and 17]. We slightly modified this approach by applying it to each individual administrative zone separately. This allows us to approximate population clusters more precisely and thus it helps to minimise the aggregation error.

Re-aggregation algorithm
In order to characterise the size of the aggregated problem, we define the relative reduction coefficient: Thus, for the unaggregated problem we recover the value 0 a = .
Used notation and main parameters of the algorithm are described in Table 2.
Implementation of the algorithm A-LEX requires that the distance matrix has zero values on the diagonal [13]. This limitation does not allow us to fully eliminate source errors A and Types of source errors. Table 1 Error type Description Elimination This error is a result of wrongly estimated distance between ADPs a and b, when measuring the distance only between corresponding centroids.
Replace the distance by the sum of distances from all DPs aggregated in the ADP a to the centroid of ADP b.
source B error It is a specific case of source A error. If ADP a is a candidate location for a facility, and at the same time it represents a customer, the distance between facility location a and customer a is incorrectly set to zero value.
Replace the zero distance by the sum of all distances from DPs aggregated in the ADP a to the centroid of the ADP a.
source C error All DPs aggregated in the same ADP are assigned to the same facility.
Re-aggregate ADPs and find the closest facility for all DPs.

D error
It is consequence of establishing facilities in ADPs and not in DPs.
Find the facility location by disaggregating ADPs in the close neighbourhood of located facilities.
Notations used in the description of the re-aggregation algorithm.  , where, f () is the value of the maximum distance from an unaggregated DPs to the closest facility; xa is the optimal solution of the ALP problem that corresponds to the relative reduction a and y is the solution provided by our re-aggregation algorithm. Thus, x0 denotes the optimal solution of the original (unaggregated) problem. Further, we define the relative difference between two solutions as: where, g() is the value of gini coefficient computed by taking all distances from unaggregated DPs to their closest facility. The gini coefficient is commonly used in the economics to gauge the level of equity and [11] use it to evaluate fairness of the facility location. We denote the distance from the customer j to the closest facility j in the solution x as the u j and the mean of the all u j values as the n . For the set (u j ,j=1,2,...,n), the gini coefficient G, may be estimated by a sample mean The gini coefficient is used as a measure of inequality, because a sample where the only non-zero value is whereas G = 0 when all data points have the same value. Finally, using similar notation, we define the relative time efficiency d as: , where t() is the time spent by computing the solution.
It is important to note that the aim of this paper is to test the performance of the re-aggregation heuristic and not to investigate in details the impact of the parameter values.

Medium sized location problems
First, we start by evaluating the quality of solutions that are provided by our algorithm by comparing them with the solutions that are provided directly by the algorithm A-LEX. In Table 4, we present results where the initial size of ALP is

Results
To evaluate the proposed heuristics, we analyse the optimality error and the computational time consumed by the heuristics, when it is applied to three real geographical areas. More details about geographical areas are given in Table 3.
Basic information about the selected geographical areas that constitute our benchmarks. To quantify the importance of the elimination of the source errors in phase 2 we formulate two different versions, denoting them as V1(including phases 0,1 and 3) and V2 (including phases 0, 1, 2 and 3).
We start by investigating the performance of the re-aggregation algorithms using benchmarks Partizanske and Kosice that can be also directly solved by the algorithm A-LEX. To evaluate the efficiency of the proposed heuristics, when it is applied to extremely large problems, we use the benchmark Zilina. This benchmark is too large to be solved directly by the algorithm A-LEX.

Performance analysis
We investigate the relation between the quality of the solution and the computational time that is consumed by individual phases of the re-aggregation algorithm by means of numerical experiments.
Because we need to compare the degree of fairness for two solutions, to evaluate an objective function value is not possible. Therefore, adopting the formulation described in [4], we define as a proxy measure the relative error MAX T comparing the equity of two solutions: problem does not allow to store the complete distance matrix in the computer memory. This leads to larger computational times and makes impossible comparison of the computational time between small and larger problem instances.

Large location problem
Here, we compute the large location problem Zilina using the versions of the re-aggregation algorithm V1 and V2. In the experiments, we fixed the parameter ax m a to value that corresponds to the 85%, 4 m = and 0 f = . Furthermore, we do not apply the parameter i max .
In these experiments, we also investigate how the solutions are improving and what is the computational time when we are increasing the number of iterations. The size of the Zilina problem does not allow us to compute the solution by using the algorithm A-LEX without aggregating the data. Therefore, instead of the solution x 0 , we use in formulas (7) and (8) the solution provided by the algorithm A-LEX on the ALPs. We prepared three versions of problem Zilina corresponding to three values of reduction coefficient a : 95%, 90% and 85% and we denote corresponding solutions as: x 95 , x 90 and x 85 Here, we also used the initialization phase 0.
The results summarised in Table 5 confirm that versions V1 and V2 result in equitable service systems for the unaggregated problem Zilina. Version V1 can find with using only 1 368 ADPs a solution where the maximum distance of customer to the closest area within the radius of 1 kilometre from each located facility is de-aggregated to m new ADPs (see phase 3 of the algorithm).
In most of the cases, our heuristic algorithm finds the solution with the same maximum distance to the closest facility as is the solution x 0 provided by the algorithm A-LEX on the unaggregated problem. Also the gini coefficient is similar to the solution x 0 , thus the level of equity is also very similar. It is important to note that the reduction of the original problem is between 89% -78%, which is significant value.
For the cases when our heuristic algorithm performs slightly worse than algorithm A-LEX (p = 10 for Kosice and p = 5,10 for Partizanske). When has impact on the quality of the solution. The price for this extra operation is drop in the reduction coefficient to the value 82% -45% and the computational time is now larger, but it is still smaller than in the case of the algorithm A-LEX, except for case Kosice and p=20. We observe that the time efficiency of the heuristic algorithm is more significant for smaller values of p.
Based on the results we can conclude that heuristics is able to provide equally fair solution using only 11%-25% DPs of the original problem and with better time efficiency.
In the next subsection, we present results obtained for large benchmarks derived from the problem instance Zilina. Here, in contrast to small problems, we compute the shortest path distances during the computational process while doing the re-aggregations. If we have large computer memory it is not necessary but it has to be done always, when the size of the  Our heuristic algorithm provides solutions with smaller maximum distance (about 3%-7%) and better gini coefficient (about 1%-4%) with using only 3% of DPs in the case of a very large lexicographic minimax problem when it is compared to the classical method with a fixed aggregated problem with 5%, 10% and 15% of the DPs. This approach allows us to find high quality solutions of very large location problems, which are not computable to optimality.
We have shown that the re-aggregation approach may bring significant improvements of the solution also for a very large lexicographic minimax problem. Both versions of the algorithm V1 and V2 provide solutions with very good level of fairness. It is also important to underline that to our knowledge, there is no previous study exploring the role of the aggregation when solving the lexicographic minimax problem. We demonstrated that better understanding of aggregation errors, in this case, can help to provide more appropriated solutions with relatively large reduction coefficient a .
It is also important to note that this is an initial study of the proposed approach and additional investigations using also other types of location problems will be done in the near future.

Acknowledgements
This work was supported by the research grants VEGA 1/0339/13 Advanced microscopic modeling and complex data sources for designing spatially large public service systems and APVV-0760-11 Designing Fair Service Systems on Transportation Networks. facility is shorter by 2.2 kilometre compared to the solution that was obtained using 9 945 ADPs. In other words, it is possible to find about 4% better solutions for original unaggregated problem, by considering only 14% of the 9 945 customers and candidate locations.
The time complexity for version V2 is higher than for version V1. This is mainly due to the phase 2 where the distance matrices are computed. Version V2 provides equal solution than version V1 when it comes to the value MAX T , however, the value of the gini coefficient is significantly smaller.

Conclusions
When a location problem is too large to be solved by the available solving method, the aggregation is commonly used to lower the size of the problem. Typically, solving methods do not re-adjust the input data and the aggregation is done at the beginning of the process and it is kept separated from the solving method. In this paper, we proposed a method which adapts the granularity of input data in each iteration of the solving process to aggregate less in areas where locating facilities and in the neighbourhood of the most distant DPs. The proposed method is versatile and it can be used for a wide range of location problems.
To test the proposed method we used large real-world problems derived from the geographical areas that consist of many municipalities. It is worth noting that it is not very common to work with such large problems in the location analysis. We found only two examples in the literature where problems with approximately 80 000 DPs were solved [18 and 19]. In contrast to our study, these references used only randomly generated benchmarks and, in addition, simpler location problem than lexicographic minimax is solved. i a^h represents the reduction coefficient in the iteration i and t is the computational time in hours.