A New Trafﬁc Congestion Prediction Model for Advanced Traveler Information and Management Systems

The increase of wasted time and pollution due to vehicular trafﬁc has paved the way to many different countermeasures, ranging from the enforcement of congestion tolls to the commercialization of vehicles powered by low emission hybrid engines. Advanced Traveler Information Systems (ATISs), which are capable of supplying updated trafﬁc information to all those citizens that are driving through city roads, represent a prominent approach to combat vehicular congestion. In brief, ATISs are concerned with collecting, processing and disseminating trafﬁc information, providing data that can be proﬁtably exploited by an on-board navigation system to compute the most convenient route to a given destination. Indeed, their role becomes progressively more relevant as their accuracy and reliability increases, thus encouraging more and more people to utilize them while driving. With this in mind, we devised a new congestion detection model, which accurately estimates and forecasts the short-term congestion state of a road, without requiring any prior knowledge regarding any of its parameters. Such model can be easily integrated within an ATIS and usefully applied to any given road. The efﬁcacy of our model is here proved through the results of several experiments, which witness the validity of our approach. Copyright


INTRODUCTION
Vehicular congestion in urban areas has steadily grown during the years to become one of the primary problems tackled by city administrators.In the United States, for example, statistics collected from its fifteen most populated cities show that the amount of time wasted by daily commuters has more than doubled in the past thirty years, reaching the value of more than fifty hours, on average, per year [1].In developing countries, as China for example, this trend is even more dramatic, as the rate at which motor vehicle ownership increases is orders of magnitude higher than the rate at which new roads are being constructed [2].The constant increase of vehicular congestion in many cities, throughout the world, has led to the devise of a plethora of different approaches that aim at stopping and, possibly, inverting such example, aid city traffic managers in taking operative decisions concerning the diversion of traffic flows, when congestion levels in given areas exceed certain pre-defined thresholds.An efficient ATIS infrastructure could also offer a valuable tool to detect or even foresee accident occurrences, for example alerting when traffic very rapidly builds up, thus indicating the happening of a non-recurrent event (e.g., an accident), or when traffic flows stop following given known patterns (e.g., vehicles entering a one-way street in the wrong direction), thus predicting a dangerous event might take place.
Clearly, the successful deployment of the abovementioned systems heavily depends on the reliability with which congestion estimates and forecasts are performed.Wrong estimates and forecasts, in fact, could divert traffic flows towards congested areas, thus prolonging the time required to reach a given destination.
While the vehicular congestion phenomenon has been thoroughly studied for highway scenarios, where identifying traffic conditions is typically simple since a limited number of factors (e.g., incoming traffic flows, obstacles, etc.) contribute to a given traffic situation that may be found at any given time (possible causes of congestion are marked with flames in the leftmost part of Figure 2), the same cannot usually be said for urban scenarios [12,13].In fact, in urban areas several can be the underlying causes of traffic jams (rightmost part of Figure 2), which can hence more hastily appear and disappear and consequently be more difficult to detect and predict.Hence, the initial step that must be undertaken in engineering an ATIS that may prove to be effective in urban areas is the devise of an operative definition that clearly indicates when vehicular congestion appears on any given road segment.Indeed, many different definitions have been presented in literature, but, to the best of our knowledge, none provides results that are not prone to interpretation, while, at the same time, being independent from specific road parameters.For this reason, we propose a new traffic congestion definition, which draws its inspiration from a relevant line of research that has studied how the capacity and the bandwidth occupied by an Internet connection could be efficiently determined by observing the end-to-end delay of IP packets [14,15].
Our definition starts from the observation that any two cars that traverse the same street will probably experience the same congested traffic conditions, assuming that they do traverse it not too far away in time one from the other.This is due to the inertia of vehicular queues, which causes a road to remain congested for a given period equal to, say S, even if at a certain point in time the ingress flow to that road rapidly drops.Hence, such observation allows us to provide the definition of congestion, which is as follows: congestion is a road's state, that lasts for at least S units of time, during which travel times exceed the time T * normally incurred under light or free-flow travel conditions.
Our aim, in this paper, is to introduce a new congestion detection and short-term forecasting algorithm, which based on the congestion definition mentioned before, can be easily put to good use within ATISs.
The strength of our algorithm is that it can provide its results without requiring any prior knowledge regarding a road.
In fact, our algorithm is able to return short-term congestion forecasts by simply processing the information collected by the PND units of vehicles.In order to verify the validity of our approach, we devised a wide set of experiments that led us to drive for over 450 miles on urban sections of Los Angeles (CA) and Pisa (Italy) roads.This paper is organized as follows.In Section 2 we survey the schemes that fall closest to ours in recognizing congested from non-congested roads.In Section 3 we briefly describe the model underlying our algorithm and verified its robustness in Section 4, whereas we demonstrate how it can be beneficially employed within an ATIS in Section 5.The results of our experimental assessment are provided in Section 6. Section 7, finally, concludes our paper.

RELATED WORK
Although a wealth of work in the area of congestion detection and forecasting algorithms exist, for the sake of brevity, we here only survey three different approaches that most easily can be integrated with an ATIS [13,14,15,16,17,18,19,20,21,22].
The first one, known as Surface Street Traffic Estimation, was introduced to recognize congestion situations on streets that are controlled by traffic lights [20].Succinctly, according to this approach, vehicles are considered to undergo a congestion event if they step in one of the two situations that follow that decelerate their flow.In fact, vehicles could experience delays by either moving in a stop and go pattern, or by simply standing in queue for a full red light cycle or more.The shortcoming of this approach is that it lacks of any short-term forecasting power and, to the best of our knowledge, it has only been defined and tested on streets that end at signalized intersections, thus leaving out of its scope and applicability a wide class of urban streets.
The second approach we here tested has been drawn from the Highway Capacity Manual (HCM) delay formula for signalized intersections [21].This formula computes the average traversal time THCM a vehicle experiences on a given a street which varies with: the street length, speed limit and capacity (as a function of its number of lanes and its traffic light phases length), and the average amount of cars that enter during a given time span.THCM returns useful information to recognize when a given section of road is congested or not, when computed setting the ingress traffic volume equal to the road's capacity.Although very interesting, such approach is limited by the fact that it depends upon prior information, such as given road parameters, which hence jeopardize its ability to adapt to new situations.To conclude with the third proposal, we refer to the service provided by Google Traffic, as a practical service of common use providing drivers with traffic information before they begin their journey (through the web) and while driving, through navigation units that are installed on their cellphones [22].Although such system is gaining an increasing role in the management of urban platoons of vehicles, little information has been so far published on how it really works and on how it actually detects a congested state on a given road.Not only, it generally lacks of any information concerning the trend of the traffic situation on a given road in the short term, thus omitting one of the pieces of the puzzle that most may influence the routing decisions taken by drivers.All this said, from the point of view of a user, color is assigned to each road, depending on the average speed that has been measured on it and on its speed limit.For highways, for example, green represents an average speed that exceeds 80 km/h, yellow an average speed that lies between 40 and 80 km/h, while red less than 40 km/h.
For the sake of completeness, we listed the main characteristics (pitfalls) of the schemes discussed in this Section in Table I.
Differently from any of the proposals that have been here described, we devised an algorithm that can contemporarily detect and forecast, in the short term, congestion on any given road, not needing any prior information regarding any of its parameters.

DETECTING AND FORECASTING TRAFFIC CONGESTION: A NEW APPROACH
Based on the traffic congestion definition anticipated in Section 1, we now provide a method to compute the congestion threshold T * , as well as the time S for which congested or non-congested conditions last on any given road R. The scheme that computes these values is as follows.
We consider a road R as congested if it is possible to find a value of T * for which, when a vehicle spends at least time T * on it, the majority of subsequent cars (e.g., 80%) that later enter R (say within a time span S) still spend at least T * units of time to leave it.If, instead, only a low percentage of the cars that later entered that road experienced a traversal time above T * units of time (e.g., 20%), this entails that R is leaving a state of congestion.
In a similar manner, R is non-congested when, if a car takes less than T * units of time to leave it, the majority of vehicles that later enter it (e.g., again 80%) still require less than that time.Obviously, in the case that only a low ratio of vehicles that later entered that road experienced a traversal time below T * (e.g., 20%), this would mean that R is transitioning into a congested state.The 80% value is drawn from the literature, but it may depend on the specific road under consideration [24].
It is now our duty to translate the abovementioned considerations into a more formal modeling setting which will then allow us to derive our congestion detection and short-term forecasting algorithm.
To this aim, we now introduce the four sets of vehicles: a) HC(T * 1 ), the number of pairs of vehicles which suffer of high congestion, b) N 1(T * 1 ), the number of pairs of vehicles which are leaving a congested situation, c) N C(T * 2 ) the number of pairs of vehicles which do not suffer of congestion, and, finally, d) N 2(T * 2 ) the number of pairs of vehicles which are entering a congestion state.Specifically: Definition A (High Congestion Set).Consider a group P of vehicles entering a street R, with the first vehicle of the group entering R at a given time and the last one entering R S units of time later than the first one.HC(T * 1 ) is defined as the set of all the pairs of vehicles, (i, j), in P for which both their traversal times, say T * i and T * j , exceed the congestion threshold T * 1 .We also define as N 1(T * 1 ) the set of all the pairs of vehicles, say (h, k), in P for which the traversal time T * h of only the first vehicle h exceeds T * 1 .Definition B (Low Congestion Set).Take the same group of cars P entering R within a time span of the same length as before.N C(T * 2 ) is defined to be the set of all the pairs of cars, say (i, j), in P for which both their traversal times, say T * i and T * j , are below the congestion threshold is the set of all the pairs of cars, say (h, k), in P for which the traversal time T * h of only the first vehicle h is below T * 2 .We need now to measure the size of the four aforementioned sets.To this aim we provide an indicator function for each of the four sets of vehicles of interest, as specified in Tables II, III, IV and V.The meaning of these indicator functions is clear, as each of these functions may be exploited to check if a given pair of vehicles i and j are in one of the four given congestion states (high congestion, no congestion, leaving, entering).For example, if two given vehicles i and j both experience a delay exceeding the congestion threshold T * 1 , the IHC(i, j) indicator function will be equal to 1.It is now possible to count the ratio between the number of pairs of vehicles that traversed a given road, all taking more than T * 1 , over the total number of pairs of vehicles that traversed that road with the first vehicle experiencing a traversal time higher than T * 1 , as this value, when high, indicates a state of stable congestion for a given road R (left term of the top equation of Table VI).

Entering Congestion State
I N 2(T * 2 ) : (P × P ) → {0, 1}, where: Similarly, the left term of the equation at the bottom of Table VI gives the ratio between the amount of pairs of vehicles that traversed a given road taking less than T * 2 units of time, over the total number of pairs where the first vehicle took a time less than T * 2 , as this value, when high, indicates a state of no congestion at all.With the tools provided so far, we can verify whether a given road R is congested (or not), for a given period S, simply by finding the values of T * 1 and T * 2 for which the formulas of Table VI are satisfied.Road R is congested if: (i,j)∈P ×P I HC(T * 1 ) (i,j) (i,j)∈P ×P I HC(T * 1 ) (i,j)+ (i,j)∈P ×P I N1(T * 1 ) (i,j) × 100% ≥ 80%.
To this aim we devised an algorithm which is based on the idea of finding the values of the thresholds T * 1 and T * 2 which maximize the size of the HC(T * 1 ) and N C(T * 2 ) sets (congested and non-congested states, respectively) and minimize the size of the N 1(T * 1 ) and N 2(T * 2 ) sets (noisy states), while avoiding the situations where this clustering activity provides spurious results due to mathematical inconsistencies.This can be performed solving the optimization problem described in Table VII.

ASSESSING THE METHODOLOGY OF THE STUDY
To assess the validity of our methodology, we have input to the equation of Table VII one hundred pairs of vehicles at a time, observing the congestion thresholds it returned as the balance between congested and non-congested pairs of vehicles changed.
In essence, we computed (T * 1 , T * 2 ) for a given road when the number of pairs of vehicles that fell into its congested and non-congested sets were as follows (N C identifies the set of pairs of traversal times both below the threshold of 50     VII, as a function of the previously defined combinations.Observing T * we verify that it is equal to 50 when the average value of the Objective Function reached its maximum, and there was an equal balance between the number of pairs of vehicles that fell in the two sets (50-50).Remarkably, even when the number of pairs of vehicles that fell into the two sets became rather unbalanced, 90 congested vs. 10 non-congested and vice versa, the Equation of Table VII found a T * value that was near to the correct one.We also developed two further experiments where, keeping fixed the number of pairs of traversal times corresponding to congested and non-congested vehicles (50-50), in the first we gradually increased the number of pairs of traversal times corresponding to vehicles that fell in the N 2 set, while in the second we gradually increased the number of pairs of vehicles that fell in both of the noisy sets (N 1 and N 2).In the first experiment, we expected that our algorithm would compute a threshold of T * = 50 seconds till the point that the amount of introduced noise became too large.This is what occurred, as reported in Figure 4, where, again, we plotted both the average T * values (seconds) along with the average of the Objective Function.In the second experiment, instead, we found that, although the value of the Objective Function steeply decreased as the number of noisy samples increased (Figure 5), the balance between the noisy samples prevented T * from departing from the value of 50 seconds.

DEPLOYMENT WITHIN AN ATIS
We can put the model above to good use within an ATIS implementation adopting the following procedure.
Initially, a central entity collects traversal time data concerning a given road R as returned by a set of vehicles that act as traffic probes.On receiving traversal time samples from vehicles, this entity keeps adding them to an internal data structure (line 2, Table VIII) until a sufficient number of samples have been collected and that road has been observed for a sufficient time length, for example for half a day (lines 3 and 4).At the end of this initial process, the entity stops collecting information concerning the street (line 5) and builds its picture of the congestion states characterizing that given street, using the CTDF() function (line 6), CTDF() being the implementation of the formula of Table VII.If any time later, be it one hour or one month, a vehicle traverses that given road exceeding the computed congestion threshold T * 1 , the entity can exploit this information to, for example, send a message alerting all those vehicles that are approaching its area (line 11).Now, it is the turn to describe how the CTDF() function works (Table IX).As already said, it implements the mechanism described in Table VII.Namely, it searches for the values T * 1 and T * 2 that contemporarily maximize the size of the high congestion set HC(T * 1 ) and the size of the no congestion set N C(T * 2 ), while minimizing the size of the remaining two sets N 1(T * 1 ) and N 2(T * 2 ) (lines 2 and 5).After computing the size of the two sets HC(T * 1 ) and N C(T * 2 ), a check is performed to evaluate whether they satisfy the conditions given in Table VI (i.e., the 80% check).If so, the function ends successfully, returning the values of S, T * 1 and T * 2 .Unfortunately, the checks could fail because a too large duration S for the state of congestion of interest was chosen.This would mean that the following holds for many pairs of subsequent cars that traverse a given road: the congested (or non congested) state a first vehicle incurs in does not last in time, as a second vehicle does not find the same state any longer.However, this problem could solely concern the value of the period S that we have chosen.In fact, a smaller value of S could exist, in principle, for which both of the subsequent cars that compose a vehicle pair incur in the same state of congestion.The idea is hence that of looking for such value, by gradually reducing the value of S until a situation is captured where both the subsequent vehicles of the pair experience a similar state of congestion (or no congestion).This motivates the iterative structure of the CTDF() function.As a final note, it is important to consider that our experiments show that the difference between T * 1 and T * 2 is always confined within a 3% value difference.This is reasonable and largely expected, and justifies the fact that from now on we will only use a unique congestion threshold value T * , obtained as T * = T * 1 ≈ T * 2 .

TEST-BED RESULTS
We carried out a set of nine different experiments in 2008 and 2009 with a fleet of cars driving through traffic to verify the effectiveness of our mobility congestion detection and forecasting algorithm.Eight of these experiments were run in Los Angeles, CA, while one in Pisa, Italy.All the main information concerning these roads is listed in Table X (name, section, length, free flow traversal time, full and green traffic light cycles and number of traversals performed) [23,24].Each vehicle carried an onboard system consisting of a laptop, a GPS receiver and an EVDO interface.Upon each traversal of a given road section R a car sent its traversal time to an ATIS, which, in turn, computed an estimate of T * when a sufficient amount of data was available.Our results are briefly described in the following Subsection.seen in Figure 6), thus proving cars almost always enjoy a smooth drive, due to the existence of a green wave.Second, the very small values of N (Table XI) confirm that no stable congestion was visible over those streets.Finally, street # 9 requires a different discussion.A high value of N and a small value of H seem to reveal a stable high congestion state (Table XI).Despite this fact, T is greater than T * (as seen in Figure 6).This paradox can be explained observing that as the traffic light at the junction with Sepulveda Blvd.permits to turn right on red, only very rarely our cars stood waiting for a full red light time.To prove this was the correct explanation, we performed a few more laps with cars going straight at that intersection.As expected, during such laps the value of T * always exceeded that of T .
A direct comparison of T * and THCM shows that the two methodologies, interestingly, return converging results on most roads.In fact, roads 5, 6, 8 and 9 give almost perfect matches (as seen in Figure 6).This fact can be easily explained observing that those are the roads where the most accurate information regarding relevant road parameters, as the peak road capacities, provided by the Los Angeles Department of Transportation, was available [25].When, instead, such type of information was not available and consequently default values taken from the Highway Capacity Manual were used, we observe that the two thresholds can substantially differ (roads 1, 2 and 4).As a final and additional experiment, Figure 7 reports the state of road 3, as provided by Google Traffic, during our experiments on the field.In particular, the three different colors (white, gray and black) corresponded to three different traffic states (non-congested, mildly congested and congested) that Google Traffic reported at that time.To note also is the fact that the average speed below which that road was considered as congested by Google Traffic was 22 km/h.
Interestingly, comparing the results provided by Google Traffic to ours, we observed a rather good matching, both in terms of congestion threshold speeds (16 km/h) and in terms of the expected duration of congestion events.In fact, we observe in Figure 7 that periods of congestion (or no congestion), alternate, lasting approximately fifteen to eighteen minutes each, while our results revealed that congested and non-congested states last for at least sixteen minutes (S = 987 seconds), as depicted in Figure 7.
In conclusion, our algorithm was able to meet our expectations during our experiments, detecting when congestion occurred and estimating its minimum persistence in time.As such, and thanks to its simplicity, we believe it is the ideal candidate to be integrated in modern ATISs.

CONCLUSION
We here presented an intuitive general-purpose traffic congestion detection and short-term forecasting scheme, which has been validated on a real test-bed, driving for over 450 miles along urban streets.The central part of our contribution lies in the proposal of a novel definition of congestion, where a street is defined as congested only in those cases where it has a high chance of remaining in that state for a given amount of time in the near future.This has allowed us to turn

7 DOI: 10 Figure 3 .
Figure 3. Congestion Threshold and Objective Function, as a function of the number of pairs of vehicles that fall into the congested and uncongested sets, respectively.

Figure 4 .
Figure 4. Congestion Threshold and Objective Function, as a function of the number of pairs of vehicles that fall into the entering congestion set.

Figure 5 .
Figure 5. Congestion Threshold and Objective Function, as a function of the number of pairs of vehicles that fall into the entering congestion and leaving congestion sets, respectively.

Figure 7 .
Figure 7. Google Traffic results for the road section of experiment # 3.

Table I .
Congestion detection tools: characteristics.

Table II .
Congestion state indicator functions: High Congestion.

Table III .
Congestion state indicator functions: Leaving Congestion.

Table IV .
Congestion state indicator functions: No Congestion.

Table V .
Congestion state indicator functions: Entering Congestion.

Table VI .
Congestion vs.No Congestion.

Table VII .
Congestion detection formula.

Table VIII .
ATIS Algorithm.Traversal time T of a vehicle that traverses a given road R. Input:

Table IX .
Congestion Threshold Detection Function.

Table X .
Experiment information: location, road section, road length, free flow traversal time, traffic light cycle time and green time and number traversals performed.

Table XI .
Results: congestion threshold T * , S, N and H.
T * [s] S [s] Prepared using wcmauth.clsour definition into an operative algorithm that can be easily integrated into modern ATISs, while resulting effective in providing relevant results of practical interest.