Skip to main content

Intersection monitoring from video using 3D reconstruction

Researchers Yuting Yang, Camillo Taylor and Daniel Lee have developed a system to turn surveillance cameras into traffic counters. Traffic information can be collected from existing inexpensive roadside cameras but extracting it often entails manual work or costly commercial software. Against this background the Delaware Valley Regional Planning Commission (DVRPC) was looking for an efficient and user-friendly solution to extract traffic information from videos captured from road intersections.
March 9, 2016 Read time: 7 mins
Table 1 Accuracy of the various counting methods.

Researchers Yuting Yang, Camillo Taylor and Daniel Lee have developed a system to turn surveillance cameras into traffic counters.

Traffic information can be collected from existing inexpensive roadside cameras but extracting it often entails manual work or costly commercial software. Against this background the Delaware Valley Regional Planning Commission (DVRPC) was looking for an efficient and user-friendly solution to extract traffic information from videos captured from road intersections.

We proposed a method that tracks and counts vehicles in the video and to use the tracking information to compute a 3D model for the vehicles and visualise the 2D road situation into 3D. The 3D model can provide feedback on the tracking and counting model for future research.

The proposed method aims to solve two tracking and counting difficulties when working from video. Firstly, unlike normal highway traffic vehicles approaching an intersection can go straight ahead or turn in either direction, which will make our system much less constrained and predictable. The second is the perspective deformation in the image, caused by the low position of camera and distortion, which creates difficulty in recognising true relative distance on road and in reconstructing the 3D model.

To deal with the variety of movement, the system was trained to predict the vehicle’s movement in the next few frames dependent on its current position and velocity. This system, known as a classifier, then categorises each vehicle depending whether it is predicted to go straight, turn left or turn right and is based on the assumption that the vehicles use the dedicated lanes for turning or going straight. This classification makes it easier to separate adjacent vehicles with different driving directions, and can help improve the performance of tracking and counting.

As no camera information was provided with the video, solving the perspective deformation problem required manually calibrating the camera. There is a need to calibrate the principal point position in the x and y direction and the radial distortion parameter, and then adjust these parameters to make the parallel lines in the image intersect at a same point (the vanishing point). The parallel lines are drawn according to the road markings. By calibrating the camera and acquiring the intersection of the parallel lines, it became a simple geometric problem to project a nadir view – that is the equivalent to viewing the scene from directly above. The nadir view in Figure 1 is not a perfect real world restoration as the height of the vehicles was not taken into account, but is done to calculate same-scale distance across the image as a threshold for vehicle tracking. Since the vehicle's height is low compared with the camera's height, this approximation was good enough for the initial stage, leaving the vehicle’s height to be considered in the 3D reconstruction.

 For tracking and counting, we established a feature-based tracking system that uses a Kanada-Lucas-Tomasi tracker to track ‘feature points’ detected in the image. Each feature point’s relative distance was then rectified according to the nadir view and grouped into individual vehicles according to properties such as proximity, distance, color, speed and direction. The 3D reconstruction part is then added.

In effect the vehicle’s 3D position is approximated by the 3D position of the feature points that represent the vehicle and is calculated on the assumption that the vehicle is a rigid body. This means the distance between each pair of points on the vehicle does not vary and therefore the orientation between the two is in accordance with the direction of the vehicle. The actual data is acquired by solving an optimisation problem in which the variables are the 3D positions of the feature points, and the objective is to minimise the reprojection error between the points' real position on image and their estimated position on the image. In this system, the optimisation is directly solved using MATLAB built-in function lsqnonlin.

One issue for structure from motion method is that it can only solve for the point positions up to a scale factor – the inability to differentiate between 1m cubic 1m from the camera and a 2m cubic 2m from the camera. To overcome this we needed to arbitrarily assign a factor for the set of points. By assuming that the vehicle is above ground but not flying, this can be done by scaling the result until its lowest point reaches zero (that is the wheels are touching the ground).

Since we calculate the 3D positions of certain feature points, the vehicle is represented as a bounding box moving in the calculated direction.

This method has been tested on six different 10-second video clips covering different trajectories, and the accumulative accuracy in counting the vehicles is shown in Table 1. To differentiate the contribution each part of the method contribution to the overall performance, some subsystems were removed and the results compared with the original findings. Table 1 shows the results of the original method in the blue column; results in the red column are without camera calibration; and the green column shows the results where the grouping algorithm has been replaced with a strategy that grouped nearby points together.

In the original method, over-counted situations occur mostly in extreme cases such as a large truck with relatively few feature points which the system sometimes mistakenly interpreted as two or more vehicles. Missed vehicles are mostly due to occlusion (blocked or overlapping vehicles).

The results were not badly affected by removing the camera calibration as this is mostly needed for 3D reconstruction. However, without such calibration there can be difficulty in determining a distance threshold to group nearby feature points as the ratio between pixel and real world distance varies across the image. As the grouping technique was able to differentiate nearby vehicles travelling in different directions (straight ahead/turn left/turn right), the distance threshold was set at a high level as feature points at some distance from each other could belong to a same vehicle. Setting a high threshold reduces the over-counted vehicles, but slightly increases the number of missed vehicles. This is because the large threshold caused the system to recognise several vehicles as one.

Removing the grouping technique had a far greater impact on the correct counting of vehicles. An increase in over-counted vehicles was caused by the lack of an algorithm to combine small groups of feature points. While the increase in missed vehicles was only a small, the actual result is worse in per-frame demonstration. This is because without the grouping algorithm, adjacent vehicles travelling in different directions could not be differentiated until they were physically separated, which is less efficient.

As this technique only requires video information from the cameras with the computing work carried out at the control centre, we believe this system could be added to the feed from other existing cameras with a minimum of manual setup. However, as the speed of the system is constrained by the 3D reconstruction process, the system can only do off-line ‘snapshots’. Manual camera calibration may not be needed when the user has the camera’s precise location and height.

The training process will be needed at each new intersection because in order to predict which direction the vehicle will be moving, we need to first manually give some ground truth values to the classifier. This enables it to ‘learn’ the correlation between a vehicle’s current speed/position and to judge the possible trajectory.

Currently the system is being used to show how the road situation would look in 3D, not to facilitate the counting system. It is planned to feed the 3D vehicle reconstruction information back into the counting system and information such as whether this vehicle has reasonable size and position will help indicate if the counting is reliable. With a known camera position, the 3D model can be used to predict occlusion and devise a way to detect masked vehicles.

Having shown the proposed method can count vehicles and compute their 3D position using video from cameras installed at road intersection after initial training, we plan to feedback the 3D model into the counting system to increase accuracy.

ABOUT THE AUTHORS:
Yuting Yang is a graduate of the University of Pennsylvania, Camillo Taylor and Daniel Lee are professors at the University.

Related Content

  • Opening the closed-loop to realise ITS benefits
    April 8, 2014
    Jim Leslie, manager of ITS applications engineering at the Econolite Group looks at practical steps in transitioning from closed-loop masters to a centralised ATMS. Not many years ago the standard method of coordinating signalised intersections in local areas was to install an on-street master – each of which monitored and controlled a limited number of signal controllers or intersections as a closed-loop system. And, to a certain extent, each closed-loop system was autonomous from others deployed by the ag
  • Felix Scheuter, of Haenni Instruments, on effective highway weight enforcement
    September 26, 2013
    Felix Scheuter, managing director at Haenni Instruments, the renowned Switzerland-based mobile scales manufacturer, gives World Highways his views on how best to ensure effective highway weight enforcement The main danger for any road is its gradual destruction by overloaded heavy goods vehicles (HGVs). The more frequently such vehicles use a highway, the faster it is destroyed. Mobile patrol teams using mobile weighing scales are a highly effective way to enforce weight limits aimed at protecting ro
  • Foundation funds research for informed campaigning
    April 29, 2015
    ITS International talks to Professor Stephen Glaister, director of the transport research and lobbying organisation, the RAC Foundation. It is through the eyes of an economist that Professor Stephen Glaister, emeritus professor of transport and infrastructure at Imperial College London and director of the RAC Foundation, views current and future transport problems. Having spent 30 years at the London School of Economics and another 10 at Imperial, the move to the RAC Foundation was a radical departure from
  • Building the case for photo enforcement
    October 26, 2016
    As red light enforcement is returning to some intersections and being shut down at others, new evidence has been released backing the safety campaigners, reports Jon Masters. In 2014, 709 Americans were killed in red-light-running crashes and an estimated 126,000 were injured according to the Insurance Institute for Highway Safety (IIHS).