Skip to main content

Intersection monitoring from video using 3D reconstruction

Researchers Yuting Yang, Camillo Taylor and Daniel Lee have developed a system to turn surveillance cameras into traffic counters. Traffic information can be collected from existing inexpensive roadside cameras but extracting it often entails manual work or costly commercial software. Against this background the Delaware Valley Regional Planning Commission (DVRPC) was looking for an efficient and user-friendly solution to extract traffic information from videos captured from road intersections.
March 9, 2016 Read time: 7 mins
Table 1 Accuracy of the various counting methods.

Researchers Yuting Yang, Camillo Taylor and Daniel Lee have developed a system to turn surveillance cameras into traffic counters.

Traffic information can be collected from existing inexpensive roadside cameras but extracting it often entails manual work or costly commercial software. Against this background the Delaware Valley Regional Planning Commission (DVRPC) was looking for an efficient and user-friendly solution to extract traffic information from videos captured from road intersections.

We proposed a method that tracks and counts vehicles in the video and to use the tracking information to compute a 3D model for the vehicles and visualise the 2D road situation into 3D. The 3D model can provide feedback on the tracking and counting model for future research.

The proposed method aims to solve two tracking and counting difficulties when working from video. Firstly, unlike normal highway traffic vehicles approaching an intersection can go straight ahead or turn in either direction, which will make our system much less constrained and predictable. The second is the perspective deformation in the image, caused by the low position of camera and distortion, which creates difficulty in recognising true relative distance on road and in reconstructing the 3D model.

To deal with the variety of movement, the system was trained to predict the vehicle’s movement in the next few frames dependent on its current position and velocity. This system, known as a classifier, then categorises each vehicle depending whether it is predicted to go straight, turn left or turn right and is based on the assumption that the vehicles use the dedicated lanes for turning or going straight. This classification makes it easier to separate adjacent vehicles with different driving directions, and can help improve the performance of tracking and counting.

As no camera information was provided with the video, solving the perspective deformation problem required manually calibrating the camera. There is a need to calibrate the principal point position in the x and y direction and the radial distortion parameter, and then adjust these parameters to make the parallel lines in the image intersect at a same point (the vanishing point). The parallel lines are drawn according to the road markings. By calibrating the camera and acquiring the intersection of the parallel lines, it became a simple geometric problem to project a nadir view – that is the equivalent to viewing the scene from directly above. The nadir view in Figure 1 is not a perfect real world restoration as the height of the vehicles was not taken into account, but is done to calculate same-scale distance across the image as a threshold for vehicle tracking. Since the vehicle's height is low compared with the camera's height, this approximation was good enough for the initial stage, leaving the vehicle’s height to be considered in the 3D reconstruction.

 For tracking and counting, we established a feature-based tracking system that uses a Kanada-Lucas-Tomasi tracker to track ‘feature points’ detected in the image. Each feature point’s relative distance was then rectified according to the nadir view and grouped into individual vehicles according to properties such as proximity, distance, color, speed and direction. The 3D reconstruction part is then added.

In effect the vehicle’s 3D position is approximated by the 3D position of the feature points that represent the vehicle and is calculated on the assumption that the vehicle is a rigid body. This means the distance between each pair of points on the vehicle does not vary and therefore the orientation between the two is in accordance with the direction of the vehicle. The actual data is acquired by solving an optimisation problem in which the variables are the 3D positions of the feature points, and the objective is to minimise the reprojection error between the points' real position on image and their estimated position on the image. In this system, the optimisation is directly solved using MATLAB built-in function lsqnonlin.

One issue for structure from motion method is that it can only solve for the point positions up to a scale factor – the inability to differentiate between 1m cubic 1m from the camera and a 2m cubic 2m from the camera. To overcome this we needed to arbitrarily assign a factor for the set of points. By assuming that the vehicle is above ground but not flying, this can be done by scaling the result until its lowest point reaches zero (that is the wheels are touching the ground).

Since we calculate the 3D positions of certain feature points, the vehicle is represented as a bounding box moving in the calculated direction.

This method has been tested on six different 10-second video clips covering different trajectories, and the accumulative accuracy in counting the vehicles is shown in Table 1. To differentiate the contribution each part of the method contribution to the overall performance, some subsystems were removed and the results compared with the original findings. Table 1 shows the results of the original method in the blue column; results in the red column are without camera calibration; and the green column shows the results where the grouping algorithm has been replaced with a strategy that grouped nearby points together.

In the original method, over-counted situations occur mostly in extreme cases such as a large truck with relatively few feature points which the system sometimes mistakenly interpreted as two or more vehicles. Missed vehicles are mostly due to occlusion (blocked or overlapping vehicles).

The results were not badly affected by removing the camera calibration as this is mostly needed for 3D reconstruction. However, without such calibration there can be difficulty in determining a distance threshold to group nearby feature points as the ratio between pixel and real world distance varies across the image. As the grouping technique was able to differentiate nearby vehicles travelling in different directions (straight ahead/turn left/turn right), the distance threshold was set at a high level as feature points at some distance from each other could belong to a same vehicle. Setting a high threshold reduces the over-counted vehicles, but slightly increases the number of missed vehicles. This is because the large threshold caused the system to recognise several vehicles as one.

Removing the grouping technique had a far greater impact on the correct counting of vehicles. An increase in over-counted vehicles was caused by the lack of an algorithm to combine small groups of feature points. While the increase in missed vehicles was only a small, the actual result is worse in per-frame demonstration. This is because without the grouping algorithm, adjacent vehicles travelling in different directions could not be differentiated until they were physically separated, which is less efficient.

As this technique only requires video information from the cameras with the computing work carried out at the control centre, we believe this system could be added to the feed from other existing cameras with a minimum of manual setup. However, as the speed of the system is constrained by the 3D reconstruction process, the system can only do off-line ‘snapshots’. Manual camera calibration may not be needed when the user has the camera’s precise location and height.

The training process will be needed at each new intersection because in order to predict which direction the vehicle will be moving, we need to first manually give some ground truth values to the classifier. This enables it to ‘learn’ the correlation between a vehicle’s current speed/position and to judge the possible trajectory.

Currently the system is being used to show how the road situation would look in 3D, not to facilitate the counting system. It is planned to feed the 3D vehicle reconstruction information back into the counting system and information such as whether this vehicle has reasonable size and position will help indicate if the counting is reliable. With a known camera position, the 3D model can be used to predict occlusion and devise a way to detect masked vehicles.

Having shown the proposed method can count vehicles and compute their 3D position using video from cameras installed at road intersection after initial training, we plan to feedback the 3D model into the counting system to increase accuracy.

ABOUT THE AUTHORS:
Yuting Yang is a graduate of the University of Pennsylvania, Camillo Taylor and Daniel Lee are professors at the University.

Related Content

  • Road user charging - replacing the gas tax with a mileage based fee
    January 19, 2012
    Oregon Department of Transportation's James Whitty discusses his state's progress with VMT fee-based charging. Back in 2001, the state of Oregon stole a lead on the rest of the US when it decided to address the need to do something about the gas tax and its decreasing ability to fund highway construction and upkeep. Recognising that a dwindling pot of money could only shrink further as vehicles became more fuelefficient, Oregon's Legislative Assembly passed laws which led to the setting up, by the state's g
  • Radar reinforces detection efficiency
    March 16, 2016
    Radar can have distinct advantages in some transport-related situations as Colin Sowman found out during a visit to Navtech Radar. Despite tremendous advances in machine vision techniques, the accuracy and reliability of camera-based detection systems suffer during periods of poor visibility where other technologies may offer an alternative. Radar is one such technology. It too has seen significant development in recent years and according to Navtech Radar, the technology can often fulfil detection and moni
  • Workzone safety can be economically viable
    October 24, 2014
    David Crawford looks how workzone safety can be ‘economically viable’. Highway maintenance is one of the most dangerous construction industry occupations in Europe. Research from The Netherlands on fatal crashes indicates that the risk facing road workzone operatives is ‘significantly higher’ than that for the general construction workforce. A survey carried out by the Highways Agency, which runs the UK’s motorway and trunk road network, has suggested that 20% of road workers have suffered injuries from pa
  • Road space utilisation improves travel times, reduces costs
    February 1, 2012
    For major road works schemes, necessary lane closures are timed to minimise congestion, most frequently at night and on weekends when traffic is at its lightest. As a result, rigid timetables are used in planning, programming and implementing work. In the UK, to calculate the expected traffic demand through roads works, historic profiles from the loop-based MIDAS (Motorway Incident Detection Automatic Signalling) system were used. These provided a valuable indicator of anticipated traffic behaviour but were