Driving intelligence tests are critical to the development and deployment of autonomous vehicles. The prevailing approach tests autonomous vehicles in life-like simulations of the naturalistic driving environment. However, due to the high dimensionality of the environment and the rareness of safety-critical events, hundreds of millions of miles would be required to demonstrate the safety performance of autonomous vehicles, which is severely inefficient.
We discover that sparse but adversarial adjustments to the naturalistic driving environment, resulting in the naturalistic and adversarial driving environment, can significantly reduce the required test miles without loss of evaluation unbiasedness. By training the background vehicles to learn when to execute what adversarial maneuver, the proposed environment becomes an intelligent environment for driving intelligence testing. We demonstrate the effectiveness of the proposed environment in a highway-driving simulation. Comparing with the naturalistic driving environment, the proposed environment can accelerate the evaluation process by multiple orders of magnitude.
Autonomous vehicles (AVs) have attracted significant attention in recent years because of their potential to revolutionize transportation safety and mobility. One critical step in the development and deployment of AVs is to test and evaluate their driving intelligence, which indicates whether an AV can operate safely and efficiently without human intervention.
However, current testing procedures for human-driven vehicles, such as Federal Motor Vehicle Safety Standards (FMVSS)1 and ISO 26262, only regulate automobile safety-related components, systems, and design features, without consideration of driving intelligence in completing driving tasks.
To the best of the authors’ knowledge, to date there are no consensus nor standard procedures on how to test and evaluate AVs. During the past few years, although the problem of AV testing has been investigated extensively by various AV developers, government agencies, professional organizations, as well as academic institutions, the theory and methods to support such testing and evaluation are lacking.
The prevailing state-of-the-art approach for AV testing uses the agent-environment framework, through a combination of software simulation, closed-track testing, and on-road testing. The basic philosophy is to test the agents of AVs in a realistic driving environment, observe their performance, and make statistical comparisons to human driver performance.
The challenge for AV testing, however, comes from three different aspects shown in Fig. 1b: First, the driving agent in AV is commonly developed based on statistics or artificial intelligence (AI) algorithms. The AI-based agent, which is usually a black box to external users, limits the use of traditional logic-based software verification and validation techniques.
Second, the driving environment is usually complex and stochastic. To represent the full complexity and variability of the environment, variables that define the environment are high dimensional, which can cause the “curse of dimensionality”. The stochasticity of the environment can also fail the traditional formal methods for absolute safety. Third, events of interest (e.g., accidents) for the driving intelligence test rarely happen, and the rareness of events can lead to the intolerable inefficiency issue for testing. Therefore, how to construct an intelligent testing environment that can test AV driving intelligence accurately and efficiently, with consideration of high dimensionality and the rareness of events, becomes the key to the AV testing problem.
Most existing methods use the naturalistic driving environment (NDE) for driving intelligence testing of AVs. For example, on-road methods test AVs in the real-world NDE, while most simulation methods test high-fidelity AV models in life-like simulations of NDE, such as Intel’s CARLA6, Microsoft’s AirSim7, NVIDIA’s Drive Constellation8, Google/Waymo’s CarCraft9, Baidu’s AADS10, etc. However, all these methods suffer from inefficiency issue, because of the “curse of dimensionality” and the rareness of events in NDE, as discussed above.
It has been argued that hundreds of millions of miles and sometimes hundreds of billions of miles would be required to demonstrate the safety performance of AVs at the level of human-driven vehicles11. Not to mention that a brand-new testing process may be required if configurations of AVs are changed. It is inefficient even under aggressive simulation schemes. In fact, Waymo has only simulated 15 billion miles in total over the years, which is the world’s longest simulation test. To a certain extent, this inefficiency issue has hindered the progress of the AV development and deployment.
Towards solving the inefficiency issue, scenario-based approaches have been proposed. Based on the importance sampling (IS) theory, critical scenarios can be purposely designed for accelerating the efficiency of AV evaluation12,13,14,15,16,17. However, existing scenario generation methods can only be applied for scenarios that involve simple maneuvers of a very limited number of vehicles with very short duration, for instance, a cut-in maneuver from a background vehicle for a few seconds.
They are far from representing the full complexity and variability of the real-world driving environment. For example, an AV driving in a highway-driving environment can involve various maneuvers (e.g., lane-changing, car-following, over-taking, etc.) of hundreds of vehicles for hours of time duration. Such a driving environment contains numerous distinctive spatiotemporal combinations of scenarios, which cannot be handled by existing scenario-based approaches.
Our approach to the construction of a simulation or test-track based AV testing environment has the following three contributions: First, our approach generates the driving environment that provides spatiotemporally continuous testing scenarios for AVs. Suppose you want to test an AV in an urban environment, our approach can drive the AV continuously for miles in the environment during one test, interacting with multiple background vehicles and experiencing different adversarial scenarios.
Second, the generated environment provides statistically accurate testing results. Our approach ensures that the testing results (such as accident rates of different accident types) of AVs in the generated environment are unbiased with the NDE. Third, the generated environment addresses the inefficiency issue of the NDE. Comparing with the NDE, our approach reduces the testing time with multiple orders of magnitude for the same evaluation accuracy.
To achieve evaluation efficiency without loss of accuracy, our approach is based on NDE, but with sparse but intelligent adjustments. The resulting driving environment is both naturalistic and adversarial, in that most of the background vehicles (more generally, road users) follow naturalistic behaviors for most of the time, and only at selected moments, selected vehicles execute specific designed adversarial moves. The key to creating the naturalistic and adversarial driving environment (NADE) is to train the background vehicles in the NDE to learn when to execute what adversarial maneuver while ensuring unbiasedness and improving efficiency. The learning process is guided by our theoretical discovery below.
In essence, AV driving intelligence testing can be considered as a rare event estimation problem with high-dimensional variables. However, few existing methods can handle both the challenges of the rareness of events and high dimensionality. Testing AVs in NDE is an application of the Crude Monte Carlo (CMC) theory18, which suffers from inefficiency problem for rare events. The IS theory has been developed for solving the challenge of rare events, but it can only be applied in low-dimensional situations19. It was proved that its efficiency would decrease exponentially with the increase of dimensionality.
Therefore, both CMC and IS have limitations for the rare event estimation problem with high-dimensional variables. However, people have not paid much attention to the advantage of the CMC theory for high dimensionality. We discover that, if there exists a small subset of variables that are critical to the rare events, applying IS theory with the small subset of variables while applying the CMC theory with the remaining variables can help overcome both the challenges of the rareness of events and high dimensionality.
We provide a theoretical proof of this in Theorem 1 in Methods. This is significant as this can apply to a general set of problems with such characteristics. For safety-critical performance tests of AVs, fortunately, these small but critical variables exist because most of the vehicle accidents involve only a small number of vehicles in a short period20. According to the Fatality Analysis Reporting System (FARS), about 91.5% of fatal injuries suffered in motor vehicle traffic crashes in the United States in 2018 involved only one or two vehicles21.
As the construction of NADE is based on NDE, we propose a data-driven approach to resemble naturalistic behavioral patterns of background vehicles for the generation of NDE. The basic idea is to model NDE with the Markov decision process, calculate naturalistic distributions of vehicle maneuvers from naturalistic driving data, and sample vehicle maneuvers from the distributions.
The NDE provides the foundation and benchmark for the generation and evaluation of NADE. To identify the small but critical variables for the generation of NADE, we propose a reinforcement learning approach to learn the challenge of background vehicle maneuvers to the AV under test. This is similar to the value network approach in AlphaGo22 as the maneuver challenges of background vehicles at any moment are interdependent with the AV maneuvers in the following time steps. In addition, as the specifics of the behavior model of the AV under test are usually unknown, we propose utilizing surrogate models (SMs) during the learning process.
The construction of SMs provides an elegant way to leverage prior knowledge such as testing results for previous AV models. Based on the maneuver challenge, the principal other vehicles (POVs) can be identified from all surrounding background vehicles, and their maneuvers can be adjusted at critical moments. In such a manner, only the distributions of a small but critical set of variables are twisted according to the IS theory, while the remaining variables follow their naturalistic distributions. Such sparse but intelligent adjustment of NDE results in NADE.
We demonstrated the effectiveness of our method for AV testing in a highway driving environment based on a high fidelity simulation platform, CARLA6, and a highway traffic simulator23, though our method is also applicable for other driving environments, such as city driving. We utilized the naturalistic driving data (NDD) from the Safety Pilot Model Deployment (SPMD) program24 and the Integrated Vehicle-Based Safety System (IVBSS)25 at the University of Michigan, Ann Arbor.
To validate the generated NADE, we constructed two representative AV agents based on driving behavior models and deep reinforcement learning techniques, respectively. The accident rates of the AVs were utilized for the driving intelligence measurement. We tested the AVs in NDE and NADE, respectively. Simulation results show that the NADE could significantly accelerate the evaluation process by multiple orders of magnitude with the same accuracy, comparing with the NDE-based method.