Autonomous driving has become a common research hotspot in industry and academia. According to statistics, in China alone, more than 100,000 people die and more than 2 million people are injured each year due to vehicle accidents, and the number of disputes and conflicts is as high as more than 6 million.
Among them, about 90% of accidents are caused by the driver’s error. For example, 30% of accidents are caused by the driver’s drunkenness, and 10% of accidents are caused by the driver’s distraction. Autonomous driving has the potential to reduce traffic accidents caused by driver errors, and it also makes it possible for people who cannot drive due to physical reasons to drive. Moreover, autonomous driving can also free labor from simple and repetitive driving and increase productivity.
Autonomous driving has received great attention since its birth. The first DARPA Unmanned Vehicle Challenge began in 2004. Unmanned vehicles needed to complete 150 miles of driving in the Mojave Desert, but none of the 15 vehicles participating in the competition were completed. In the following year, 5 of the 23 vehicles successfully completed the entire journey. Later, in 2007, the DARPA Urban Challenge was held. Unmanned vehicles needed to drive autonomously in a simulated urban environment, and six of them completed the race. In addition to these challenges, many research institutions and commercial companies have also joined the ranks of autonomous driving research. Google driverless cars have driven more than 1 million kilometers and achieved zero accidents; Tesla, Baidu, Tencent, Alibaba and traditional car manufacturers are all studying driverless cars.
The degree of automation of a car can be divided according to the degree of reliance on human operation. The SAE J3016 standard defines the grading standards for smart cars at level 0-5: level 0 represents a vehicle for which all driving operations are human responsible; level 1 includes basic driving assistance, such as anti-lock braking systems; level 2 includes advanced assistance , Such as risk minimization vertical control (usually based on control theory to calculate the set of risky states); level 3 represents conditional automation, when necessary, unmanned vehicles need human control; level 4 represents that unmanned vehicles can operate in certain environments Can be fully automated; Level 5 is fully automated. The figure below shows the grading standards for autonomous driving.
Autonomous driving is a complex system that combines hardware and software. Unmanned vehicles mainly use cameras, radars and other equipment to perceive the surrounding environment, make decisions and judges based on the acquired information, and formulate corresponding strategies by appropriate working models, such as predicting the future of the vehicle and other vehicles, pedestrians, etc. After planning the path and behavior, the next step is to control the vehicle to drive along the desired trajectory, mainly including lateral control (steering) and longitudinal control (speed). Of course, the above actions are all actions under local path planning based on real-time acquisition of environmental information by sensors, and need to be combined with a global path based on complete environmental information (GPS). The following figure shows the framework of a basic autonomous driving system.
Machine learning is widely used in autonomous driving, and it mainly focuses on the perception of the environment (autonomous vision) and decision making of unmanned vehicles. The application of machine learning in environmental perception belongs to the category of supervised learning. For example, object detection on images in the camera requires a large number of images with annotated entities as training data, so that deep learning methods can be renewed The object is recognized in the image. The application of machine learning in behavioral decision-making generally belongs to the category of reinforcement learning. Agents need to interact with the environment. Every action of the agent will affect the environment, and changes in the environment will also affect the behavior of the agent. Reinforcement learning learns the mapping relationship between environment and behavior from a large number of sample data interacting with the environment, so that every time an agent perceives the environment, it can act “intelligently”.
The collection and processing of environmental information is the basis and prerequisite for unmanned autonomous driving. The perception of the environment by unmanned vehicles is mainly based on various sensor technologies, and the sensors used generally include cameras, millimeter wave radars, and lidars. In order to perceive the environment safely and accurately, unmanned vehicles work together with multiple sensors. Millimeter-wave radar and lidar are mainly responsible for medium and long-distance ranging and environmental perception, while cameras are mainly used for the recognition of traffic signals and other objects. .
The main research direction in the traditional computer vision field is the vision problem of cameras based on visible light. To evaluate whether a solution is feasible, we need a standard test method. KITTI data is jointly developed by the Karlsruhe Institute of Technology (KIT) and Toyota Chicago Institute of Technology (TTIC). It is one of the most authoritative and influential autonomous driving vision data sets in the world. Therefore, we analyze and compare some of the most novel algorithms based on this data set.
In unmanned driving, real-time and accurate detection of surrounding objects is crucial. At the same time, object detection is also one of the research focuses of computer vision. Recently, the rise of Convolutional Neural Network (CNN) has greatly improved the ability of object detection.
In the beginning, CNN was simply integrated into the sliding-window method (Sermanet, et al., 2013). However, this does not solve the precise positioning of the object in the picture. Therefore, Girshick et al. (2014) proposed RCNN (Region-based CNN) to solve the localization problem in object recognition. They use selective search to generate many region proposals, then use CNN to extract feature vectors from each region, and use linear support vector machines (linear SVM) to classify and distinguish Draw out candidate regions that contain entities. Although RCNN solves the positioning problem, the time complexity of the model itself is too high, so there are a series of works (ResNet, Fast R-CNN, Faster R-CNN) to optimize the model, and finally realize an end-to-end (end- to-end) framework. The original CNN can only process images of a fixed size, and cannot process candidate bounding boxes of different sizes.
However, the effect of the above method on the KITTI data set is not significant. The main reason is that the scale of the entities in the data set is changeable and the degree of occlusion or truncation is serious. Moreover, the above method is only an improvement on the detection network, and the selective search used in screening candidate frames is not suitable for this scenario, so Ren et al. (Ren et al., 2015) proposed RPN (region proposal networks). ) Instead of selective search. RPN is a variant of CNN that uses a sliding window to predict the type and probability of entities contained in each candidate frame on the convolutional feature of the image. The combination of RPN and Fast RCNN realizes end-to-end learning.
Decision-making plays the role of “driver” in the entire unmanned vehicle system. This level gathers all important vehicle surrounding information, including not only the position, speed, direction and lane of the unmanned vehicle itself, but also all important perceived obstacle information and predicted trajectory within a certain distance of the unmanned vehicle. . The problem that the behavioral decision-making layer needs to solve is to determine the driving strategy of the unmanned vehicle on the basis of knowing this information.
The behavioral decision-making module is a place where information gathers. Due to the need to consider a variety of different types of information and restricted by traffic rules, behavioral decision-making problems are often difficult to solve with a pure mathematical model. The more common method is to use software engineering methods to design some rule engine systems. For example, in the DARPA unmanned vehicle competition, Stanford’s unmanned vehicle “Junior” uses cost design and a finite state machine (Finite State Machine) to design the trajectory and control instructions of the unmanned vehicle. In recent unmanned vehicle planning, Reinforcement Learning based on the Markov Decision Process has also begun to be increasingly applied to the behavioral decision-making algorithms of unmanned vehicles.
The essence of reinforcement learning is a process of learning decisions (that is, mapping states to actions) to maximize returns. Like most machine learning methods, the learner itself does not know which actions to take, but tries to discover which actions will produce the greatest return. However, in many scenarios, actions not only affect immediate rewards, but also all subsequent rewards. Therefore, trial-and-error search and delayed reward are the two most important features of reinforcement learning. At the same time, reinforcement learning is a very broad concept, and any method that can solve similar problems can be called reinforcement learning.
With the rise of deep learning, deep neural networks have powerful representation capabilities and function fitting capabilities, which have also injected new vitality into the development of reinforcement learning. Deep reinforcement learning has become a new research hotspot.
Isele (2017) studied unmanned driving in the intersection scene. They still use the Deep Q-Network method to handle this task. Among them, it proposed two DQN structures-Sequential Action DQN and Time-to-Go DQN. The sequence action network is a three-layer fully connected network. The output corresponds to the probability of 3 actions (decelerate, accelerate, maintain speed) under 4 time scales (1,2,4,8 time steps). Time-to-Go DQN uses CNN, and the output corresponding 5 actions are: 1 go action and 4 time scales (1,2,4,8 time steps) wait action. For reward, it defines its own rules: +1 means success, -10 means collision, and -0.01 is used for step cost.
Shalevshwartz et al. (2016) mainly deal with some more complex scenarios in actual driving, such as two-way road merging. The method they proposed does not simply use deep reinforcement learning to achieve autonomous driving, but divides the problem into two parts, a learnable part and a non-learnable part. Compared with the previous work such as CARMA, this method is more complete and more practical, combining the advantages of reinforcement learning and dynamic programming. Reinforcement learning is used to judge the environment and make behavioral decisions, while dynamic planning is responsible for planning routes and executing actions to ensure the safety of driving.
The application of machine learning in autonomous driving is mainly concentrated in the two modules of environment perception and behavior decision-making. In environmental perception, the development of deep learning has greatly promoted computer vision, and its effect on object recognition and road monitoring has been significantly improved. However, compared with high-precision lidars, the camera has a limited sensing range and is greatly affected by the environment. Therefore, the perception of the environment in automatic driving is mostly based on the joint action of multiple types of sensors. In behavioral decision-making, the combination of reinforcement learning and deep learning makes it possible to map complex and changeable environments into actions. However, most of the current researches are mainly based on experiments in the road scenes of computer simulations (to get all the information of the environment), which is still some distance away from the real scenes. Machine learning is data-driven, so it is of great significance to provide a relatively complete data set that includes the perception of the real environment and the driver’s operation records, which will further promote the development of autonomous driving.