HD map change detection technology on roads using deep learning
Just like how we figure out our location on a map and find our way to the destination, there is data that on-road machines need before anything else in order to move to and from places. It is HD maps, or high-definition spatial data. The most important for self-driving machines, it enables them to perform localization, perceive the current situation, make decisions and plan routes to the destination.
At the end of the day, the HD map is also a “map,” bound to the problem all maps intrinsically have: It must be ceaselessly renewed and maintained in the latest state. Moreover, a self-driving machine does not possess human-level perception and behavioral capacity, requiring even more information accuracy.
Technology to keep HD map up-to-date
So how is NAVER Maps maintaining up-to-date maps? Because there is limited time and resources to make maps on a regular basis, most map manufacturers detect and renew changed data through the participation of many map users. That is, those who actually use the map inform the manufacturer about changes on the map to help maintain its up-to-date-ness.
This goes the same for HD maps. Practically, it is difficult to continuously dispatch aerial photographs and mobile mapping system (MMS) vehicles equipped with high-precision sensors. So we aimed to obtain information on changed data from vehicles, which are basically users of the HD map, thus detecting changes and performing updates based on multiple cars equipped with low-cost sensors. This is the ACROSS Project.
Relevant article: ACROSS Project, technology that keeps HD map for self-driving on roads up-to-date
Sensors from cameras, IMU, GPS, etc. are used to find your approximate location (localization), judge whether the HD map at your current location has changed (change detection) and perform an update (map update). Relatively low-cost sensors are employed, which may generate an error in the accuracy of localization, but it is to be overcome by collection of results from multiple vehicles and through deep learning. In this story, we are going to take a look on the deep learning technology in regards to change detection.
Approach: Extension from indoor change detection research
Earlier, NAVER Labs introduced a self-updating map, which is a store change detection algorithm on the indoor map. It is the technology that applies deep metric learning to enable the model to define itself the criteria for changes from the relationship of data pairs, and detects changes between a pair of images photographed in the same location at two different times.
Relevant article: Self-updating map, technology by which AI and robot find changed store names
Explanatory diagram of POI change detection
Through the self-updating map research, we have verified that change detection using deep metric learning (1) properly ignores unimportant changes (pedestrians, store interior, etc.) and (2) is robust against viewpoint changes at multiple locations. The example shown on the left below is perceiving the sign well, which is an important information, as a domain for detecting changes, while the example on the right shows that the two images shot from different angles and locations are perceived as identical space by deep learning.
Then, can this indoor change detection algorithm, which is robust against problematic situations, be extended and applied as it is to the outdoor HD road map? Same principle applies for the HD map where it obtains images from cameras installed on vehicles. So, just as indoor change detection research, (1) unimportant changes such as pedestrians and vehicles on roads may exist and (2) viewpoint changes due to slight localization errors may be generated as low-cost sensors are used. However, there are a few issues to be resolved before being applying deep metric learning to the HD map change detection.
Challenge 1. Applying adversarial learning
For the metric learning training for change detection, data triplets of “base images, relevant images and irrelevant images'' are generally used. Objective functions are taught to make a base image and a relevant image similar and a base image and an irrelevant image dissimilar, thus learning the metric for changes. But the problem we encountered first is that data pairs in the outdoor environment for comparison were greatly different.
Let's look at the diagram below. The data triplet example on the left is the data pair we used in the previous indoor map research, and the base image, relevant image and irrelevant image have all been acquired from cameras. We can see that, when learning is done by using such data triplets, the trained deep learning model makes judgment primarily based on signs where store names are written in order to detect changes of the stores. (Ideal, isn't it?)
In the example on the right for the outdoor HD map change detection, on the other hand, the base image and relevant image, which must be made to be identical, look very different whereas the relevant image and irrelevant image, which need to be set as dissimilar, look extremely similar to each other. The HD map is data in the form of vectors so, if it is projected on an image plane, values of all parts excluding the road layout data are shown as blank and nonexistent. In contrast, base images acquired through vehicle cameras are displayed as common images in which not only road signs but also cars and buildings exist.
When the characteristics of two data to be compared, or methods by which the data is acquired, are significantly different, we can say that there is a difference in the domains of data. If training is conducted by using data of different domains, a model will make judgments by assembling incorrect information instead of our targets (e.g., road signs, lanes, etc.). It is like an inaccurate piece of information insisting that "there is always a certain tree where a left-turn road sign exists."
In order to compare a pair of images whose data domains are different, we have applied adversarial learning, which is the technique that reduces the difference between two domains by training by turns (a) the discriminator that classifies domains of two different data and (b) the generator that encodes data into an arbitrary feature space. A discriminator learns the method of judging from which domains the data have come from while a generator learns how to make the data more and more similar so that the discriminator cannot differentiate one from the other.
As training continues, the outputs from the generator become so similar that it is not possible to accurately point out from which domains the two data have come from. Thus, even in case domains use different data pairs, changes on the HD map can be properly detected thanks to the adversarial learning technique that makes use of metric learning.
Challenge 2. Overcoming occlusions & localization noise
Another issue to be resolved for change detection on the HD map is the error in projections due to occlusion by objects and localization error.
Many cars exist on roads and these cars cover the road surfaces that needs to be detected for changes. Occlusion by vehicles may lead to the deep learning model misjudge that preexisting road signs have disappeared or new signs have been established. From the eye of a machine, a left-turn sign or a go-straight sign has to be standing at a certain location but, since it is covered by cars, the machine will judge that the sign has been removed. Moreover, if a localization error occurs, the HD map, which is projected on an image plane, will not be well-aligned with the images, which may also result in miscalculation by the deep learning model. Accordingly, we have aimed to train the model to be able to make judgments on its own without additional post-processing.
First, we trained the model to ignore the part where occlusion by an obstacle occurs. Here, semantic segmentation is used to distinguish obstacles such as pedestrians and cars and then exclude the data that has been changed at a particular location is excluded from the training data. And as a result, the model will no longer regard such occlusion as changes. We also inserted random localization errors over the course of the training so that the deep learning model can be trained to overcome such localization errors.
Results
Based on the indoor map change detection research of the past, we have added various ideas for change detection on the HD map; in particular, the map intended for cars. What do you think the results are?
Results of judgment in various situations of HD map changes
The upper row shows schematized images that have overlapped images and HD map and the row below displays parts where changes have occurred as judged by the deep learning model. We can see that it is able to judge the changes well even if road signs have disappeared or been modified to different shapes. What’s more interesting here is that it has well-detected the white markings in the middle of roads or even temporarily established crosswalks even in circumstances of changes that were not included in the training data.
With localization error
It is also shown to be robust against projection errors caused by localization errors. The graph on top of the image above shows the "degree of similarity between the image and the HD map". The model judges that a change has occurred if the degree of similarity is low. We have seen what kind of judgments the deep learning model makes when arbitrary localization errors are applied. It shows that, though some errors do occur in other parts as well, the model has judged that changes have always occurred at the location of "go-straight and left-turn road sign" where a change actually happened.
Also, we have seen that, even when occlusion happens by obstacles such as cars, the deep learning model does not execute change detection the way we have planned; it normally detects a change well at a location from which a road sign has been removed. However, in the event of an area blocked by a car, it fails to judge the particular instant as a change.
(Left) Projected HD map mask, (Right) Section where change was detected
Finally, this is a video of change detection results in a section where many changes have occurred. Despite the presence of slight errors, we can see that most of the changed domains are well-detected without showing serious problems.
Conclusion
We have so far taken a brief look at the method of detecting changes on the HD road map by using deep learning. Most of the research until this point have focused on extending from indoor change detection to a map intended for outdoors and cars. In order for the HD map to maintain its up-to-date-ness and function as essential data for self-driving machines, we aim to research newer and more diverse methodologies. NAVER LABS always welcomes anyone who are passionate to define new problems, solve them and grow together with the company.