Last week, Ren Xiaofeng, chief scientist of Alibaba Gaode Maps, had a technical exchange with everyone on the development of computer vision-related technologies and the application in the field of map travel at the online live event of # #. The interaction in the live broadcast room was very hot, especially in During the QA session, the students asked questions about the visual applications, AR navigation, positioning technology, 5G, career development and other topics that they were interested in. Ren Xiaofeng gave wonderful answers. We have compiled the Q&A content and shared it with everyone.
Video playback address:
Visual technology development and application
Question: What are the applications of computer vision in the construction of high-precision maps?
Ren Xiaofeng : Visual algorithms are the core technology for high-precision map construction. They are mainly used in data alignment and accuracy assurance, recognition and automatic generation of map data, visual positioning and high-precision map updates.
Question: Do you think the existing basic subject research level and hardware level can guarantee the rapid development of visual technology? Will the development of visual technology encounter bottlenecks that are difficult to break through in the near future?
Ren Xiaofeng : After the rapid development of deep learning technology in various fields of vision in the past few years, to a certain extent, the basic technologies of deep learning and vision have now encountered bottlenecks. In other words, it has not developed as fast as it was at the beginning. There are many problems to be solved, and new technologies may also need to be created. For applications, I think the basic technology and hardware level are generally sufficient at present, and more importantly, how to make good use of the technology and break through the technical bottleneck in a targeted manner.
Question: Single target tracking SOT (a given template to track a single target, category-independent/cross-domain) has made remarkable progress in the past two years and has the potential to solve fast tracking. Is there any current map business such as visual positioning ( What is the prospect of application in VO tracking road signs)/AR navigation (short-term tracking)? If so, what kind of demand problems (robustness/speed, etc.) need to be solved?
Ren Xiaofeng : Tracking is a basic vision technology, which is used in many scenarios. For navigation and travel, it can indeed play a core role in AR navigation and positioning, reducing the calculation requirements for identification (detection), and increasing robustness and smoothness. However, in many practical applications, the use and requirements of tracking are different from the settings of single-target tracking in academia.
Question: Can visual features be combined with semantics to bring a better experience to map navigation and travel services?
Ren Xiaofeng : Vision can provide high-precision positioning, as well as semantic understanding of the scene, which will definitely bring a better navigation and travel experience. But the specific product experience and technical realization still need further exploration and accumulation.
Question: Which direction is the next important and difficult point of computer vision? What are the future prospects?
Ren Xiaofeng : Computer vision is a universal sensing method with a large amount of information, which can be used for a variety of sensing tasks and can be observed from a distance. The application prospects are very broad and beautiful. The next difficulty lies in the basic technology that requires progress and breakthroughs. There are also: how to find the application scenarios where vision can play a core role, how to design an overall plan based on actual problems, and how to integrate various algorithms, how to better solve the problem of computing resources, and how to combine other sensors and prior knowledge.
Question: Is AR navigation calculated by real-time images? Can the computing power of the equipment be marked?
Ren Xiaofeng : AR navigation is real-time image calculation, which realizes navigation and assisted driving functions under low computing power. We also do pre-calculation as much as possible, and calculate some elements in the environment in advance to cooperate with real-time calculations.
Question: What does AR navigation finally use to display content? Display or HUD?
Ren Xiaofeng : AR navigation has a variety of product forms: central control screen, HUD, rearview mirror, and dashboard. These are all currently used/potentially used display methods.
Question: There is a non-technical question. Will AR navigation over-attract the driver's attention and cause him/her to ignore the traffic on both sides of the vehicle?
Ren Xiaofeng : This is a good problem of product design, and it is also a problem that we have been polishing and seeking balance. A well-designed AR navigation product will take into account that it will not attract too much attention.
Question: Will there be a fatigue driving test for safety assisted driving?
Ren Xiaofeng : AutoNavi s AR navigation currently only has an outward-facing monocular camera, which does not support fatigue driving detection. In-vehicle monitoring, including fatigue detection, is an important application of vision technology in safe driving assistance.
Question: What are the current mainstream implementation technologies for indoor positioning? Is the prospect of indoor navigation based on acoustic signals good?
Ren Xiaofeng : There are many sensor-based technologies for indoor positioning, including WiFi, Bluetooth, RFID, Ultra-Wideband, and acoustic signals. I think the development of indoor positioning, if sensors need to be deployed, to a large extent does not depend on technology and positioning accuracy, but whether there are good applications. The popularity of WiFi positioning is because indoor networks require WiFi. iPhone 11 is equipped with a UWB chip that can transfer files at close range.
Question: What is the cause of such a large gap in GPS positioning? Is it because of multipath effects?
Ren Xiaofeng : There are many reasons for the inaccurate GPS positioning, mainly in the "urban canyon" (high-rise buildings) scene. The multipath effect is the most important factor, because the refraction of the environment (especially high-reflective materials like glass) causes the GPS position calculation to be inaccurate. In other aspects, there are many reasons such as the decrease in the number of observable satellites due to the obstruction of buildings/viaducts, the interference of air (especially charged ions and water vapor).
Question: How does AutoNavi solve the problem of GPS drift?
Ren Xiaofeng : This is a complicated issue. Based on mobile phone sensors, we have made many optimizations based on actual driving and walking scenarios, including GPS confidence analysis, integration with IMU, and road network integration. Visual positioning is a new direction we are pioneering to solve inaccurate positioning.
Basic Map Technology
Question: What are the current Gaode map layers? Is it a semantic high-precision map?
Ren Xiaofeng : AutoNavi Map has a variety of map data forms, from standard maps (as seen on AutoNavi App), to lane-level maps, to high-precision maps. The accuracy is different, the corresponding application is different. There are semantic information in a variety of maps, but the content and accuracy of the semantic information will vary.
Question: What is the difference between a depth camera and an ordinary camera?
Ren Xiaofeng : The information obtained by a normal camera is a two-dimensional RGB image, and there is no three-dimensional information. In addition to the RGB color, the depth camera also obtains depth (distance) information on each pixel, generally using the active mode (time-of-flight, structured light, etc.). Many mainstream mobile phones are now equipped with depth cameras.
Question: How does AutoNavi Map collect road information? Will the map be updated in real time if there is a road change?
Ren Xiaofeng : There are multiple sources of road information on AutoNavi Maps, mainly relying on low-cost vehicle-mounted video data. Road-related information is changing at any time. We will continuously collect the latest information and make updated map data, and launch the application in time.
Question: What are the difficulties in mapping indoor three-dimensional spaces (such as multi-storey commercial buildings)?
Ren Xiaofeng : The biggest difficulty in drawing indoor 3D maps is data collection. The method of 3D reconstruction requires images with multiple angles. The accuracy of the movement modeling method based on the depth camera may not be able to meet the demand.
Newcomer's career growth
Question: From the academic research field of vision and image to the company's commercial computer vision application technology development, what knowledge needs to be added?
Ren Xiaofeng : I think the main consideration is not to supplement specific knowledge, but to cultivate one's own abilities in various aspects: (1) the ability to analyze and solve practical problems; (2) practical ability; (3) fast learning And the ability to expand knowledge.
Question: How to make a career plan in the field of computer vision?
Ren Xiaofeng : There is no essential difference between career planning in other industries and technical directions. You must combine your own strengths/weaknesses and interests to find your own suitable work direction, gradually improve the technical depth, breadth, height, and comprehensive ability, step by step to make reality As a result, career development.
Question: Is it necessary to have deep learning skills to work in the field of vision now?
Ren Xiaofeng : Computer vision now uses a lot of deep learning technology. I think deep learning knowledge and technology are necessary. There are some geometry-related sub-fields, such as 3D reconstruction, SLAM/VIO, and there are not many deep learning applications, but (1) more deep learning applications are expected in the future; (2) starting from improving the breadth of technology and vision , Also need to understand deep learning to a certain extent.
Industry hot spots and others
Question: Will 5G technology be used in autonomous driving?
Ren Xiaofeng : At present, it seems that 5G technology will have many applications in autonomous driving, but for L4/L5 autonomous driving, I think 5G cannot fundamentally solve the problem of autonomous driving safety (and comfort).
Question: How does the computing end and the cloud in tracking and positioning work together?
Ren Xiaofeng : Generally speaking, those that require high real-time performance and are closely integrated with sensors will be completed on the end; those that are closely integrated with maps and require a large amount of reference data will be completed on the cloud.
Question: Google Maps has a street view map module that uses many image recognition technologies. How is the street view map assembled? And what is the development trend of Street View?
Ren Xiaofeng : The street view maps of Google Maps are mainly from Google s own street view collection vehicle, which is equipped with high-quality cameras and integrated inertial navigation sensors. Street view map is mainly a process of stitching. Street view maps are very interesting, but they have not yet brought fundamental changes to the navigation and travel experience. Google's recent AR pedestrian navigation (this is different from AutoNavi's in-vehicle AR navigation) is a new application based on street view maps.
Question: How can wearable devices (similar to glasses, smart assistants, etc.) be better implemented and commercialized in terms of visual technology?
Ren Xiaofeng : Hardware (AR display, computing power) and experience are the main issues for wearable devices to be truly implemented and popularized. As an advanced product, Google Glass is too limited in hardware. At present, the application of AR glasses is mainly in enterprise scenarios. I personally think that the application prospects of wearable devices as personal assistants (including navigation, information display, etc.) are very good, but the current hardware conditions may not be mature.