Tencent Audio and Video Lab: Using AI black technology to achieve ultra-low bit rate HD real-time video chat

Tencent Audio and Video Lab: Using AI black technology to achieve ultra-low bit rate HD real-time video chat

1 Introduction


Since Apple introduced the concept of a retina screen in the iPhone 4, the resolution of mobile phones has since been advancing by leaps and bounds. Currently, 1920x1080 or higher resolutions have become standard. However, when we talk about real-time audio and video chat, we are helplessly constrained by the uplink bandwidth. A considerable number of users can still only send small-resolution video streams with a bitrate below 250kbps, which wastes users high-definition mobile phone screens.

How can we allow users to see more high-definition real-time video at the receiving end without increasing the uplink bit rate?



Super-Resolution (Super-Resolution) is an image magnification technology that has developed rapidly in recent years, that is, the resolution of the original image is improved through hardware or software, and one or more low-resolution images are obtained through one or more low-resolution images. The process of a high-resolution image is super-resolution reconstruction. Can the super-resolution technology be used for real-time video, and real-time video with a higher resolution can be seen at low bit rates or even ultra-low bit rates?

Traditional super-resolution is based on the texture of the image to determine the direction of the texture, and on this basis to make enhancements. In the past year, the use of machine learning to do super-resolution has increasingly become a trend. The neural network structure based on machine learning can better learn the characteristics of the residuals between low-resolution and high-resolution images, and repair the distortion to restore to a higher resolution. The super-resolution technology based on machine learning, such as SRCNN, SRResNet, VDSR and other network models, show incomparable advantages in super-resolution effects.

A neural network-based real-time video super-resolution technology jointly developed by the team of Tencent Audio and Video Lab and Teacher Dai Yurong of Youtu Lab X-lab. Under the condition of extremely small neural network model size, it can be used for real-time video calls on mobile phones. Realize the super-resolution technology based on machine learning, which has the effect of subjectively increasing the resolution by one level. This technology will soon be applied to real-time video chat on the iOS version of mobile QQ 7.3.5.

2. Technical background


Many students may already know that there are already a lot of super-resolution based on neural networks on pictures. For example, the super-resolution technology of X-lab of Youtu Lab has been implemented in the QQ space album. However, using super-resolution technology for real-time video chat, the audio and video laboratory should be the company's first product in real-time video chat for two persons on mobile QQ.

Because it is different from the super-resolution of pictures, the super-resolution in real-time video chat has more stringent requirements on performance and effects. It can achieve the effect of subjectively increasing the resolution by one level under a small performance budget. higher requirement. Let's first look at the difference between picture super-resolution and video super-resolution.



It is precisely because of the similarity between the front and rear frames of the video that super-resolution video can use the correlation between the front and rear frames of the video and use its statistical characteristics to further improve the super-resolution effect. Especially in scenes with more than twice the magnification, the video super-division multi-frame model based on the correlation between the front and rear frames has significantly exceeded the traditional magnification algorithm.

3. Overall plan


Although the multi-frame video super-division model has incomparable advantages, the performance of most mobile phones at this stage is not up to the calculation amount of the video multi-frame super-division model. In order to allow more mobile phones to use black technology such as video super resolution, after repeated experiments and trade-offs, we continued to optimize the performance and effects, and finally decided to use the real-time video chat frame-by-frame super-resolution solution. Below is the flow chart of our super-scoring program.



We use the video before and after the encoding and decoding as the input neural network training, so that the network can learn to restore the two distortion characteristics caused by the encoding and decoding + scaling, and the edges will be clear without amplifying the distortion caused by the encoding. At the same time, we optimized the machine learning network structure and the performance of the feedforward library to achieve real-time frame-by-frame super-resolution amplification on mobile phones. It is worth mentioning that we have also innovatively made classic enhancement algorithms after super-resolution. The reason is that after the super-resolution algorithm eliminates the aliasing effect, a simple enhancement algorithm can make the edges clearer and the details richer. In summary, there is a subjective effect of increasing the resolution by one level.



The following figure is a comparison screenshot of the encoded and decoded H.265 video stream after bicubic enlargement and super-division enlargement:

(click to enlarge to see the details) The

performance and effect comparison of our network model and other models is as follows :



4. Performance optimization


Neural network-based super-resolution technology should be used in real-time video calls. The biggest difficulty is that the amount of calculation is too large. Optimizing performance involves system engineering at all levels. In addition to optimizing the neural network structure, we also implemented the feedforward network library on the iPhone's GPU.

The following are the performance optimizations we have done at various levels:


through these optimizations, we horizontally compare the best VDSR models in the frame-by-frame super-division. With the enlarged quality aligned, our speed is more than 10 times faster than VDSR.

The following is the performance data of the final online version:

5. Running effect in mobile QQ


The figure below is a comparison of the performance data of the mobile phone QQ 7.3.5 version switch super-resolution.

With the same frame rate and bit rate and almost no increase in time delay, the clarity data tested with the dedicated clarity test tool has been significantly improved: the


following video is the effect of super-resolution incorporated into the mobile phone QQ 7.3.5 video call , Click to see the details:

( click here to view the online video )

6. Future prospects


The real-time video super-division technology of the Audio and Video Lab is the first bold attempt to use a large-scale neural network in a scene with a tight performance budget such as real-time video on a mobile client. It has achieved good results so far. In addition to improving video quality, video super-resolution can also save upstream and downstream bandwidth while maintaining the same video quality.

The next focus of the video super-segmentation project is mainly in the following two aspects:

  • 1) The multi-frame super-resolution model of real-time video is still being optimized, and good progress has been made. The video super-resolution effect will be further improved under the same performance consumption. The first version is expected to be launched in the next mobile QQ version;
  • 2) Continue to optimize the performance of the neural network feedforward library and cover more platforms.

To sum up, we will continue to optimize the real-time video super-resolution technology, and I believe that such black technology will definitely empower more scenes.

(Original link: mp.weixin.qq.com/s/krxgdOmE_... )

Appendix: More real-time audio and video articles


[8] An article
on the open source real-time audio and video technology WebRTC : " The status quo of the open source real-time audio and video technology WebRTC "
" Brief description of the advantages and disadvantages of the open source real-time audio and video technology WebRTC "
" Interview with the father of the WebRTC standard: the past, present and future of WebRTC "
" Conscience Sharing: WebRTC Zero-Basic Developer's Tutorial (Chinese) [Attachment Download] "
" Introduction to the Overall Architecture of WebRTC Real-time Audio and Video Technology "
" Beginner 's Guide : What is a WebRTC server, and how does it connect to calls? "
" WebRTC real-time audio and video technology infrastructure: basic architecture and protocol stacks , "
" On the technical points of the development of real-time video broadcast platform , " "
[view] WebRTC should choose H.264 video encoding four grounds "
" open source WebRTC develop real-time Is the audio and video reliable? What are the third-party SDKs?
" Application of RTP/RTCP data transmission protocol in open source real-time audio and video technology WebRTC "
" Brief description of the working principle of end-to-end encryption (E2EE) in real-time audio and video chats " "
" Real-time Communication RTC Technology Stack: Video Codec "
" Concise Compilation Tutorial of Open Source Real-time Audio and Video Technology WebRTC under Windows "
" Web-side Real-time Audio and Video Technology WebRTC: It looks beautiful, but there are still many pitfalls to the production application fill?
>> More similar articles...

[9] Other essential materials for real-time audio and video development:
" Tencent Audio and Video Lab: Using AI Black Technology to Realize Ultra-low Bit Rate HD Real-time Video Chat "
" Interview with the person in charge of WeChat video technology : The Evolution of WeChat Real-time Video Chat Technology "
" Instant Messaging Audio and Video Development (1): Theoretical Overview of Video Coding and Decoding "
" Instant Messaging Audio and Video Development (2): Introduction to Digital Video of Video Coding and Decoding "
" Instant Messaging Audio and Video Development" (3): Coding Foundation of Video Coding and Decoding "
" Instant Messaging Audio and Video Development (4): Introduction to the Prediction Technology of Video Codec "
" Instant Messaging Audio and Video Development (5): Understanding the Mainstream Video Coding Technology H.264 "
" Instant Communication audio and video development (6): How to start learning audio coding and decoding technology "
"Instant Messaging Audio and Video Development (7): Introduction to Audio Basics and Coding Principles "
" Instant Messaging Audio and Video Development (8): Common Real-time Voice Communication Coding Standards "
" Instant Messaging Audio and Video Development (9): Real-time Voice Communication Echo and Echo Cancellation? Overview "
" Instant Messaging Audio and Video Development (10): Echo Cancellation for Real-time Voice Communication? Technical Details "
" Instant Messaging Audio and Video Development (11): Real-time Voice Communication Packet Loss Compensation Technology "
" Instant Messaging Audio and Video" Development (12): Discussion on multi-person real-time audio and video chat architecture "
" Instant messaging audio and video development (13): Features and advantages of real-time video coding H.264 "
" Instant messaging audio and video development (14): Real-time audio Introduction to Video Data Transmission Protocol "
" Instant Messaging Audio and Video Development (15): Talk about the Application of P2P and Real-time Audio and Video "
" Instant Messaging Audio and Video Development (16): Several Suggestions for Mobile Real-time Audio and Video Development "
" Instant Messaging Audio and Video Development (17): The Past and Present of Video Coding H.264 and VP8 "
"A Brief Introduction to Audio Processing and Coding Compression Technology in Real-time Voice Chat "
"NetEase Video Cloud Technology Sharing: A Quick Introduction to Audio Processing and Compression Technology "
" Learning RFC3550: Basic Knowledge of RTP/RTCP Real-Time Transmission Protocol "
" Research on Real-time Streaming Media Technology Based on RTMP Data Transmission Protocol (Full Paper) "
" Talking with Sound Net Architects Real-time audio and video cloud realization difficulties (video interview) "
" On the technical points of developing a real-time video live broadcast platform "
" Still relying on "feeding" to test the quality of real-time voice calls? This article teaches you scientific evaluation methods! "
" Real-time 1080P real-time audio and video live broadcast with a delay of less than 500 milliseconds "
" Real-time video live broadcast technology practice on mobile: how to achieve real-time seconds without blocking "
" How to test your real-time audio in the simplest way video solutions "
," technology reveals: support for real-time video broadcast one million Facebook fans to interact , "
" brief chat in real-time audio and video-end encryption (E2EE) works , "
" mobile terminal real-time audio and video broadcast technology explain (a) : Opening "
" mobile terminal real-time audio and video broadcast technology explain (b): acquisition"
" "Mobile end real time audio and video broadcast technology explain (III): Processing , "
" mobile terminal in real time audio and video broadcast technology explain (IV): coding and packaging "
" mobile terminal in real-time audio and video broadcast technology explain (V): plug flow and transport "
" Detailed explanation of real-time audio and video live broadcast technology on the mobile terminal (6): Delay optimization "
" Integration of theory with practice: a simple HTML5-based real-time video broadcast "
" Detailed explanation of echo cancellation technology in IM real-time audio and video chats "
" Talking about real-time audio and video Several key technical indicators that directly affect user experience in live broadcasts "
" How to optimize the transmission mechanism to achieve ultra-low latency of real-time audio and video? "
" The first disclosure: how deft is done with the millions of viewers to watch live games can still open and no seconds Caton? "
" Android Live Starter practice: have come up with a simple system live "
" NetEase cloud letter real-time live video data in some optimization ideas TCP transport layer "
," real-time audio and video chat technology sharing: for unreliable network packet loss anti-eds Decoder How does P2P technology reduce the bandwidth of real-time video live broadcast by 75%?
"
>> More similar articles......