How generating images from a sketch helps to improve your Zoom call

გამარჯობა დიდებული ადამიანი,

Last week I promised you a paper about how to do video encoding via neural networks. To be frank, I will fail to deliver that paper as it does not exist in public :/ The technology I had in mind was the AI Video Compresion by NVIDIA but sadly there is no public paper about that topic as (I assume) NVIDIA considers it a trade secret. But after digging trough the internet I found a public paper by NVIDIA that is considered to contain the preprocessor to the video compression product they plan to sell. In the paper is described how to generate images from a drawing, which is is a similar problem then with the video encoding: Create a good looking image based on incomplete information.


We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photo realistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and non linearity layers. We show that this is sub optimal as the normalization layers tend to “wash away” semantic information. To address the issue, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned trans-formation. Experiments on several challenging datasets demonstrate the advantage of the proposed method over existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows user control over both semantic and style.

Download Link:

Subscribe to the Weekly CS Paper Newsletter to get a computer science paper every weekend
No risk. One click unsubscribe.