SIGGRAPH 2018 Day 3

Today is a time space mixture adventure. Try to get into the talk of two state of the arts face related paper in the VR session in the morning. One is from TMU and the other is from Facebook Reality Lab. Both of them try to tackle the issue on how to show genuine whole face expression in VR while both sides wear the headset.

On one side,  Matthias Niessner and his golden face synthesize team explore how to deal with this issue based on their face-to-face work. The advantage is that due to they use generic face model, the representation is not strongly subject dependent so no calibration and pre-capture is necessary. However, due to only use infrared camera inside headset for eye gaze tracking, the upper face’s expression may not be preserved.

While facebook use subject dependent high quality model for this work. And use deep learning on teeth composition. The quality looks better. However, it needs pre-capture for the subject.


And thanks to my friends from Pixar, this time we notice that there is no booth for the animation studio so we don’t know where to pick up the renderman teapot. It is turn out that they release after their renderman 22 demo talk which last for 1 hour. It is actually a really good talk. 30 years development of renderman, from scan line renderer to ray tracing, and then path tracing. They give up old infrastructure for physical correct and simple models. It is glad to see at this stage, ray tracing lighting can be achieved in an interactive speed. With the help of Nvidia’s RTX, I think the production time for all stages of animation can be shrink and we could see more ideas in the movie since the cost to try out new story line, camera, actions, etc are cheaper. But the most important thing is get my teapot!







The real-time live! demo session is also crazy. The Nvidia RTX, ILM X LAB, and Unreal combined VR virtual movie shot demo is totally a game changer on how we can make movie quality shots in real time with everyone inside a virtual environment. I can image in near future, the individual shot may be captured in this real-time ray tracing environment. Then the director can cut the movie to review, and handle that short to the off-line renderer, if necessary, for movie final images.

CVPR Experienes: Conference Session 2

On Tuesday, the major show is the Face session! However, we have to say that Face related research is not like the main thing in CVPR. As the session name indicates: “Computational Photography and Faces”. Sure, we have a lot of poster about face modeling and expression detection, but the limited oral session tells the trend now. No worry, however, in SIGGRAPH, Face session is always packed up with people!

So in the afternoon oral session, we have 5 oral presentations, which definitely shine the state-of-the-art work in this region. I am so much loving this section due to the fact that this is what I belong to, and of course, our lab has contributed to one of the paper!

13. Recurrent Face Aging: a cool data set in 2D which contains a lot of faces covering large aging space is crated and used to predict human aged face.

14. Face2Face: Real-Time Face Capture and Reenactment of RGB Videos: What I can say, the jaw-dropping demo video since last SIGGRAPH Aisa. This time they updated the model to work with only 2D rgb image. The presentation is cool because it ends with a live demo with Putin as the target agent! So the Basel Morph Model is used to do identity morphing, which need the user to show a frontal face, rotate left then right to create subject dependent 3D face model. The initialization procedure takes about 30 seconds. Then we obtain a fully controllable avatar. The texture albedo is also learned. Based on my demo test, the system is pretty nice and smooth, however, don’t expect it can handle directional light, might just be like the global light source. Without tongue model, it is still can accurately modeling the lip movement so normal speaking should be OK. There are more interesting story behind this. To me, it is such a nice experience to meet the authors here at CVPR!

15. Self-Adaptive matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions: A stable face region is located in the general face image and the model can used to detect the heart rate from the image space. It is just so glad to see the demo and illustration video/image are actually from our database!


16. Visually Indicated Sounds: MIT always has the balls to do cool stuff. The authors notice that human beings can indicate the sound of the materials pretty well even only with the image. So they spend a lot of time to use drum stick to hit “A LOT” object and recording with video camera. Then they train the deep learning model so that the machine can pick up the motion of drum stick hit certain “objects” and simulate the corresponding audio.
We know that in movies, sometimes the audio composers can not get the real some of the scene due to different reasons and need to create the audio effect with other stuff. This CVPR paper is like a auto way to do this.

17. Image Style Transfer Using Convolutional Neural Network: Transfer Van Gogh’s painting style to your image automatically? This is the instruction for you.

Here are some photos again, to cover the topics on the second day.

CVPR Experiences: Conference Session 1

First day of CVPR is packed with some good talk which shows the trend of the computer vision research right now. Day one is packed with object detection work, especially by using convolution neural network (CNN, aka deep learning approach).

Here I just report some interesting work:

Matching and Alignment:

    1. Learning to Assign Orientations to Feature Points: Include the orientation learning in 3D reconstructions by CNN implicitly will help to obtain the missing part of alignment, so you have less holes. It sounds like the orientation of the image patch can play a key role in image alignment.
    2. Learning Dense Correspondence via 3D -Guided Cycle: Directly apply to car, this paper talks about how to find matches in two images. The similarity need to be at the component level. In this way, you can reconstruct image B with information from image A’s pixels, while still maintain the structure and orientation of image B. It shows how to do the 3D model to 2D image alignment. And with possible occlusion, matchability learning is the way they try. Possible extension of the work is to extend the patch to the entire target so even in the occlusion case we can have a fully recovered image.
    3. The Global Patch Collider: Try to find the Patch which matches in different images, by forest voting.
    4. Joint Probabilistic Matching Using m-Best solution: a little optimization by using a sampling weighted function to choose several sub-optimal solutions.
    5. Face Alignment Across Large Poses: A 3D Solution. In traditional way, face alignment rely on the fact that all the tracking points are available, which is too strong. To training on large pose tracking data, we normally do not have this kind labeled data. In this paper, the people use synthesized training data by align a morphable model to the any face pose with knowing pose information, then get the 3D position and the corresponding 2D intensity plus the pose. Then a CNN can be trained to locate the correspondence.

During the spotlight session, Segmentation and Contour Detection is covered.

  1. Affinity CNN: Learning  Pixel-Centric Pairwise Relations for Figure/Ground Embedding: Should look into.

Then basically I went to the poster session so take some photos about different posters I am interested in. One talk about real-time (80 fps) CNN, with lower accuracy got my attention. Low memory bandwidth with full code and “How to run” tutorial, this could be a very good way to try some cool idea: The detail about this can be found at

Here are some poster photos:

At the end of first day, the best paper award and related work has been announced. MSRA’s new deep learning model “Deep Residual Learning for Image Recognition” shows Microsoft’s position in this deep battle. By winning all the major competitions during 2015, it does not sound that the model is very elegant, but it works. The best paper award to this paper settles the tune of this CVPR to still be “Deep Learning”. And later during the CVPR we notice that the author of the paper, Jian Sun, has been dig out from MS Asian Research to Face++ by a super good salary (like 8 digit in Chinese Yuan). As I know, a good PhD student focus on deep learning now normally don’t worry about job and salary at all. They are like bias in the market because there are so much data but so little people have the hints on how to dig them.