2016 |
Zhang, Zheng; Girard, Jeff M; Wu, Yue; Zhang, Xing; Liu, Peng; Ciftci, Umur; Canavan, Shaun; Reale, Michael; Horowitz, Andy; Yang, Huiyuan; Cohn, Jeffrey F; Ji, Qiang; Yin, Lijun Multimodal spontaneous emotion corpus for human behavior analysis Conference Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, 2016. Abstract | Links | BibTeX | Tags: 3D facial expression, Spontaneous expression @conference{Zhang2016, title = {Multimodal spontaneous emotion corpus for human behavior analysis}, author = {Zheng Zhang and Jeff M Girard and Yue Wu and Xing Zhang and Peng Liu and Umur Ciftci and Shaun Canavan and Michael Reale and Andy Horowitz and Huiyuan Yang and Jeffrey F Cohn and Qiang Ji and Lijun Yin}, url = {http://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Zhang_Multimodal_Spontaneous_Emotion_CVPR_2016_paper.html}, year = {2016}, date = {2016-06-30}, booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016}, pages = {3438-3446}, abstract = {Emotion is expressed in multiple modalities, yet most research has considered at most one or two. This stems in part from the lack of large, diverse, well-annotated, multimodal databases with which to develop and test algorithms. We present a well-annotated, multimodal, multidimensional spontaneous emotion corpus of 140 participants. Emotion inductions were highly varied. Data were acquired from a variety of sensors of the face that included high-resolution 3D dynamic imaging, high-resolution 2D video, and thermal (infrared) sensing, and contact physiological sensors that included electrical conductivity of the skin, respiration, blood pressure, and heart rate. Facial expression was annotated for both the occurrence and intensity of facial action units from 2D video by experts in the Facial Action Coding System (FACS). The corpus further includes derived features from 3D, 2D, and IR (infrared) sensors and baseline results for facial expression and action unit detection. The entire corpus will be made available to the research community.}, keywords = {3D facial expression, Spontaneous expression}, pubstate = {published}, tppubtype = {conference} } Emotion is expressed in multiple modalities, yet most research has considered at most one or two. This stems in part from the lack of large, diverse, well-annotated, multimodal databases with which to develop and test algorithms. We present a well-annotated, multimodal, multidimensional spontaneous emotion corpus of 140 participants. Emotion inductions were highly varied. Data were acquired from a variety of sensors of the face that included high-resolution 3D dynamic imaging, high-resolution 2D video, and thermal (infrared) sensing, and contact physiological sensors that included electrical conductivity of the skin, respiration, blood pressure, and heart rate. Facial expression was annotated for both the occurrence and intensity of facial action units from 2D video by experts in the Facial Action Coding System (FACS). The corpus further includes derived features from 3D, 2D, and IR (infrared) sensors and baseline results for facial expression and action unit detection. The entire corpus will be made available to the research community. |
2015 |
Zhang, Xing; Ciftci, Umur A; Yin, Lijun Mouth Gesture based Emotion Awareness and Interaction in Virtual Reality Conference SIGGRAPH '15 ACM SIGGRAPH 2015 Posters, (26), SIGGRAPH ACM Digital Library, 2015, ISBN: 978-1-4503-3632-1 . Abstract | Links | BibTeX | Tags: 3D facial expression, GPU, Virtual reality @conference{Zhang2015, title = {Mouth Gesture based Emotion Awareness and Interaction in Virtual Reality}, author = {Xing Zhang and Umur A Ciftci and Lijun Yin}, doi = {10.1145/2787626.2787635}, isbn = {978-1-4503-3632-1 }, year = {2015}, date = {2015-07-14}, booktitle = {SIGGRAPH '15 ACM SIGGRAPH 2015 Posters}, number = {26}, publisher = {ACM Digital Library}, organization = {SIGGRAPH}, abstract = {In recent years, Virtual Reality (VR) has become a new media to provide users an immersive experience. Events happening in the VR connect closer to our emotions as compared to other interfaces. The emotion variations are reflected as our facial expressions. However, the current VR systems concentrate on "giving" information to the user, yet ignore "receiving" emotional status from the user, while this information definitely contributes to the media content rating and the user experience. On the other hand, traditional controllers become difficult to use due to the obscured view point. Hand and head gesture based control is an option [Cruz-Neira et al. 1993]. However, certain sensor devices need to be worn to assure control accuracy and users are easy to feel tired. Although face tracking achieves accurate result in both 2D and 3D scenarios, the current state-of-the-art systems cannot work when half of the face is occluded by the VR headset because the shape model is trained by data from the whole face.}, keywords = {3D facial expression, GPU, Virtual reality}, pubstate = {published}, tppubtype = {conference} } In recent years, Virtual Reality (VR) has become a new media to provide users an immersive experience. Events happening in the VR connect closer to our emotions as compared to other interfaces. The emotion variations are reflected as our facial expressions. However, the current VR systems concentrate on "giving" information to the user, yet ignore "receiving" emotional status from the user, while this information definitely contributes to the media content rating and the user experience. On the other hand, traditional controllers become difficult to use due to the obscured view point. Hand and head gesture based control is an option [Cruz-Neira et al. 1993]. However, certain sensor devices need to be worn to assure control accuracy and users are easy to feel tired. Although face tracking achieves accurate result in both 2D and 3D scenarios, the current state-of-the-art systems cannot work when half of the face is occluded by the VR headset because the shape model is trained by data from the whole face. |
Zhang, Xing; Yin, Lijun; Cohn, Jeffery F Three dimensional binary edge feature representation for pain expression analysis Conference Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, 1 , 2015, ISBN: 978-1-4799-6026-2. Abstract | Links | BibTeX | Tags: Emotion, facial expression, latent-dynamic conditional random field (LDCRF), pain @conference{Zhang2015b, title = {Three dimensional binary edge feature representation for pain expression analysis}, author = {Xing Zhang and Lijun Yin and Jeffery F Cohn}, url = {http://ieeexplore.ieee.org/document/7163107/}, doi = {10.1109/FG.2015.7163107}, isbn = {978-1-4799-6026-2}, year = {2015}, date = {2015-05-04}, booktitle = {Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on}, volume = {1}, abstract = {Automatic pain expression recognition is a challenging task for pain assessment and diagnosis. Conventional 2D-based approaches to automatic pain detection lack robustness to the moderate to large head pose variation and changes in illumination that are common in real-world settings and with few exceptions omit potentially informative temporal information. In this paper, we propose an innovative 3D binary edge feature (3D-BE) to represent high-resolution 3D dynamic facial expression. To exploit temporal information, we apply a latent-dynamic conditional random field approach with the 3D-BE. The resulting pain expression detection system proves that 3D-BE represents the pain facial features well, and illustrates the potential of noncontact pain detection from 3D facial expression data.}, keywords = {Emotion, facial expression, latent-dynamic conditional random field (LDCRF), pain}, pubstate = {published}, tppubtype = {conference} } Automatic pain expression recognition is a challenging task for pain assessment and diagnosis. Conventional 2D-based approaches to automatic pain detection lack robustness to the moderate to large head pose variation and changes in illumination that are common in real-world settings and with few exceptions omit potentially informative temporal information. In this paper, we propose an innovative 3D binary edge feature (3D-BE) to represent high-resolution 3D dynamic facial expression. To exploit temporal information, we apply a latent-dynamic conditional random field approach with the 3D-BE. The resulting pain expression detection system proves that 3D-BE represents the pain facial features well, and illustrates the potential of noncontact pain detection from 3D facial expression data. |
2014 |
Zhang, Xing; Yin, Lijun; Cohn, Jeffrey F; Canavan, Shaun; Reale, Michael; Horowitz, Andy; Liu, Peng; Girard, Jeffrey M BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database Journal Article Image and Vision Computing, 32 (10), pp. 692-706, 2014. Abstract | Links | BibTeX | Tags: 3D facial expression, Dynamic facial expression database, FACS, Spontaneous expression @article{Zhang2014, title = {BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database}, author = {Xing Zhang and Lijun Yin and Jeffrey F. Cohn and Shaun Canavan and Michael Reale and Andy Horowitz and Peng Liu and Jeffrey M. Girard}, url = {http://www.sciencedirect.com/science/article/pii/S0262885614001012}, doi = {10.1016/j.imavis.2014.06.002}, year = {2014}, date = {2014-07-14}, journal = {Image and Vision Computing}, volume = {32}, number = {10}, pages = {692-706}, abstract = {Facial expression is central to human experience. Its efficiency and valid measurement are challenges that automated facial image analysis seeks to address. Most publically available databases are limited to 2D static images or video of posed facial behavior. Because posed and un-posed (aka “spontaneous”) facial expressions differ along several dimensions including complexity and timing,well-annotated video of un-posed facial behavior is needed. Moreover, because the face is a three-dimensional deformable object, 2D video may be insufficient, and therefore 3D video archives are required.We present a newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains. To the best of our knowledge, this newdatabase is the first of its kind for the public. Thework promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action.}, keywords = {3D facial expression, Dynamic facial expression database, FACS, Spontaneous expression}, pubstate = {published}, tppubtype = {article} } Facial expression is central to human experience. Its efficiency and valid measurement are challenges that automated facial image analysis seeks to address. Most publically available databases are limited to 2D static images or video of posed facial behavior. Because posed and un-posed (aka “spontaneous”) facial expressions differ along several dimensions including complexity and timing,well-annotated video of un-posed facial behavior is needed. Moreover, because the face is a three-dimensional deformable object, 2D video may be insufficient, and therefore 3D video archives are required.We present a newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains. To the best of our knowledge, this newdatabase is the first of its kind for the public. Thework promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action. |
2013 |
Reale, Michael; Zhang, Xing; Yin, Lijun Nebula Feature: A Space-Time Feature for Posed and Spontaneous 4D Facial Behavior Analysis Conference Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on , IEEE, 2013, ISBN: 978-1-4673-5545-2 . Abstract | Links | BibTeX | Tags: 4D facial expression analysis, AU recognition, spatio-temporal feature, Spontaneous expression @conference{Reale2013, title = {Nebula Feature: A Space-Time Feature for Posed and Spontaneous 4D Facial Behavior Analysis}, author = {Michael Reale and Xing Zhang and Lijun Yin}, url = {http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6553746&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6553746}, doi = {10.1109/FG.2013.6553746}, isbn = {978-1-4673-5545-2 }, year = {2013}, date = {2013-04-22}, booktitle = {Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on }, pages = {1-8}, publisher = {IEEE}, abstract = {In this paper, we propose a new, compact, 4D spatio-temporal “Nebula” feature to improve expression and facial movement analysis performance. Given a spatio-temporal volume, the data is voxelized and fit to a cubic polynomial. A label is assigned based on the principal curvature values, and the polar angles of the direction of least curvature are computed. The labels and angles for each feature are used to build a histogram for each region of the face. The concatenated histograms from each region give us our final feature vector. This feature description is tested on the posed expression database BU-4DFE and on a new 4D spontaneous expression database. Various region configurations, histogram sizes, and feature parameters are tested, including a non-dynamic version of the approach. The LBP-TOP approach on the texture image as well as on the depth image is also tested for comparison. The onsets of the six canonical expressions are classified for 100 subjects in BU-4DFE, while the onset, offset, and non-existence of 12 Action Units (AUs) are classified for 16 subjects from our new spontaneous database. For posed expression recognition, the Nebula feature approach shows improvement over LBPTOP on the depth images and significant improvement over the non-dynamic 3D-only approach. Moreover, the Nebula feature performs better for AU classification than the compared approaches for 11 of the AUs tested in terms of accuracy as well as Area Under Receiver Operating Characteristic Curve (AUC).}, keywords = {4D facial expression analysis, AU recognition, spatio-temporal feature, Spontaneous expression}, pubstate = {published}, tppubtype = {conference} } In this paper, we propose a new, compact, 4D spatio-temporal “Nebula” feature to improve expression and facial movement analysis performance. Given a spatio-temporal volume, the data is voxelized and fit to a cubic polynomial. A label is assigned based on the principal curvature values, and the polar angles of the direction of least curvature are computed. The labels and angles for each feature are used to build a histogram for each region of the face. The concatenated histograms from each region give us our final feature vector. This feature description is tested on the posed expression database BU-4DFE and on a new 4D spontaneous expression database. Various region configurations, histogram sizes, and feature parameters are tested, including a non-dynamic version of the approach. The LBP-TOP approach on the texture image as well as on the depth image is also tested for comparison. The onsets of the six canonical expressions are classified for 100 subjects in BU-4DFE, while the onset, offset, and non-existence of 12 Action Units (AUs) are classified for 16 subjects from our new spontaneous database. For posed expression recognition, the Nebula feature approach shows improvement over LBPTOP on the depth images and significant improvement over the non-dynamic 3D-only approach. Moreover, the Nebula feature performs better for AU classification than the compared approaches for 11 of the AUs tested in terms of accuracy as well as Area Under Receiver Operating Characteristic Curve (AUC). |
Zhang, Xing; Yin, Lijun; Cohn, Jeffrey F; Canavan, Shaun; Reale, Michael; Horowitz, Andy; Liu, Peng A high-resolution spontaneous 3d dynamic facial expression database Conference Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, 2013, ISBN: 978-1-4673-5544-5. Abstract | Links | BibTeX | Tags: 3D facial expression, Dynamic facial expression database, FACS, Spontaneous expression @conference{Zhang2013, title = {A high-resolution spontaneous 3d dynamic facial expression database}, author = {Xing Zhang and Lijun Yin and Jeffrey F Cohn and Shaun Canavan and Michael Reale and Andy Horowitz and Peng Liu}, url = {http://ieeexplore.ieee.org/document/6553788/}, doi = {10.1109/FG.2013.6553788}, isbn = {978-1-4673-5544-5}, year = {2013}, date = {2013-04-22}, booktitle = {Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on}, abstract = {Abstract: Facial expression is central to human experience. Its efficient and valid measurement is a challenge that automated facial image analysis seeks to address. Most publically available databases are limited to 2D static images or video of posed facial behavior. Because posed and un-posed (aka “spontaneous”) facial expressions differ along several dimensions including complexity and timing, well-annotated video of un-posed facial behavior is needed. Moreover, because the face is a three-dimensional deformable object, 2D video may be insufficient, and therefore 3D video archives are needed. We present a newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The work promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action.}, keywords = {3D facial expression, Dynamic facial expression database, FACS, Spontaneous expression}, pubstate = {published}, tppubtype = {conference} } Abstract: Facial expression is central to human experience. Its efficient and valid measurement is a challenge that automated facial image analysis seeks to address. Most publically available databases are limited to 2D static images or video of posed facial behavior. Because posed and un-posed (aka “spontaneous”) facial expressions differ along several dimensions including complexity and timing, well-annotated video of un-posed facial behavior is needed. Moreover, because the face is a three-dimensional deformable object, 2D video may be insufficient, and therefore 3D video archives are needed. We present a newly developed 3D video database of spontaneous facial expressions in a diverse group of young adults. Well-validated emotion inductions were used to elicit expressions of emotion and paralinguistic communication. Frame-level ground-truth for facial actions was obtained using the Facial Action Coding System. Facial features were tracked in both 2D and 3D domains using both person-specific and generic approaches. The work promotes the exploration of 3D spatiotemporal features in subtle facial expression, better understanding of the relation between pose and motion dynamics in facial action units, and deeper understanding of naturally occurring facial action. |
2010 |
Zhang, Xing; Yin, Lijun; Gerhardstein, Peter; Hipp, Daniel Expression-driven Salient Features: Bubble-based Facial Expression Study by Hhuman and Machine Conference Multimedia and Expo (ICME), 2010 IEEE International Conference on, IEEE, 2010, ISBN: 978-1-4244-7491-2. Abstract | Links | BibTeX | Tags: bubble, emotion recognition, facial expression recognition, human computer interaction @conference{Zhang2010, title = {Expression-driven Salient Features: Bubble-based Facial Expression Study by Hhuman and Machine }, author = {Xing Zhang and Lijun Yin and Peter Gerhardstein and Daniel Hipp}, url = {http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=5583081&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5583081}, doi = {10.1109/ICME.2010.5583081}, isbn = {978-1-4244-7491-2}, year = {2010}, date = {2010-07-19}, booktitle = {Multimedia and Expo (ICME), 2010 IEEE International Conference on}, pages = {1184-1189}, publisher = {IEEE}, abstract = {Humans are able to recognize facial expressions of emotion from faces displaying a large set of confounding variables, including age, gender, ethnicity and other factors. Much work has been dedicated to attempts to characterize the process by which this highly developed capacity functions. In this paper, we propose to investigate local expression-driven features important to distinguishing facial expressions using a so-called `Bubbles' technique. The bubble technique is a kind of Gaussian masking to reveal information contributing to human perceptual categorization. We conducted experiments on factors from both human and machine. Observers are required to browse through the bubble-masked expression image and identify its category. By collecting responses from observers and analyzing them statistically we can find the facial features that humans employ for identifying different expressions. Humans appear to extract and use localized information specific to each expression for recognition. Additionally, we verify the findings by selecting the resulting features for expression classification using a conventional expression recognition algorithm with a public facial expression database.}, keywords = {bubble, emotion recognition, facial expression recognition, human computer interaction}, pubstate = {published}, tppubtype = {conference} } Humans are able to recognize facial expressions of emotion from faces displaying a large set of confounding variables, including age, gender, ethnicity and other factors. Much work has been dedicated to attempts to characterize the process by which this highly developed capacity functions. In this paper, we propose to investigate local expression-driven features important to distinguishing facial expressions using a so-called `Bubbles' technique. The bubble technique is a kind of Gaussian masking to reveal information contributing to human perceptual categorization. We conducted experiments on factors from both human and machine. Observers are required to browse through the bubble-masked expression image and identify its category. By collecting responses from observers and analyzing them statistically we can find the facial features that humans employ for identifying different expressions. Humans appear to extract and use localized information specific to each expression for recognition. Additionally, we verify the findings by selecting the resulting features for expression classification using a conventional expression recognition algorithm with a public facial expression database. |
Publication List
2016 |
Multimodal spontaneous emotion corpus for human behavior analysis Conference Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, 2016. |
2015 |
Mouth Gesture based Emotion Awareness and Interaction in Virtual Reality Conference SIGGRAPH '15 ACM SIGGRAPH 2015 Posters, (26), SIGGRAPH ACM Digital Library, 2015, ISBN: 978-1-4503-3632-1 . |
Three dimensional binary edge feature representation for pain expression analysis Conference Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on, 1 , 2015, ISBN: 978-1-4799-6026-2. |
2014 |
BP4D-Spontaneous: a high-resolution spontaneous 3D dynamic facial expression database Journal Article Image and Vision Computing, 32 (10), pp. 692-706, 2014. |
2013 |
Nebula Feature: A Space-Time Feature for Posed and Spontaneous 4D Facial Behavior Analysis Conference Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on , IEEE, 2013, ISBN: 978-1-4673-5545-2 . |
A high-resolution spontaneous 3d dynamic facial expression database Conference Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on, 2013, ISBN: 978-1-4673-5544-5. |
2010 |
Expression-driven Salient Features: Bubble-based Facial Expression Study by Hhuman and Machine Conference Multimedia and Expo (ICME), 2010 IEEE International Conference on, IEEE, 2010, ISBN: 978-1-4244-7491-2. |