Skip to main content

Highlight

Predicting Human Gaze

Achievement/Results

NSF-funded researchers and Garrison Cottrell, Christopher Kanan, and Matthew Tong at the University of California at San Diego working together with Lingyun Zhang of ID Analytics have developed a measure of salience that incorporates task knowledge and accurately predicts human gaze during a target counting task.

The Saliency Using Natural Statistics model (SUN) utilizes three kinds of statistical knowledge about the world in choosing which areas of a scene should be fixated: what features are rare, the visual appearance of particular objects of interest, and the locations in a scene likely to contain such objects. Previous work with SUN emphasized the role of novelty as a form of bottom up salience: in addition to achieving state of the art performance in predicting gaze when free-viewing images and video, SUN’s bottom-up component is able to explain a number of search asymmetries in the literature ranging from the simple (a tilted bar surrounded by vertical bars is faster to find than a vertical bar among tilted bars) to the complex (for most Causcasians, finding an African American face among Caucasian faces is faster than finding a Caucasian face among African Americans). However, when we view the world, we typically do it with a purpose and so bottom-up saliency may often be dominated by task-driven concerns.

The recent work took SUN beyond simulated free-viewing of images to a more complex counting task of determining how many of a particular target class of objects were in each scene. This full version of the SUN model was much better able to account for human fixations performing this task than the bottom-up portion alone. Appearance played a key role in predicting fixations both to objects of interest and to objects that merely appeared similar to the target object class. The system thus made many of the same “mistakes” that people make. SUN’s predictive power was also compared with that of Torralba and colleagues’ Contextual Guidance model, achieving improved predictions and shedding light on the roles of appearance, context, and the importance of prior experience. No existing model is as good a predictor of where someone will look than knowing where other people have looked, so much work remains to be done in understanding what attracts human gaze. However, SUN shows that knowledge of feature rarity, object appearance, and likely object locations account for much of the fixations humans make.

Christopher Kanan and Matthew Tong are both NSF IGERT (Integrative Graduate Education and Research Traineeship) fellows in the Vision and Learning in Humans and Machines Traineeship program at UCSD run by Professors Virginia de Sa and Garrison Cottrell. Portions of this work will appear in Visual Cognition this year and have been presented at CoSyNe (Computational and Systems Neuroscience meeting) and the Annual Meeting of the Vision Sciences Society.

Address Goals

This model, is the best model of human eye movements at this time. Being able to model where humans look has many important applications (e.g. locating people in public areas). It may also be important in the study of disorders that involve abnormal gaze patterns (see for example our previous highlight). This model may be an important research tool for other researchers of visual attention or for people interested in applying the model in their applications.