Research
Philosophy
Each time we engage in a moderately complex task,
we likely enlist the help of an untold number of simpler visuo-motor
operations that exist largely outside of our conscious awareness.
Consider for instance the steps involved in preparing a cup of
coffee. For the sake of simplicity, assume that the coffee has
already been brewed and is waiting in the pot, and that all of the
essential accessories, an empty cup, a spoon, a carton of cream, and
a tin of sugar, are sitting on a countertop in front of you. What is
your first step toward accomplishing this goal? The very first thing
that you might do is to move your eyes to the handle of the coffee
pot, followed shortly thereafter by the much slower movement of your
preferred hand to the same target. Because the coffee pot is hot and
the handle is relatively small, this change in fixation is needed to
guide your hand to a safe and useful place in which to grasp the
object. After lifting the pot, your eye may then dart over to the
cup. This action is needed, not only to again guide the pot to a
very specific point in space directly over the cup, but also to
provide feedback to the pouring operation so as to avoid a spill.
After sitting the pot back on the counter (an act that may or may
not require another eye movement), your gaze will likely shift to
the spoon. Lagging shortly behind this behavior may be simultaneous
movements of your hands, with your dominant hand moving toward the
sugar tin and your non-preferred hand moving to the spoon. The spoon
is a relatively small and slender object that again requires
assistance from foveal vision for grasping; the tin is a rather
bulky and indelicate object that does not require precise Visual
information to inform the grasping operation. Once the spoon is in
hand and the lid to the tin is lifted, gaze can then be directed to
the tin in order to help scoop out the correct measure of sugar. To
ensure that the spoon is kept level, a tracking operation may be
used to keep your gaze on the loaded spoon as it moves slowly to the
cup. After receiving the sugar, and following a few quick turns of
the spoon, your coffee would finally be ready to drink (see Land et
al., 1998, for a similarly framed example).
![]()
Projects
Developing a neurocomputational model of eye movements during Visual
search
see all projects
The need for a computationally explicit model of
eye movements in a search task is an extremely important step toward
understanding the Visual
routines and base representations
underlying search behavior. Ongoing work in our lab is attempting to
extend a well-established saliency map conception of search (meaning
that items in a search display are processed in proportion to their
similarity to the target) to include real-world objects and eye
movement behavior. Such extensions to real-world search are not
trivial and require an interdisciplinary effort to be successful.
The computer vision community has a great deal of experience in
representing real-world objects, but far less experience in the
behavioral techniques needed to test these representational schemes.
The cognitive psychological community has elaborate methods for
describing complex behavior, but far less experience in the formal
representation of real-world objects. As a result of these mutual
limitations, no computationally explicit theory of eye movements
during real-world search had been validated by behavioral data, and
no behaviorally explicit theory of oculomotor search had been
implemented as a computational model.
In a collaboration with Rajesh Rao, Mary Hayhoe, and Dana Ballard,
we developed a computational model of Visual
search to explain the
pattern of oculomotor behavior reported in Zelinsky et al. (1997).
By combining image processing techniques from computer vision with
biological constraints identified in the computational neuroscience
community, this interdisciplinary model represents arbitrarily
complex Visual
patterns as high-dimensional vectors of feature
properties (i.e., colors, orientations, spatial scales, etc.). A
simple Visual
routine consisting of the sequential coarse-to-fine
application of spatial filters then causes simulated gaze to move
toward the target. We tested this model by collecting eye movement
data from human observers searching for real-world targets, then
inputting these same scenes to the model and comparing the simulated
sequence of saccades and fixations to the human behavioral data. The
results revealed a qualitative similarity between the Zelinsky et
al. (1997) pattern of results and the simulated gaze patterns
generated by the model (Rao et al., 1996, 2002).
More recent work conducted at SBU has modified and extended this
model in several key respects. First, the base representation used
by Rao et al. (2002) assumed a uniform clarity to the scene being
viewed regardless of where gaze was positioned in the image. Humans,
however, have a fovea that limits high Visual
acuity to only the
region of the image that we are looking at directly with our eyes.
In order to bring the model and human representational constraints
into closer agreement, we created for the model a simplified
simulated retina. The information available from each fixation is
therefore acuity constrained much like human vision, requiring the
model now to move its simulated fovea over the scene to acquire new
information as it searches for a target. We also abandon the Visual
routine used in the Rao et al. (2002) model in favor of a more
dynamic method of driving gaze to the search target. As in the
earlier model, this approach also uses filter-based image processing
techniques to represent real-world targets and search displays, then
compares these target and display representations to derive a
salience map indicating likely target candidates. However, rather
than applying a hard-wired coarse-to-fine filtering scheme, the
target of a simulated saccade is now determined by the spatial
average of activity on this map, with this average changing over
time as a moving threshold removes those salience map points
offering the least evidence for the target. As a result of this
threshold pruning points from the salience map, a sequence of eye
movements is produced that eventually aligns simulated gaze with the
model's best guess as to the target's location. We are currently
testing this routine by comparing the simulated oculomotor scanpaths
to the scanpaths of human observers viewing the same displays and
searching for the same targets. Preliminary findings reveal
considerable spatio-temporal agreement between these gaze patterns,
both at an aggregate level (e.g., general tradeoffs between saccade
latency and accuracy) as well as in the behavior of individual
observers (Zelinsky, 1999a, 2000a, 2000b, 2002, 2003a, 2003b).
