Part of ECCV2010


Organizers

Sven Dickinson
University of Toronto
Toronto, Canada


Zygmunt Pizlo
Purdue University
West Lafayette, USA

Support:




Third International Workshop on Shape Perception in Human and Computer Vision

(SPHCV) @ ECCV2010

 

September 11, 2010
Hersonissos, Crete, Greece
 
8:45 am - 6.00 pm
Topic and Motivation

Shape has been one of the most important aspects of computer vision since the field has been established. This is hardly surprising, considering the fact that the shape of an object carries rich information about the object’s identity and function. However, while shape formed the backbone of most early recognition systems, the advent of appearance models in the 1990’s drew the spotlight away from shape. Only recently is shape starting to make a comeback in the mainstream recognition community, with contours once again starting to play a prominent role. Many classical topics, including shape hierarchies, contour-based representations, perceptual grouping, and shape priors are being revisited by today’s researchers, often without the hindsight of earlier foundational work in human and/or computer vision. This workshop will evaluate what we know about shape now and what should be done next. It will bring together prominent speakers from both human and computer vision shape perception communities to reflect on their experience and identify future research directions. This will be the third such multidisciplinary workshop devoted specifically to shape perception. The first workshop was held in October, 2008 as a part of the European Conference on Computer Vision (ECCV):

http://viper.psych.purdue.edu/workshops/iwsphcv08/

There, students and researchers in computer vision who attended the workshop had the opportunity to learn of the progress made by high-profile shape perception researchers in human vision and the challenges they face. The workshop culminated with a lively panel discussion which attempted to explore what the two communities can learn from each other and what the common issues are. The workshop was a major success, drawing a large audience to hear a set of outstanding invited speakers and an engaging discussion. The second workshop was held in August, 2009 as part of the European Conference on Visual Perception (ECVP):

http://viper.psych.purdue.edu/workshops/iwsphcv09/

There, the situation was reversed, with students and researchers in human vision having the opportunity to learn of the progress made by high-profile shape perception researchers in computer vision and the challenges they face. Following the same format as the first workshop, the workshop was again a major success.  

 

Organization and Workshop Format

The format of the one-day workshop will be 12 invited speakers (six human vision, six computer vision). Each talk will last 25 min plus 5 min for discussion. The speakers have been chosen to represent a broad cross-section of shape perception research, representing the major paradigms in both the human and computer vision communities. Speakers will be encouraged to reflect on their experience, identify critical challenges, etc., rather than present snapshots of their latest research results.

 

Location of the Workshop

The workshop is part of ECCV2010 and will be held on 11th of September 2010. For updated information about the location of the workshop please refer to the webpage of the main conference.

 

Workshop Participation

The workshop is open to all ECCV participants.


List of Speakers

Human Vision


Computer Vision

James Elder York University Doug DeCarlo
Rutgers University
Donald D. Hoffman University of California, Irvine   Lena Gorelick
University of Western Ontario
Phil Kellman University of California, Los Angeles Ian Jermyn Ariana group, INRIA
Zoe Kourtzi University of Birmingham Bernt Schiele
MPI Informatics, Saarbrücken
Bosco S. Tjan University of Southern California Anuj Srivastava
Florida State University
Christopher Tyler Smith-Kettlewell Institute
Chris Taylor University of Manchester

 

Program of the Workshop 

Workshop will run from 8:45 - 6:00pm.

Time Speaker
8:45 - 9:00 Introduction
9:00 - 9:30 Lena Gorelick, University of Western Ontario

Using Poisson Based Shape Representation for Object and Action Recognition

9:30 - 10:00 Christopher Tyler, Smith-Kettlewell Institute

3D Shape as the Natural Operating Domain of Object Processing

10:00 - 10:30 Doug DeCarlo, Rutgers University

Visual Explanations

10:30 - 11:00 Coffee Break
11:00 - 11:30 Phil Kellman, University of California, Los Angeles

Perceiving and Representing Contours and Objects: Segmentation,Grouping and Shape

11:30 - 12:00 Bernt Schiele, MPI Informatics, Saarbrücken

Back to the Future: Rediscovering the Power of Shape Models

12:00 - 12:30 James Elder, York University

On Growth and Formlets: Toward a generative model of shape

12:30 - 2:00 Lunch Break
2:00 - 2:30 Chris Taylor, University of Manchester

What can we learn to see?

2:30 - 3:00 Zoe Kourtzi, University of Birmingham

Visual learning for perceptual decisions in the human brain

3:00 - 3:30 Anuj Srivastava, Florida State University

Statistical Modeling of Shapes of Elastic Curves and Surfaces

3:30 - 4:00 Coffee Break
4:00 - 4:30 Donald Hoffman, University of California, Irvine

The Evolution of Shape Perception

4:30 - 5:00 Ian Jermyn, Ariana Group, INRIA

Shape as an emergent property of long-range interactions

5:00 - 5:30 Bosco Tjan, University of Southern California

Recognizing objects in clutter

5:30 - Panel Discussion

Abstracts of the Workshop
Doug DeCarlo

Visual Explanations

Human perceptual processes organize visual input to make the structure of the world explicit. Successful techniques for automatic depiction in computer graphics, meanwhile, create images whose structure clearly matches the visual information to be conveyed. I will discuss how analyzing these structures and realizing them in formal representations can allow computer graphics to engage with vision science, to mutual benefit. We call these representations visual explanations: their job is to account for patterns in two dimensions as evidence of a visual world. I will situate this discussion using recent work in computer graphics, and computational depiction in particular.

James Elder

On Growth and Formlets: Toward a generative model of shape

Appearance-based computer vision algorithms for object detection and recognition can often be made to run very fast. While more detailed contour shape analysis can improve performance, this typically comes at a computational cost. One might expect the human visual system to show a similar performance-time tradeoff. Surprisingly, we find that for the task of animal detection, contour shape cues appear to be most rapidly exploited. This raises the question of how the brain represents shape information to be most efficiently exploited. Physiological results suggest that at intermediate levels of the object pathway, shapes are represented as sparse population codes over localized shape components. To be most effective, these shape codes would be generative, allowing feedback to earlier visual areas to control grouping and segmentation. A major stumbling block in forming such a generative model has been the requirement of closure: that a sample drawn from the model always constitutes a valid shape. For example, any boundary of a planar shape should be a simple, closed curve, with no self-intersections. Here I will outline a sparse representation of shape that overcomes this problem. Under this formlet model, every shape is considered to be the outcome of a growth process applied to the same simple embryonic shape (e.g., a circle) embedded in the plane. This embryonic shape is then deformed through repeated application of simple, localized diffeomorphic remappings of the plane called formlets. By preserving the topology of the embedded curve, the formlet model guarantees that the set of valid shapes is closed under this deformation process, and a matching pursuit strategy allows partially- or wholly-observed shapes to be sparsely represented to arbitrary precision. The collection of formlets over scale and space constitute a shape code that may be identified with neural representations at intermediate stages of the object pathway. We demonstrate the effectiveness of the model on the problem of perceptual completion of partially-occluded shapes.

Lena Gorelick

Using Poisson Based Shape Representation for Object and Action Recognition

I will discuss methods for visual object recognition that are fundamentally based on explicit shape information present in the images and video sequences. In order to use shape for recognition we need three main components: (1) a representation of shapes, (2) a method for proposing shape hypotheses from images, and (3) a method for efficiently combining and evaluating ensembles of shape hypotheses for recognition. I will focus on Poisson-based shape representation, where each internal point of a silhouette is assigned the expected time to hit the boundaries by a random walk beginning at the point. This function can be computed by solving Poisson's equation, with the silhouette contours providing boundary conditions, and used to reliably extract many useful properties of a silhouette. Next, I will present a recent shape-based detection and figure-ground segmentation algorithm. This method relies mainly on the shape of the object as is reflected by the bottom-up segmentation. In practice, bottom-up segmentation algorithms often fail to extract complete object silhouettes. Therefore, our method applies to partial shape hypotheses (silhouettes) formed by segments at intermediate scales of the bottom-up segmentation, possibly with incomplete boundaries. We employ probabilistic shape modeling and use top-down statistical tests to evaluate ensembles of partial shape hypotheses to detect objects in the image and sharply delineate them from their background. Finally, I will briefly show how we can use the Poisson-based shape representation in the task of action recognition.

Donald Hoffman

The Evolution of Shape Perception

Does vision estimate true properties of an objective world? Most vision researchers assume that it does. We assume, for instance, that our perceptions of three-dimensional shapes are good estimates of the true shapes of physical objects, and that these true shapes are objective in the sense that they exist even when unobserved. We often justify this assumption on evolutionary grounds, claiming that truer perceptions are fitter, and that natural selection therefore shapes perceptions toward truth. This assumption can be tested using evolutionary game theory. It turns out, in general, to be false. True perceptions are too expensive in time and energy. Moreover, natural selection shapes perceptions to reflect utility, not truth. Our visual perceptions of objects in space-time are best understood as a user interface, much like the windows interface of a computer. Space-time is our desktop, and physical objects are icons of this desktop. An interface is useful because it hides the truth. The colors, shapes and positions of icons on a desktop do not accurately depict the true colors, shapes and positions of the files they represent; indeed, files have no colors or shapes. Similarly, the colors and shapes of objects we see in space-time are not accurate depictions of objective truths, but instead reflect a /Homo sapiens/ specific interface, shaped by natural selection to help us have kids, not to see the truth. This evolutionary perspective provides a new framework for understanding the visual perception of shapes and the role of Bayesian estimation in visual perception. For more, see http://www.cogsci.uci.edu/~ddhoff/PerceptualEvolution.pdf

Ian Jermyn

Shape as an emergent property of long-range interactions

Why do we model shape? To summarize the geometric properties of entities in the real world around us, to be used for a variety of purposes. What is a shape? A subset of some manifold, usually Euclidean space, 'corresponding' in some way to the entity. But shape really refers not to a single such subset but to an ensemble of subsets, or more precisely, to a probability distribution on the set of subsets. Not all probability distributions correspond to shapes, however. Some are far too generic. To correspond to a shape, a probability distribution must involve long-range dependencies between points of the subset that, given a relatively small proportion of the points, enable quite accurate prediction of the rest. Many ways of introducing such long-range dependencies are described in the literature, the most popular being the use of a reference or template shape around which variations are authorized.

In this talk, I will describe an alternative framework in which long-range dependencies are introduced explicitly. The subsets involved may be represented by their bounding contour ('higher-order active contour'), a smoothed version of their characteristic function ('phase field'), or by a binary Markov random field, each with their own characteristic advantages. The framework permits the modelling of shapes of unknown and potentially arbitrary topology, in particular the case of an arbitrary number of instances of a given shape. I will briefly describe the main applications of this framework so far, which have been to the segmentation of objects from remote sensing images. I will also describe current and future work aimed at generalizing the method to arbitrary shapes. More generally, the general notion of shape emerging from the long-range interactions of a set of bivalent variables might be of interest.

Phil Kellman

Perceiving and Representing Contours and Objects: Segmentation,Grouping and Shape

Perceiving and representing the shapes of contours and objects are among the most crucial tasks for biological and artificial vision systems. In this talk I consider and connect two important issues of shape processing. First, in ordinary environments, obtaining useful shape descriptions depends on interpolation processes that identify physically connected units despite fragmented input. I will describe recent progress in understanding spatiotemporal object formation in human vision, in which objects and shape are recovered from information that is fragmentary in both space and time. Considering the representations obtained by interpolation motivates the second part of the talk: What kinds of shape representations are used in human perception? I will use sub-symbolic outputs of early visual filtering, on one hand, and precise polynomial approximations of contour shape, on the other, as examples of what human shape representations cannot be like. Rather, computational and psychophysical considerations suggest representations that are symbolic, simpler than precise mathematical descriptions, and geared toward representing ecologically useful classifications and similarity relations. I will describe psychophysical and modeling work in which contour shape is approximated in terms of constant curvature segments as an example that provides a plausible account of some aspects, and more generally, illustrates the kinds of properties needed for successful accounts of shape perception.

Zoe Kourtzi

Visual learning for perceptual decisions in the human brain

Detecting and recognizing meaningful objects in complex environments is a critical skill for successful interactions. Despite the ease and speed with which we recognize objects, the computational challenges of visual recognition are far from trivial. Long-term experience through development and evolution and shorter-term training in adulthood have both been suggested to contribute to the optimization of visual functions that mediate our ability to interpret complex scenes. Here, we focus on the role of learning in shaping processes related to the detection and categorization of objects in cluttered scenes. We propose that the brain learns to exploit flexibly the statistics of the environment, extract the image features relevant for perceptual decisions and assign objects into meaningful categories in an adaptive manner. Further, we provide evidence for two different routes to visual learning in clutter with discrete brain plasticity signatures. Specifically, learning regularities typical in natural scenes can occur simply through frequent exposure and shapes processing in occipitotemporal regions implicated in the representation of global forms. In contrast, learning new perceptual organization rules requires task-specific training and enhances processing in intraparietal regions implicated in attention-gated learning. This work provides the first insights in understanding how long-term experience and short-term training interact to shape the optimization of visual recognition processes.

Bernt Schiele

Back to the Future: Rediscovering the Power of Shape Models

Recognizing 3D objects from arbitrary view points is one of the most fundamental problems in computer vision. A major challenge lies in the transition between the 3D geometry of objects and 2D representations that can be robustly matched to natural images. Most approaches thus rely on 2D natural images either as the sole source of training data for building an implicit 3D representation, or by enriching 3D models with natural image features. This talk discusses the recent revival of shape models that have shown to enable 3D object recognition. While many of the inherent and favorable properties of such models have been discussed already in the 80's it appears that only now we are able to take full advantages of those properties. We discuss e.g. a shape-based model that allows to easily and explicitly transfer knowledge on different levels such the individual parts’ shape and appearance information, transfer of local symmetry between parts, and transfer of part topology. We also discuss recent work that learns such shape-models from 3D CAD data as the only source of information for building a multi-view object class detector.

Anuj Srivastava

Statistical Modeling of Shapes of Elastic Curves and Surfaces

Interest in shapes of 2D and 3D objects naturally leads to shape analysis of curves and surfaces. While past research has developed some impressive tools using indirect representations of objects -- landmarks, level sets, diffeomorphic embeddings, medial axes, etc -- we are interested in a direct shape analysis of parameterized curves and surfaces. The main goal here is to develop tools for shape comparisons, optimal deformations, shape averaging, and statistical shape modeling. In this setting one needs the analysis to be invariant to re-parameterizations also, in addition to the standard shape-preserving transformations (rigid motions and global scalings). The key idea is to use mathematical representations (of curves and surfaces) and Riemannian metrics such that re-parameterization groups act by isometries, i.e. re-parameterizations preserve distances between elements. For such representations, the geodesics are computed using a path-straightening algorithm and the re-parameterizations are removed using either dynamic programming (for curves) or gradient method (for surfaces). This framework allows us to compute central moments, e.g. mean and covariance, of observed shapes and further in defining Gaussian-type distributions on shape spaces. These models are useful in statistical shape classification, hypothesis testing and Bayesian shape extractions from images. The framework is general enough to incorporate other information, such as landmarks, colors, or other annotations along with the shapes in the analysis. I will demonstrate these ideas using examples from vision, biometrics, bioinformatics, and medical image analysis.

Bosco S. Tjan

Recognizing objects in clutter

The visual environment is cluttered. Computational techniques that seem adequate for recognizing objects in an uncluttered image often fail miserably in a cluttered scene. In human vision, object recognition is impeded by clutter if the target object is in the peripheral visual field. This phenomenon of crowding, thought to be a key limitation of peripheral vision, is ubiquitous and cannot be explained by the lower spatial resolution in the periphery. Crowding in the periphery serves as a biological model for the failure of object recognition in clutter. At present, no single model can account for the multitude of empirical findings on crowding. We have recently proposed that crowding is due to the improper encoding of image statistics in peripheral V1 (Nandy & Tjan, 2009 SfN; Nandy & Tjan, 2010 VSS; Tjan & Nandy, 2010 VSS). Specifically, the image statistics of natural scenes are distorted due to a temporal overlap between spatial attention, which enables the acquisition of image statistics, and the saccadic eye movement it elicits. In terms of the mutual information between contour orientations at adjacent spatial locations, this distortion turns the veridical image statistics dominated by smooth continuation to ones biased towards repetition. By fixing all but one parameter in our model with well-known anatomical and behavioral data unrelated to crowding, we found that the spatial extent of the distortion in orientation statistics precisely reproduces the shape and spatial extent of crowding, with all its tell-tale characteristics: Bouma's Law, radial-tangential anisotropy, and inward-outward asymmetry. The reproduction is robust in the sole free parameter of the model (the hypothesized temporal overlap between attention and saccade). We predicted that a change in image statistics or saccade patterns would lead to a change in the shape and spatial extent of crowding, and found empirical evidence that supports both of these predictions. The success of our computational model of crowding suggests that low-level image statistics are critical for object recognition in clutter.

Joint work with Anirvan S. Nandy. Support: NIH EY016093, EY017707.

Chris Taylor

What can we learn to see?

Shape (and appearance) learning has proved a powerful approach in computer vision. The ability to model the common characteristics and systematic variation in a group of images is often important in its own right, whilst exploiting such prior knowledge to achieve robust interpretation of unseen images has 'moved the goal posts' in terms of what is practically achievable. The talk will review some of the basic ideas and discuss the current state-of-the-art, highlighting the limitations as well as the strengths of the approach. There are classes of problem that cannot currently be addressed properly in the standard framework - an important generic example will be posed as an open issue.

Christopher W. Tyler

3D Shape as the Natural Operating Domain of Object Processing

Models of shape processing generally begin with the identification of the 1D curve of the boundary of an object, from which interior properties and depth structure are often inferred. This approach will completely fail for objects presented in the form of a random-dot stereogram, which is specifically designed to eliminate all such boundary information and provide only the depth structure information through binocular disparity cues. Nevertheless, objects and their surface structure can readily be perceived in random-dot stereograms, and the object boundaries readily inferred from interpolation across the disparity cues. The random-dot stereogram paradigm thus reveals that human visual processing is fully capable of 3D object perception without prior boundary identification, and to do so under the sparse sampling conditions of the random-dot format. Sparse sampling is important because it is a typical feature of the visual environment. Many objects are represented only by sparse samples separated by with intervening spaces across their surfaces, particularly in the natural world of faces and manufactured world of cars, refrigerators and china plates. To perceive the 3D surface shape requires interpolation of the depth structure across spaces devoid of depth cues (an inherently different process from the oft-mentioned one of luminance and color interpolation across a frontoparallel surface). One interesting feature of the sparse sampling paradigm is that it allows testing of the domain of interpolation of object structure. Our data reveal that interpolation in human vision is limited to the generic depth domain (and is not possible within the domains of the individual cues in the absence of depth structure). Since object perception and manipulation in sparse-sampled conditions depends on surface interpolation in depth, it follows that the natural operating domain of object processing is 3D shape. Objects by their nature are 3D and have to be understood in 3D in order to be effectively perceived and manipulated from many angles. Thus, computer vision equally needs to operate on 3D shape rather than 2D profiles, which cannot be achieved without a full 3D representation of all objects of interest in the scene. A variety of approaches to achieving such a representation from the available visual information will be considered.