Chapter Review Summary

Visual perception is a highly active process in which the perceiver goes beyond the information given in organizing and interpreting the visual input. The process must specify a figure/ground organization for the input and how the figure is organized in depth. The interpretive process is guided by certain principles, including those catalogued by the Gestalt psychologists. Importantly, though, the interpretation seems to be guided simultaneously by the input’s features and its overall configuration.

We easily recognize a wide range of objects in a wide range of circumstances. Our recognition is heavily influenced by context, which can determine how or whether we recognize an object. To study these achievements, investigators have often focused on the recognition of printed language, using this case as a microcosm within which to study how object recognition in general might proceed.

Many investigators have proposed that recognition begins with the identification of features in the (organized) input pattern. Crucial evidence for this claim comes from neuroscience studies showing that the detection of features is separate from the processes needed to assemble these features into more complex wholes.

To study word recognition, investigators often use tachistoscopic presentations. In these studies, words that appear frequently in the language are easier to identify, and so are words that have been recently viewed—an effect known as repetition priming. The data also show a pattern known as the word-superiority effect; this refers to the fact that words are more readily perceived than isolated letters. In addition, well-formed nonwords are more readily perceived than letter strings that do not conform to the rules of normal spelling. Another reliable pattern is that recognition errors, when they occur, are quite systematic, with the input typically perceived as being more regular than it actually is. These findings together indicate that recognition is influenced by the regularities that exist in our environment (e.g., the regularities of spelling patterns).

These results can be understood in terms of a network of detectors. Each detector collects input and fires when the input reaches a threshold level. A network of these detectors can accomplish a great deal; for example, it can interpret ambiguous inputs, recover from its own errors, and make inferences about barely viewed stimuli.

The feature net seems to “know” the rules of spelling and “expects” the input to conform to these rules. However, this knowledge is distributed across the entire network and emerges only through the network’s parallel processing. This setup leads to enormous efficiency in our commerce with the world because it allows us to recognize patterns and objects with relatively little input and under highly diverse circumstances. But these gains come at the cost of occasional error. This trade-off may be necessary, though, if we are to cope with the informational complexity of our world.

A feature net can be implemented in different ways—with or without inhibitory connections, for example. With some adjustments (e.g., the addition of geon detectors), the net can also recognize three-dimensional objects. However, some stimuli—for example, faces—probably are not recognized through a feature net but instead require a different sort of recognition system, one that is sensitive to relationships and configurations within the stimulus input.

The feature net also needs to be supplemented to accommodate top-down influences on object recognition. These influences can be detected in the benefits of larger contexts in facilitating recognition and in forms of priming that are plainly concept-driven rather than data-driven. These other forms of priming demand an interactive model, which merges bottom-up and top-down processes.