Exploiting Core Knowledge for Visual Object Recognition

Mark Schurgin & Jonathan Flombaum (forthcoming; please do not cite or distribute without permission)

Abstract

Humans recognize thousands of objects, and with relative tolerance to variable retinal inputs. The acquisition of this ability is not fully understood, and it remains one area in which artificial systems have yet to surpass people. We sought to investigate the learning process that supports object recognition. Specifically, we investigated learning through the association of inputs that co-occur over short periods of time. We tested the hypothesis that human perception exploits expectations about object kinematics in order to limit the scope of association to inputs that are likely to have the same token as a source. In several experiments we exposed participants to images of objects, and we then tested recognition sensitivity. Using motion, we manipulated whether successive encounters with an image took place through kinematics that implied the same or a different token as the source of those encounters. Images were injected with noise, or shown at varying orientations, and we included two manipulations of motion kinematics. Across all experiments, memory performance was better for images that had been previously encountered with kinematics that implied a single token. A model-based analysis similarly showed greater memory strength when images were shown via kinematics that implied a single token. These results suggest that constraints from physics are built into the mechanisms that support learning about objects. Such constraints —often characterized as ‘Core Knowledge’— are known to support perception and cognition broadly, even in young infants. But they have never been considered as a mechanism for learning with respect to recognition. 

Download full model outputs: click here

The manuscript includes memory strength models for all but the forced choice experiment (Experiment 2). For simplicity of reporting, full parameter values are only shown in the paper for the model applied to Experiment 1a. The Excel file linked here and above includes the full set of parameter outputs. Feel free to write to us with questions (flombaum@jhu.edu or maschurgin@jhu.edu)

Demos

Apparent Motion: Experiments 1- 3

The strength of this effect may depend on the size of your monitor and on your distance. Enlarge the video to occupy your display, fixate, and you should see two objects moving past one another in opposite horizontal directions. The moving objects are made up of transients quickly appearing and disappearing in four successive locations (each). In each movie, two of the transients are images —pictures of recognizable real-world objects. Note, in Experiment 1 the images were noisy —with pixels randomly scrambled. In Experiment 2, they were images of objects shown at different orientations.

Movie 1: Experiment 1, Continuous Motion — In this demo, the two real world images appear in the same object stream.

Movie 2: Experiment 1, Discontinuous Motion — In this demo, the two real world images appear in different object streams.

Test Phase

The test phase asked participants to judge images, one at a time, as 'Old,' 'Similar,' or 'New.' Images shown were drawn from the same three categories. We found more accurate recognition of old images as old when those images were previously seen under continuous as opposed to discontinuous motion. 

Theoretical Replication: Experiment 4. Integrating Noisy Inputs through Occlusion and Dissoclusion

Participants were instructed to fixate, and then when a red disc appeared with the two guidelines at the end of a trial, they were asked to report by keypress whether the disc was misaligned to the left or to the right.

Movie 3: Experiment 4, Continuous Motion — In this demo the two images emerge from behind the same occluder.

Movie 4: Experiment 4, Discontinuous Motion — In this demo the two images emerge from behind difference occluders.