Dr. Yael Niv describes the interweaving nature of attention, learning and reinforcement and why extinction protocols don’t work
Imagine you’re on your daily commute to work, walking briskly to catch an important meeting. You arrive at that dreaded intersection whose sole purpose is to test your patience every morning. In an attempt to find the silver lining, you think to yourself “standing amidst people with no regard for their auditory acuity is how I keep up with pop music.” Of course, your valiant effort comes to a halt as you realize you’re actually listening to five different playlists all at once. “To think I can’t tell apart the hottest new songs from the dissonant sound of five songs playing simultaneously…”
Most people would remain in place as long as the pedestrian light is red. Some cities, on the other hand, are lax when it comes to jaywalking so, to some, this is a viable alternative. These two scenarios involve quite distinct cognitive processes. In one case, we’re attending to a relevant stimulus (i.e., the pedestrian light) in a stimulus-rich environment. In the other case, we are making an inference (i.e., probability of getting a ticket) because there is a lack of relevant information. According to Dr. Yael Niv, these two scenarios also involve different neural computations.
Dr. Niv’s lab used a “dimensions task” to investigate the former scenario – how we effectively attend to task-relevant information in a noisy environment – in a laboratory setting. In the task, participants were exposed to three compound stimuli at every trial. Each of those compound stimuli was itself composed of three stimuli: a face, a place and a tool. Subjects had to figure out which stimulus dimension (e.g., tool) was the relevant one through trial and error. Within that dimension, one target stimulus (e.g., hammer) was rewarded with a large value, whereas the non-target stimuli (e.g., drill and wrench) were rewarded with a lesser value. Subjects had 25 trials to figure out which dimension and target features yielded the best rewards, after which the rules changed. This meant that subjects had to relearn what cues are relevant and irrelevant once again.
Dr. Niv’s group used eye-tracking tools to assess subjects’ overt attention – what they were attending to. In addition, they used functional magnetic resonance imaging (fMRI) and multi-voxel pattern analysis (MVPA) to investigate covert attention. Specifically, a computer program was trained to predict the stimulus a subject was attending to based on the fMRI signal. By doing so, the authors could assess how attention impacts valuation and prediction error signals.
Reinforcement learning theories could not explain subjects’ performance in the task. For example, if subjects received a reward for a compound stimulus made up of Madonna, the Taj Mahal and a hammer, they were not necessarily going to continue picking those stimuli. Instead, subjects were shifting their attention from one dimension (e.g., tools) to another (e.g., faces) at different points in the game. They were essentially testing a hypothesis by assigning more weight to certain dimensions and closely anticipating the feedback they received after making a choice. Overall, the study suggested that there is a bidirectional relationship between learning and attention that facilitates learning in stimulus-rich (or multi-dimensional) environments: attention biased learning about stimulus value, while learning (from feedback) influenced what subjects attended to (i.e., led to an update of stimulus value). Dr. Niv referred to this as attention-weighted reinforcement learning (AWRL).
Among the group’s findings was that the value computations in the ventromedial prefrontal cortex (vmPFC) and reward prediction errors in the ventral striatum were biased by what the subject was attending to. This is consistent with previous work by Hare et al showing that the vmPFC responds differently to food if subjects are primed to think about its healthiness. Moreover, the activity in a fronto-parietal attention circuit was pronounced during attentional shifts between dimensions (e.g., from faces to tools). Notably, there was an enhancement in connectivity with the vmPFC at these time points. This heightened connectivity may be mediating the behavioural phenomenon described above: that attention biases learning about stimulus value and vice versa.
What about the opposite scenario – when there is a sparseness of information and a need for inference-based decision making? Dr. Niv’s group hypothesized that the orbitofrontal cortex (OFC) would be involved due to previous work showing that animals with OFC lesions did not exhibit the canonical reward prediction error spikes upon reward receipt – at least not to the same extent as sham lesioned animals. This can be interpreted as an inability to encode a cognitive map of task states – a mental representation of the distinct conditions or phases that constitute a task. If the OFC houses these cognitive constructs, then it is arguably unsurprising that lesions would blunt reward prediction errors since these animals would lack a coherent picture of the various task states. The question, then, was whether one would see distinguishable OFC activity depending on the task state.
Dr. Niv’s group devised a task where subjects had to consider observable and unobservable information to get a reward. They trained a classifier program to associate different task states (e.g., unobservable rule change vs. observable stimulus change) with OFC firing patterns. The question was whether the program could now be used to identify what task state a subject was experiencing in real-time based on a read-out of OFC firing? This was indeed the case! Notably, this could not be accomplished when the activity of other brain regions was fed into the program and used as a read-out.
The lecture ended on a very intriguing note; Dr. Niv argued that the reason extinction paradigms do not work well is because they are too dissimilar to our conditioning paradigms, resulting in two separate task states being encoded as distinct clusters of experience. This is consistent with the notion that new memory traces can compete with old ones. If this can truly explain spontaneous recovery of memory traces, then subjecting animals to an extinction paradigm that is more similar to the initial conditioning should decrease memory retrieval (i.e., spontaneous recovery). To test this, they used a gradual extinction protocol where the unconditioned stimulus (US) was still presented following the conditioned stimulus (CS), albeit less and less over time. I was fascinated to learn that gradually weaning animals off the CS-US association seemed to be more effective than classical extinction paradigms where the US is completely absent from the outset. This is in line with the idea that the mammalian brain clusters similar experiences into one representation.
Lecture title: “Carving the World Into Useful Task Representations”.
Presenter: Dr. Yael Niv