A look into Reinforcement Learning (RL) and learning new skills — how to help the brain functions more effectively
Learning is defined as an act, process, or the experience of gaining knowledge or skill set. Reinforcement Learning (RL) is a theory that states how an organism or person can learn this skill set through via action and an outcome association. Learning from mistakes as one goes along a learning path that offers reinforcement and expectation can enhance the learning. High expectation offers better learning than if the expectation has a low learning value. The outcomes of the learning made while traveling along the path are coded in the brain and were suggested by midbrain dopamine neurons and increased activity with the outcomes are better than expected. The reinforcement of the learning (RL) is made to the anterior cingulate cortex and produces a measurable signal on the scalp. The signal is Feedback-Related Negativity (FRN). When the signals occur, they become indicators of the RL process and high-amplitude FRN should indicate an updating of learning from the action-outcome. The evidence up to this test was limited in showing that acquisition of the new learning was contingent or dependent on the amplitude of the negative event-relationship or potential (ERP). The RL theory suggests that an increase in FRN amplitude made during the feedback would indicate or be associated with good performance on future initiations.
This study consisted of 19 individuals between the ages of 17 and 23. Eight of the subjects were men. Twelve of those tested were right-handed. The test subjects were asked to choose between 4 response buttons as soon as possible after they were shown an item on a screen that indicated one of the numbers. If they took too long to make up their mind in their selection process (1500 ms), they received feedback that indicated “too late”. If they correctly chose the button that supported what they saw, the screen would blue, then after 1000 ms would turn green indicating they could go on to the next section learning. If the subject chose incorrectly, the screen would turn red for 1000 ms and the screen would start again. The button selection was also sequence-based. They would learn the sequence by trial and error. If they chose correction they moved onto the next sequence. If they didn’t choose correctly in the sequence, the whole sequence would start again. There were 12 steps to the sequence. However, sequences were manipulated so that in the course of the 12 items of the sequence, 3 items for the response was not considered correct until it was the first, second, third, or four choices. There was a predetermination of how many attempts were needed to get positive feedback for a particular sequence. Three types of feedback performances were recorded by an electroencephalograph. It used 61 channels mounted on an electro cap. These three types of feedback were:
As the results of the type of feedback that subjects received were manipulated, the lowest number that any of the test subjects could attain on a particular sequence was 18. Each person made on an average of 12.4 errors. Negative RL test failures consisted of choosing the same incorrect response button in the next go-around of the test item AND in choosing the incorrect response on a later encounter of the item. Positive RL failures were those that a “true” failure to correctly respond to an item while positive feedback was given during a previous encounter. The remaining failures were those which reproduced a positive reinforced response and involved mistakes by the subjects when they already received the correct response repeatedly. After calculating the data, the numbers reflected that subjects gradually made fewer errors.
Negative feedback elicited a negative deflection in the negative event-relationship or potential (ERP) and was significantly different than zero. They were able to distinguish between negative feedback from a novel or unique response as to feedback that resulted from a response that had been tried previously. Also too, the Feedback-Related Negativity (FRN) in both instances was present in both types of feedback, it was however enhanced significantly when the good negative RL and bad negative RL were compared. The “learning difference wave” was not zero but floated between 150 to 500 ms indicating that a large FRN that followed negative feedback was predictive of not seeking the response that was tried previously (a “don't do that again” reaction) the subsequent time. FRN on good positive RL trials were dramatically different depending on the number of attempts and the effect was caused when a reduced positivity was present when positive feedback was given after the fourth attempt.
Learning from mistakes is very important for success in future behavior. Response learning signals from the midbrain to the anterior cingulate cortex that reflect a high-amplitude FRN charge is an indicator that predicts learning after negative reinforcement. It also indicates good performance in the future when a subject is confronted with the same choices on a future occasion. The test results in the trial support the RL theory of FRN that says when receiving negative feedback is an action, the amplitude is more negative when the test subjects learned from the feedback and tried a response they had never tried (chose a different option).
This amplitude also gets adjusted and updated for future correct and incorrect responses. Regardless of whether it was the first, second, or third attempt in the sequence, there was no difference in the FRNs signaled after the first attempt. That said, the negative feedback was less informative after a first attempt then after the third attempt. The signal strength was less positive on the fourth attempt than on the other attempts and was attributed to expectation feedback (the subjects ran out of viable options) than if they had more options from which to pick. This is a case study, they were able to show that FRN amplitude predicts whether an association was learned and that a rich stimulus was differentially more rewarding and rewarded than another. Although difficult to interpret, FRN reflects the outcomes that show below expectations can be positive when a subject is expecting a negative outcome but that FRN does not reflect outcomes to be worse than expected (just different).
FRN amplitudes reflect the process of learning and skill acquisition after we make a mistake and are indicative of whether we learned from the mistake or will repeat it.
Essentially it boils down to having buy-in in the learning process. If there is an expectation of a positive outcome, then there is the potential to try a new way of thinking. If there is no new way of thinking or solution applied, then nothing was learned and the mistake is doomed to be repeated. When all options have been exhausted, the expectation falls off as there is only one alternative/option left. Learning involves having some element of “risk” or option in the game and the ability to look at trying new attempts to find clarity or the skill set to be learned.