Chapter 5: Direct Learning and Human Potential
Control Learning and Human Potential
Just as predictive learning had a pioneer at the turn of the 20th century in Ivan Pavlov, control learning had its own in Edward Thorndike. Whereas Pavlov would probably be considered a behavioral neuroscientist today, Thorndike would most likely be considered a comparative psychologist. He studied several different species of animals including chicks, cats, and dogs and published his doctoral dissertation (1898) as well as a book (1911) entitled Animal Intelligence.
Figure 5.6 Thorndike’s puzzle box.
Thorndike created mazes and puzzle boxes for use with a variety of species, including fishes, cats, dogs, and chimpanzees (Imada and Imada, 1983). Figure 5.17 provides a sketch of a cat in a puzzle box that could be rigged up in a variety of ways. A sequence of responses was required in order to open the door leading to visible food (e.g., pulling a string, pressing a latch, etc.). A reduction in the amount of time taken to open the door indicated that control learning had occurred.
Figure 5.7 B. F. Skinner.
B.F. Skinner (1938) was a second pioneer in the study of control learning. Similar to Pavlov and Thorndike, he developed an iconic apparatus (see Figure 5.8), the operant chamber, more popularly referred to as a Skinner box. In predictive learning, there is usually a connection between a biologically significant stimulus (e.g., food or shock) and the response being studied (salivation or an increase in heart rate). In control learning, the connection between the response and food is arbitrary. There is not a genetic relationship between completing a maze or pressing a bar and the occurrence of food; there is a genetic relationship between food and salivation.
Skinner boxes have many applications and are in widespread usage not only to study adaptive learning, but also to study perception, motivation, animal cognition, psychophysiology, and psychopharmacology. Perhaps their best known application is the study of different schedules of reinforcement. Unlike a maze or puzzle box, the subject can repeatedly make a response in a Skinner-box. As we will see, the pattern of an individual’s rate of responding is sensitive to the pattern of consequences over an extended period of time.
Figure 5.8 Rat in a Skinner box.
As described above, research in predictive learning involves detecting the correlation between environmental events. Individuals acquire the ability to predict the occurrence or non-occurrence of appetitive or aversive stimuli. According to Skinner (1938), research in control learning (i.e., instrumental or operant conditioning) involves detecting contingencies between one’s behavior and subsequent events (i.e., consequences). This is the same distinction made in Chapter 1 between correlational and experimental research. Correlational research involves systematic observation of patterns of events as they occur in nature. Experimental research requires active manipulation of nature in order to determine if there is an effect. It is as though the entire animal kingdom is comprised of intuitive scientists detecting correlations among events and manipulating the environment in order to determine cause and effect. This capability enables adaptation to our diverse environmental niches. It is impossible for humans to rely upon genetic evolution to adapt to their modern conditions. We are like rats in a Skinner box, evaluating the effects of our behavior in order to adapt.
Exercise
Describe the apparatuses and procedures used for studying operant conditioning (control learning).
Skinner’s Contingency Schema
Another major contribution made by Skinner was the schema he developed to organize contingencies between behavior and consequences. He described four basic contingencies based on two considerations: did the consequence involve adding or removing a stimulus; did the consequence result in an increase or decrease in the frequency of the preceding behavior (see Figure 5.19). The four possibilities are: positive reinforcement in which adding a (presumably appetitive) stimulus increases the frequency of behavior; positive punishment in which adding a (presumably aversive) stimulus decreases the frequency of behavior; negative reinforcement in which removing a (presumably aversive) stimulus increases the frequency of behavior; negative punishment in which removing a (presumably appetitive) stimulus decreases the frequency of behavior.
Figure 5.9 Operant conditioning contingencies.
From a person’s perspective, positive reinforcement is what we ordinarily think of as receiving a reward for a particular behavior (If I do this, something good happens). Positive punishment is what we usually think of as punishment (If I do this, something bad happens). Negative reinforcement is often confused with punishment but, by definition, results in an increase in behavior. It is what we usually consider to be escaping or avoiding an aversive event (If I do this, something bad is removed, or If I do this, something bad does not happen). Examples of negative punishment would be response cost (e.g., a fine) or time out (If I do this, something good is removed, or If I do this, something good does not happen). Everyday examples would be: a child is given a star for cleaning up after playing and keeps cleaning up (positive reinforcement); a child is yelled at for teasing and the behavior decreases (positive punishment); a child raises an umbrella after (escape) it starts to rain, or before (avoidance) stepping out into the rain (both negative reinforcement); a child’s allowance is taken away for fighting (response cost), or a child is placed in the corner for fighting while others are permitted to play (time out) and fighting decreases (both negative punishment).
Video
Watch the following video for descriptions and examples of positive and negative reinforcement and punishment:
Skinner’s hedonic motivation “carrots and sticks” schema of contingencies between behaviors and consequences is familiar and intuitive. These principles are arguably the most powerful explanatory tools the discipline of psychology has provided for human behavior. They have been applied with a diversity of individuals and groups (e.g., autistic children, schizophrenic adults, normal school children, etc.) in a diversity of settings (e.g., hospitals, schools, industry, etc.) for every conceivable behavior (toilet training, academic performance, wearing seat belts, etc., etc., etc.). We will consider examples of control learning applications later.
Exercise
Describe Skinner’s schema for categorizing contingencies between behavior and consequences
Skinner’s control learning schema may be expanded to include predictive learning, thus forming a more comprehensive adaptive learning overview (Levy, 2013). Individuals acquire the ability to predict and control the occurrence and non-occurrence of appetitive and aversive events. This adaptive learning overview provides an intuitively plausible, if simplistic, portrayal of the human condition. Some things feel good (like food) and some things feel bad (like shock). We are constantly trying to maximize feeling good and minimize feeling bad. This requires being able to predict, and where possible control, events in our lives. This is one way of answering the existential question: What’s it all about?
Basic Control Learning Phenomena
Acquisition
Acquisition of a control response is different from acquisition of a predictive response. In predictive learning, two correlated events are independent of the individual’s behavior. In control learning, a specific response is required in order for an event to occur. In predictive learning, the response that is acquired is related to the second event (e.g., a preparatory response such as salivation for food). In control learning, the required response is usually arbitrary. For example, there is no “natural” relationship between bar-pressing and food for a rat or between much of our behavior and its consequences (e.g., using knives and forks when eating). This poses the question, how does the individual “discover” the required behavior?
From an adaptive learning perspective, a Skinner box has much in common with Thorndike’s puzzle box. The animal is in an enclosed space and a specific arbitrary response is required to obtain an appetitive stimulus. Still, the two apparatuses pose different challenges and were used in different ways by the investigators. Thorndike’s cats and dogs could see and smell large portions of food outside the box. The food in the Skinner box is tiny and released from a mechanical device hidden from view. Thorndike was interested in acquisition of a single response and recorded the amount of time it took for it to be acquired. Skinner developed a way to speed up acquisition of the initial response and then recorded how different variables influenced its rate of occurrence.
Thorndike’s and Skinner’s subjects were made hungry by depriving them of food before placing them in the apparatus. Since Thorndike’s animals could see and smell the food outside the puzzle box, they were immediately motivated to determine how to open the door to get out. It usually took about 2 minutes for one of Thorndike’s cats to initially make the necessary response. Unless there is residue of a food pellet in the food magazine in a Skinner box, there is no reason for a rat to engage in food-related behavior. One would need to be extremely patient to wait for a rat to discover that pressing the bar on the wall will result in food being delivered in the food magazine.
In order to speed up this process, the animal usually undergoes magazine training in which food pellets are periodically dropped into the food chamber (magazine). This procedure accomplishes two important objectives: rats have an excellent sense of smell, so they are likely to immediately discover the location of food; there is a distinct click associated with the operation of the food delivery mechanism that can be associated with the availability of food in the magazine. This makes it much easier for the animal to know when food is dispensed. Magazine training is completed when the rat, upon hearing the click, immediately goes to the food. Once magazine training is completed it is possible to use the shaping procedure to “teach” bar pressing. This involves dispensing food after successive approximations to bar pressing. One would first wait until the rat is in the vicinity of the bar before providing food. Then the rat would need to be closer, center itself in front of the bar, lift its paw, touch the bar, and finally press the bar. Common examples of behaviors frequently established through shaping with humans are: tying shoes, toilet training, bike riding, printing, reading, and writing.
In applied settings and the lab, it is possible to accelerate the shaping process by prompting the required behavior. A prompt is any stimulus that increases the likelihood of a desired response. It can be physical, gestural, or verbal. It is often effective to use these in sequence. For example, if we were trying to get a dog to roll over on command, you might start by saying “roll over” followed by physically rolling the dog. Then you might gradually eliminate the physical prompt (referred to as fading), saying “roll over” and using less force. This would be continued until you were no longer touching the dog, but simply gesturing. Imitative prompts, in which the gesture matches the desired response, are particularly common and effective with children (e.g., the game “peek-a-boo”). Getting back to our dog example, eventually, you could use fading on the gesture. Then it would be sufficient to simply say the words “roll over.” The combination of shaping, prompting, and fading is a very powerful teaching strategy for non-verbal individuals. Once words have been acquired for all the necessary components of a skill, it can be taught exclusively through the use of language. For example, “Please clean up your room by putting your toys in the chest and your clothes in the dresser.” Skinner (1986) describes the importance of speech to human accomplishments and considers plausible environmental contingencies favoring the evolutionary progression from physical to gestural to verbal prompts. He emphasizes that “sounds are effective in the dark, around corners, and when listeners are not looking.” In the following chapter we will consider speech and language in greater depth.
Exercise
Describe an example incorporating prompting and fading within the shaping procedure to establish verbal control.
Learned and Unlearned Appetitive and Aversive Stimuli
We share with the rest of the animal kingdom the need to eat and survive long enough to reproduce if our species is to continue. Similar to the distinction made in the previous chapter between primary and secondary drives, and Pavlov’s distinction between unconditioned stimuli (biologically significant events) and conditioned stimuli, Skinner differentiated between unconditioned reinforcers and punishers and conditioned reinforcers and punishers . Things related to survival such as food, water, sexual stimulation, removal of pain, and temperature regulation are reinforcing as the result of heredity. We do not need to learn to “want” to eat, although we need to learn what to eat. However, what clearly differentiates the human condition from that of other animals, and our lives from the lives of the Nukak, is the number and nature of our conditioned (learned) reinforcers and punishers.
Early in infancy, children see smiles and hear words paired with appetitive events (e.g., nursing). We saw earlier how this would lead to visual and auditory stimuli acquiring meaning. These same pairings will result in the previously neutral stimuli becoming conditioned reinforcers. That is, children growing up in the Colombian rainforest or cities in the industrialized world will increase behaviors followed by smiles and pleasant sounds. The lives of children growing up in these enormously different environments will immediately diverge. Even the feeding experience will be different, with the Nukak child being nursed under changing, sometimes dangerous, and uncomfortable conditions while the developed world child is nursed or receives formula under consistent, relatively safe, and comfortable conditions.
In Chapter 1, we considered how important caring what grade one receives is to success in school. Grades became powerful reinforcers that played a large role in your life, but not the life of a Nukak child. Grades and money are examples of generalized reinforcers . They are paired with or exchangeable for a variety of other extrinsic and social reinforcers. Grades probably have been paired with praise and perhaps extrinsic rewards as you grew up. They also provide information (feedback) concerning how well you are mastering material. Your country’s economy is a gigantic example of the application of generalized reinforcement.
It was previously mentioned that low-performing elementary-school students that receive tangible rewards after correct answers, score higher on IQ tests than students simply instructed to do their best (Edlund, 1972; Clingman & Fowler, 1976). High-performing students do not demonstrate this difference. These findings were related to Tolman and Honzik’s (1930) latent learning study described previously. Obviously, the low-performing students had the potential to perform better on the tests but were not sufficiently motivated by the instructions. Without steps taken to address this motivational difference, it is likely that these students will fall further and further behind and not have the same educational and career opportunities as those who are taught when they are young to always do their best in school.
Parents can play an enormous role in helping their children acquire the necessary attitudes and skills to succeed in and out of school. It is not necessary to provide extrinsic rewards for performance, although such procedures definitely work when administered appropriately. We can use language and reasoning to provide valuable lessons such as “You get out of life what you put into it” and “Anything worth doing is worth doing well.” Doing well in school, including earning good grades, is one application of these more generic guiding principles. In chapter 8, we will review Kohlberg’s model of moral development and discuss the importance of language as a vehicle to provide reasons for desired behavior.
Discriminative Stimuli and Warning Stimuli
The applied Skinnerian operant conditioning literature (sometimes called Applied Behavior Analysis or ABA, not to be confused with the reversal design with the same acronym), often refers to the ABCs : antecedents, behaviors, and consequences. Adaptation usually requires not only learning what to do, but under what conditions (i.e., the antecedents) to do it. The very same behavior may have different consequences in different situations. For example, whereas your friends may pat you on the back and cheer as you jump up and down at a ball game, reactions will most likely be different if you behave in the same way at the library. A discriminative stimulus signals that a particular behavior will be reinforced (i.e., followed by an appetitive stimulus), whereas a warning stimulus signals that a particular behavior will be punished (followed by an aversive event). In the example above, the ball park is a discriminative stimulus for jumping up and down whereas the library is a warning stimulus for the same behavior.
Stimulus-Response Chains
Note that these are the same procedures that establish stimuli as conditioned reinforcers and punishers. Thus, the same stimulus may have more than one function. This is most apparent in a stimulus-response chain ; a sequence of behaviors in which each response alters the environment producing the discriminative stimulus for the next response.
Our daily routines consist of many stimulus-response chains. For example:
- Using the phone: sight of phone – pick up receiver; if dial tone – dial, if busy signal – hang up; ring – wait; sound of voice – respond.
- Driving a car: sight of seat – sit; sight of keyhole – insert key; feel of key in ignition – turn key; sound of engine – put car in gear; feel of engaged gear – put foot on accelerator.
Video
Watch the following video for a description and example of a short-term stimulus-response chain:
We can also describe larger units of behavior extending over longer time intervals as consisting of stimulus-response chains. For example:
- Graduating college: Studying this book – doing well on exam; doing well on all exams and assignments – getting good course grade; getting good grades in required and elective courses – graduating.
- Getting into college: preparing for kindergarten; passing kindergarten; passing 1st grade, etc.
- Life: getting fed; getting through school; getting a job; etc., etc.
Whew, that was fast! If only it were so simple!
Exercise
Attributions
Figure 5.6 “Puzzle box” by Jacob Sussman is in the Public Domain
Figure 5.7 “B. F. Skinner” by Silly rabbit is licensed under CC BY 3.0
Figure 5.8 “Skinner box” by U3144362 is licensed under CC BY-SA 4.0
Figure 5.9 “Operant contingencies” by Box73 is licensed under CC BY 3.0
acquiring the ability to change the environment (also referred to as instrumental or operant conditioning)
consequence in which following a response by an appetitive stimulus results in an increase in the frequency
consequence in which following a response by an aversive stimulus results in a decrease in the frequency
consequence in which removing (escape) or preventing (avoidance) an aversive stimulus results in an increase in the frequency of behavior
consequence in which removing (response cost) or preventing (time out) an appetitive stimulus results in a decrease in the frequency of behavior
an increase in the frequency of a behavior as the result of a consequence
pairing of the sound of the food delivery mechanism in a Skinner box with food; this enables food to be used as a reinforcer with the shaping procedure
reinforcing successive approximations to a desired response
use of a stimulus to increase the likelihood of a desired response
reinforcers and punishers that acquire their effectiveness through genetic mechanisms (e.g., food, water, painful stimuli, etc.)
reinforcers and punishers that acquire their effectiveness through experience, either being paired with or exchangeable for other reinforcers or punishers
conditioned reinforcers paired with or exchangeable for a variety of other reinforcers (e.g., tokens and money)
learning-based approach to assessing and treating behavioral problems
antecedents, behaviors, and consequences
stimulus that signals a particular behavior will be reinforced (i.e., followed by an appetitive stimulus)
stimulus that signals a behavior will be punished (followed by an aversive event)
a sequence of behaviors in which each response alters the environment producing the discriminative stimulus for the next response