Logan: Basic Principles of Learning PRINCIPLE: RATIO SCHEDULES OF REINFORCEMENT If an operant response must be repeated some number of times before reinforcement is obtained, the requisite number of responses will be emitted in a rapid burst, with a post-reinforcement pause if the response ratio is constant (fixed). The operant/instrumental response that we observe and record is typically only one component of a series of responses comprising a behavior chain. A rat running in a maze takes a number of steps and makes a number of turns enroute to the goal box. Similarly, a rat in a Skinner box must approach the bar, raise up on his haunches, place his paws on the bar, press downward, release the bar, and turn toward the reward dispenser. It is not always easy to determine when such a behavior chain begins and when it ends, but it is clear that each link of the chain is importantly dependent on the feedback cues from the preceding response in the chain. That is to say, as each link of a BEHAVIOR CHAIN is performed, the response-produced feedback provide stimuli setting the occasion for the emission of the next response in the chain. When the responses comprising a behavior chain are topographically different, we refer to a heterogeneous behavior chain. Baking a cake from scratch, assembling a bicycle, and even performing a dance step are examples of heterogenous chains that require following directions. In many cases, however, the same response must be repeated some number of times before reinforcement is received. These are called homogeneous behavior chains and pounding a nail with a hammer, whipping cream, and even engaging in sexual activity are commonplace examples. In the laboratory, a rat might be reinforced only after he has pressed the bar several times. The number of responses required to obtain reinforcement is called the RESPONSE RATIO. If the same number of responses is required each time to obtain reinforcement, it is a fixed ratio (FR). If the number of responses varies irregularly from time to time, it is a variable ratio (VR). In either case, reinforcement is always available and it is simply a matter of how rapidly the organism can emit the ratio requirement in order to obtain reinforcement. Ratio schedules of reinforcement generate high response rates. With a fixed ratio, the organism will likely pause after each reinforcement before taking off on the next run through the ratio, and the longer the ratio, the longer the pause. When the ratio is varied so that the next reinforcement is as likely to require only one or a few responses as it is to require many, there is little pausing. Fine-grain analysis of performance on a fixed ratio homogeneous behavior chain often reveals a GOAL GRADIENT, which is an increase in speed or rate of responding as the goal is approached. This has been attributed to the fact that the delay before reinforcement is received is shorter the closer one gets to completing the ratio. In actual fact, there is also a slight slowing down right near the end of the ratio. This is presumably due to the fact that the organism must stop after the ratio is complete and starting to stop becomes anticipatory. There is of course, a limit to how long a ratio can be and still maintain behavior, but that limit is remarkably long. Indeed, animals in the laboratory have been induced to perform on ratio schedules of reinforcement that do not provide sufficient nourishment to replenish what was used in emitting the ratio. This fact, combined with the high rate of schedules (especially the variable ratio schedules) potentially diabolical ones. A familiar instance of a VR schedule is the slot machine . . . Otherwise intelligent people can be induced to pay for the opportunity to pull on a lever that is reinforced on a sparse variable-ratio schedule. TERMS: Behavior chain, feedback, goal gradient, schedule of reinforcement (fixed ratio, variable ratio).