Logan: Basic Principles of Learning PRINCIPLE: INTERVAL SCHEDULES OF REINFORCEMENT If reinforcement for an operant response is made available according to a temporal schedule, a steady rate of responding will occur if the interval between reinforcements is varied; responding will show a temporal discrimination if the interval is constant (fixed). An operant response is defined as a bit of behavior that operates on or affects the environment in some way and that is freely available to the organism, at least for some period of time in a particular situation. A common experimental arrangement uses a hungry pigeon as the subject and the reinforcement is a few seconds access to grain. The pigeon is placed in an enclosed chamber (commonly called a Skinner box) on one wall of which is an illuminated key about the size of a half-dollar that the pigeon can peck. If every peck is reinforced, a continuous reinforcement (CRF) schedule is in force and the pigeon will peck every time the grain hopper is withdrawn in order to have it returned; response latency will gradually increase as the pigeon becomes satiated. An interval schedule of reinforcement is in force if key-pecking is only reinforced at particular times. When these times occur more-or-less randomly, so that there is no way for the pigeon to know when the next peck will be reinforced, pecks are emitted at a reasonable constant rate over time. On such a variable interval (VI) schedule, response rate depends most importantly on the average interreinforcement interval (the density of reinforcement). When, however, reinforcements are made available at regular times (fixed interval, FI) the pigeon pauses after each reinforcement and then begins to peck at an accelerating rate as the time for the next scheduled reinforcement draws near. The importance of operant schedules of reinforcement was first fully recognized by Skinner. Whereas earlier experimental studies of learning had relied almost exclusively on runways and mazes, Skinner showed that the fundamental Principle of Positive Reinforcement could be studied with an operant response that is freely available and, which is more, the rates at which the response is emitted varies systematically with the reinforcement schedule. (It is important to note that in this context the response is emitted rather than elicited; there is no clearly identifiable stimulus that instigates each occurrence of the response.) In effect, however, Skinner changed the emphasis from the learning process itself to the steady-state performance of a learned response. That is to say, the initial acquisition of the operant response and the rate of adjustment to the reinforcement schedule are of secondary interest. The records shown in the figure reflect the performance of pigeons that have not only been shaped to peck the key, but have been exposed to the indicated reinforcement schedules for a very long time. In the process Skinner disclaimed any interest in trying to "explain" the observed behavior. He correctly contended that we never really explain anything in an ultimate sense by the methods of science, and there is nothing to gain by speculating about why the organism does what it does when you only end up with what you began. You may think that you "understand" the FI scallop shown in the figure by reasoning that the pigeon has learned about how long it is between reinforcements, that responses get started early in anticipation of the next reinforcement and then occur at an increasing pace as the pigeon gets more and more excited about the impending food. But you have not actually said anything that you did not already know from the data themselves. Skinner's approach is pure empiricism: We can predict behavior to the extent that we know the variables of which behavior is a function, and we can control behavior to the extent that we control those variables. In this light, much of our everyday behavior can be seen as being controlled by operant schedules of reinforcement. For example, you can turn on your television set at any time of the day or night and the reinforcement schedule will depend on what types of program you find enjoyable. If there are many such programs, you are essentially on a VI schedule because something you find interesting is just as likely to be onat any time. If, however, you only enjoy some particular programs, you are on an FI schedule determined by the frequency with which they are televised. Most of us are on a complex combination of these two extremes. TERMS: Empiricism, operant level, operant response, positive reinforcement, schedule of reinforcement (continuous, fixed interval, variable interval), shaping.