Logan:  Basic Principles of Learning

              PRINCIPLE: INTERVAL SCHEDULES OF REINFORCEMENT

          If reinforcement for an operant response is made available 
          according to a temporal schedule, a steady rate of responding 
          will occur if the interval between reinforcements is varied;
          responding will show a temporal discrimination if the interval 
          is constant (fixed).

     An operant response is defined as a bit of behavior that operates on
or affects the environment in some way and that is freely available to
the organism, at least for some period of time in a particular situation.
A common experimental arrangement uses a hungry pigeon as the subject and
the reinforcement is a few seconds access to grain.  The pigeon is placed
in an enclosed chamber (commonly called a Skinner box) on one wall of
which is an illuminated key about the size of a half-dollar that the
pigeon can peck.  If every peck is reinforced, a continuous reinforcement
(CRF) schedule is in force and the pigeon will peck every time the grain
hopper is withdrawn in order to have it returned; response latency will
gradually increase as the pigeon becomes satiated.

     An interval schedule of reinforcement is in force if key-pecking is
only reinforced at particular times.  When these times occur more-or-less
randomly, so that there is no way for the pigeon to know when the next
peck will be reinforced, pecks are emitted at a reasonable constant rate
over time.  On such a variable interval (VI) schedule, response rate
depends most importantly on the average interreinforcement interval (the
density of reinforcement).  When, however, reinforcements are made
available at regular times (fixed interval, FI) the pigeon pauses after
each reinforcement and then begins to peck at an accelerating rate as the
time for the next scheduled reinforcement draws near.

     The importance of operant schedules of reinforcement was first fully
recognized by Skinner.  Whereas earlier experimental studies of learning
had relied almost exclusively on runways and mazes, Skinner showed that
the fundamental Principle of Positive Reinforcement could be studied with
an operant response that is freely available and, which is more, the
rates at which the response is emitted varies systematically with the
reinforcement schedule.  (It is important to note that in this context
the response is emitted rather than elicited; there is no clearly
identifiable stimulus that instigates each occurrence of the response.)
In effect, however, Skinner changed the emphasis from the learning
process itself to the steady-state performance of a learned response.
That is to say, the initial acquisition of the operant response and the
rate of adjustment to the reinforcement schedule are of secondary
interest.  The records shown in the figure reflect the performance of
pigeons that have not only been shaped to peck the key, but have been
exposed to the indicated reinforcement schedules for a very long time.

     In the process Skinner disclaimed any interest in trying to "explain"
the observed behavior.  He correctly contended that we never really 
explain anything in an ultimate sense by the methods of science, and
there is nothing to gain by speculating about why the organism does what
it does when you only end up with what you began.  You may think that you
"understand" the FI scallop shown in the figure by reasoning that the
pigeon has learned about how long it is between reinforcements, that
responses get started early in anticipation of the next reinforcement and
then occur at an increasing pace as the pigeon gets more and more excited
about the impending food.  But you have not actually said anything that 
you did not already know from the data themselves.  Skinner's approach is
pure empiricism: We can predict behavior to the extent that we know the
variables of which behavior is a function, and we can control behavior to
the extent that we control those variables.

     In this light, much of our everyday behavior can be seen as being
controlled by operant schedules of reinforcement.  For example, you can
turn on your television set at any time of the day or night and the
reinforcement schedule will depend on what types of program you find
enjoyable.  If there are many such programs, you are essentially on a VI
schedule because something you find interesting is just as likely to be
onat any time.  If, however, you only enjoy some particular programs, you
are on an FI schedule determined by the frequency with which they are
televised.  Most of us are on a complex combination of these two extremes.

TERMS: Empiricism, operant level, operant response, positive
reinforcement, schedule of reinforcement (continuous, fixed interval,
variable interval), shaping.