\" cat head file_name|tbl|xeqn|xroff -ms
.bp
.EH '\fBScientific Computation''Computers\fR %'
.OH '\fBScientific Computation''Computers\fR %'
.sp 5
.ps 14
.ce 
.B
1. Computers 
.R
.NL
.sp 3
.NH
Introduction
.PP
In this chapter we will explore what computers are and what they can do, on
two extreme levels, the very general and the quite specific in terms of the
simplest practical devices.
The discussion in terms of prototype computers serves to introduce 
a variety of concepts in computing relevant to more sophisticated
language and programming concepts to be developed in the following chapters.
.PP
Understanding how humans may communicate with computers first requires
understanding how computers "think" (or work).
Much of this chapter is therefore devoted to the ways computers handle
information.
.SH
Models 
.PP
There are some who believe that it is possible to create truly isolated
systems. Those who work with abstract concepts, such as mathematicians,
tend to fall into this group. The truth is, nothing is totally independent.
Even in mathematics a look into history shows that major mathematical
structures developed out of practical considerations; 
the origins of arithmetic are lost in the dawn of economic trade,
geometry developed as a language to describe the shapes of objects,
algebra developed as a symbolic representation of geometry,
calculus has been traced to attempts to settle land boundary disputes among 
the farmers along the ever-changing Nile River.
Differential equations and tensor calculus were invented by Newton and
Einstein, respectively, to describe theoretical models of mechanical systems.
Euclidean vs fractal geometry.
.PP
We will be interested in \fBrepresentations\fR of actual or possible
physical systems, called \fBmodels\fR.
Mathematics is the language of choice of scientists to describe models
because it is concise, precise and manipulatable.
The problem is that only the simplest models lend themselves to explicit
mathematical analysis.
More realistic models many times become "intractable" mathematically,
due to their inherent complexity (e.g. high dimensionality), 
necessitating some degree of approximation in their investigation.
We will not be interested so much in the origin or development of models
as in their exploration or consequences.
Once a model is defined, it becomes necessary to explore its confirmatory
and predictive power.
In this sense, computers may be thought of anthropomorphically as mental 
prosthetic devices which can enhance human mentality.
.NH
Types of Computers
.PP
Not only is there a great variety of computers, but we will see there are
several ways to view a given computer, and even differing concepts of what
computers really are.
.PP
One definition of a \fBcomputer\fR is \fIany system that responds to its 
environment\fR.
According to this very broad definition, computers range from subatomic 
particles to the universe itself. In between are humans and other life forms 
and manufactured machines.
We will focus in on man-made physical devices (\fBhardware\fR)
which perform tasks according to deterministic prescriptions
which may be encoded in fixed or variable (\fBsoftware\fR) hardware
components.
.SH
Virtual Computers 
.PP
Functionally, computers process information (\fBdata\fR) received as
\fBinput\fR to produce more information as \fBoutput\fR.
From this view, \fIcomputers communicate, store and process information\fR.
Information may be categorized into two general classes, \fBdata\fR and 
\fBinstructions\fR.
.PP
In addition to the "hardware" perspective of computers as machines or
mechanisms, there is an internal view and an external view of computers.
The internal view is how the computer would describe itself (if it could).
The "consciousness" of a computer could range from mere recognition of an
external environment, to a biological psyche, to the software operating
system of the electronic programmable computer.
\fBOperating systems\fR are the automated managers of the activity of
computers and are the subject of the next chapter.
The external view is an observer's description of the computer, and has a
similar range of opinion, depending on the system and the observer.
The description of the external viewer may be illusory, in which case the
system is said to be a \fBvirtual\fR system.
The traditional external view of computers is that of the \fBuser\fR
(usually a human) communicating through some transducer \fBdevice\fR, to a
\fBmachine\fR, usually an electronic computer.
Users don't ordinarily see what goes on "inside" the machine, but just as
with watches and personalities, there can be an advantage in understanding
the internal workings of the system to avoid misunderstanding.
.SH
Digital Electronic Computers
.PP
\fBAnalogue\fR computers work in continuous space (described by real,
or fractional numbers), while \fBdigital\fR computers operate in 
discrete space (described by integer, or whole numbers).
Examples of analogue computers include slide rules (with logarithmic
scales for table look-up and multiplication corresponding to addition of
logs), speedometers, and airport wind socks.
Electrons are the working medium of \fBelectronic computers\fR and their
storage, retrieval and directed movement comprise basic computer operations.
Digital computers ultimately represent \fBinformation\fR in terms of
two electronic states, "on" or "off" through current or magnetization in one 
direction or another, called \fBbinary\fR states.
Examples of digital computers include "pocket" calculators and PCs
(Personal Computers), as well as supercomputers.
The information is understood through \fBencoding\fR conventions (see the
section on Coding, below).
.PP
The essential hardware components of man-made digital computers store
information in physical \fBmemory\fR, communicate it through
\fBinput/ouput (I/O) devices\fR and manipulate it through \fBprocessors\fR.
There are several categories of memory hardware depending on the physical 
\fImedium\fR, such as integrated microcircuit (IC) chip, magnetic tape, 
"compact" disk (CD), "floppy" (flexible) and "hard" (inflexible) disk, 
physical \fIlocation\fR, such as internal (core) or external to the main 
housing, \fIcapability\fR, such as read/write programmable (changeable), ROM
("read-only"), non-permanent (volatile),
and \fIfunction\fR, such as supportive of processing (cache/register), 
throughput (buffer), and storage.
External I/O devices include keyboards and video displays ("terminals"), 
printers, telephone modems, and network transceivers.
Information is manipulated by by a variety of processor modules, or "units",
including central, or \fBCPU\fR, arithmetic/logic, or \fBALU\fR, 
memory management, or \fBMMU\fR, and
coprocessors and peripheral, or \fBPPU\fR processors.
.SH
Computers as Tools
.PP
Another perception of computers is as tools to solve problems.
In terms of the structure of traditional digital computers, they may be
thought of as systems capable of answering questions put to them.
The communications can be expressed in a variety of languages, including
mathematical, procedural (programming) and "natural" (human speech) language. 
.SH
Computer Architectures 
.PP
The type and arrangement of components (both hardware and software) in a 
computer is referred to as the computer's \fBarchitecture\fR.
What an external user communicates with usually has a number of layers of
virtual computers beneath the surface, each with its architecture, 
eventually arriving at the hardware, or "bare metal".
.PP
Digital electronic computers with a single CPU are \fBsequential\fR
machines. CPUs, which contain millions of microelectronic components (to
encode and decode instructions, move and operate on data, and maintain
program control)
are much more complicated than memory chips, so traditional architectures
evolved from one CPU, with possibly a specialized floating point (real
number) coprocessor and a number of device and memory managing PPUs, together
with much memory, to a small number of CPUs centrally managed, to larger 
numbers (array processors, hypercube topologies) to massively parallel 
architectures of many CPUs, each with small associated memory.
There has been an accompanying evolution from \fBsequential\fR (linear or
one-dimensional scalar) programming to \fBvector\fR (multicomponent) to
\fBparallel\fR (multidimensional) program design.
.PP
Limitations of space and time are important considerations for digital
computers.
Computers have finite space (memory) which limits the number of possible 
data and instruction states. Since it requires more than one parameter to
describe the state of any physical system (e.g. position of a particle in
space), no computer is capable of describing even its own state entirely.
On a more practical level, if a finite-state computer ever returns to a 
previous state, it will repeat the following states in cyclic fashion.
Consequently, finite word sizes imply that truly random values cannot be
generated indefinitely.
.SH
How Simple Can it Be?
.PP
What is the simplest computer we can think of? 
Imagine a machine whose only components are a simple (but of infinite
capacity) external memory device, such as a magnetic tape with places to store 
values from a finite alphabet of symbols, a device ("head") which can move 
only one position at a time along the tape and can read or write (replace) 
a single symbol,
and some mechanism to recognize a finite set of instructions and decide,
given what is written on the tape at the current position, what to rewrite on 
the tape at that position and which direction to move. 
The current instruction and contents of the tape determines the "state" of the 
machine, and the instructions themselves are called "transitions" because 
they cause changes in the state of the machine.
Initially the tape head is positioned at one end of the "input" string
stored on the tape. The machine reads the symbol at that point and begins
to take action based on the symbol and the first instruction. Reading,
writing and moving continues along the tape until an "halt" instruction is
encountered in the instruction set. At that point the contents of the tape
contain the results, or "output" of the process. 
.PP
This simple device is named for its inventor, the mathematician Alan
Turing, who discussed its virtues in 1936 (before any digital computers were
invented).
Can such a simple machine compute anything at all? Can it compute anything of
value? Directly it cannot even add two numbers without being programmed.
It is an \fBabstract\fR (virtual) machine, or automoton.\**
.FS
Early computing devices were called \fBautomotons\fR after a 19th Century
"automatic" chess playing box with a chess set on top, 
which actually concealed a human inside.
.FE
However, not only is a Turing machine able to perform useful tasks, it is 
claimed that \fIa Turing Machine is capable of performing any conceivable 
computable task!\fR 
Such a machine is termed \fBa universal computing machine\fR.
Of course complex processes might be enormously complex to program on such
a simple device (which is why we use more complicated computers and
higher-level languages), but it is encouraging to realize that the most
sophisticated supercomputer is no more powerful than the humblest
home computer (although much more efficient). In addition, all programming
languages are equivalent in power to solve problems, the only substantial
differences being convenience and efficiency.
And, of course the (finite) alphabet, the symbols of which are used to
encode information can be as simple as the binary alphabet with no loss in
generality.
.PP
Now consider another simple machine. This one consists of a string of symbols,
possibly binary, with only one simple rule for changing the symbols.
The state of the machine is determined solely by the symbols and the rule.
At each position in the string, the new state of that position is
determined only by its current state and some combination of the states of
the nearby surrounding positions.
Such a machine is called a \fBcellular automoton\fR, and was the subject of
study by the "father of computing", mathematician John von Neuman.
von Neuman was investigating the intriging question whether a machine could
be constructed that could make an exact copy of itself, a mechanical system
analogous to reproduction in biology.
It has been shown that Celular Automata are also universal computing
machines, equivalent to Turing Machines.
.PP
The Turing Machine (TM) and the Cellular Automoton (CA) are examples of 
\fBfinite state machines\fR, and are prototypes of linear and parallel
computer architectures. They are complementary in design in that the one (TM)
has few states and many rules in the instruction set, while the other has
many states and few rules.
Sometimes computers which are designed according to the TM concept are
referred to as "von Neuman architectures", and cellular automota as
"non-von Neuman architectures", but von Neuman actually worked on the
development of both.
.NH
Information Representation
.SH 
Language
.PP
Information is stored, communicated and processed in symbolic or
\fBcoded\fR form, such as speech, text or electronic signals.
The arrangement of the symbols (grammar) and their meaning (semantics)
which are agreed to by convention, or definition is called \fBlanguage\fR.
.PP
Written (or displayed) information is constructed from the simplest
symbols, called the \fBalphabet\fR of the language.
Two types of coding between humans and computers use the
English (or some other "natural", or human language) alphabet, consisting
of 52 symbols {a-z,A-Z}, and the number alphabet, consisting of digits,
{0-9} in decimal systems. Punctuation conveys additional contextual meaning; 
the "dot" can mean end of sentence (period) or beginning of fraction 
(decimal point).
(Multiple interpretations of the same symbol is called \fBoverloading\fR.)
.PP
Combining symbols into sequences, or strings, called \fBwords\fR generates 
variety and extends the capacity of the alphabet symbols to convey information.
For example a, ad, add, ... zebra each have a defined interpretation in
English (stored in dictionaries), and 1, 12, 123, ... have well-defined
interpretations in mathematics (stored in a formula, see below).
.SH 
Bits and Words
.PP
At the lowest levels, information in digital computers is coded in an alphabet 
of binary digits, or \fBbits\fR (represented with 0 and 1 with numerals), and 
ordered collections, or \fBstrings\fR of bits called \fBwords\fR, usually of 
a fixed number (length). 
It is like an alien culture that has only one digit on each of two hands.
However, just as a person who types on a typewriter (or keyboard) with two
fingers can type anything a typist can who uses all ten digits,
a binary system can represent any information any other number system can.
Our task is to try and understand how to two-digit aliens (digital computers) 
thinks and how to communicate with them.
.PP
Like the two-finger typist, there is a relationship between efficiency
and flexibility.
Thus, a binary digit represents only two different patterns, but a letter can
represent 26 different possibilities. Similarly it would take more bits to
represent the same information than it would take using a richer alphabet.
.PP
IBM introduced the 8-bit \fBbyte\fR in the 1960s, with the result that words 
are commonly defined in multiples of 8-bit bytes, such as 16, 24 and 32 bits.
.PP
As we shall see, the contents of a word, consisting of a sequence of bits, 
such as 1100101, can have several different interpretations depending on the 
context, including the binary number equivalent to 101 in decimal notation, 
the alphabetic character e, the decimal integer \-4123, 
the real number $6.02~x~10 sup 23$, or even an instruction, 
such as ADD two numbers.
.SH
Information
.PP
The term "bit" connotes a "bit of information", formalized in the
definition of information content from the field of information theory:
.EQ
I~=~lg (n)
.EN
where I is the information content, lg denotes base-2 logarithm, and n is
the number of states, or possible patterns, in a string of bits, or word. 
Thus one-bit words have 2 states and information content $lg(2)~=~1$.
They can store the answer to a single question with possible answers of "yes"
or "no", "true" or "false", "right" or "wrong", "heads" or "tails", 
"on" or "off", "0" or "1", etc.
Note that the logarithm function converts a product argument into a sum,
so that information is additive for combinations of multiply independent 
states.
2-bit words have $2 sup 2~=~4$ states and information content 
$lg ( 2 sup 2 )~=~2$, and thus can store the answer to 2 simultaneous yes/no
questions, while
8-bit words have $2 sup 8~=~256$ states and information content 
$lg ( 2 sup 8 )~=~8$, and can encode the possible sequence of answers to 8 
simultaneous yes/no questions.
A connection between information theory and probability theory 
may be made by assigning a probability $p sub i$ to each pattern or state
i of a word.
The information gained by knowing that (any) one state has been achieved
is
.EQ (1.1)
DELTA I~=~- sum from i to n {p sub i lg ( p sub i )} 
.EN
where n is the total number of possible patterns.
Information theory provides a foundation for the field of
statistical thermodynamics, which assumes equal a priori probabilities of 
states and identifies entropy with information and thermal equilibrium with 
maximum entropy.
.SH
Natural Numbers and Bases
.PP
Humans naturally have 10 fingers ("digits"), which led to a natural number
"alphabet" of 10 symbols {0,1,...9}, or \fBdecimal system\fR.
The number of different digit symbols in a number "alphabet" is called the 
\fBbase\fR of the number system. Decimal numbers are base-10.
.PP
To represent numbers larger than the largest single digit, written language 
uses strings of digits with the "positional convention", meaning 
that a digit at a given position, or place in the string, multiplies a certain 
power of the base. The power of the base increases from 0 at the extreme
right and increases by one power for each shift in position to the left.
For example, in base-10, or decimal notation,
.EQ
123 sub 10~=~(3~x~10 sup 0~+~2~x~10 sup 1~+~1~x~10 sup 2 ) sub 10~=~(3~x~1~+~2~x~10~+~1~x~100) sub 10~=~(3~+~20~+~100) sub 10
.EN
where the subscript 10 reminds us of the base.
Note that these numbers are written (recorded) and read (scanned) from 
left to right (by our convention), but the
positional interpretation decodes the number from right to left.
The right-most digits of a number are called the \fBleast significant digits\fR
and those on the left are the \fBmost significant digits\fR.
Practice permits rapid human scanning of smaller numbers, but larger
numbers are not so easy to interpret. This is why they are sometimes
separated into groups of three digits using commas.
Fractional numbers can be represented by separating the fractional part from 
the "whole" number part with a dot ("decimal" point) and continuing with
negative powers of the base,
.EQ
123.456 sub 10~=~( 3~x~10 sup 0~+~2~x~10 sup 1~+~1~x~10 sup 2~+~4~x~10 sup -1~+~5~x~10 sup -2~+~6~x~10 sup -3 ) sub 10
.EN
.EQ
=~(3~+~20~+~100~+~.4~+~.05~+~.006) sub 10
.EN
.PP
If humans naturally had \fIeight\fR fingers on each of two hands, they might have 
naturally developed the base-16, or \fBhexadecimal\fR system. We can use the 
first 10 decimal symbols for the first ten numbers in the base-16 system, but
need six more to complete the hexadecimal alphabet. The first six letters of 
the writing alphabet are commonly used to represent numeric digits in
base-16, so the entire alphabet consists of {0-9,A-F}. The number 
$123 sub 16$ has a different decimal value from $123 sub 10$:
.EQ
123 sub 16~=~(3~x~16 sup 0~+~2~x~16 sup 1~+~1~x~16 sup 2 ) sub 10~=~(3~x~1~+~2~x~16~+~1~x~256) sub 10~=~(3~+~32~+~256) sub 10~=~291 sub 10
.EN
Note that a procedure for converting a hexadecimal number into a decimal
number has been demonstrated here: convert each number in the exapansion to
decimal and complete the arithmetic operations in decimal.
.PP
Digital computers ultimately use \fBbinary\fR numbers from the set {0,1},
similar to an alien culture that has only one digit on each of two hands.
Note that the aliens can count beyond 1 (or $1 sub 2$) the same way we
count beyond 9, namely using a positional convention.
It would be incorrect to write $123 sub 2$ because there are no 2 or 3
symbols in the binary alphabet. For the same reason, 12F would be a legitimate
hexadecimal number, but an invalid decimal number.
Here is an example of a binary number and its decimal equivalent:
.EQ
101 sub 2~=~(1~x~2 sup 0~+~0~x~2 sup 1~+~1~x~2 sup 2 ) sub 10~=~(1~x~1~+~0~x~2~+~1~x~4) sub 10~=~(1~+~0~+~4) sub 10~=~5 sub 10
.EN
.PP
By way of summary, the left-to-right positional convention notation of a
sequence (string) of n digits, $d sub i ,~i~=~n-1, .. 0$, in an arbitrary base 
b means
.EQ (1.2)
( d sub n-1 d sub n-2 ... d sub 0 . d sub -1 d sub -2 ... d sub -m ) sub b~=~sum from i=-m to n-1 {d sub i b sup n}
.EN
.SH
Base Conversion
.PP
Conversions from a non-decimal base to a decimal base are straight-forward.
The reverse conversion, from decimal to non-decimal, requires more effort,
as the factor of each power of the non-decimal base must be determined.
A general way to convert between bases expands the number first in the
given base, then converts each value in the new base, and finally carries out the
arithmetic operations of the expansion to construct the positional
convention representation of the number in the new base.
For example,
.EQ
123 sub 10~=~(3~x~10 sup 0~+~2~x~10 sup 1~+~1~x~10 sup 2 ) sub 10
.EN
.EQ
=~(11~x~1010 sup 0~+~10~x~1010 sup 1~+~1~x~1010 sup 10 ) sub 2~=~(11~x~1~+~10~x~1010~+~1~x~1100100) sub 2~
.EN
.EQ
=~(11~+~10100~+~1100100) sub 2~=~1111011 sub 2
.EN
.EQ
=~(3~x~A sup 0~+~2~x~A sup 1~+~1~x~A sup 2 ) sub 16~=~(3~x~1~+~2~x~A~+~1~x~64) sub 16~
.EN
.EQ
=~(3~+~14~+~64) sub 2~=~76 sub 16
.EN
Note that $10 sub 10$ becomes $1010 sub 2$ (in base 2) or $A sub 16$ in
base sixteen), 
etc.
A more efficient computational method finds the largest power of the base that
will divide the decimal number, identifies the modulus (whole number
quotient) with the factor of that power, and repeats the procedure with the 
remainder, until the remainder is less in value than the base.
.PP
As a final example, here are the binary, hexadecimal and decimal 
representations of the maximum value an 8-bit binary number can have,
which is also the maximum value of a 2-digit hexadecimal number:
.EQ
11111111 sub 2~=~255 sub 10~=~FF sub 16
.EN
Note that the larger the base, the fewer the digits required to represent the 
same value.
.SH
Integer Arithmetic
.PP
Digital computers ordinarily support integer arithmetic
operations in the hardware of the CPU. Binary arithmetic uses the same
procedures as decimal arithmetic, with "carrys" for addition, multiple
additions or subtractions for multiplication and division, etc.
(The rules of arithmetic are independent of the base.)
For example, we add 2 to 5 and multiply 2 by 5 in "longhand" with pencil and 
paper (using the binary number representation for $5 sub 10$ above) as follows:
.TS
center;
r.
$101 sub 2$ $=~(1~x~2 sup 0~+~0~x~2 sup 1~+~1~x~2 sup 2 )$ $=~5 sub 10$
$+010 sub 2$ $=~(0~x~2 sup 0~+~1~x~2 sup 1~+~0~x~2 sup 2 )$ $=~2 sub 10$
_
$111 sub 2$ $=~(1~x~2 sup 0~+~1~x~2 sup 1~+~1~x~2 sup 2 )$ $=~7 sub 10$
.TE
.EQ
2 sub 10~x~5 sub 10~=~010 sub 2~x~101 sub 2~=~(0~x~2 sup 0~+~1~x~2 sup 1~+~0~x~2 sup 2 )~x~(1~x~2 sup 0~+~0~x~2 sup 1~+~1~x~2 sup 2 ) sub 2
.EN
.EQ
=~( 1~x~2 sup 1 ) sub 2~x~(1~x~2 sup 0~+~1~x~2 sup 2 ) sub 2~=~( 1~x~2 sup 1~+~1~x~2 sup 3 ) sub 2
.EN
.EQ
=~1010 sub 2~=~( 2~+~8 ) sub 10 ~=~10 sub 10
.EN
.PP
Note that multiplication causes the word size of the result to grow longer 
over that of the operands. Also notice that multiplication by 2 in base 2 can 
be performed simply by \fIshifting\fR all the binary digits of the word one
place to the \fIleft\fR (just as multiplication by 10 in decimal shifts left
by one place). Similarly, division by two in binary can be performed with a 
right shift.
.SH
Integer Number Coding Schemes
.PP
Although numbers are represented ultimately in the natural language of
digital computers, namely binary, there are a number of alternative
representations, each with certain advantages and disadvantages.
.PP
The number of different integers that can be represented in a computer with
a limited or fixed word length architecture is limited by the size of the word.
A k-bit word and store $2 sup k$ different values. Thus a 16-bit word
can store a maximum of $2 sup 16~=~65,536$ different integers. Which
\fIrange\fR of integers the 65,536 different binary codes represent 
(e.g. 1 to 65,536, 0 to 65,535, -32,768 to +32,767, etc) is
ultimately defined by the internal architecture.
.PP
To illustrate alternative binary code representations of integers, we will
consider k = 3-bit words, having $2 sup 3~=~8$ unique values.
Table 1.1 below shows the different bit patterns for an 8-bit word, listed
in increasing binary value, along with five different decimal interpretations.
The \fInatural\fR interpretation corresponds to ordinary binary
positional representation, where each digit is the multiplier of a different 
power of two, as described in the section Natural Numbers and Bases above.
The disadvantage of the natural interpretation is that it is restricted to
positive integers; negative numbers are excluded. The remaining
four interpretations include negative numbers (at the expense of reducing
the range of the absolute magnitudes that can be represented).
A procedure for calculating a negative number from the corresponding
positive number with the same magnitude is also given in the table.
The \fIoffset\fR representation simply wraps the numbers around to include
negative values. The \fIsigned\fR representation has the advantage of
encoding the sign of the integer in the the most significant bit, but has
the disadvantage of having two representations for the number zero (which
has no positive or negative property).
In \fIone's-complement\fR representation, positive numbers are converted to 
negative numbers with the same magnitude and \fIvice versa\fR simply by 
changing all zeros to ones and all ones to zeros, a very efficient hardware
process called \fBcomplementation\fR (or inversion, symbolized NOT in logic
and denoted by the ~ symbol in the table).
The redundancy in the number zero is removed in \fItwo's-complement\fR 
representation, thereby extending the range of numbers that can be represented.
.sp
.B
.ce
Table 1.1 Integer Data Types
.TS
center box tab(|);
c s s s s s
l c c c c c.
\fBDecimal Equivalent Codes for Binary Integers
\fIBit pattern|Natural|Offset|Signed|1-comp.|2-comp.\fR
_
$000 sub 2$   |0      |\-4   | +0   | +0    | +0    
$001 sub 2$   |1      |\-3   | +1   | +1    | +1    
$010 sub 2$   |2      |\-2   | +2   | +2    | +2    
$011 sub 2$   |3      |\-1   | +3   | +3    | +3    
$100 sub 2$   |4      | +0   |\-0   |\-3    |\-4    
$101 sub 2$   |5      | +1   |\-1   |\-2    |\-3    
$110 sub 2$   |6      | +2   |\-2   |\-1    |\-2    
$111 sub 2$   |7      | +3   |\-3   |\-0    |\-1    
=
$\-n sub 10~=$|\-     |~$n+1$|~$MSB$|~$n$   |~$n+1$ 
_
$(MAX+1) sub 10$|0    | +0   |\-0    |\-3   |\-4    
$(MIN\-1) sub 10$|0   |\-4   |\-2    |\ 3   |\ 3    
.TE
.PP
Operations on limited sized words can result in values outside the range of 
representation, called \fBunderflow\fR (too small to represent) and 
\fBoverflow\fR (too large to represent).
For example, a 32-bit word with one sign bit can accommodate integers between
$- 2 sup 31~=~-2147483648 sub 10$ and $+ 2 sup 31 -1~=~+2147483647 sub 10$
(0 is one of the numbers) in two's-complement representation.
Arithmetic operations (including division by 0) resulting in numbers larger 
than a few billion in magnitude will be out of range.
What happens when underflow or overflow occurs depends on the design of the 
system; the overflow bit may be trunctated, or allowed to overwrite the 
adjacent word in memory.
Usually an error flag is set, which may or may not have 
further consequences depending on how the operating system of the computer
handles such errors.
(Single-user computers, such as pocket calculators may display \f4ERROR\fR and 
halt on overflow, until "reset", which would be inappropriate for multitasking 
computers.)
We will demonstrate how surprising results may result if overflow is 
\fIignored\fR.
Consider the effect of adding 1 to the maximum integer, MAX, allowed
according to a given representation. 
The value resulting from the addition will depend on how the integers are 
represented in binary, and may or may not exceed the fixed word size. 
For a 3-bit word, Table 1.1 shows the interpretation of the result of the 
operation MAX + 1 in the various coding schemes.
Similar surprising values may result if underflow is ignored on subtraction of 
1 from the minimum valid integer, MIN; refer to Table 1.1.
(Subtraction is often performed using the efficient hardware operations of
complementation followed by incrementation by unity: A \- B = A + ~B + 1.)
.SH
Real Number Coding Schemes
.PP
Not only is the range of numbers restricted in integer representation, but
fractional, or \fBreal\fR numbers have no representation at all.
(It may be possible to write 3/4 as a ratio of two integers, but unless
fraction integer arithmetic is supported, an integer
division operation would truncate this to a quotient of 0.)
.PP
There are a number of ways to represent real (fractional) numbers in a binary
coding system.
If a fixed accuracy can be assumed, then a binary word can be divided into 
whole and fractional parts, separated by an assumed decimal point at a given
bit in the word. This is called \fBfixed-point representation\fR. 
For example, if three-digits are permitted, $5.625 sub 10$ has the decimal 
interpretation 
.EQ
5.625 sub 10 ~=~5 x 10 sup 0~+~6 x 10 sup -1~+~2 x 10 sup -2~+~5 x 10 sup -3
.EN
and the binary representation
.EQ
101.101 sub 2 ~=~1 x 2 sup 2~+~0 x 2 sup 1~+~1 x 2 sup 0~+~1 x 2 sup -1~+~0 x 2 sup -2~+~1 x 2 sup -3
.EN
.PP
Clearly, fixed-point representation limits the range of numbers that can be
represented.
A "floating" point representation, as used in "scientific notation"
is more flexible. 
Decimal scientific numbers are expressed as fractional decimal numbers, 
called the \fBmantissa\fR, M, multiplying the base to some power, called the 
\fBcharacteristic\fR (or "exponent"), C, to reposition the decimal point, 
of the form
$~x~=~+- W.F cdot 10 sup +- C$. 
A similar notation is used for binary floating point numbers.
.PP
To save storage space, the fractional part, whole number 
part, and characteristic are assumed to occupy fixed locations in computer
words, and the numbers are shifted ("normalized") so that the floating point 
immediately precedes the mantissa:
$~x~=~+- M cdot 2 sup C$, with binary M and C. 
.PP
Only positive characteristics need be stored if the characteristics are 
shifted ("unbiased") from the minimum possible value to zero.
For example, given a k bit characteristic, there are $2 sup k$ values.
In offset (biased) representation, the characteristic may range from
$2 sup k - 1$ to $2 sup {k-1}$ (e.g. $k~=~8~=>$ \-128 to +127).
Mapping to an unbiased ("excess-N") representation by shifted by N changes to 
the natural representation with an apparent range of $~0$ to $2 sup k$ 
(e.g. $k~=~8~=>$ 0 to 256). When the number is decoded, however, it is rebiased.
.PP
Normalized mantissas are left "justified" with range:
$(.000 cdot cdot cdot ) sub 2 ~<=~M~<=~(.111 cdot cdot cdot ) sub 2$.
.PP
Many variations exist for representing the components of real numbers in
floating-point coding. A typical storage structure for k-bit words is:
.TS
center box tab(|);
l|c|c|c.
Component:|Sign|(.)Mantissa|Characteristic
_
Number of bits:|1 bit|k-1-c bits|c bits
Mode:||left justified|excess-N
.TE
Given a fixed-length word size, there is an obvious trade-off between range 
(number of characteristic bits) and accuracy (number of mantissa bits). 
A 32-bit word is typically divided into a 24-bit mantissa, which includes
the sign bit, and an 8-bit characteristic. 
The accuracy of a 23-bit mantissa is limited to one part in 
$2 sup 23~=~8388608 sub 10$, or about 7 significant digits ("figures").
An 8-bit characteristic can accommodate exponents ranging between 
$2 sup -128~=~10 sup -38$ and $2 sup +127~=~10 sup +38$. 
.PP
Floating-point arithmetic can be carried out by performing integer
arithmetic operations separately on mantissas and characteristics.
Multiplication involves multiplying mantissas and adding characteristics,
addition first requires equalizing the characteristics to the same value.
.PP
Underflow and overflow are defined differently for real numbers than for 
integers. Underflow occurs when the mantissa and characteristic fall below 
their smallest binary values, e.g. for the above example,
$2 sup -23~x~2 sup -128~=~1/8388608 sub 10~x~10 sup -38~=~1~x~10 sup -45$, 
irrespective of the sign.
Overflow occurs when the largest mantissa and exponent are exceeded,
independent of the sign of the number, $8~x~10 sup 44$ in the above
example.
.PP
The restriction of mantissa values results in truncation or possibly
rounding the results of operations, called \fBround-off error\fR,
which accumulate with multiple operations.
Round-off errors may be minimized by adding more bits to the mantissa from 
additional words, resulting in "multiple" (double, quadruple, etc) precision.
The range of values may be increased when the the number of characteristic
bits are increased in multiple precision.
Given a finite number of words available, there is obviously a trade-off 
between the total number of numbers that can be stored and the accuracy of of
the numbers that are stored.
.SH
Character Coding
.PP
Strings of binary bits may be used represent other symbols, called
\fBcharacters\fR, according to accepted conventions. 
In this way the symbols on a terminal keyboard may be translated,
or \fBencoded\fR into binary words recognizable by a computer, and
conversely, binary words may cause characters to be shown on a display screen.
A common character set encoding is the ASCII (American Standard Code for 
Information Exchange) code, consisting of the 128 different possible binary
patterns of 7-bit words, and associated characters corresponding to the
Roman alphabet, Arabic numerals, punctuation and "special" 
character codes to communicate graphics and common actions, 
such as interpret the BREAK key to terminate processing, and 
the (carriage) RETURN key as end-of-input and begin processing. 
Combinations of keyboard symbols may have meaning as do single characters.
The SHIFT, CTRL (control), and ESC (escape) keys in combination with other
keys can be used to enhance a keyboard character set.
.PP
Character codes may have certain conveniences. 
Seven-digit ASCII binary codes have equivalent two-digit
\fBhexadecimal\fR (base-sixteen) codes (0-9,A-F), facilitating a two-field
representation, which generates a matrix form of the ASCII table for
convenient human lookup (see the ASCII code table).\**
.FS
Actually, the lower 4 bits of the 7-binary digit ASCII word require a 
hexadecimal code to translate to an equivalent single digit, but the upper 
3 bits require only a smaller equivalent octal (base 8) single digit code.
Since the first eight symbols of the hexadecimal digits are identical to
those for the octal digits (and to the decimal digits for that matter), {0-7}, 
it is still correct to say that a 7-bit binary word is equivalent to 2
hexadecimal characters.
.FE
Non-printable (non-displayable) ASCII characters all have either 000 or 001 for 
their most significant bits (\fBMSB\fR) (leftmost in mathematical 
value/positional notation for integers).
Upper-case alphabetic ASCII character codes are identical to corresponding 
lower-case alphabetic codes if the sixth bit (from the right, where the least
significant bit, \fBLSB\fR, is found) is ignored.
Finally, an eighth bit can be added to the 7-bit ASCII code which stores
the "parity" (even = 0, odd = 1) of the remaining 7 bits, to check for
single-replacement transmission errors.
.PP
The principle operations on character codes are encoding and decoding,
but for symbols that are in "collating" sequence, such as numerical digits and 
alphabetic letters, codes like ASCII can take advantage of hardware integer 
arithmetic to perform simple operations efficiently, such as comparisons 
(2 > 1?, R < M?). 
This can be used to facilitate efficient alphabetic or numeric
("alphanumeric") sorting.
.NH
Computer Languages
.SH
Data Types
.PP
Information is conveyed in computer languages as combinations of "letters"
of an "alphabet" into "words".
Computer languages provide for a variety of representations of information,
including numerical and semantic, called \fBdata types\fR, with individual
examples (realizations) called \fBdata objects\fR.
It is helpful to visualize a segment of memory storing a word at a given
location as a mailbox (data object), with an \fBaddress\fR and contents equal 
to the \fBvalue\fR of the word.
Defined data types together with the possible operations that manipulate data 
objects comprise a \fBdata structure\fR.
Most computer hardware architectures support a set of elementary data
structures, including boolean, integer (fixed-point), real (floating-point) 
and character data types, together with simple operations such as logical and 
arithmetic operations.
There may also be a number of operating system data structures transparent to 
the user, such as storage "buffers", run-time (temporary) storage "stacks" 
and lists, and subroutine communication "activation-records", as well as
multiprogramming/multitasking management structures.
.SH
Machine Languages
.PP
We will not say much about CPU languages and instructions other than to 
point out that the processors of general purpose digital electronic computers 
receive instructions in binary code, called \fBmachine language\fR.
The set of acceptable instructions is defined by the hardware design of the
microcircuit chip, and can vary from thousands covering every conceivable 
combination of basic operations on every conceivable type and location of data, 
to a minimum in "reduced" instruction set configurations (RISC architectures).
For example, it can be shown that any known logical operation can be reduced 
to a (possibly quite complex) combination just two logical operations, (unary) 
inversion (NOT) and binary intersection (AND) (or, alternatively, union OR).
.PP
Since data and instructions are stored in memory,
all architectures support hardware operations to "move" (copy) data and
instructions and manipulate the data according to the instructions. 
Data is moved between external memory or peripherals and internal CPU memory 
(registers) and between the registers and the arithmetic logic unit)
via input/output communications lines, called the \fBI/O bus\fR. 
The arithmetic logic unit (ALU), as it name implies performs the basic 
operations on words and bits within words, of
logic (AND, OR, NOT, COMPARison, SHIFT) and arithmetic
(integer ADDition). As discussed above, binary SUBtraction can be performed 
with inversion and addition and MULtiplication and DIVision with shifting. 
.PP
The CPU also controls the flow of instructions (program control) by
maintaining the location of the set of instructions (in a "program counter")
and can perform the fundamental operation of branching with a TEST and 
(conditional) JUMP to a computed address.
.SH
Programming Languages
.PP
Humans communicate in more complicated languages than the simple languages
based on a binary alphabet. Although all tasks given to computers can
(and actually are) reduced to the simple operations performed by the CPU,
constructing the set of instructions, or \fBprogram\fR to perform a 
complicated task in such simple terms is tedious and error prone.
To improve communications between humans and computers, the simple
components of basic computing are organized into higher-level combinations,
similar to the way multiple additions are composed into multiplication.
Through special translating programs, computers are able to read symbols 
written in "high-level" languages and translate them into lower-level 
instructions.
.PP
High-level programs consist of strings of symbols that are constructed 
according to defined conventions to commuicate the structure of data and 
instructions, and
The rules that apply to the symbols constitute the \fBprogramming languages\fR.
The translation process is referred to as \fBcompilation\fR of the 
program.
.PP
Computer languages allow for user-defined combinations of elementary data types
and operations, called \fBstructured data types\fR.
If the language does not automatically provide for character "strings"
(natural language words), sets and arrays, for
example, it may be possible for the programmer to define symbols to
represent them and their operations.
.PP
In general, languages have two attributes, grammatical rules, called
\fBsyntax\fR, and symbol interpretation, or meaning, called \fBsemantics\fR.
Language conventions are critical to clear communications.
For example the + symbol could represent the two-operand ("binary")
operation of integer addition, vector addition, set addition, or word addition,
depending on the context and language convention, and by convention or
definition, the character string ADD could have the same meaning as the +
symbol.
In addition the + symbol is used to represent the one-operand ("unary")
operation of a positive number. Because of this "overloading" of the + symbol,
the compiler for a language which allows it (not all do) must be able to 
distinguish the meaning of the + symbol from the context of the expression.
.PP
Computer \fBprograms\fR, which are collections of statements communicating
data structures and instructions (commands) in a language understood by
computers are the subjects of chapters 3 and 4.
.bp
.SH
Exercises
.IP 1.
What are some advantages and disadvantages of analogue \fIversus\fR digital
computers?
.IP 2.
What are some advantages and disadvantages of parallel \fIversus\fR sequential
computers?
.IP 3.
Discuss what is needed to model human thought.
.IP 4.
Write down the instructions to follow to convert a general non-decimal
(non-base-10) number into a decimal (base-10) number.
.IP 5.
Convert $123 sub 10$ to binary notation (base-2).
.IP 6.
Look up the ASCII code for a, A, 1, CR and space.
.IP 7.
Compare the ASCII and IBM EBCDIC character codes. 
Speculate what some of the features of the next standard character code might 
be.
.IP 8.
What is the one-to-one (unique) mapping of the character set {A-Z} (i.e. A 
through Z) to the digits {0-9} which translates the following encoded 
(encrypted) statement into a valid decimal integer addition?
.TS
center;
r r r.
SEND
+MORE
_
MONEY
.TE
.OH ''''
.EH ''''