\" cat head file_name|tbl|xeqn|xroff -ms .bp .EH '\fBScientific Computation''Computers\fR %' .OH '\fBScientific Computation''Computers\fR %' .sp 5 .ps 14 .ce .B 1. Computers .R .NL .sp 3 .NH Introduction .PP In this chapter we will explore what computers are and what they can do, on two extreme levels, the very general and the quite specific in terms of the simplest practical devices. The discussion in terms of prototype computers serves to introduce a variety of concepts in computing relevant to more sophisticated language and programming concepts to be developed in the following chapters. .PP Understanding how humans may communicate with computers first requires understanding how computers "think" (or work). Much of this chapter is therefore devoted to the ways computers handle information. .SH Models .PP There are some who believe that it is possible to create truly isolated systems. Those who work with abstract concepts, such as mathematicians, tend to fall into this group. The truth is, nothing is totally independent. Even in mathematics a look into history shows that major mathematical structures developed out of practical considerations; the origins of arithmetic are lost in the dawn of economic trade, geometry developed as a language to describe the shapes of objects, algebra developed as a symbolic representation of geometry, calculus has been traced to attempts to settle land boundary disputes among the farmers along the ever-changing Nile River. Differential equations and tensor calculus were invented by Newton and Einstein, respectively, to describe theoretical models of mechanical systems. Euclidean vs fractal geometry. .PP We will be interested in \fBrepresentations\fR of actual or possible physical systems, called \fBmodels\fR. Mathematics is the language of choice of scientists to describe models because it is concise, precise and manipulatable. The problem is that only the simplest models lend themselves to explicit mathematical analysis. More realistic models many times become "intractable" mathematically, due to their inherent complexity (e.g. high dimensionality), necessitating some degree of approximation in their investigation. We will not be interested so much in the origin or development of models as in their exploration or consequences. Once a model is defined, it becomes necessary to explore its confirmatory and predictive power. In this sense, computers may be thought of anthropomorphically as mental prosthetic devices which can enhance human mentality. .NH Types of Computers .PP Not only is there a great variety of computers, but we will see there are several ways to view a given computer, and even differing concepts of what computers really are. .PP One definition of a \fBcomputer\fR is \fIany system that responds to its environment\fR. According to this very broad definition, computers range from subatomic particles to the universe itself. In between are humans and other life forms and manufactured machines. We will focus in on man-made physical devices (\fBhardware\fR) which perform tasks according to deterministic prescriptions which may be encoded in fixed or variable (\fBsoftware\fR) hardware components. .SH Virtual Computers .PP Functionally, computers process information (\fBdata\fR) received as \fBinput\fR to produce more information as \fBoutput\fR. From this view, \fIcomputers communicate, store and process information\fR. Information may be categorized into two general classes, \fBdata\fR and \fBinstructions\fR. .PP In addition to the "hardware" perspective of computers as machines or mechanisms, there is an internal view and an external view of computers. The internal view is how the computer would describe itself (if it could). The "consciousness" of a computer could range from mere recognition of an external environment, to a biological psyche, to the software operating system of the electronic programmable computer. \fBOperating systems\fR are the automated managers of the activity of computers and are the subject of the next chapter. The external view is an observer's description of the computer, and has a similar range of opinion, depending on the system and the observer. The description of the external viewer may be illusory, in which case the system is said to be a \fBvirtual\fR system. The traditional external view of computers is that of the \fBuser\fR (usually a human) communicating through some transducer \fBdevice\fR, to a \fBmachine\fR, usually an electronic computer. Users don't ordinarily see what goes on "inside" the machine, but just as with watches and personalities, there can be an advantage in understanding the internal workings of the system to avoid misunderstanding. .SH Digital Electronic Computers .PP \fBAnalogue\fR computers work in continuous space (described by real, or fractional numbers), while \fBdigital\fR computers operate in discrete space (described by integer, or whole numbers). Examples of analogue computers include slide rules (with logarithmic scales for table look-up and multiplication corresponding to addition of logs), speedometers, and airport wind socks. Electrons are the working medium of \fBelectronic computers\fR and their storage, retrieval and directed movement comprise basic computer operations. Digital computers ultimately represent \fBinformation\fR in terms of two electronic states, "on" or "off" through current or magnetization in one direction or another, called \fBbinary\fR states. Examples of digital computers include "pocket" calculators and PCs (Personal Computers), as well as supercomputers. The information is understood through \fBencoding\fR conventions (see the section on Coding, below). .PP The essential hardware components of man-made digital computers store information in physical \fBmemory\fR, communicate it through \fBinput/ouput (I/O) devices\fR and manipulate it through \fBprocessors\fR. There are several categories of memory hardware depending on the physical \fImedium\fR, such as integrated microcircuit (IC) chip, magnetic tape, "compact" disk (CD), "floppy" (flexible) and "hard" (inflexible) disk, physical \fIlocation\fR, such as internal (core) or external to the main housing, \fIcapability\fR, such as read/write programmable (changeable), ROM ("read-only"), non-permanent (volatile), and \fIfunction\fR, such as supportive of processing (cache/register), throughput (buffer), and storage. External I/O devices include keyboards and video displays ("terminals"), printers, telephone modems, and network transceivers. Information is manipulated by by a variety of processor modules, or "units", including central, or \fBCPU\fR, arithmetic/logic, or \fBALU\fR, memory management, or \fBMMU\fR, and coprocessors and peripheral, or \fBPPU\fR processors. .SH Computers as Tools .PP Another perception of computers is as tools to solve problems. In terms of the structure of traditional digital computers, they may be thought of as systems capable of answering questions put to them. The communications can be expressed in a variety of languages, including mathematical, procedural (programming) and "natural" (human speech) language. .SH Computer Architectures .PP The type and arrangement of components (both hardware and software) in a computer is referred to as the computer's \fBarchitecture\fR. What an external user communicates with usually has a number of layers of virtual computers beneath the surface, each with its architecture, eventually arriving at the hardware, or "bare metal". .PP Digital electronic computers with a single CPU are \fBsequential\fR machines. CPUs, which contain millions of microelectronic components (to encode and decode instructions, move and operate on data, and maintain program control) are much more complicated than memory chips, so traditional architectures evolved from one CPU, with possibly a specialized floating point (real number) coprocessor and a number of device and memory managing PPUs, together with much memory, to a small number of CPUs centrally managed, to larger numbers (array processors, hypercube topologies) to massively parallel architectures of many CPUs, each with small associated memory. There has been an accompanying evolution from \fBsequential\fR (linear or one-dimensional scalar) programming to \fBvector\fR (multicomponent) to \fBparallel\fR (multidimensional) program design. .PP Limitations of space and time are important considerations for digital computers. Computers have finite space (memory) which limits the number of possible data and instruction states. Since it requires more than one parameter to describe the state of any physical system (e.g. position of a particle in space), no computer is capable of describing even its own state entirely. On a more practical level, if a finite-state computer ever returns to a previous state, it will repeat the following states in cyclic fashion. Consequently, finite word sizes imply that truly random values cannot be generated indefinitely. .SH How Simple Can it Be? .PP What is the simplest computer we can think of? Imagine a machine whose only components are a simple (but of infinite capacity) external memory device, such as a magnetic tape with places to store values from a finite alphabet of symbols, a device ("head") which can move only one position at a time along the tape and can read or write (replace) a single symbol, and some mechanism to recognize a finite set of instructions and decide, given what is written on the tape at the current position, what to rewrite on the tape at that position and which direction to move. The current instruction and contents of the tape determines the "state" of the machine, and the instructions themselves are called "transitions" because they cause changes in the state of the machine. Initially the tape head is positioned at one end of the "input" string stored on the tape. The machine reads the symbol at that point and begins to take action based on the symbol and the first instruction. Reading, writing and moving continues along the tape until an "halt" instruction is encountered in the instruction set. At that point the contents of the tape contain the results, or "output" of the process. .PP This simple device is named for its inventor, the mathematician Alan Turing, who discussed its virtues in 1936 (before any digital computers were invented). Can such a simple machine compute anything at all? Can it compute anything of value? Directly it cannot even add two numbers without being programmed. It is an \fBabstract\fR (virtual) machine, or automoton.\** .FS Early computing devices were called \fBautomotons\fR after a 19th Century "automatic" chess playing box with a chess set on top, which actually concealed a human inside. .FE However, not only is a Turing machine able to perform useful tasks, it is claimed that \fIa Turing Machine is capable of performing any conceivable computable task!\fR Such a machine is termed \fBa universal computing machine\fR. Of course complex processes might be enormously complex to program on such a simple device (which is why we use more complicated computers and higher-level languages), but it is encouraging to realize that the most sophisticated supercomputer is no more powerful than the humblest home computer (although much more efficient). In addition, all programming languages are equivalent in power to solve problems, the only substantial differences being convenience and efficiency. And, of course the (finite) alphabet, the symbols of which are used to encode information can be as simple as the binary alphabet with no loss in generality. .PP Now consider another simple machine. This one consists of a string of symbols, possibly binary, with only one simple rule for changing the symbols. The state of the machine is determined solely by the symbols and the rule. At each position in the string, the new state of that position is determined only by its current state and some combination of the states of the nearby surrounding positions. Such a machine is called a \fBcellular automoton\fR, and was the subject of study by the "father of computing", mathematician John von Neuman. von Neuman was investigating the intriging question whether a machine could be constructed that could make an exact copy of itself, a mechanical system analogous to reproduction in biology. It has been shown that Celular Automata are also universal computing machines, equivalent to Turing Machines. .PP The Turing Machine (TM) and the Cellular Automoton (CA) are examples of \fBfinite state machines\fR, and are prototypes of linear and parallel computer architectures. They are complementary in design in that the one (TM) has few states and many rules in the instruction set, while the other has many states and few rules. Sometimes computers which are designed according to the TM concept are referred to as "von Neuman architectures", and cellular automota as "non-von Neuman architectures", but von Neuman actually worked on the development of both. .NH Information Representation .SH Language .PP Information is stored, communicated and processed in symbolic or \fBcoded\fR form, such as speech, text or electronic signals. The arrangement of the symbols (grammar) and their meaning (semantics) which are agreed to by convention, or definition is called \fBlanguage\fR. .PP Written (or displayed) information is constructed from the simplest symbols, called the \fBalphabet\fR of the language. Two types of coding between humans and computers use the English (or some other "natural", or human language) alphabet, consisting of 52 symbols {a-z,A-Z}, and the number alphabet, consisting of digits, {0-9} in decimal systems. Punctuation conveys additional contextual meaning; the "dot" can mean end of sentence (period) or beginning of fraction (decimal point). (Multiple interpretations of the same symbol is called \fBoverloading\fR.) .PP Combining symbols into sequences, or strings, called \fBwords\fR generates variety and extends the capacity of the alphabet symbols to convey information. For example a, ad, add, ... zebra each have a defined interpretation in English (stored in dictionaries), and 1, 12, 123, ... have well-defined interpretations in mathematics (stored in a formula, see below). .SH Bits and Words .PP At the lowest levels, information in digital computers is coded in an alphabet of binary digits, or \fBbits\fR (represented with 0 and 1 with numerals), and ordered collections, or \fBstrings\fR of bits called \fBwords\fR, usually of a fixed number (length). It is like an alien culture that has only one digit on each of two hands. However, just as a person who types on a typewriter (or keyboard) with two fingers can type anything a typist can who uses all ten digits, a binary system can represent any information any other number system can. Our task is to try and understand how to two-digit aliens (digital computers) thinks and how to communicate with them. .PP Like the two-finger typist, there is a relationship between efficiency and flexibility. Thus, a binary digit represents only two different patterns, but a letter can represent 26 different possibilities. Similarly it would take more bits to represent the same information than it would take using a richer alphabet. .PP IBM introduced the 8-bit \fBbyte\fR in the 1960s, with the result that words are commonly defined in multiples of 8-bit bytes, such as 16, 24 and 32 bits. .PP As we shall see, the contents of a word, consisting of a sequence of bits, such as 1100101, can have several different interpretations depending on the context, including the binary number equivalent to 101 in decimal notation, the alphabetic character e, the decimal integer \-4123, the real number $6.02~x~10 sup 23$, or even an instruction, such as ADD two numbers. .SH Information .PP The term "bit" connotes a "bit of information", formalized in the definition of information content from the field of information theory: .EQ I~=~lg (n) .EN where I is the information content, lg denotes base-2 logarithm, and n is the number of states, or possible patterns, in a string of bits, or word. Thus one-bit words have 2 states and information content $lg(2)~=~1$. They can store the answer to a single question with possible answers of "yes" or "no", "true" or "false", "right" or "wrong", "heads" or "tails", "on" or "off", "0" or "1", etc. Note that the logarithm function converts a product argument into a sum, so that information is additive for combinations of multiply independent states. 2-bit words have $2 sup 2~=~4$ states and information content $lg ( 2 sup 2 )~=~2$, and thus can store the answer to 2 simultaneous yes/no questions, while 8-bit words have $2 sup 8~=~256$ states and information content $lg ( 2 sup 8 )~=~8$, and can encode the possible sequence of answers to 8 simultaneous yes/no questions. A connection between information theory and probability theory may be made by assigning a probability $p sub i$ to each pattern or state i of a word. The information gained by knowing that (any) one state has been achieved is .EQ (1.1) DELTA I~=~- sum from i to n {p sub i lg ( p sub i )} .EN where n is the total number of possible patterns. Information theory provides a foundation for the field of statistical thermodynamics, which assumes equal a priori probabilities of states and identifies entropy with information and thermal equilibrium with maximum entropy. .SH Natural Numbers and Bases .PP Humans naturally have 10 fingers ("digits"), which led to a natural number "alphabet" of 10 symbols {0,1,...9}, or \fBdecimal system\fR. The number of different digit symbols in a number "alphabet" is called the \fBbase\fR of the number system. Decimal numbers are base-10. .PP To represent numbers larger than the largest single digit, written language uses strings of digits with the "positional convention", meaning that a digit at a given position, or place in the string, multiplies a certain power of the base. The power of the base increases from 0 at the extreme right and increases by one power for each shift in position to the left. For example, in base-10, or decimal notation, .EQ 123 sub 10~=~(3~x~10 sup 0~+~2~x~10 sup 1~+~1~x~10 sup 2 ) sub 10~=~(3~x~1~+~2~x~10~+~1~x~100) sub 10~=~(3~+~20~+~100) sub 10 .EN where the subscript 10 reminds us of the base. Note that these numbers are written (recorded) and read (scanned) from left to right (by our convention), but the positional interpretation decodes the number from right to left. The right-most digits of a number are called the \fBleast significant digits\fR and those on the left are the \fBmost significant digits\fR. Practice permits rapid human scanning of smaller numbers, but larger numbers are not so easy to interpret. This is why they are sometimes separated into groups of three digits using commas. Fractional numbers can be represented by separating the fractional part from the "whole" number part with a dot ("decimal" point) and continuing with negative powers of the base, .EQ 123.456 sub 10~=~( 3~x~10 sup 0~+~2~x~10 sup 1~+~1~x~10 sup 2~+~4~x~10 sup -1~+~5~x~10 sup -2~+~6~x~10 sup -3 ) sub 10 .EN .EQ =~(3~+~20~+~100~+~.4~+~.05~+~.006) sub 10 .EN .PP If humans naturally had \fIeight\fR fingers on each of two hands, they might have naturally developed the base-16, or \fBhexadecimal\fR system. We can use the first 10 decimal symbols for the first ten numbers in the base-16 system, but need six more to complete the hexadecimal alphabet. The first six letters of the writing alphabet are commonly used to represent numeric digits in base-16, so the entire alphabet consists of {0-9,A-F}. The number $123 sub 16$ has a different decimal value from $123 sub 10$: .EQ 123 sub 16~=~(3~x~16 sup 0~+~2~x~16 sup 1~+~1~x~16 sup 2 ) sub 10~=~(3~x~1~+~2~x~16~+~1~x~256) sub 10~=~(3~+~32~+~256) sub 10~=~291 sub 10 .EN Note that a procedure for converting a hexadecimal number into a decimal number has been demonstrated here: convert each number in the exapansion to decimal and complete the arithmetic operations in decimal. .PP Digital computers ultimately use \fBbinary\fR numbers from the set {0,1}, similar to an alien culture that has only one digit on each of two hands. Note that the aliens can count beyond 1 (or $1 sub 2$) the same way we count beyond 9, namely using a positional convention. It would be incorrect to write $123 sub 2$ because there are no 2 or 3 symbols in the binary alphabet. For the same reason, 12F would be a legitimate hexadecimal number, but an invalid decimal number. Here is an example of a binary number and its decimal equivalent: .EQ 101 sub 2~=~(1~x~2 sup 0~+~0~x~2 sup 1~+~1~x~2 sup 2 ) sub 10~=~(1~x~1~+~0~x~2~+~1~x~4) sub 10~=~(1~+~0~+~4) sub 10~=~5 sub 10 .EN .PP By way of summary, the left-to-right positional convention notation of a sequence (string) of n digits, $d sub i ,~i~=~n-1, .. 0$, in an arbitrary base b means .EQ (1.2) ( d sub n-1 d sub n-2 ... d sub 0 . d sub -1 d sub -2 ... d sub -m ) sub b~=~sum from i=-m to n-1 {d sub i b sup n} .EN .SH Base Conversion .PP Conversions from a non-decimal base to a decimal base are straight-forward. The reverse conversion, from decimal to non-decimal, requires more effort, as the factor of each power of the non-decimal base must be determined. A general way to convert between bases expands the number first in the given base, then converts each value in the new base, and finally carries out the arithmetic operations of the expansion to construct the positional convention representation of the number in the new base. For example, .EQ 123 sub 10~=~(3~x~10 sup 0~+~2~x~10 sup 1~+~1~x~10 sup 2 ) sub 10 .EN .EQ =~(11~x~1010 sup 0~+~10~x~1010 sup 1~+~1~x~1010 sup 10 ) sub 2~=~(11~x~1~+~10~x~1010~+~1~x~1100100) sub 2~ .EN .EQ =~(11~+~10100~+~1100100) sub 2~=~1111011 sub 2 .EN .EQ =~(3~x~A sup 0~+~2~x~A sup 1~+~1~x~A sup 2 ) sub 16~=~(3~x~1~+~2~x~A~+~1~x~64) sub 16~ .EN .EQ =~(3~+~14~+~64) sub 2~=~76 sub 16 .EN Note that $10 sub 10$ becomes $1010 sub 2$ (in base 2) or $A sub 16$ in base sixteen), etc. A more efficient computational method finds the largest power of the base that will divide the decimal number, identifies the modulus (whole number quotient) with the factor of that power, and repeats the procedure with the remainder, until the remainder is less in value than the base. .PP As a final example, here are the binary, hexadecimal and decimal representations of the maximum value an 8-bit binary number can have, which is also the maximum value of a 2-digit hexadecimal number: .EQ 11111111 sub 2~=~255 sub 10~=~FF sub 16 .EN Note that the larger the base, the fewer the digits required to represent the same value. .SH Integer Arithmetic .PP Digital computers ordinarily support integer arithmetic operations in the hardware of the CPU. Binary arithmetic uses the same procedures as decimal arithmetic, with "carrys" for addition, multiple additions or subtractions for multiplication and division, etc. (The rules of arithmetic are independent of the base.) For example, we add 2 to 5 and multiply 2 by 5 in "longhand" with pencil and paper (using the binary number representation for $5 sub 10$ above) as follows: .TS center; r. $101 sub 2$ $=~(1~x~2 sup 0~+~0~x~2 sup 1~+~1~x~2 sup 2 )$ $=~5 sub 10$ $+010 sub 2$ $=~(0~x~2 sup 0~+~1~x~2 sup 1~+~0~x~2 sup 2 )$ $=~2 sub 10$ _ $111 sub 2$ $=~(1~x~2 sup 0~+~1~x~2 sup 1~+~1~x~2 sup 2 )$ $=~7 sub 10$ .TE .EQ 2 sub 10~x~5 sub 10~=~010 sub 2~x~101 sub 2~=~(0~x~2 sup 0~+~1~x~2 sup 1~+~0~x~2 sup 2 )~x~(1~x~2 sup 0~+~0~x~2 sup 1~+~1~x~2 sup 2 ) sub 2 .EN .EQ =~( 1~x~2 sup 1 ) sub 2~x~(1~x~2 sup 0~+~1~x~2 sup 2 ) sub 2~=~( 1~x~2 sup 1~+~1~x~2 sup 3 ) sub 2 .EN .EQ =~1010 sub 2~=~( 2~+~8 ) sub 10 ~=~10 sub 10 .EN .PP Note that multiplication causes the word size of the result to grow longer over that of the operands. Also notice that multiplication by 2 in base 2 can be performed simply by \fIshifting\fR all the binary digits of the word one place to the \fIleft\fR (just as multiplication by 10 in decimal shifts left by one place). Similarly, division by two in binary can be performed with a right shift. .SH Integer Number Coding Schemes .PP Although numbers are represented ultimately in the natural language of digital computers, namely binary, there are a number of alternative representations, each with certain advantages and disadvantages. .PP The number of different integers that can be represented in a computer with a limited or fixed word length architecture is limited by the size of the word. A k-bit word and store $2 sup k$ different values. Thus a 16-bit word can store a maximum of $2 sup 16~=~65,536$ different integers. Which \fIrange\fR of integers the 65,536 different binary codes represent (e.g. 1 to 65,536, 0 to 65,535, -32,768 to +32,767, etc) is ultimately defined by the internal architecture. .PP To illustrate alternative binary code representations of integers, we will consider k = 3-bit words, having $2 sup 3~=~8$ unique values. Table 1.1 below shows the different bit patterns for an 8-bit word, listed in increasing binary value, along with five different decimal interpretations. The \fInatural\fR interpretation corresponds to ordinary binary positional representation, where each digit is the multiplier of a different power of two, as described in the section Natural Numbers and Bases above. The disadvantage of the natural interpretation is that it is restricted to positive integers; negative numbers are excluded. The remaining four interpretations include negative numbers (at the expense of reducing the range of the absolute magnitudes that can be represented). A procedure for calculating a negative number from the corresponding positive number with the same magnitude is also given in the table. The \fIoffset\fR representation simply wraps the numbers around to include negative values. The \fIsigned\fR representation has the advantage of encoding the sign of the integer in the the most significant bit, but has the disadvantage of having two representations for the number zero (which has no positive or negative property). In \fIone's-complement\fR representation, positive numbers are converted to negative numbers with the same magnitude and \fIvice versa\fR simply by changing all zeros to ones and all ones to zeros, a very efficient hardware process called \fBcomplementation\fR (or inversion, symbolized NOT in logic and denoted by the ~ symbol in the table). The redundancy in the number zero is removed in \fItwo's-complement\fR representation, thereby extending the range of numbers that can be represented. .sp .B .ce Table 1.1 Integer Data Types .TS center box tab(|); c s s s s s l c c c c c. \fBDecimal Equivalent Codes for Binary Integers \fIBit pattern|Natural|Offset|Signed|1-comp.|2-comp.\fR _ $000 sub 2$ |0 |\-4 | +0 | +0 | +0 $001 sub 2$ |1 |\-3 | +1 | +1 | +1 $010 sub 2$ |2 |\-2 | +2 | +2 | +2 $011 sub 2$ |3 |\-1 | +3 | +3 | +3 $100 sub 2$ |4 | +0 |\-0 |\-3 |\-4 $101 sub 2$ |5 | +1 |\-1 |\-2 |\-3 $110 sub 2$ |6 | +2 |\-2 |\-1 |\-2 $111 sub 2$ |7 | +3 |\-3 |\-0 |\-1 = $\-n sub 10~=$|\- |~$n+1$|~$MSB$|~$n$ |~$n+1$ _ $(MAX+1) sub 10$|0 | +0 |\-0 |\-3 |\-4 $(MIN\-1) sub 10$|0 |\-4 |\-2 |\ 3 |\ 3 .TE .PP Operations on limited sized words can result in values outside the range of representation, called \fBunderflow\fR (too small to represent) and \fBoverflow\fR (too large to represent). For example, a 32-bit word with one sign bit can accommodate integers between $- 2 sup 31~=~-2147483648 sub 10$ and $+ 2 sup 31 -1~=~+2147483647 sub 10$ (0 is one of the numbers) in two's-complement representation. Arithmetic operations (including division by 0) resulting in numbers larger than a few billion in magnitude will be out of range. What happens when underflow or overflow occurs depends on the design of the system; the overflow bit may be trunctated, or allowed to overwrite the adjacent word in memory. Usually an error flag is set, which may or may not have further consequences depending on how the operating system of the computer handles such errors. (Single-user computers, such as pocket calculators may display \f4ERROR\fR and halt on overflow, until "reset", which would be inappropriate for multitasking computers.) We will demonstrate how surprising results may result if overflow is \fIignored\fR. Consider the effect of adding 1 to the maximum integer, MAX, allowed according to a given representation. The value resulting from the addition will depend on how the integers are represented in binary, and may or may not exceed the fixed word size. For a 3-bit word, Table 1.1 shows the interpretation of the result of the operation MAX + 1 in the various coding schemes. Similar surprising values may result if underflow is ignored on subtraction of 1 from the minimum valid integer, MIN; refer to Table 1.1. (Subtraction is often performed using the efficient hardware operations of complementation followed by incrementation by unity: A \- B = A + ~B + 1.) .SH Real Number Coding Schemes .PP Not only is the range of numbers restricted in integer representation, but fractional, or \fBreal\fR numbers have no representation at all. (It may be possible to write 3/4 as a ratio of two integers, but unless fraction integer arithmetic is supported, an integer division operation would truncate this to a quotient of 0.) .PP There are a number of ways to represent real (fractional) numbers in a binary coding system. If a fixed accuracy can be assumed, then a binary word can be divided into whole and fractional parts, separated by an assumed decimal point at a given bit in the word. This is called \fBfixed-point representation\fR. For example, if three-digits are permitted, $5.625 sub 10$ has the decimal interpretation .EQ 5.625 sub 10 ~=~5 x 10 sup 0~+~6 x 10 sup -1~+~2 x 10 sup -2~+~5 x 10 sup -3 .EN and the binary representation .EQ 101.101 sub 2 ~=~1 x 2 sup 2~+~0 x 2 sup 1~+~1 x 2 sup 0~+~1 x 2 sup -1~+~0 x 2 sup -2~+~1 x 2 sup -3 .EN .PP Clearly, fixed-point representation limits the range of numbers that can be represented. A "floating" point representation, as used in "scientific notation" is more flexible. Decimal scientific numbers are expressed as fractional decimal numbers, called the \fBmantissa\fR, M, multiplying the base to some power, called the \fBcharacteristic\fR (or "exponent"), C, to reposition the decimal point, of the form $~x~=~+- W.F cdot 10 sup +- C$. A similar notation is used for binary floating point numbers. .PP To save storage space, the fractional part, whole number part, and characteristic are assumed to occupy fixed locations in computer words, and the numbers are shifted ("normalized") so that the floating point immediately precedes the mantissa: $~x~=~+- M cdot 2 sup C$, with binary M and C. .PP Only positive characteristics need be stored if the characteristics are shifted ("unbiased") from the minimum possible value to zero. For example, given a k bit characteristic, there are $2 sup k$ values. In offset (biased) representation, the characteristic may range from $2 sup k - 1$ to $2 sup {k-1}$ (e.g. $k~=~8~=>$ \-128 to +127). Mapping to an unbiased ("excess-N") representation by shifted by N changes to the natural representation with an apparent range of $~0$ to $2 sup k$ (e.g. $k~=~8~=>$ 0 to 256). When the number is decoded, however, it is rebiased. .PP Normalized mantissas are left "justified" with range: $(.000 cdot cdot cdot ) sub 2 ~<=~M~<=~(.111 cdot cdot cdot ) sub 2$. .PP Many variations exist for representing the components of real numbers in floating-point coding. A typical storage structure for k-bit words is: .TS center box tab(|); l|c|c|c. Component:|Sign|(.)Mantissa|Characteristic _ Number of bits:|1 bit|k-1-c bits|c bits Mode:||left justified|excess-N .TE Given a fixed-length word size, there is an obvious trade-off between range (number of characteristic bits) and accuracy (number of mantissa bits). A 32-bit word is typically divided into a 24-bit mantissa, which includes the sign bit, and an 8-bit characteristic. The accuracy of a 23-bit mantissa is limited to one part in $2 sup 23~=~8388608 sub 10$, or about 7 significant digits ("figures"). An 8-bit characteristic can accommodate exponents ranging between $2 sup -128~=~10 sup -38$ and $2 sup +127~=~10 sup +38$. .PP Floating-point arithmetic can be carried out by performing integer arithmetic operations separately on mantissas and characteristics. Multiplication involves multiplying mantissas and adding characteristics, addition first requires equalizing the characteristics to the same value. .PP Underflow and overflow are defined differently for real numbers than for integers. Underflow occurs when the mantissa and characteristic fall below their smallest binary values, e.g. for the above example, $2 sup -23~x~2 sup -128~=~1/8388608 sub 10~x~10 sup -38~=~1~x~10 sup -45$, irrespective of the sign. Overflow occurs when the largest mantissa and exponent are exceeded, independent of the sign of the number, $8~x~10 sup 44$ in the above example. .PP The restriction of mantissa values results in truncation or possibly rounding the results of operations, called \fBround-off error\fR, which accumulate with multiple operations. Round-off errors may be minimized by adding more bits to the mantissa from additional words, resulting in "multiple" (double, quadruple, etc) precision. The range of values may be increased when the the number of characteristic bits are increased in multiple precision. Given a finite number of words available, there is obviously a trade-off between the total number of numbers that can be stored and the accuracy of of the numbers that are stored. .SH Character Coding .PP Strings of binary bits may be used represent other symbols, called \fBcharacters\fR, according to accepted conventions. In this way the symbols on a terminal keyboard may be translated, or \fBencoded\fR into binary words recognizable by a computer, and conversely, binary words may cause characters to be shown on a display screen. A common character set encoding is the ASCII (American Standard Code for Information Exchange) code, consisting of the 128 different possible binary patterns of 7-bit words, and associated characters corresponding to the Roman alphabet, Arabic numerals, punctuation and "special" character codes to communicate graphics and common actions, such as interpret the BREAK key to terminate processing, and the (carriage) RETURN key as end-of-input and begin processing. Combinations of keyboard symbols may have meaning as do single characters. The SHIFT, CTRL (control), and ESC (escape) keys in combination with other keys can be used to enhance a keyboard character set. .PP Character codes may have certain conveniences. Seven-digit ASCII binary codes have equivalent two-digit \fBhexadecimal\fR (base-sixteen) codes (0-9,A-F), facilitating a two-field representation, which generates a matrix form of the ASCII table for convenient human lookup (see the ASCII code table).\** .FS Actually, the lower 4 bits of the 7-binary digit ASCII word require a hexadecimal code to translate to an equivalent single digit, but the upper 3 bits require only a smaller equivalent octal (base 8) single digit code. Since the first eight symbols of the hexadecimal digits are identical to those for the octal digits (and to the decimal digits for that matter), {0-7}, it is still correct to say that a 7-bit binary word is equivalent to 2 hexadecimal characters. .FE Non-printable (non-displayable) ASCII characters all have either 000 or 001 for their most significant bits (\fBMSB\fR) (leftmost in mathematical value/positional notation for integers). Upper-case alphabetic ASCII character codes are identical to corresponding lower-case alphabetic codes if the sixth bit (from the right, where the least significant bit, \fBLSB\fR, is found) is ignored. Finally, an eighth bit can be added to the 7-bit ASCII code which stores the "parity" (even = 0, odd = 1) of the remaining 7 bits, to check for single-replacement transmission errors. .PP The principle operations on character codes are encoding and decoding, but for symbols that are in "collating" sequence, such as numerical digits and alphabetic letters, codes like ASCII can take advantage of hardware integer arithmetic to perform simple operations efficiently, such as comparisons (2 > 1?, R < M?). This can be used to facilitate efficient alphabetic or numeric ("alphanumeric") sorting. .NH Computer Languages .SH Data Types .PP Information is conveyed in computer languages as combinations of "letters" of an "alphabet" into "words". Computer languages provide for a variety of representations of information, including numerical and semantic, called \fBdata types\fR, with individual examples (realizations) called \fBdata objects\fR. It is helpful to visualize a segment of memory storing a word at a given location as a mailbox (data object), with an \fBaddress\fR and contents equal to the \fBvalue\fR of the word. Defined data types together with the possible operations that manipulate data objects comprise a \fBdata structure\fR. Most computer hardware architectures support a set of elementary data structures, including boolean, integer (fixed-point), real (floating-point) and character data types, together with simple operations such as logical and arithmetic operations. There may also be a number of operating system data structures transparent to the user, such as storage "buffers", run-time (temporary) storage "stacks" and lists, and subroutine communication "activation-records", as well as multiprogramming/multitasking management structures. .SH Machine Languages .PP We will not say much about CPU languages and instructions other than to point out that the processors of general purpose digital electronic computers receive instructions in binary code, called \fBmachine language\fR. The set of acceptable instructions is defined by the hardware design of the microcircuit chip, and can vary from thousands covering every conceivable combination of basic operations on every conceivable type and location of data, to a minimum in "reduced" instruction set configurations (RISC architectures). For example, it can be shown that any known logical operation can be reduced to a (possibly quite complex) combination just two logical operations, (unary) inversion (NOT) and binary intersection (AND) (or, alternatively, union OR). .PP Since data and instructions are stored in memory, all architectures support hardware operations to "move" (copy) data and instructions and manipulate the data according to the instructions. Data is moved between external memory or peripherals and internal CPU memory (registers) and between the registers and the arithmetic logic unit) via input/output communications lines, called the \fBI/O bus\fR. The arithmetic logic unit (ALU), as it name implies performs the basic operations on words and bits within words, of logic (AND, OR, NOT, COMPARison, SHIFT) and arithmetic (integer ADDition). As discussed above, binary SUBtraction can be performed with inversion and addition and MULtiplication and DIVision with shifting. .PP The CPU also controls the flow of instructions (program control) by maintaining the location of the set of instructions (in a "program counter") and can perform the fundamental operation of branching with a TEST and (conditional) JUMP to a computed address. .SH Programming Languages .PP Humans communicate in more complicated languages than the simple languages based on a binary alphabet. Although all tasks given to computers can (and actually are) reduced to the simple operations performed by the CPU, constructing the set of instructions, or \fBprogram\fR to perform a complicated task in such simple terms is tedious and error prone. To improve communications between humans and computers, the simple components of basic computing are organized into higher-level combinations, similar to the way multiple additions are composed into multiplication. Through special translating programs, computers are able to read symbols written in "high-level" languages and translate them into lower-level instructions. .PP High-level programs consist of strings of symbols that are constructed according to defined conventions to commuicate the structure of data and instructions, and The rules that apply to the symbols constitute the \fBprogramming languages\fR. The translation process is referred to as \fBcompilation\fR of the program. .PP Computer languages allow for user-defined combinations of elementary data types and operations, called \fBstructured data types\fR. If the language does not automatically provide for character "strings" (natural language words), sets and arrays, for example, it may be possible for the programmer to define symbols to represent them and their operations. .PP In general, languages have two attributes, grammatical rules, called \fBsyntax\fR, and symbol interpretation, or meaning, called \fBsemantics\fR. Language conventions are critical to clear communications. For example the + symbol could represent the two-operand ("binary") operation of integer addition, vector addition, set addition, or word addition, depending on the context and language convention, and by convention or definition, the character string ADD could have the same meaning as the + symbol. In addition the + symbol is used to represent the one-operand ("unary") operation of a positive number. Because of this "overloading" of the + symbol, the compiler for a language which allows it (not all do) must be able to distinguish the meaning of the + symbol from the context of the expression. .PP Computer \fBprograms\fR, which are collections of statements communicating data structures and instructions (commands) in a language understood by computers are the subjects of chapters 3 and 4. .bp .SH Exercises .IP 1. What are some advantages and disadvantages of analogue \fIversus\fR digital computers? .IP 2. What are some advantages and disadvantages of parallel \fIversus\fR sequential computers? .IP 3. Discuss what is needed to model human thought. .IP 4. Write down the instructions to follow to convert a general non-decimal (non-base-10) number into a decimal (base-10) number. .IP 5. Convert $123 sub 10$ to binary notation (base-2). .IP 6. Look up the ASCII code for a, A, 1, CR and space. .IP 7. Compare the ASCII and IBM EBCDIC character codes. Speculate what some of the features of the next standard character code might be. .IP 8. What is the one-to-one (unique) mapping of the character set {A-Z} (i.e. A through Z) to the digits {0-9} which translates the following encoded (encrypted) statement into a valid decimal integer addition? .TS center; r r r. SEND +MORE _ MONEY .TE .OH '''' .EH ''''