Mercurial > hg > Members > kono > nitros9-code
view docs/ccguide/chap1.chapter @ 2106:c4c7facbd082
Fixed up some warnings
author | boisy |
---|---|
date | Sat, 07 Oct 2006 12:11:22 +0000 |
parents | 1b5656126ac6 |
children |
line wrap: on
line source
<chapter> <title>The C Compiler System</title> <section> <title>Introduction</title> <para> The "C" programming language is rapidly growing in popularity and seems destined to become one of the most popular programming languages used for microcomputers. The rapid rise in the use of C is not surprising. C is an incredibly versatile and efficient language that can handle tasks that previously would have required complex assembly language programming. </para> <para> C was originally developed at Bell Telephone Laboratories as an implementation language for the UNIX operating system by Brian Kernighan and Dennis Ritchie. They also wrote a book titled <quote>The C Programming Language</quote> which is universally accepted as the standard for the language. It is an interesting reflection on the language that although no formal industry-wide <quote>standard</quote> was ever developed for C, programs written in C tend to be far more portable between radically different computer systems as compared to so-called <quote>standardized</quote> languages such as BASIC, COBOL, and PASCAL. The reason C is so portable is that the language is so inherently expandable that is some special function is required, the user can create a portable extension to the language, as opposed to the common practice of adding additional statements to the language. For example, the number of special-purpose BASIC dialects defies all reason. A lesser factor is the underlying UNIX operating system, which is also sufficiently versatile to discourage bastardization of the language. Indeed, standard C compilers and Unix are intimately related. </para> <para> Fortunately, the 6809 microprocessor, the OS-9 operating system, and the C language form an outstanding combination. The 6809 was specifically designed to efficiently run high-level languages, and its stack-oriented instruction set and versatile repertoire of addressing modes handle the C language very well. As mentioned previously, UNIX and C are closely related, and because OS-9 is derived from UNIX, it also supports C to the degree that almost any application written in C can be transported from a UNIX system to an OS-9 system, recompiled, and correctly executed. </para> </section> <section> <title>The Language Implementation</title> <para> OS-9 C is implemented almost exactly as described in 'The C Programming Language' by Kernighan and Ritchie (hereafter referred to as K&R). </para> <para> Allthough this version of C follows the specification faithfully, there are some differences. The differences mostly reflect parts of C that are obsolete or the constraints imposed by memory size limitations. </para> </section> <section> <title>Differences from the K & R Specification</title> <itemizedlist spacing="compact"> <listitem><para> Bit fields are not supported. </para></listitem> <listitem><para> Constant expressions for initializers may include arithmetic operators only if all the operands are of type INT or CHAR. </para></listitem> <listitem><para> The older forms of assignment operators, '=+' or '=*', which are recognized by some C compilers, are not supported. You must use the newer forms '+=','*=' etc. </para></listitem> <listitem><para> "#ifdef (or #ifndef) ...[#else...] #endif" is supported but "#if <constant expression>" is not. </para></listitem> <listitem><para> It is not possible to extend macro definitions or strings over more than one line of source code. </para></listitem> <listitem><para> The escape sequence for new-line '\n' refers to the ASCII carriage return character (used by OS-9 for end-of-line), not linefeed. (hex 0A). Programs which use '\n' for end-of-line (which includes all programs in K & R), will still work properly. </para></listitem> </itemizedlist> </section> <section> <title>Enhancements and Extensions</title> <section> <title>The <quote>Direct</quote> Storage Class</title> <para> The 6809 microprocessor instructions for accessing memory via an index register or the stack pointer can be relatively short and fast when they are used in C programs to access "auto" (function local) variables or function arguments. The instructions for accessing global variables are normally not so nice and must be four bytes long and correspondingly slow. However, the 6809 has a nice feature which helps considerably. Memory, anywhere in a single page (256 byte block), may be accessed with fast, two byte instructions. This is called the "direct page", and at any time its location is specified by the contents of the "direct page register" within the processor. The linkage editor sorts out where this could be, and it need not concern the programmer, who only needs to specify for the compiler which variables should be in the direct page to give the maximum benefit in code size and execution speed. </para> <para> To this end, a new storage class specifier is recognized by the compiler. In the manner of K & R page 192, the sc-specifier list is extended as follows: <informaltable frame="none"> <tgroup cols="3"> <colspec colwidth="1.0in"/> <colspec colwidth="1.0in"/> <colspec colwidth="1.0in"/> <tbody> <row> <entry>Sc-specifier:</entry> <entry>auto</entry> <entry></entry> </row> <row> <entry></entry> <entry>static</entry> <entry></entry> </row> <row> <entry></entry> <entry>extern</entry> <entry></entry> </row> <row> <entry></entry> <entry>register</entry> <entry></entry> </row> <row> <entry></entry> <entry>typedef</entry> <entry></entry> </row> <row> <entry></entry> <entry>direct</entry> <entry>(extension)</entry> </row> <row> <entry></entry> <entry>extern direct</entry> <entry>(extension)</entry> </row> <row> <entry></entry> <entry>static direct</entry> <entry>(extension)</entry> </row> </tbody> </tgroup> </informaltable> The new key word may be used in place of one of the other sc-specifiers, and its effect is that the variable will be placed in the direct page. "DIRECT" creates a global direct page variable. "EXTERN DIRECT" references an EXTERNAL-type direct page variable; and "STATIC DIRECT" creates a local direct page variable. These new classed may not be used to declare function arguments. "Direct" variables can be initialized but will, as with other variables not explicitly initialized, have the value zero at the start of program execution. 255 bytes are available in the direct page (the linker requires one byte). If all the direct variables occupy less than the full 255 bytes, the remaining global variables will occupy the balance and memory above if necesary. If too many bytes or storage are requested in the direct page, the linkage editor will report an error, and the programmer will have to reduce the use of DIRECT-type variables to fit the 256 bytes addressable by the 6809. </para> <para> It should be kept in mind that "direct" is unique to this compiler, and it may not be possible to transport programs written using "direct" to other environments without modification. </para> </section> <section> <title>Embedded Assembly Language</title> <para> As versatile as C is, occasionally there are some things that can only be done (or done at maximum speed) in assembly language. The OS-9 C compiler permits user-supplied assebly-language statements to be directly embedded in C source programs. </para> <para> A line beginning with "#asm" switches the compiler into a mode which passes all subsequent lines directly to the assembly-language output, until a line beginning with "#endasm" is encountered. "#endasm" switches the mode back to normal. Care should be exercised when using this directive so that the correct code section is adhered to. Normal code from the compiler is in the PSECT (code) section. If your assembly code uses the VSECT (variable) section, be sure to put a ENDSECT directive at the end to leave the state correct for following compiler generated code. </para> </section> <section> <title>Control Character Escape Sequences</title> <para> The escape sequences for non-printing characters in character constants and strings (see K & R page 181) are extended as follows: <programlisting> linefeed (LF): \l (lower case 'ell') </programlisting> This is to distinguish LF (hex 0A) from \n which on OS-9 is the same as \r (hex 0D). <programlisting> bit patterns: \NNN (octal constant) \dNNN (decimal constant) \xNN (hexadecimal constant) </programlisting> For example, the following all have a value of 255 (decimal): <programlisting> \377 \xff \d255 </programlisting> </para> </section> </section> <section> <title>Implementation-dependent Characteristics</title> <para> K & R frequently refer to characteristics of the C language whose exact operations depend on the architacture and instruction set of the computer actually used. This section contains specific information regarding this version of C for the 6809 processor. </para> <section> <title>Data Representation and Storage Requirements</title> <para> Each variable type requires a specific amount of memory for storage. The sizes of the basic types in bytes are as follows: </para> <informaltable frame="none"> <tgroup cols="3"> <colspec colwidth="0.8in"/> <colspec colwidth="0.4in"/> <colspec colwidth="3.0in"/> <thead> <row> <entry>Data Type</entry> <entry>Size</entry> <entry>Internal Representation</entry> </row> </thead> <tbody> <row> <entry>CHAR</entry> <entry>1</entry> <entry>two's complement binary</entry> </row> <row> <entry>INT</entry> <entry>2</entry> <entry>two's complement binary</entry> </row> <row> <entry>UNSIGNED</entry> <entry>2</entry> <entry>unsigned binary</entry> </row> <row> <entry>LONG</entry> <entry>4</entry> <entry>two's complement binary</entry> </row> <row> <entry>FLOAT</entry> <entry>4</entry> <entry>binary floating point (see below)</entry> </row> <row> <entry>DOUBLE</entry> <entry>8</entry> <entry>binary floating point (see below)</entry> </row> </tbody> </tgroup> </informaltable> <para> This compiler follows the PDP-1 implementation and format in that CHARs are converted to INTs by sign extension, "SHORT" or "SHORT INT" means INT, "LONG INT" means LONG and "LONG FLOAT" means DOUBLE. The format for DOUBLE values is as follows: </para> <screen> (low byte) (high byte) +-+---------------------------------------+----------+ ! ! seven byte ! ! ! ! mantissa ! ! +-+---------------------------------------+----------+ ^ sign bit </screen> <para> The for of the mantissa is sign and magnitude with an implied "1" bit at the sign bit position. The exponent is biased by 128. The format of a FLOAT is identical, except that the mantissa is only three bytes long. Conversion from DOUBLE to FLOAT is carried out by truncating the least significant (right-most) four bytes of the mantissa. The reverse conversion is done by padding the least significant four mantissa bytes with zeros. </para> </section> <section> <title>Register Variables</title> <para> One register variable may be declared in each function. The only types permitted for register variables are int, unsigned and pointer. Invalid register variable declarations are ignored; i.e. the storage class is made auto. For further details see K & R page 81. </para> <para> A considerable saving in code size and speed can be made by judicious use of a register variable. The most efficient use is made of it for a pointer or a counter for a loop. However, if a register variable is used for a complex arithmetic expression, there is no saving. The "U" register is assigned to register variables. </para> </section> <section> <title>Access To Command Line Parameters</title> <para> The standard C arguments "argc" and "argv" are available to "main" as described in K & R page 110. The start-up routine for C programs ensures that the parameter string passed to it by the parent process is converted into null-terminated strings as expected by the program. In addition, it will run together as a single argument any strings enclosed between single or double quotes ("'" or '"'). If either is part of the string required, then the other should be used as a delimiter. </para> </section> </section> <section> <title>System Calls and the Standard Library</title> <section> <title>Operating System Calls</title> <para> The system interface supports almost all the system calls of both OS-9 and UNIX. In order to facilitate the portability of programs from UNIX, some of the calls use UNIX names rather than OS-9 names for the same function. There are a few UNIX calls that do not have exactly equivalent OS-9 calls. In these cases, the library function simulates the function of the corresponding UNIX call. In cases where there are OS-9 calls that do not have UNIX equivalents, the OS-9 names are used. Details of the calls and a name cross-reference are provided in the "C System Calls" section of this manual. </para> </section> <section> <title>The Standard Library</title> <para> The C compiler includes a very complete library of standard functions. It is essential for any program which uses functions from the standard library to have the statement: <programlisting> "#include <stdio.h> </programlisting> See the "C Standard Library" section of this manual for details on the standard library functions provided. </para> <para> IMPORTANT NOTE: If output via printf(), fprintf() or sprintf() of long integers is required, the program MUST call "pflinit()" at some point; this is necessary so that programs not involving LONGS do not have the extra LONGs output code appended. Similarly, if FLOATs or DOUBLEs are to be printed, "pffinit()" MUST be called. These functions do nothing; existence of calls to them in a program informs the linker that the relevant routines are also needed. </para> </section> </section> <section> <title>Run-time Arithmetic Error Handling</title> <para> K & R leave the treatment of various arithmetic errors open, merely saying that it is machine dependent. This implementation deal with a limited number of error conditions in a special way; it should be assumed that the results of other possible errors are undefined. </para> <para> Three new system error numbers are defined in <errno.h>: <programlisting> #define EFPOVR 40 /* floating point overflow of underflow */ #define EDIVERR 41 /* division by zero */ #define EINTERR 42 /* overflow on conversion of floating point to long integer */ </programlisting> </para> <para> If one of these conditions occur, the program will send a signal to itself with the value of one of these errors. If not caught or ignored, the will cause termination of program with an error return to the parent process. However, the program can catch the interrupt using "signal()" or "intercept()" (see C System Calls), and in this case the service routine has the error number as its argument. </para> </section> <section> <title>Achieving Maximum Program Performance</title> <section> <title>Programming Considerations</title> <para> Because the 6809 is an 8/16 bit microprocessor, the compiler can generate efficient code for 8 and 16 bit objects (CHARs, INTs, etc.). However, code for 32 and 64 bit values (LONGs, FLOATs, DOUBLEs) can be at least four times longer and slower. Therefore don't use LONG, FLOAT, or DOUBLE where INT or UNSIGNED will do. </para> <para> The compiler can perform extensive evaluation of constant expressions provided they involve only constants of type CHAR, INT, and UNSIGNED. There is no constant expression evaluation at compile-time (except single constants and "casts" of them) where there are constants of type LONG, FLOAT, or DOUBLE, therefore, complex constant expressions involving these types are evaluated at run time by the compiled program. You should manually compute the value of constant expressions of these types if speed is essential. </para> </section> <section> <title>The Optimizer Pass</title> <para> The optimizer pass automatically occurs after the compilation passes. It reads the assembler source code text and removes redundant code and searches for code sequences that can be replaced by shorter and faster equivalents. The optimizer will shorten object code by about 11% with a significant increase in program execution speed. The optimizer is recommended for production versions of debugged programs. Because this pass takes additional time, the "-O" compiler option can be used to inhibit it during error-checking-only compilations. </para> </section> <section> <title>The Profiler</title> <para> The profiler is an optional method used to determine the frequency of execution of each function in a C program. It allows you to identify the most-frequently used functions where algorithmic or C source code programming improvements will yield the greatest gains. </para> <para> When the "-P" compiler option is selected, code is generated at the beginning of each function to call the profiler module (called "_prof"), which counts invocations of each function during program execution. When the program has terminated, the profiler automatically prints a list of all functions and the number of times each was called. The profiler slightly reduces program execution speed. See "prof.c" source for more information. </para> </section> </section> <section> <title>C Compiler Component Files and File Usage</title> <para> Compilation of a C program by cc requires that the following files be present in the current execution directory (CMDS). </para> <table frame="none"> <title>OS-9 Level I Systems</title> <tgroup cols="2"> <colspec colwidth="1.0in"/> <colspec colwidth="3.0in"/> <tbody> <row> <entry>cc1</entry> <entry>compiler executive program</entry> </row> <row> <entry>c.prep</entry> <entry>macro pre-processor</entry> </row> <row> <entry>c.pass1</entry> <entry>compiler pass 1</entry> </row> <row> <entry>c.pass2</entry> <entry>compiler pass 2</entry> </row> <row> <entry>c.opt</entry> <entry>assembly code optimizer</entry> </row> <row> <entry>c.asm</entry> <entry>relocating assembler</entry> </row> <row> <entry>c.link</entry> <entry>linkage editor</entry> </row> </tbody> </tgroup> </table> <table frame="none"> <title>OS-9 Level II Systems</title> <tgroup cols="2"> <colspec colwidth="1.0in"/> <colspec colwidth="3.0in"/> <tbody> <row> <entry>cc2</entry> <entry>compiler executive program</entry> </row> <row> <entry>c.prep</entry> <entry>macro pre-processor</entry> </row> <row> <entry>c.comp</entry> <entry>compiler proper</entry> </row> <row> <entry>c.opt</entry> <entry>assembly code optimizer</entry> </row> <row> <entry>c.asm</entry> <entry>relocating assembler</entry> </row> <row> <entry>c.link</entry> <entry>linkage editor</entry> </row> </tbody> </tgroup> </table> <para> In addition a file called "clib.l" contains the standard library, math functions, and systems library. The file "cstart.r" is the setup code for compiled programs. Both of these files must be located in a directory named "LIB" on the system's default mass storage device, which is specified in the OS-9 "INIT" module and is usually the disk drive the system is booted from. </para> <para> If, when specifying "#include" files for the pre-processor to read in, the programmer uses angle brackets, "<" and ">", instead of parentheses, the file will be sought starting at the "DEFS" directory on whichever drive is the default system drive for the system running. </para> <section> <title>Temporary Files</title> <para> A number of temporary files are created in the current data directory during compilation, and it is important to ensure that enough space is available on the disk drive. As a rough guide, at least three times the number of blocks in the largest source file (and its included files) should be free. </para> <para> The identifiers "etext", "edata", and "end" are predefined in the linkage editor and may be used to establish the addresses of the end of executable text, initialized data, and uninitialized data respectively. </para> </section> </section> <section> <title>Running the Compiler</title> <para> The are two commands which invoke distinct versions of the compiler. "cc1" is for OS-9 Level I which uses a two pass compiler, and, "cc2" is for Level II which causes a single pass version. Both versions of the compiler works identically, the main difference is that cc1 has been divided into two passes to fit the smaller memory size of OS-9 Level I systems. In the following text, "cc" refers to either "cc1" or "cc2" as appropiate for your system. The syntax of the command line which calls the compiler is: </para> <cmdsynopsis> <command>cc</command> <arg>option-flags</arg> <arg rep="repeat" choice="plain"><replaceable>file</replaceable></arg> </cmdsynopsis> <para> One file at a time can be compiled, or a number of files may be compiled together. The compiler manages the compilation up to four stages: pre-processor, compilation to assembler code, assembly to relocatable code, and linking to binary executable code (in OS-9 memory module format). </para> <para> The compiler accepts three types of source files, provided each name on the command line has the relevant postfix as shown below. Any of the above file types may be mixed on the command line. </para> <table frame="none"> <title>File Name Suffix Conventions</title> <tgroup cols="2"> <colspec colwidth="0.5in"/> <colspec colwidth="3.0in"/> <thead> <row> <entry>Suffix</entry> <entry>Usage</entry> </row> </thead> <tbody> <row> <entry>.c</entry> <entry>C source file</entry> </row> <row> <entry>.a</entry> <entry>assembly language source file</entry> </row> <row> <entry>.r</entry> <entry>relocatable module</entry> </row> <row> <entry>none</entry> <entry>executable binary (OS-9 memory module)</entry> </row> </tbody> </tgroup> </table> <para> There are two modes of operation: multible source file and single source file. The compiler selects the mode by inspecting the command line. The usual mode is single source and is specified by having only one source file name on the command line. Of course, more than one source file may be compiled together by using the "#include" facility in the source code. In this mode, the compiler will use the name obtained by removing the postfix from the name supplied on the command line, and the output file (and the memory module produced) will have this name. For example: <screen> cc prg.c </screen> will leave an executable file called "prg" in the current execution directory. </para> <para> The multiple source mode is specified by having more than one source file name on the command line. In this mode, the object code output file will have the name "output" in the current execution directory, unless a name is given using the "-f=" option (see below). Also, in multiple source mode, the relocatable modules generated as intermediate files will be left in the same directories as their corresponding source files with the postfixes changed to ".r". For example: <screen> cc prg1.c /d0/fred/prg2.c </screen> will leave an executable file called "output" in the current execution directory, one file called "prg1.r" in the current data directory, and "prg2.r" in "/d0/fred". </para> </section> <section> <title>Compiler Option Flags</title> <para> The compiler recognizes several command-line option flags which modify the compilation process where needed. All flags are recognized before compilation commences so the flags may be placed anywhere on the command line. Flags may be ran together as in "-ro", except where a flag is followed by something else; see "-f=" and "-d" for examples. </para> <para> -A suppresses assembly, leaving the output as assembler code in a file whose name is postfixed ".a". </para> <para> -E=<number> Set the edition number constant byte to the number given. This is an OS-9 convention for memory modules. </para> <para> -O inhibits the assembly code optimizer pass. The optimizer will shorten object code by about 11% with a comparable increase in speed and is recommended for production versions of de-bugged programs. </para> <para> -P invokes the profiler to generate function frequency statistics after program execution. </para> <para> -R suppresses linking library modules into an executable program. Outputs are left in files with postfixes ".r". </para> <para> -M=<memory size> will instruct the linker to allocate <memory size> for data, stack, and parameter area. Memory size may be expressed in pages (an integer) or in kilobytes by appending "k" to an integer. For more details of the use of this option, see the "Memory Management" section of this manual. </para> <para> -L=<filename> specifies a library to be searched by the linker before the Standard Library and system interface. </para> <para> -F=<path> overrides the above output file naming. The output file will be left with <filename> as its name. This flag does not make sense in multiple source mode, and either the -a or -r flag is also present. The module will be called the last name in <path>. </para> <para> -C will output the source code as comments with the assembler code. </para> <para> -S stops the generation of stack-checking code. -S should only be used with great care when the appication is extremely time-critical and when the use of the stack by compiler generated code is fully understood. </para> <para> -D<identifier> is equivalent to "#define <identifier>" written in the source file. -D is useful where different versions of a program are maintained in one source file and differentiated by means of the "#ifdef" of "#ifndef" pre-processor directives. If the <identifier> is used as a macro for expansion by the pre-processor, "1"(one) will be the expanded "value" unless the form "-d<identifier>=<string>" is used in which case the expansion will be <string>. </para> <table frame="none"> <title>Command Line and Option Flag Examples</title> <tgroup cols="3"> <colspec colwidth="1.5in" colname="c1"/> <colspec colwidth="1.5in" colname="c2"/> <colspec colwidth="1.5in" colname="c3"/> <thead> <row> <entry>command line</entry> <entry>action</entry> <entry>output file(s)</entry> </row> </thead> <tbody> <row> <entry>cc prg.c</entry> <entry>compile to an executable program</entry> <entry>prg</entry> <entry></entry> </row> <row> <entry>cc prg.c -a</entry> <entry>compile to assembly language source code</entry> <entry>prg.a</entry> </row> <row> <entry>cc prg.c -r</entry> <entry>compile to relocatable module</entry> <entry>prg.r</entry> </row> <row> <entry>cc prg1.c prg2.c prg3.c</entry> <entry>compile to executable program</entry> <entry>prg1.r, prg2.r, prg3.r, output</entry> </row> <row> <entry>cc prg1.c prg2.a prg3.r</entry> <entry>compile prg1.c, assemble prg2.a and combine all into and executable program</entry> <entry>prg1.r, prg2.r</entry> </row> <row> <entry>cc prg1.c prg2.c -a</entry> <entry>compile to assembly language source code</entry> <entry>prg1.a, prg2.a</entry> </row> <row> <entry>cc prg1.c prg2.c -f=prg</entry> <entry>compile to executable program</entry> <entry>prg</entry> </row> </tbody> </tgroup> </table> </section> </chapter>