NOTE: This Technical Note has been
retired. Please see the Technical Notes
page for current documentation.|
The Macintosh Quadra computers are the first Apple products to use the Motorola
68040 microprocessor, which has an on-chip floating-point unit (FPU). This
feature enables the Quadra to perform basic floating-point operations much
faster than a Macintosh platform that employs an MC68881/2 floating-point
coprocessor working in conjunction with an MC68020/030 microprocessor. This
Note addresses compatibility and performance issues for Quadra computers
executing FPU instructions either programmed explicitly in assembly language or
generated by compilers (
While all currently available 68040 processors have an onboard FPU, it is important to use Gestalt to verify the existence of a floating-point coprocessor before attempting to use any FPU instructions. Motorola has announced a variant of the 68040 without an FPU unit; this chip has most of the caching characteristics of the current 68040, but does not support the 68881/2 opcode set.
Unfortunately, the FPU circuitry in the 68040 does not by itself support the full functionality of the MC68881/2. Motorola has provided a floating-point software package (FPSP) which emulates all of the MC68881/2 functionality that is not provided by the 68040. This package resides in the operating system of the Quadra. When the 68040 requires emulation services in the course of executing an FPU instruction, it traps to the FPSP via one of several exception vectors, depending on the type of emulation that is needed. The combination of the 68040 and FPSP enables Quadra computers to run old user code without modification unless the code uses floating-point exception handlers.
If user code includes floating-point exception handlers, the handlers must be
modified to reflect the
Whenever the 68040 in a Quadra invokes the FPSP, performance inevitably will suffer relative to an MC68881/2 platform because the software emulation of complex algorithms involving floating-point calculations and exception state simply cannot outperform dedicated hardware and microcode. In addition, the instruction cache must cope with many instructions of emulation code to accomplish what the MC68881/2 does in a single FPU instruction. Finally, FPSP intervention flushes the FPU pipeline, thus negating any performance enhancements achievable through overlapping execution of FPU instructions.
FPU Instructions Provided by the 68040
The following FPU instructions are supported by the 68040:
* Precision-constraining operation is not provided by MC68881/2; precision of instruction supersedes that set in the FP control register (FPCR).
+ Privileged instruction
Processing of these FPU instructions is usually handled entirely by the 68040. The FPSP is invoked if an unsupported data type or format is involved or if an exceptional condition is generated that requires fix-up of FPU state by emulation.
The FPSP provides three basic emulation services for the 68040. First, it emulates many MC68881/2 instructions, including all transcendental functions and some arithmetic instructions. Second, the FPSP handles instructions that involve certain data classes (unnormalized and denormal floating-point numbers) or the packed decimal data format, which are not supported by the 68040 hardware. Finally, the FPSP provides exception handlers for certain floating-point exception conditions in order to emulate MC68881/2 behavior when user traps are either disabled or enabled. In the latter case, after completing its exception processing, the FPSP passes control to the user-provided handler.
On Macintosh Quadra platforms executing MC68881/2 instructions, entry to the FPSP occurs automatically by trapping via one of several low-memory exception vectors, depending on which emulation service is required. The system installs the exception vector entries to the FPSP at boot time, and applications should not tamper with these vectors. Because the FPSP preempts the exception vectors for certain user-provided handlers in the MC68881/2 model, compatibility is a problem for old user code that contains floating-point exception handlers. Later sections will address the issues of compatibility in more detail.
Emulation of Unimplemented FPU Instructions
The following MC68881/2 arithmetic instructions are emulated by the FPSP, which produces results and exceptions identical to MC68881/2 platforms:
The algorithms used by the FPSP to calculate transcendental functions are both accurate and fast. Results will not always agree with those of the MC68881/2. When they disagree, the FPSP is generally more precise. The performance of the 68040 FPSP on transcendental functions is roughly equivalent to that of a similarly clocked MC68030/MC68882 combination.
When the 68040 in a Quadra attempts to execute any of the unimplemented
MC68881/2 instructions, it traps, via vector number 11, the unimplemented
F-Line opcode exception vector stored at vector offset (low-memory address)
If the code executing in a Quadra contains an F-Line opcode that is undefined by the instruction sets of both the 68040 and MC68881/2, trapping to the FPSP via vector 11 also applies. In this case, the handler recognizes that no emulation is necessary, and it passes control to the system F-Line exception handler via a secondary vector stored in low memory.
If an application, such as a development or debugging environment, needs to
install its own F-Line exception handler on Quadra platforms, it must not
overwrite vector 11 at offset
Unimplemented Data Type/Format Support in the FPSP
The FPU in the 68040 does not support all of the floating-point data types and formats of the MC68881/2. The following data types require FPSP support:
denormalized single (S), double (D), or extended (X) precision operand to an FPU instruction; and unnormalized X operand to an FPU instruction.
The following data format requires FPSP support:
packed decimal real (P) format as source or destination for an FPU instruction.
When the 68040 encounters an unimplemented data type or format in the course of
executing a hardware-supported FPU instruction, it traps, via exception vector
55, the FP unimplemented data type exception vector stored at vector offset
For denormal S, denormal D, and all P format source operands, the FPSP converts
the values to the normalized X equivalents, restores FPU state, and restarts
the operation. If a source operand is an unnormalized X that can be converted
to a normalized X, the instruction is also completed as described. If the
instruction is a move out to P format in memory (
For denormal X operands or unnormalized X operands that reduce to denormal X values, the FPSP converts such operands to an internal normalized format that contains an extra exponent bit, restores state to the FPU, and restarts the operation if no exponent wrap condition will occur (for example, division of a denormal value by another denormal value). Otherwise, the FPSP emulates the entire instruction.
Denormalized values resulting from instructions executed by the 68040 hardware do not generate the unimplemented data type exception. Instead, a non-maskable underflow exception occurs which invokes a handler in the FPSP. This handler rounds the internal result appropriately according to the specified rounding precision and direction and delivers the result.
In the case of instructions that are emulated by the FPSP, the processing of unimplemented data type/format operands is handled within the confines of the emulation process. That is, the 68040 traps to the FPSP's unimplemented instruction handler, which is capable of recognizing and dealing with such operands.
Instructions, whether emulated or not, that use the P format as either source or destination have relatively poor performance because they require emulation of binary-to-decimal or decimal-to-binary conversions.Idiosyncrasies
Binary operations (source and destination operands are both inputs) with P format source operands should avoid using FP1 as the destination operand because a bug in the FPSP causes spurious results in this case. If an unimplemented data type or format occurs as input to an operation, the exception is posted by the 68040 when the next FPU instruction is attempted. This deferred exception handling may appear not to deliver the correct result in a debugging environment that installs a breakpoint prior to the second FPU instruction.
FPSP Exception Handlers
Certain floating-point exception conditions on the 68040 require intervention by the FPSP in order to fix up results or other state. Some of the FPSP exception handlers are non-maskable in the sense that they are executed regardless of whether or not the exception is trap-enabled by the user. All of the FPSP floating-point exception handlers, whether non-maskable or not, are vectored via Motorola-designated locations in low-memory supervisor address space. If a user-enabled exception occurs, the FPSP exception handler is executed first before vectoring occurs to the user handler via a secondary vector maintained by the Macintosh Quadra system. The user code must not modify the primary floating-point exception vectors to FPSP exception handlers. A later section will describe installation of user exception handlers.
The following is a brief description of FPSP exception handlers:
Branch/Set on Unordered (BSUN)
This maskable handler is invoked only if the user has enabled the BSUN
exception. Entry to this handler is via vector number 48 stored at location
Inexact Result (INEX1/INEX2)
No FPSP handler is required. When enabled, INEX1 or INEX2 exceptions invoke the
user's handler via vector number 49 at location
Divide by Zero (DZ)
No FPSP handler is required. When enabled, the user's DZ handler is invoked via
vector number 50 at location
This non-maskable handler is entered via vector number 51 at location
Operand Error (OPERR)
This non-maskable handler is entered via vector number 52 at location
This non-maskable handler is entered via vector number 53 at location
Signaling Not-a-Number (SNAN)
This non-maskable handler is entered via vector number 54 at location
If a program enables no floating-point exceptions in the FPCR, compatibility is not an issue. In this case, no user exception handlers need be installed. The program traps to non-maskable FPSP handlers as required for any fix-up of exceptional results or FPU state and then resumes execution.
Performance degradation by non-maskable FPSP floating-point exception handling is minimal in most cases because such intervention is rarely needed. The most common exception, INEX2, requires no FPSP support. Underflows and overflows are infrequent when the default extended rounding precision is employed. OPERR occurrences are also rare, unless many out-of-range conversions occur from floating-point to integer formats.
User Floating-Point Exception Handlers
Users who require floating-point exception handlers in their applications
running on Macintosh Quadra platforms must exercise some care in both the
writing and the installation of such handlers. Moreover, if an application also
targets Macintosh computers with MC68881/2 coprocessors and intends to resume
processing via an
Each floating-point exception on the 68040 is reported by either the conversion unit (CU) or normalization unit (NU) pipeline stage of the FPU. Exceptions reported by the CU are called E1 exceptions; they are detected relatively early in the execution of an FPU instruction. Exceptions reported by the NU are called E3 exceptions; they are detected late in the execution of FPU instructions as the NU attempts to normalize and round the intermediate result for storage in a destination FP register. E1 exceptions include all floating-point exception types. The only E3 exceptions are OVFL, UNFL, and INEX2 occurring on opclass 0 (register-to-register) and opclass 2 (memory-to-register) instructions. If both E3 and E1 exceptions exist at the same time, the E3 exception should be handled first, allowing the 68040 to subsequently trap to handle the pending E1 exception.
There are two
Both 68040 floating-point exception
As a minimum, user floating-point exception handlers on 68040 platforms must
Minimum Floating-Point Exception Handler for the MC68881/2 and Quadra
The following code sequence serves as a minimum handler for all enabled floating-point exceptions except BSUN on both with MC68881/2 platforms and Quadra computers. This handler simply clears the exceptional condition in the FPU and resumes execution without attempting to modify any other FPU state. A minimal BSUN handler would require additional intervention (via one of four methods outlined in the user manuals for the 68040 and the MC68881/2) to prevent infinite looping on the BSUN trap.
Installation of User Floating-Point Exception Handlers
Current MPW language libraries (MPW 2.0.2 or later releases and Language
Systems FORTRAN version 3.0) provide for the vectoring of user floating-point
exception handlers in a consistent and portable fashion for both Quadra and
MC68881/2 Macintosh platforms. The C functions
FINTRZ FPn,FPm ; truncate to integral value FMOVE.L FPm,<ea> ; convert to integral format
If the application is running in (IEEE 754) default mode (FPCR = $00000000: no exceptions are enabled, rounding precision is extended, rounding direction is round-to-nearest), the following code sequence will accomplish the same conversion with optimal performance on a Quadra and with minimal performance degradation on an MC68881/2 platform:
FMOVE.L #$00000010,FPCR ; set round-to-zero mode FMOVE.L FPn,<ea> ; truncate to integral format FMOVE.L #$00000000,FPCR ; restore default modes
If the user's FPCR setting is not the default, the last sequence must be
modified to save and restore the user's FPCR setting at the cost of several
instructions and some temporary storage. Throughput for these conversions may
be enhanced if the application requires an array of floating-point values to be
converted, because the FPCR needs to be modified only once before and once
after all conversions are done via the
FMOVE.L FPn,<ea> step.
Out-of-range source values result in degraded performance on Quadra computers
due to nonmaskable vectoring to the OPERR handler in the FPSP.
Workarounds for conversions from floating-point values to the unsigned integer formats of C are more complicated and of necessity slower than those to signed integer formats.
In order to minimize trapping to the FPSP for handling of exceptional conditions, data types, or data formats, the following hints may prove useful:
Applications should run with extended rounding precision set in the FPCR.
Temporary storage for intermediate floating-point results should be in extended format and preferably in FP registers.
Applications should avoid the generation of unnormalized extended format values via integer operations with subsequent reliance on the FPU to normalize the results.
Applications should avoid the extensive use of the Motorola packed decimal (P) data format.
The MPW QR6 folder in the E.T.O. #6 Developers CD contains C and Pascal
libraries that have been performance-tuned. In particular, some of the
-mc68881 mode implementations have been modified to obtain better
performance on Quadra platforms. Included among the new implementations are
conversions from floating-point to the unsigned integer formats of C.
Unfortunately, conversions to signed integer formats are generated in-line by
the C compiler and thus still include the
FINTRZ instruction, which is
emulated by the FPSP in Quadra platforms.
FPU operations on Macintosh Quadra platforms are performed by a combination of circuitry in the 68040 microprocessor and emulation code in the FPSP. The 68040 provides very fast implementations of most of the basic floating-point arithmetic functions in the MC68881/2 instruction set. The FPSP emulates all transcendental functions and some arithmetic functions. In addition, the FPSP handles instructions that involve certain data types/formats that are unsupported by the 68040 hardware and fixes up state when certain exceptional conditions arise during processing.
Compatibility of results relative to MC68881/2 platforms holds for all FPU arithmetic instructions, whether or not they are emulated on Quadra computers. Results for transcendental FPU instructions may differ, and they are generally more precise on the Quadra.
FPU applications that run with no floating-point exceptions enabled in the FPCR
and that do not install an unimplemented F-Line Opcode handler will run without
modification on both MC68881/2 and Quadra platforms. User unimplemented F-Line
exception handlers are installed via vector 11 at address
MC68881/2 platforms and via a secondary vector at address
Quadra platforms. Similarly, installation of user floating-point exception
handlers for enabled exceptions must take care not to overwrite entry points to
the FPSP on Quadra platforms. MPW libraries provide high-level installation
procedures for user floating-point exception handlers. If such handlers are to
run on all FPU platforms, they must take into account the differences in
FSAVE state frames for Quadra and MC68881/2 platforms.
Optimizing FPU performance on Quadra computers is largely a matter of understanding the conditions under which the FPSP is invoked and then avoiding such conditions via workarounds whenever possible. Code sequences thus optimized for Quadra computers will often provide less than optimal performance on MC68881/2 platforms.
MC68881/MC68882 Floating-Point Coprocessor User's Manual
MC68040 32-Bit Microprocessor User's Manual
MC68040 Designer's Manual, Section 3: Floating-Point Emulation
M68000 Family Programmer's Reference Manual
IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Std 754-1985)
Acrobat version of this Note (84K)
Contact ADC | ADC Site Map | ADC Advanced Search
|For information about Apple Products, please visit Apple.com.|
Contact Apple | Privacy Notice
Copyright © 2002 Apple Computer, Inc. All rights reserved.