About PCI Performance on the Power Macintosh
A good place to start addressing PCI performance on Power Macintosh CPUs is the PCI standard itself. The Bus Specification, Revision 2.0, features a 32-bit data path -- upgradeable to 64-bits -- with synchronous bus operation up to 33 Mhz, and the ability to transfer a data object on the raising edge of each PCI clock cycle. Assuming that neither the initiator nor the target inserts wait states during each data phase, the maximum theoretical bandwidth over a 32-bit bus is 132 Mbytes/second. This also assumes continuous bursting with a 32-bit data object transferred on each PCI clock cycle. (Apple's implementation incorporates a 32-bit data bus.
Since the IB chip competes for system memory along with other system devices, continuous PCI bursting is not possible. Therefore, the achievable PCI bandwidth on Power Macintosh computers -- a significant improvement from NuBus -- will be less than the PCI theoretical maximum. Also, the bandwidth will be dependent on the PCI target's hardware design and the architecture of the driver software.
A PCI burst transfer is defined by one PCI bus transaction with a signal address phase followed by two or more data phases. One may ask, how can the bus master transfer a data object on each PCI clock cycle? To initiate a bus transaction, the PCI master only has to arbitrate for ownership of the bus one time. The master then issues the start address and transaction type during the address phase. It's the responsibility of the target device to latch the start address into an address counter and increment the addressing from data phase to data phase. (A single-beat read or write transaction is defined by a signal address phase followed by only one data phase..
For data to be transferred between the PowerPC Processor and the PCI Target, or for the PCI Target to transfer data between system memory, one of the following commands is initiated, as shown in Table 1.
Table 1. Commands between PowerPC Processor and PCI Bus
The Configuration Read and Configuration Write commands are used to transfer data between the Processor and the PCI target's Configuration registers during system initialization.
The Memory Read and Memory Write commands are used to transfer data between the PCI Master and the Target's memory space.
The Memory Read Line command is used by the PCI Master to transfer a cache line of data from the PCI Target's memory space.
The Memory Read Multiple command is used by the PCI Master to transfer more than one cache line of data from the PCI Target's memory space.
The Memory Write and Invalidate command is used by the PCI Master to transfer one or more complete cache lines of data to the PCI Target's memory space.
About the PowerPC Processor and PCI Commands
Now with the basics of the PCI bus under our belt, let's move on to important details regarding PCI Power Macintosh computers. The PowerPC processor has a 64-bit data bus and its system memory space defaults to write back cache mode, while the PCI bus is 32-bits wide and the PPC processor sets PCI address space to cache inhibit mode. For PPC initiated read and write transactions between PCI memory space, the IB chip (the PowerPC Processor to PCI Bridge) will initiate basically one of the three following types of PCI commands:
As per the PCI Specification Revision 2.0, PCI Power Macintosh Computers support PCI I/O space. PCI I/O commands and Mac OS services available for them are addressed later in this Technote.
With the basics of the PCI bus described and details of the Power Macintosh PCI implementation outlined, this should be ample background to describe the functionality of the IB chip. In particular, under what circumstance will it perform what type of PCI command?
Bursting from PowerPC to PCI
Provided software is written to utilize floating-point load and store instructions -- as opposed to integer operations -- the IB chip will burst a two-beat Memory Read or Memory Write command (two 4-byte data phases with one PCI transaction). The PowerPC floating-point data is 8-bytes wide and integer data is 4-bytes. Utilizing floating-point instructions in effect nearly doubles the PCI bandwidth over single-beat PCI Memory Read or Write commands. This is worth investigating for solutions where the PCI hardware does not support cache line bursting.
If the PCI target's address space is set to write thru cache mode, the IB chip will perform an eight-beat burst read on PCI with the Memory Read Line command. This translates to a cache line, eight 4-byte long words, i.e. 32-bytes.
If the PCI target's address space is set to write back cache mode, the IB chip will perform an eight-beat burst write on PCI with the Memory Write and Invalidate command.
Bursting from PCI to PowerPC
If the address is aligned on an 8-byte boundary, the IB chip will respond to PCI Memory Read and Memory Write commands by a two-beat PCI transaction to align two 32-bit PCI data words to the 64-bit PowerPC bus. On non-8-byte-aligned addresses, single-beat transactions are implemented.
The PCI Memory Write and Invalidate command will perform an 8-beat transaction if the address is aligned on a 32-byte boundary.
The PCI Memory Read Line or Memory Read Multiple commands will perform an eight-beat transaction if the address is aligned to an address less than or equal to 8-bytes less than the next 32-byte boundary. The PCI Memory Read Line and Memory Read Multiple commands are treated the same by the IB chip, in either case the IB chip will disconnect after an eight-beat transaction -- one 32-byte cache line.
As mentioned earlier, 132 Mbytes/sec is the maximum theoretical bandwidth across a 32-bit PCI bus at 33 Mhz. Table 2 and Table 3 show the maximum achievable bandwidth that can be expected, depending on the type of PCI transaction performed. Please note these values are not guaranteed but are realistic ranges that have been measured moving large buffers (many thousands of bytes) -- to average out PCI arbitration PCI wait states -- across a Power Macintosh Computer's PCI bus.
The numbers in Tables 2 and Table 3 are based on the following assumptions:
PCI Target responses during PowerPC Processor to PCI transactions:
PCI Master requirements during PCI Master with System Memory transactions:
Table 2. PowerPC Processor to PCI Maximum Bandwidth Summary
Table 3. PCI Master to System Memory Maximum Bandwidth Summary
About the Mac OS & Services to Maximize PCI Throughput
Now that the hardware level basics have been examined for PCI Power Macintosh Computers, let's move up to the Mac OS level and review services available to maximize PCI throughput. It's important to mention first that for second generation PCI Power Macintosh Computers, there is a new PCI driver environment -- or I/O architecture -- available in the reference release Mac OS version 7.5.2. Refer to Designing PCI Cards and Drivers for Power Macintosh Computers.
With this reference release OS, Apple starts to separate between APIs (Application Programming Services) and SPI (System Programming Services). In this present Mac OS release and the future direction, such as Copland, APIs and toolbox services are no longer available to driver SW. The Mac OS version 7.5.2 provides a DSL (Driver Services Library) that implements all SPI services available for drivers; documented in Designing PCI Cards and Drivers for Power Macintosh Computers, Chapter 9.
To coordinate I/O operations that transfer buffers between system memory and PCI address space, the Macintosh OS provides two functions with the DSL (Driver Services Library):
Remembering that PCI address space defaults to cache inhibit mode, to enable the PowerPC to burst to areas of PCI memory space, that area must be set
to cacheable setting. This can be done with the
Be advised that the
As an example, if two cards (card x and card y) have addresses mapped into segment 8, one at 0x80800000 and another at 0x80801000, the first call to
Extensions to the
Table 4 lists the different BlockMove functions provided in the DSL
Table 4. BlockMove functions provided in the DSL
The difference between
To summarize the
A common question from PCI developers is, how to initiate a PCI burst of a cache line? Provided the PCI address space is marked cacheable as explained
To read or write PCI I/O space, the Expansion Bus Manager provides routines to transfer data -- byte, word, or long word (8, 16, or 32 bits, respectively) -- using PCI I/O Read and I/O Write commands. The Expansion Bus Manager is part of the ROM firmware in PCI Power Macintosh CPUs. These routines also perform appropriate byte swapping. For a further description, refer to Designing PCI Cards and Drivers for Power Macintosh Computers, chapter 10. PCI cards that are limited to I/O space, and do not incorporate PCI memory space, are limited to PCI I/O Read and I/O Write commands to transfer data between the PPC and PCI target. If PCI I/O data needs to be processed quickly, note there is a significant performance hit using Expansion Manager Routines. These routines are intended for PCI targets that have I/O registers or low bandwidth I/O buffers. The IB chip does not burst PCI I/O Read nor burst PCI I/O Write commands.
As described in chapter 10 of Designing PCI Cards and Drivers for Power Macintosh Computers along with sample code, the PCI property "assigned-addresses" provides vector entries that represent physical addresses on PCI cards. Using the "APPL,address" property a driver can locate a logical address of a physical I/O resource. By accessing the logical I/O address the IB chip will generate the appropriate PCI I/O command. Therefore a driver can generate PCI I/O commands without using the Expansion Bus Manager Routines; the same way it accesses PCI memory space. This provides the fastest way to access I/O space, but note it does not perform byte swapping as the Expansion Bus Manager routines.
Also note, the Expansion Bus Manager provides OS services to generate PCI Configuration Read, Configuration Write, Interrupt Acknowledge, and Special Cycle commands.
The PCI bus on Power Macintosh computers delivers higher I/O performance along with lower costs and complexity from the previous NuBus architecture. PCI also represents an emerging standard in the desktop PC industry. To maximize bus performance, utilize the services available in the Driver Services Library, and pay close attention to PCI chip selection -- in particular, chips that can execute cache line burst transactions with Memory Read Line, Memory Read Multiple, and Memory Write and Invalidate commands. And consider Designing PCI Cards and Drivers For Power Macintosh Computers as essential documentation for successful PCI development on the Mac platform.
Designing PCI Cards and Drivers For Power Macintosh Computers
Creating PCI Device Drivers, develop, The Apple Technical Journal, Issue 22
The New Device Drivers: Memory Matters, develop, The Apple Technical Journal, Issue 24
Contact ADC | ADC Site Map | ADC Advanced Search
|For information about Apple Products, please visit Apple.com.|
Contact Apple | Privacy Notice
Copyright © 2002 Apple Computer, Inc. All rights reserved.