N64 Programming Manual Chapter 11

RSP Overview

This document describes the graphics state machine of the RCP, with a particular focus on the RSP (please see Section 3.3, "RSP: Reality Signal Processor”).

The RSP is an R4000-like CPU with an 8-element vector unit, featuring a small instruction memory, IMEM (4K bytes or 1K instructions) and small data memory, DMEM (4K bytes). Software running on this processor implements a large portion of the geometry display pipeline.

In addition, the RSP provides visibility for all of the RCP functionality, through a variety of software conventions and hardware exposure. All “display lists” for the RCP graphics features must pass through the RSP. There are several important features which require the application programmer to be consciously aware of the distinctions between the RSP and the RDP (and program each of them separately), but for the most part, the RSP serves as the single interface between the application program and the graphics pipeline:

Figure 11.1.1 Nintendo 64 Graphics Pipeline

Topics covered in this document include:

RSP overview

display list processing

matrix state

vertex state

vertex lighting state

texture state

clipping and culling

primitives

controlling the RDP state

RSP Overview
A program which runs on the RSP is called a task; the application is completely responsible for scheduling and invoking tasks on the RSP.

The interface between the application and the RSP task is accomplished with a series of operating system calls, and a structure called the task list (or task header) which is type OSTask (defined in sptask.h). The task list contains all the information necessary to begin task execution, including pointers to the microcode to run. This structure is filled in by the application program.

A detailed description of invocation of a task on the RSP is beyond the scope of this section (please see Section 4.7, “RCP Task Management”), but the essential procedure is straightforward:

the RSP is assumed to be halted (or the R4300 halts it).
the R4300 DMA’s the boot microcode into the RSP IMEM.
the R4300 DMA’s the ‘task header’ into the RSP DMEM.
the R4300 sets the RSP PC to 0.
the R4300 clears the RSP halt status (allowing it to run).

From this point, the boot microcode takes over, loading the task microcode (and data) specified in the task list, and jumping to the beginning of the task.

One item in the task header is a pointer to the initial data to process (in the case of a graphics task, this is a display list pointer).

Display List Format
The display list which the gspFast3D, gspF3DNoN, or gspLine3D microcode running on the RCP interprets is defined as a stream of 64-bit commands.

Applications written in C will usually use the interface from the file gbi.h., which will be included via inclusion of ultra64.h. Although the construction of display lists looks like a familiar series of function calls, they are actually just bit-packing macros. These macros are described in detail in their individual man pages.

Each macro has two forms, i.e. gSPTexture() and gsSPTexture(). The difference between ‘g’ and ‘gs’, is that the ‘g’ form is an in-line form which requires an additional argument (pointer of the display list being constructed). The display list pointer must be of the form “ptr++”, in order for the macros to work properly.

The ‘gs’ form is for static declarations, and generates the appropriate C structure initialization sequence.

Throughout this document, only the ‘gs’ form is mentioned, however the ‘g’ form also applies, and could always be substituted.

All of the display list building macros also embed an ‘SP’ or a ‘DP’ to describe the functional unit of the RCP which will operate on this command. This is certainly confusing, especially to application programmers familiar with higher-level graphics API’s such as OpenGL. In order to achieve maximum performance, it is necessary to expose the two major units of the RCP to the application programmer. The primary reason for this is resource constraints; there is simply not enough RSP IMEM to build a display list processor that is rich enough to hide these details from the application programmer. In addition, given the dedicated application of the RCP (video games), any CPU cycles spent “gift-wrapping” the graphics API are a waste of time. The binary encoding of most of the display list commands is the lowest possible level: they are the bits that control the hardware.

Exposing the two functional units of the RCP also limits the amount of state shared between them. The major drawback of this design decision is that you must often tell the same thing to the RSP and the RDP. For example, in order to “turn on texture mapping” you must turn it on in the RSP and turn it on in the RDP. This may seem clumsy at first, and indeed this is a common source of display list bugs, but the parallel execution of the RSP and RDP, plus the lean display list processing machine make this trade-off worthwhile.

Segmented Memory and the RSP Memory Map
All DRAM addresses in the display list are segmented addresses. The mapping of segments and their base addresses is provided using the gsSPSegment() macro. It is the responsibility of the application to maintain this mapping and inform the RSP via the display list.

The RSP maintains an associative table of up to 16 segment ID’s and their base addresses. Any DRAM address in the display list is ‘physical-ized’ using this table.

The RDP only uses physical addresses, and one of the chores of the RSP is to do the address translation necessary for the RDP.

Note: By convention, segment table entry 0 is reserved for physical addressing, and should be set to 0x0.

The RSP software can only access DMEM. All data must first be transferred into DMEM using DMA operations, which must be 64-bit aligned. Invocation of the DMA engine is handled by the RSP software, but the application programmer needs to be aware of the boundary requirements. Any data structure that is to be passed to the RSP must be aligned to a 64-bit boundary. The structures in gbi.h use C unions to guarantee this.

Since the DMA engine is shared between the R4300 and the RSP, the application program should also avoid unnecessary DMA activity while the RSP is running.

Interaction Between the RSP and R4300 Memory Caching
The most prevalent example of communication between the CPU and the RSP is that of the CPU creating a display list in DRAM for eventual interpretation by the RSP. The display list data is read from DRAM via a DMA mechanism. Unfortunately, DRAM locations may be “stale” with respect to newer data being held in the R4300’s data cache. The R4300 cache mechanism implements a “write-back” caching policy which means individual stores to memory are not immediately written to memory. To update the memory contents with more recent cached data, the CPU must first write back cached data to the DRAM. Then, and only then, will the RSP be able to DMA the correct data for display list processing.

Conversely, the contents of memory may be more recent than cached data in some situations when the RSP modifies memory (an obvious example is updating the color frame buffer). In this case, the CPU’s cache may contain stale data and the CPU should invalidate the cached data to force an access directly to DRAM and get the most recent data. As a practical note, this second scenario only arises in advanced applications.