Past Microcode

This section includes information related to older version microcode:

Note: Fast3D and Line3D microcode have migrated to F3DEX/L3DEX.

The following microcode is not currently supported:


gspFast3D
This microcode comprises the following six object files:

This is the optimized, high-quality, full-featured 3D polygonal geometry RSP microcode. It supports 3D clipping, lighting, texture coordination generation, fog, and matrix stack.

The gspFast3D, gspFast3D.dram, and gspFast3D.fifo versions of the microcode are equivalent to the gspF3DNoN, gspF3DNoN.dram, and gspF3DNoN.fifo versions (respectively), with the difference that near clipping is performed at the Near Clipping plane in the former 3 versions, and performed at the eyepoint in the latter 3 versions.

All computations are performed with as much precision as is practical, in order to create the highest quality images. (Please see "Note for Near Clipping" for additional details.

GBI
The following GBI command is not supported by the microcode:

gSPLine3d

Unsupported GBI Macros
All gSPLine3D macros are complied as no-ops, so they have no effect.

Performance
In order from the fastest to the slowest, the following types of triangles can be generated by this microcode:

Triangle attribute computation is heavily vectorized, so generation of Gouraud shading attributes is essentially free, if you are generating any other attributes.

Vertex transformations and lighting calculations are heavily vectorized, so it is best to operate on as many vertices as possible. Even number-sized loads are more efficient because vertices are processed in groups of two.

When doing lighting, and any vertices are clipped, clipping and lighting are implemented as ucode overlays, using a most recently used algorithm. Lighting happens at vertex load time and clipping happens at triangle draw time, so this division of microcode is acceptable. However, a display list that loads only a few vertices at a time and then draws a small number of triangles, would not amortize the microcode swapping overhead very effectively.

The RCP is designed to draw high quality textured primitives. Where possible, use texture-mapping to achieve visual complexity rather than additional geometry.

Notes on Different Versions
There are some differences when calling the DRAM and fifo versions of this microcode.

gspFast3D
The flags field of any task followed by this task should have OS_TASK_DP_WAIT set. If more than one task using this microcode is called in the same frame, then only the last task should contain a gSPFullSync in its display list. This microcode takes care of sending all output to the RDP. When using this microcode, it is not necessary to specify output_buff or an output_buff_size. (These fields of task header can be set to 0.)

gspFast3D.dram
Tasks using this microcode need to set the OS_TASK_DP_WAIT flag only if they follow a task using gspFast3D or gspFast3D.fifo. This microcode sends its data to a buffer in DRAM and not to the RDP. The CPU must then cause the buffer to be sent to the RDP. The buffer is pointed to by output_buff in the task header. This must point to a buffer which is at least as big as the maximum RDP display list that can be generated by the task. Remember that when geometry gets clipped RDP lists will expand, so leave extra room. If the buffer is not large enough to store the entire RDP display list, other memory areas will be overwritten. After the RSP finishes its process, the buffer can be sent to the RSP using the osDpSetNextBuffer command. The length of data in the buffer,which is needed for osDpSetNextBuffer, is written at an address specified by rdp_output_len in the task header.

While the display list is being sent to the RDP, the RSP can execute other DRAM microcode, whose output_buff is different, or audio tasks. When gDPFullSync is not included in the display list to be sent to the RDP, other RDP display lists can be sent (from the RSP task to other buffers) using other osDpSetNextBuffer commands. However, when gDPFullSync is included in the display list, neither send other RDP display lists using osDpSetNextBuffer nor start gspFast3D or gspFast3D.fifo tasks until completing the display list.

gspFast3D.fifo
A task that uses this microcode and is followed by a gspFast3D task or a osDpSetNextBuffer command needs to set the OS_TASK_DP_WAIT flag. This microcode watches transmission of the display list to the RDP. A buffer specified by output_buf in the task header is used. The buffer must be cache aligned. Output_buff_size must be the pointer for byte followed by last byte of the buffer. The larger the buffer is, the more practical the interface between the RSP and the RDP. When there are multiple tasks in parallel which use fifo microcode, only the last task in a frame must include gDPFullSync. When there are multiple tasks continuously which use fifo microcode, all tasks must use the output_buff buffer. (Each task can use a different buffer, however, it is more efficient to use one large buffer for all tasks.)

Note for Near Clipping
Near Clipping removes geometry either behind the viewer or between the viewer and the Near Clipping plane. In actual circumstances, an object never disappears when getting closer to the viewpoint, so it should not happen in a N64 program. One way to achieve this is to locate the near plane very close to the viewpoint. (By calling guPerspective, make the near value small.) However, it does not always work because the smaller ratio of near/far makes the accuracy of Z and texture mapping worse.

Another way to accomplish this is to use a gspF3DNoN microcode (or its DRAM or fifo version) which does Near Clipping. An object behind the viewer is clipped and an object far from the near plane is visible. However, an object between the near plane and the viewer is also visible. In this way, the near value can be increased without geometry disappearing between the viewpoint and the near plane.

Z buffering never functions in the area between the viewpoint and the near plane. As a result, objects between the near plane and viewer never hide each other. For example, in an asteroid type game, when an asteroid approaches the view point closer than the near plane, the asteroid is drawn correctly. (objects far from the near plane are hidden.) However, when two asteroids approach closer than the near plane, they cannot be hidden correctly.

Default RDP State
Whenever a graphic task is first started, some of the RDP states are initialized to their default states. The rest of the states keep their previous values. After restarting from yield, RDP states are restored with states set at yield. The following are RDP default settings:


gspLine3D
This is optimized, high-quality, completely functional, and 3D line RSP microcode.

The gspLine3D microcode supports 3D clipping, matrix stack, and gouraud shading.

The .dram version controls output transfer of the RDP display list to the memory buffer (RDRAM) instead of the RDP.

All processing is accurate enough to create high quality images.

All gSP1Triangle commands create the three edges of the triangles to be drawn. Please note, when drawing two triangles next to each other, both shared edges are drawn. So, this takes additional time for processing. The command (gSP1Triangle) is not efficient, so line microcode should be used only for debugging.

The calculation for line attributes is dealt with by using vectors efficiently so that the load when using gouraud shading attributes can be ignored when generating other attributes.

The default RDP state is same as the default state written in gspFast3D


gspSprite2D
This is the optimized, high-quality, full-featured 2D sprite geometry microcode. It supports automatic subdivision and loads any size of all of the texture format sizes and types supported in the command, and sends it directly to the RDP. Additionally, images can be scaled up or inverted in the X or Y directions.

Commands
The sprite microcode is accessed through the following functions/macros:

Simple Code for Displaying a Sprite

#include "gu.h"

#include "gbi.h"
     uSprite MySprite;
     guSprite2DInit(Mysprite, ImagePointer,
        TlutPointer, ImageWidth,
        RectangleWidth, RetangleHeight, 
        ImageType, ImageSize, 
        TextureStartS, TextureStartT);

gSPSprite2DBase(glistp++,
    OS_K0_TO_PHYSICAL(MySprite));

gSPSprite2DScaleFlip (glistp++, ScaleX, ScaleY, 
   FlipTextureX, FlipTextureY); 

gSPSprite2DDraw (glistp++, PScreenX, PScreenY)

typedef struct {
     void *SourceImagePointer;
     void *TlutPointer;
     short Stride;
     short SubImageWidth;
     short SubImageHeight;
     char  SourceImageType;
     char  SourceImageBitSize;
     short SourceImageOffsetS;
     short SourceImageOffsetT;
   /* 20 bytes for above */
 
   /* padding to bring structure size to
      64-bit alignment */;
     char dummy[4];
 
} uSprite_t;
 
typedef union {
     uSprite_t  s;
    /* Ensure this is 64-bit aligned */;
     long long int force_structure_alignment[3];
} uSprite;
 
     void guSprite2DInit(uSprite *SpritePointer,
     void *SourceImagePointer,
     void *TlutPointer,
     int Stride,
     int SubImageWidth,
     int SubImageHeight,
     int SourceImageType,
     int SourceImageBitSize,
     int SourceImageOffsetS,
     int SourceImageOffsetT);

Arguments

GBI
The following GBI commands are not supported by this microcode:

Note Regarding Z-Buffering
The sprite microcode does not directly support Z-Buffering. This is unnecessary as Z-Buffering can be accomplished outside of the sprite microcode by setting up the proper rendering mode and making use of the hardware primitive depth registers. Following is a code fragment that does Z-Buffering.

gDPSetRenderMode(glistp++,
   G_RM_AA_ZB_OPA_SURF,
   G_RM_AA_ZB_OPA_SURF2);

gDPSetDepthSource(glistp++,
   _ZS_PRIM);

gDPSetCombineMode(glistp++,
   G_CC_DECALRGB, G_CC_DECALRGB);

gDPSetPrimDepth(glistp++,
   ZBufferValue, 0);

guSprite2DInit(MySprite, ImagePointer,
   TlutPointer, ImageWidth,
   RectangleWidth, RectangleHeight, 
   ImageType, ImageSize, 
   TextureStartS, TextureStartT); 

gSPSprite2DBase(glistp++,
   OS_K0_TO_PHYSICAL(MySprite)); 

gSPSprite2DScaleFlip(glistp++,
   ScaleX, ScaleY, 
   FlipTextureX, FlipTextureY); 

gSPSprite2DDraw(glistp++, PScreenX, PScreenY);

Warnings, Limitations, and Workarounds
Images that have been non-unit scaled and flipped around the Y axis may not be smoothly converted in the vertical direction, depending on the quantity of sub-pixels. Jumping will occur at a certain quantity. The solution is to convert non-unit scaling to unit amounts in the vertical direction.

The Sprite Microcode was designed to be able to scale up images by any amount. Images can also be scaled down together with some attendant artifacts. Please note that, while the TextureScaleX and TextureScaleY parameters are s 5.10 fixed-point numbers, they are restricted to being positive. Consequently, the largest usable scale value is 32767, which corresponds to a texel to pixel ratio of 31.999. Texture images that are either scaled in the Y axis or placed on a subpixel scanline boundary require filtering by the hardware texture filter unit. This filtering requires that at least one extra line in the screen image be loaded in the texture memory so that the filtering can occur.

The texture memory is limited to 4K bytes, so there are some restrictions:



gspTurbo3D
The gspTurbo3D microcode is a reduced-feature, reduced-precision microcode that delivers significantly faster performance.

All three subtypes (.o, .dram.o, and .fifo.o) are low accuracy, simplified 3D polygon geometry RSP microcodes that work effectively for characters and objects that are always displayed near the center of the view area. All processing is done with low accuracy to increase speed. However, this low degree of accuracy is reflected in the objects.

The DRAM version writes its output (RDP display list) into a memory buffer instead of transferring it to the RDP.

The FIFO version transfers data to the RDP by using DRAM FIFO.

Features Not Supported by gspTurbo3D

Turbo Display List
The turbo display list is a linear list of object structures that ends with a NULL object (the object state is a NULL object).
#include "gt.h"

typedef struct {
  gtGlobState *gstatep; // global state, usually NULL

  gtState *statep;      // when NULL, object
                        // processing is finished

  Vtx *vtxp;            // when NULL, point in
                        // buffer is used

  gtTriN *trip;         // when NULL,
                        // nothing is drawn
} gtGfx_t;

typedef union {
  gtGfx_t obj;
  long long int force_structure_alignment;
} gtGfx;
Each object structure includes 4 pointers (global state, object state, vertex list, and triangle list) for a total of 16 bytes.

When a global state pointer or vertex list pointer is NULL, the one in current DMEM is used. When the triangle list pointer is NULL, the triangle is not generated. When the object state pointer is NULL, the end of display list is assumed.

Turbo Global State
Following is the turbo global state structure.

Because it is specific to this microcode, you must change the microcode when you change the structure.

#include "gt.h" 

typedef struct {
     u16 perspNorm; // normalization of perspective
     u16 pad0;
     u32 flag;
     Gfx rdpOthermode;
     u32 segBases[16]; // segment base address
     Vp viewport;  // view-port
     Gfx *rdpCmds; // RDP data block when NULL block
                   // ended by gDPEndDisplayList
} gtGlobState_t;

/* Note: Although there are 16 segment 
 * table entries, the first segment (segment 0) 
 * is reserved for physical memory mapping. 
 * Therefore, segment 0 cannot be used. */

typedef union {
     gtGlobState_t sp
     long long int force_structure_alignment;
} gtGlobState;
The global state includes data that is unlikely to change and that is also the prime of each object. A format of the global state structure is exactly the same as DMEM and this structure is simply copied to DMEM.

The perspNorm field is used while transforming a vertex (see gSPPerspNormalize).

The rdpOthermode field includes the DP command SetOtherMode which is sent before sending any other DP commands.

The segBases array includes a 16-segment base address. Its entry 0 is reserved for physical memory mapping, so it cannot be used.

The viewport is used while transforming a vertex.

The rdpCmds points to a DP command block. When this pointer is not NULL, the macros in the DP command block are transferred to the RDP. The list of DP macros in the DP command block must end with the gDPEndDispley macro. Some DP macros (given later on this page) cannot use the DP command block.

Turbo Object State
The turbo object state structure is shown below.

This is 'state' structure, which is linked to each object to be rendered. This is limited to microcode. When you change its structure, you must also change the gtoff.c tool and microcode.

#include "gt.h"

typedef struct {
     u32 renderState;  // render state
     u32 textureState; // texture state
     u8 vtxCount; // number of vertex
     u8 vtxV0; // vertex load address
     u8 triCount; // number of triangles
     u8 flag;
     Gfx *rdpCmds;
     Gfx rdpOthermode;
     Mtx transform; // transformation matrix
} gtState_t;

typedef union {
     gtState_t sp;
     long long int force_structure_alignment;
} gtState;	// same as gtStateLite : gtState, 
	      // but not matrix. (see flag)
            // This structure must go
            // through gtState.

typedef struct {
   u32 renderState; // render state
   u32 textureState; // texture state
   u8 vtxCount; // number of vertex
   u8 vtxV0; // vertex load address
   u8 triCount; // number of triangles
   u8 flag;
   Gfx *rdpCmds; // pointer for RDP DL
                   // (segment address)
   Gfx rdpOthermode;
} gtStateL_t;

typedef union {
   gtStateL_t sp;
   long long int force_structure_alignment;
} gtStateL;
The gtStateL version of the state structure can be used when a new matrix is not necessary. This is good for large objects that need to be placed among some turbo objects. The same transformation matrix can be used for all of its parts. You must set the GT_FLAG_NOMTX flag when using the gtStateL version of the state structure.

The renderState field is similar to geometry mode in gbi.h. It uses the following flags which are bit OR'd together: The textureState field has a texture tile number in the lower three bits of its field. All primitives in an object are drawn by using the same tile.

Turbo Vertex
The vertex list is an aggregation of vertex structures. It uses the same format as the vertex format in gbi.h. Please see gSPVertex for details.

The vertex cache in the turbo microcode can read 64 vertices. The vertex is transformed when it is loaded.

Turbo Triangle List
The triangle list is an aggregation of the following structure.

The following structure represents a single triangle, which is one of a list of triangle objects to be rendered. The triangle list has an 8-byte limit. This structure is only 4-bytes, so it is assumed that this triangle is an element in an array. It is also assumed that the array is arranged in 8-byte units.

#include "gt.h"

typedef struct {
   u8 v0, v1, v2, flag; // flag for flat shading
   } gtTriN;
This array must be aligned to an 8-byte boundary.

GBI DL Command
The turbo microcode uses a completely different display list format, so the GBI DL command is not supported.

However, the global and object states of the DP command block are supported. These commands are the same format (and same microcode) as the one in gbi.h. Some DP commands are not supported because the DP state operation is not appropriate for the interface between turbo geometry and turbo display list processes.

Unsupported DP GBI Macros
The turbo microcodes do not support the following DP GBI commands: Most of these can be set by using the gtStateSetOthermode interface.

Performance
This microcode generates the following triangle types in order of speed, beginning with the fastest:

Because vectors are used efficiently for a calculation of the triangle attributes, you can calculate the gouraud shading attributes without limitations when other attributes are also generated.

Z buffering the triangle needs a few additional processes.

Because vectors are used for efficient vertex transformation, it is the best to operate as many vertices as possible. Loading vertices in a multiple of four is the most effective method.

The RCP is designed to be able to draw high-quality texture primitives. Texture mapping should be used where possible (instead of additional geometry) in order to achieve more complicated graphics.

Caution
This is first release of this microcode. Its functions and display list format will be changed in the future.

Cracks and tears sometimes appear because the calculation of the edge slope is simplified.

Copyright © 1999
Nintendo of America Inc. All Rights Reserved
Nintendo and N64 are registered trademarks of Nintendo
Last Updated January, 1999