Elysian Shadows

DMA Rendering and Store Queues with KOS on the Sega Dreamcast

Here's how I initialize it:

// Vertex buffer sizes #define GLOBAL_VBUFSIZE(1024 * 1024) #define OP_POLY_VBUFSIZE(1024 * 1024) #define OP_MOD_VBUFSIZE(256 * 1024) #define TR_POLY_VBUFSIZE(512 * 1024) #define TR_MOD_VBUFSIZE(256 * 1024) #define PT_POLY_VBUFSIZE(512 * 1024) // List buffers static uint8 op_poly_buf[OP_POLY_VBUFSIZE]__attribute__((aligned(32))); static uint8 op_mod_buf[OP_MOD_VBUFSIZE]__attribute__((aligned(32))); static uint8 tr_poly_buf[TR_POLY_VBUFSIZE]__attribute__((aligned(32))); static uint8 tr_mod_buf[TR_MOD_VBUFSIZE]__attribute__((aligned(32))); static uint8 pt_poly_buf[PT_POLY_VBUFSIZE]__attribute__((aligned(32))); int init_pvr() {   // KOS sets up a 640x480 video mode for us automatically so we don't   // bother with that. Instead, initialize the pvr.   pvr_init_params_t pvr_params;   pvr_params.vertex_buf_size= GLOBAL_VBUFSIZE;   pvr_params.dma_enabled= 1;   pvr_params.fsaa_enabled= 0;   pvr_params.opb_sizes[PVR_LIST_OP_POLY]= PVR_BINSIZE_32;   pvr_params.opb_sizes[PVR_LIST_OP_MOD]= PVR_BINSIZE_8;   pvr_params.opb_sizes[PVR_LIST_TR_POLY]= PVR_BINSIZE_16;   pvr_params.opb_sizes[PVR_LIST_TR_MOD]= PVR_BINSIZE_8;   pvr_params.opb_sizes[PVR_LIST_PT_POLY]= PVR_BINSIZE_16;      if ( pvr_init( &pvr_params ) < 0 )     return -1;        // Set up pointers to list buffers   pvr_set_vertbuf( PVR_LIST_OP_POLY, op_poly_buf, OP_POLY_VBUFSIZE );   pvr_set_vertbuf( PVR_LIST_OP_MOD,  op_mod_buf,  OP_MOD_VBUFSIZE  );   pvr_set_vertbuf( PVR_LIST_TR_POLY, tr_poly_buf, TR_POLY_VBUFSIZE );   pvr_set_vertbuf( PVR_LIST_TR_MOD,  tr_mod_buf,  TR_MOD_VBUFSIZE  );   pvr_set_vertbuf( PVR_LIST_PT_POLY, pt_poly_buf, PT_POLY_VBUFSIZE );   pvr_set_pal_format( PVR_PAL_ARGB8888 );   return 0; } 

With that set up, we have two new KOS functions to play with.

void * pvr_vertbuf_tail(pvr_list_t list);

Returns a pointer to the tail of the given list.

void * pvr_vertbuf_tail(pvr_list_t list);

Tells KOS we've written some data to the given list.

Those will allow one to send stuff to the vertex buffers manually which is what we want.

What I've done is written a small API for transferring strip headers and verts directly to the list buffers using store queues. It is used like:

pvrBeginScene(); pvrBeginPrimitives( list ); pvrSendStripHeader( &header ); pvrSendVertexXX( ... ); pvrSendVertexXX( ... ); pvrSendVertexXX( ... ); pvrEndPrimitives(); pvrEndScene(); 

These are the same as they've always been and should be self-explanatory:

void pvrBeginScene( void ) {   vid_border_color(255, 0, 0);   pvr_wait_ready();   vid_border_color(0, 255, 0);   pvr_scene_begin(); } void pvrEndScene( void ) {   vid_border_color(0, 0, 255);   pvr_scene_finish(); } 

pvrBeginPrimitives sets up the store queues for burst transfer to the list buffers.

_start_ptr points to the list buffer tail at the time pvrBeginPrimitives was called.

_current_ptr points to the list buffer tail we're currently writing to and increments with each write.

pvrEndPrimitives uses the difference between the two pointers to determine how much was written and reports this to KOS.

static pvr_list_t_current_list = PVR_LIST_OP_POLY; static uint32*_start_ptr = NULL; static uint32*_current_ptr = NULL; void pvrBeginPrimitives( pvr_list_t list ) {   _current_list = list;   uint32* ptr = (uint32*)pvr_vertbuf_tail( _current_list );   QACR0 = ((((uint32)ptr) >> 26) << 2) & 0x1c;   QACR1 = ((((uint32)ptr) >> 26) << 2) & 0x1c;   _start_ptr = _current_ptr = (uint32 *) (0xe0000000 | (((uint32)ptr) & 0x03ffffe0)); } void pvrEndPrimitives( void ) {   uint32 bytes = (uint32)_current_ptr - (uint32)_start_ptr;   if ( bytes != 0 )   pvr_vertbuf_written( _current_list, bytes ); } 

To actually transfer something, you grab _current_ptr and write your data to the location it's pointing to. For every 32 bytes you write, you have to issue a store queue burst transfer. This is done by performing a prefetch. 

#define PREFETCH(addr) __asm__ __volatile__("pref @%0" : : "r" (addr))

As for submitting strip headers... It's tightly coupled with my strip header library, and that's a whole different issue. You could just transfer the headers KOS creates for you, but you'd be better off doing it yourself (or using my lib).

There are [b]18[/b] different vertex formats supported by the DC, and posting them all would probably result in way too much code. So I'll show the two you're most used to. 03 is the standard pvr_vertex_t structure you've seen in KOS and 16 is for textured sprites.

void pvrSendVertex03( uint32 flags, float x, float y, float z, float u, float v, uint32 argb, uint32 oargb ) {   register uint32* _ptr = _current_ptr;   _ptr += 8;   *--_ptr= oargb;   *--_ptr= argb;   *(float*)--_ptr= v;   *(float*)--_ptr= u;   *(float*)--_ptr= z;   *(float*)--_ptr= y;   *(float*)--_ptr= x;   *--_ptr= flags;   PREFETCH((void *)_ptr);   _ptr += 8;   _current_ptr = _ptr; } void pvrSendVertex16( uint32 flags, float ax, float ay, float az, float bx, float by, float bz, float cx, float cy, float cz, float dx, float dy, uint32 auv, uint32 buv, uint32 cuv ) {   register uint32* _ptr = _current_ptr;   _ptr += 16;   *--_ptr= cuv;   *--_ptr= buv;   *--_ptr= auv;   *--_ptr= 0;   *(float*)--_ptr= dy;   *(float*)--_ptr= dx;   *(float*)--_ptr= cz;   *(float*)--_ptr= cy;   PREFETCH((void *)_ptr);   *(float*)--_ptr= cx;   *(float*)--_ptr= bz;   *(float*)--_ptr= by;   *(float*)--_ptr= bx;   *(float*)--_ptr= az;   *(float*)--_ptr= ay;   *(float*)--_ptr= ax;   *--_ptr= flags;   PREFETCH((void *)_ptr);   _ptr += 16;   _current_ptr = _ptr; } 

The reason for the decrement writes is that it's done in one instruction on the SH4 whereas increment writes would result in two.

Last time I tested the speed of this, I had those functions as inline class functions. And they turned out to be slightly faster than the asm code I wrote in the past (probably due to the inlining). I know SEGA themselves used macros, and that may be the way to go if you're gonna do this in C (like I've done). To sum it up, it's probably the same speed as the asm stuff you've used, but it's a [b]lot[/b] more flexible.

Questions?

DISCUSSION TOPIC

Falco Girgis
Falco Girgis is the founder and lead software architect of the Elysian Shadows project. He was previously employed in the telecom industry before taking a chance on Kickstarter and quitting his job to live the dream. He is currently pursuing his masters in Computer Engineering with a focus on GPU architecture.