Elysian Shadows

Rect API, Triangle Strip API, and Tech Demo

Upon graduation, I set out on a sidequest to rewrite libGyro. This was because I had learned so much from my own low-level academic programming that I thought could largely benefit the library, and because I was completely burned out from academic-related software development projects and saw this as a nice "healing" project. Today I stand before you with the BASIC libGyro2.0 library completed. I'm ready to get the show back on the road with Elysian Shadows. I certainly couldn't have done it without my two other low-level compadres comprising the dark driver-level brotherhood, MDK and Tvspelsfreak.

[b]Finalized Video API[/b]

The largest, most complex, and most important aspect of LibGyro2.0's design has been the video API. It has undergone many, MANY iterations, but I think I've found one that is fucking bulletproof. The agonizing design decisions have been to create a design that leverages

1) lending itself extremely well to hardware acceleration without favoring one platform too much

2) can easily be reimplemented on additional platforms

3) offers a PRETTY, easy-to-use C API for use in an existing C or C++ codebase

Functionality of the video API is split into two basic groups:

[b]Rectangle API[/b]

The rectangle API is designed to offer a quick (optimized) method of rendering rectangles. This is most useful in 2D games (almost all of our primitives in ES will use this) and also handles sprite-sheet rendering for the user. The API works similar to OpenGL's immediate mode, but it's far more compact and efficient (since you aren't submitting one vertex property at a time).

 gyVidTexBind(tex); gyVidAlphaLevel(GY_ALPHA_FULL); gyVidRectListBegin(&color); gyVidRect(spriteFrame1); gyVidRect(spriteFrame2); gyVidRect(spriteFrame3); gyVidRect(spriteFrame4); gyVidRectListEnd(); 

gyVidTexBind() and gyVidAlphaLevel() are two global state functions that affect both the Rect API and the Triangle Strip API. For Rectangles, you begin a list by specifying a color, then render rects by specifying a frame from the sheet. Finally, you end with GyVidRectListEnd(). Texture coordinates, vertices, and sprite sheets are all handled internally with this API. Setting the texture to NULL renders untextured polygons.

This is going to be the preferred rendering method for almost all 2D applications for both textured and untextured quads. It's fast and easy.

From a low-level, Dreamcast standpoint, there is only one header submitted when you begin the list, then the gyVidRect() calls are simply committing vertices to the store queues. gyVidRectListEnd() then reports the bytes written to KOS. That's WAY less reested than how I was handling it previously, eh Tvspelsfreak? :)

[b]Triangle Strip API[/b]

The triangle strip API is slightly more complex than the rectangle API, but offers a much higher level of control (and also supports 3D rendering). It is modeled completely after OpenGL's vertex array API, so 1) it's familiar and 2) there's minimal translation to OpenGL. Vertices, colors, and texture coordinates are all specified as pointers to arrays. 

 gyVidColorArray(colorPtr); gyVidTexCoordArray(texCoordPtr); gyVidTriangleStrip(vertexPtr, startIndex, vertexAmount); 

The color and texture coordinate arrays are specified before rendering (as they can be omitted). When you're ready to render, the vertex array is specified with starting index and vertex amount. This is also affected by the transparency and texture state. I plan to later add support for an "index array" that is functionally equivalent to OpenGL's. 

Anyway, this is a little more difficult to use, but it offers complete control, is still very efficient, and is fully suited to 3D rendering.

[b]Dreamcast Implementation[/b]

The Dreamcast implementation required the most work, but in the end I am most proud of it (and it is the most badass). In many ways, the GLAL video driver is calling into the OpenGL video driver. In the case of the Dreamcast, we wrote this driver from scratch (all the way down to the hardware level). Using Tvspelsfreak's SH library, the store queues, and DMA rendering, we've essentially been able to create a vertex-array based API on the Dreamcast that closely mimics OpenGL's for triangle strips.

 void gyVidTriangleStrip(const GYVector4 *const vertexArray, const unsigned int startIndex, const unsigned int amount) {     stripheader_t header;     unsigned int i;     register uint32_t *ptr = (uint32_t *)pvr_vertbuf_tail(list);     //Initialize SH4 store-queues to bypass cache     QACR0 = ((((uint32_t)ptr) >> 26) << 2) & 0x1c;     QACR1 = ((((uint32_t)ptr) >> 26) << 2) & 0x1c;     _start_ptr = _current_ptr = (uint32_t *) (0xe0000000 | (((uint32_t)ptr) & 0x03ffffe0));     ptr = _current_ptr;     assert(amount <= GYRO_WORLD_VERT_BUFSIZE); //that would suck for you...     //transform local vertices by internal SH4 matrix accumulator     gyMatMultVec(vertexArray+startIndex, _worldVert, amount);     //Commit polygon strip header     shInit(&header, (_shTex)? 3 : 0, _list, _shTex, NULL);     ptr += shCommit(&header, ptr);     //Commit vertices     //branch avoidance optimization here? Copy everything into one buffer with no branches?     //Is serial iteration here reested?     for(i = startIndex; i < amount; ++i) {         ptr += 8; //32-bit vertices         *--ptr = 0;         *--ptr = (_colorArray)? PVR_PACK_COLOR(_colorArray[i].a, _colorArray[i].r, _colorArray[i].g, _colorArray[i].b) : 0xffff;         if(_texCoordArray) {             *(float*)--ptr = _texCoordArray[i].u;             *(float*)--ptr = _texCoordArray[i].v;         }         else {             ptr -= 2;         }         *(float*)--ptr = _worldVert[i].z;         *(float*)--ptr = _worldVert[i].y;         *(float*)--ptr = _worldVert[i].x;         *--ptr = (i == amount)? PVR_CMD_VERTEX_EOL : PVR_CMD_VERTEX; //flag last strip vertex         PREFETCH((void *)ptr);         ptr += 8;     }     _current_ptr = ptr;     //report number of bytes written to KOS     uint32_t bytes = (uint32_t)_current_ptr - (uint32_t)_start_ptr;     if(bytes != 0) pvr_vertbuf_written(list, bytes);     _texCoordArray = _colorArray = NULL; } 

The rectangle API on the other hand, was designed primarily with the Dreamcast in mind. It uses optimized "hardware sprite" primitives under the hood. Creating a quad from a triangle strip requires 4 32-byte vertices. Creating a "sprite" on the Dreamcast is a single 64-byte vertex. So the rectangle API has an optimized implementation on the DC.

 void gyVidRect(const unsigned int frame) {     static GYVector4 localVert[4] = {         {-0.5f, -0.5f, 1.0f, 1.0f},         {0.5f, -0.5f, 1.0f, 1.0f},         {0.5f, 0.5f, 1.0f, 1.0f},         {-0.5f, 0.5f, 1.0f, 1.0f}     };     register uint32_t *ptr = _current_ptr;     GYTexCoordFrame *texCoordFrame = (_texCoordSheet)? &_texCoordSheet->texCoord[frame] : NULL;     //transform local vertices by internal SH4 matrix accumulator     gyMatMultVec(localVert, _worldVert, 4);     //commit hardware sprite vertex     ptr += 16; //64-byte vertex     if(texCoordFrame) {         *--ptr = PVR_PACK_16BIT_UV(texCoordFrame->botRight.u, texCoordFrame->botRight.v); //prepacking these bitches is         *--ptr = PVR_PACK_16BIT_UV(texCoordFrame->botRight.u, texCoordFrame->topLeft.v); //a potential optimization         *--ptr = PVR_PACK_16BIT_UV(texCoordFrame->topLeft.u, texCoordFrame->topLeft.v);     }     else ptr -= 3;     *--ptr = 0;     *(float*)--ptr = _worldVert[3].y;     *(float*)--ptr = _worldVert[3].x;     *(float*)--ptr = _worldVert[2].z;     *(float*)--ptr = _worldVert[2].y;     PREFETCH((void*)ptr);     *(float*)--ptr = _worldVert[2].x;     *(float*)--ptr = _worldVert[1].z;     *(float*)--ptr = _worldVert[1].y;     *(float*)--ptr = _worldVert[1].x;     *(float*)--ptr = _worldVert[0].z;     *(float*)--ptr = _worldVert[0].y;     *(float*)--ptr = _worldVert[0].x;     *--ptr = PVR_CMD_VERTEX_EOL;     PREFETCH((void*)ptr);     ptr += 16;     _current_ptr = ptr; } 

libGyroDC also uses Tvspelsfreak's "strip header library" to submit vertex headers to the Dreamcast. For those of you who didn't know, it's a very impressive little library (whose code looks like nothing but a clusterfuck of bit manipulation) capable of painlessly producing headers for just about every vertex type the DC has to offer:

Debugging in Tvspelsfreak's library is also channeled through libGyro's debugging system (which will also be channeled through the engine). 

 //Debug callback supplied to Tvspelsfreak's SH library static void shErrorCallback(SHERROR error, const char *fname) {     _gyLog(GY_DEBUG_CRITICAL, "DC_SHLIB ERROR! code: %d, fname: %sn", error, fname); } ... //register debug function pointer to Tvspelsfreak's SHlib shErrorHandler(&shErrorCallback); 

Congratulations at outdoing KOS there, friend. We're proud to have shlib in LibGyro. :D

One of my biggest motivators for writing libGyro2.0 has also been to optimize the shit out of our matrix/vector transformation pipeline. I'm proud to finally say that at least on the Dreamcast, this is completely hardware accelerated. Each 4x4 matrix or 4x1 vector calculation now uses the SH4's SIMD instructions for lightning fucking fast transformations. This will also be extremely beneficial for physics as well. I hope to eventually get around to doing this on the other consoles.

 ! Transform zero or more sets of vectors using the current internal ! matrix. Each vector is three floats long. ! Number of cycles in the loop in the best case: ~38 cycles. ! Number of vertices per second: ~15,000,000 ! Minimum number of vertices: 1. ! ! r4: Input vectors ! r5: Output vectors ! r6: Number of vectors ! r7: Vector stride (bytes between vectors) ! .globl _mat_transform _mat_transform:         ! Save registers and setup pointers.         pref            @r4         mov             r7,r2 ! r2=Stride.         fmov            fr12,@-r15         add             #-8,r2 ! r2=Stride-12=read stride.         fmov            fr13,@-r15         mov             r7,r3 ! r3=Stride.         fmov            fr14,@-r15         add             #12,r3 ! r3=Stride+12=write stride.         ! Load the first vertex.         fmov            @r4+,fr0         add             #12,r5 ! End of the first destination vector.         fmov            @r4+,fr1 !       nop         fmov            @r4,fr2         add             r2,r4         fldi1           fr3         dt                      r6 fmul            fr12,fr1         fmul            fr12,fr0 !       nop         ! Store a vector.         ftrv            xmtrx,fv8         fmov            fr12,@-r5         fmov            fr1,@-r5 !       nop         fmov            fr0,@-r5         bt/s            .firstInLoop          add            r3,r5         pref            @r5         fldi1           fr14 !       nop         ! Load a vector.         fmov            @r4+,fr0 !       nop         fmov            @r4+,fr1 !       nop         fmov            @r4,fr2         add             r2,r4         fldi1           fr3 !       nop         pref            @r4         fdiv            fr11,fr14         dt                      r6         fmul            fr13,fr5         fmul            fr13,fr4 nop         ! Store a vector.         ftrv            xmtrx,fv0         fmov            fr13,@-r5         fmov            fr5,@-r5 !       nop         fmov            fr4,@-r5         bt/s            .secondInLoop          add            r3,r5         pref            @r5         fldi1           fr12 !       nop         ! Load a vector.         fmov            @r4+,fr4 !       nop         fmov            @r4+,fr5 !       nop         fmov            @r4,fr6         add             r2,r4         fldi1           fr7 !       nop         pref            @r4         fdiv            fr3,fr12         dt                      r6  fmul            fr14,fr9         fmul            fr14,fr8 !       nop         ! Store a vector.         ftrv            xmtrx,fv4         fmov            fr14,@-r5         fmov            fr9,@-r5 !       nop         fmov            fr8,@-r5         bf/s            .loop          add            r3,r5 .loopEnd:         pref            @r5         fldi1           fr13         fdiv            fr7,fr13         fmul            fr12,fr1 !       nop         fmul            fr12,fr0         fmov            fr12,@-r5 !       nop nop !       nop nop         fmov            fr1,@-r5 !       nop         fmov            fr0,@-r5 !       nop nop !       nop nop !       nop nop !       nop nop !       nop nop !       nop nop         fmov            @r15+,fr14         fmul            fr13,fr5         add             r3,r5         fmul            fr13,fr4         fmov            fr13,@-r5 !       nop         fmov            @r15+,fr13 !       nop         fmov            @r15+,fr12 !       nop         fmov            fr5,@-r5         rts          fmov           fr4,@-r5 

(SH4 assembly excerpt from KOS)

PLEASE NOTE FOR FUTURE REFERENCE: Despite the comments on the above function, the r7 argument is NOT a fucking vector stride. After hours of raping the shit out of the Dreamcast, it turns out that this argument is actually the TOTAL SIZE OF EACH VECTOR, not the stride. 

[b]x86 Implementation[/b]

With the rise of the iPad build of Elysian Shadows, comes the need for ES to run in HD (720p). The x86 implementation now allows the user to specify flags with regard to window-size/resolution (and title/icon). You can set the viewport to match the window size or to be stretched to the window. You can also specify a min/max viewport size (at this point the viewport begins stretching rather than resizing). Gone are the days of ES being a fuck-ugly, typical 640x480 PC application. We're going HD, bitches.

Also, the resizing is going to be VERY useful for when we need to test resizable UI/camera elements that need to operate the same way on the varying resolutions of each platform. ES x86 will have a max resolution of 720p (to match the iPad) and a minimal resolution to match the PSP. 

To coincide with these features, the libGyro users must be able to query libGyro for a projection size to render to variable-sized viewports. The following accessor returns the projection size:

void gyVidResolution(unsigned int *const width, unsigned int *const height)

[b]GLAL Implementation[/b]

"GLAL," as I've stated before is the OpenGL(ES)/OpenAL framework for video and audio that is shared among Windows, OSX, Linux, iPhone, and iPad (everything that isn't the Dreamcast or PSP). The GLAL implementation very closely resembles the libGyro API, so we are able to interface extremely easily with the underlying vertex array functionality anyway. We also must be careful to not use any OpenGL features that are not compatible with OpenGLES platforms.

 void _gyVidTriangleStrip(const struct _GYVertex4 *const vertexArray, const unsigned int startIndex, const unsigned int amount) {     GYMatrix4 modelViewMat;     gyMatStore(&modelViewMat);     glLoadMatrixf((float *)modelViewMat);     glEnableClientState(GL_VERTEX_ARRAY);     glVertexPointer(4, GL_FLOAT, 0, vertexArray);     glDrawArrays(GL_TRIANGLE_STRIP, startIndex, amount); 	glDisableClientState(GL_COLOR_ARRAY); 	glDisableClientState(GL_TEXTURE_COORD_ARRAY); 	glDisableClientState(GL_VERTEX_ARRAY); } 

[b]iOS Implementation[/b]

The iOS implementation has been MDK's pet secret for a very long time now, and all credit for the underlying framework goes to him. I FINALLY got around to working with it, writing some code for it, and developing on my iPad the other day. The iOS implementation introduces an additional layer of complexity, as we've been forced to use Objective-C to interface with the Apple iOS API. By design, libGyro's abstraction allows the Objective-C back-end to be completely abstracted away from the C++ engine through libGyro's C API. This means that while the engine will technically be "Objective-C++" from the iPhone's perspective, the user gets to remain unaware of the fact (they will thank us).

I've taken the liberty of writing some reesty-ass ObjC code to abstract away Apple's iOS application "bundles" from the rest of the program. All asset paths should be given relative to the executable. LibGyro then appends the bundle's root location onto this string before loading assets. I've decided to go this route, so that we DON'T have to write our own filesystem abstraction and can continue to assume the working directory is the same directory as the executable resides within.

MDK, our UIMainApplication() problem with "MainWindow" was resolved when I used

NSStringFromClass ([self class])

instead of passing it as @"MainWindow." I know that the two should be equivalent, but my programmer's senses were tinglng, and I think "MainWindow" may be a reserved word for .nib/.xib bullshit in XCode (fuck knows how their reflective classes work)... and voila, it works.

Then the next step was to enable MDK's iPhone code to work in full HD on the iPad... OR SO I THOUGHT! It turns out the bastard was one step ahead of me, and it was already iPad ready... :D

[b]Tech Demo[/b]

...And of course, here's the tech demo running on 5 platforms. It really should be 6, but since my iPhone isn't working, the only iOS platform I can show it on is the iPad:

Tech demo running on Windows 7:

Tech demo running on OSX:

Tech demo running on Linux:

Tech demo running on Sega Dreamcast:

Tech demo running on iPad:

SOURCE CODE:

 #include "gyro.h" #include  #include  #include "math.h" #include "time.h" GYTexture *tex = NULL; GYInput input; float offset = 240.0f; #define OBJECT_COUNT 1000 typedef struct _Object {     float x, y, z;     float sx, sy, sz;     float orient;     float xvel, yvel;     float rotVel;     int frame; } Object; Object object[OBJECT_COUNT];  void render() { 	GYColor4 color = {1.0f, 1.0f, 1.0f, 0.5f};     unsigned int i = 0; 	gyVidSceneBegin();     gyVidTexBind(tex); 	gyVidAlphaLevel(GY_ALPHA_FULL); 	gyVidRectListBegin(&color); 	for(i = 0; i < OBJECT_COUNT; ++i) { 		gyMatLoadIdentity();         gyMatTranslate(object[i ].x, object[i ].y, object[i ].z);         gyMatScale(object[i ].sx, object[i ].sy, object[i ].sz);         gyMatRotateZ(object[i ].orient);         gyVidRect(object[i ].frame);     } 	gyVidRectListEnd(); 	gyVidSceneEnd(); };  void initExample() { 	GYVidInitParams params = { 		#if GYRO_PLATFORM == GYRO_PLATFORM_x86 		GY_VID_1280x720, GY_WIN_RESIZABLE | GY_WIN_STRETCH, 320, 240, 1280, 720, "TESTZOR", "icon.BMP" 		#endif 	}; 	int i = 0; 	gySysInit(); 	gyVidInit(¶ms); 	gyInputInit(); 	for(;i < GY_KEY_LAST;++i) { 		input.keyDown[i ] = input.keyTapped[i ] = 0; 	}   	tex = gyVidTexCreate("tizer.png", 256, 256, 32, 32); 	if(!tex) { 		gyPrintf("TEX NOT FOUND"); 	} 	else { 		gyPrintf("TEX OKAY"); 	} 	for(i = 0; i < OBJECT_COUNT; ++i) { 		object[i ].x = rand()%630; 		object[i ].y = rand()%470; 		object[i ].z = 1.0f; 		object[i ].orient = (rand()%180)/3.141f; 		object[i ].sx = rand()%90+10; 		object[i ].sy = rand()%90+10; 		object[i ].sz = 1.0f; 		object[i ].frame = 3;//rand()%10+2; 		object[i ].xvel = rand()%9+1; 		object[i ].yvel = rand()%9+1; 		object[i ].rotVel = (rand()%2)/3.141f; 	} }  void init_render(unsigned int width, unsigned int height) { 	//do nothing right now! }  int update() { //ios version of input 	unsigned int i = 0; 	unsigned int screenWidth, screenHeight; 	gyInputPoll(&input); 	gyVidResolution(&screenWidth, &screenHeight); 	if(input.keyDown[GY_KEY_DPAD_D]) 		offset += 1.0f; 	if(input.keyDown[GY_KEY_ESC]) 		return 1; 	for(i = 0; i < OBJECT_COUNT; ++i) { 		object[i ].x += object[i ].xvel; 		object[i ].y += object[i ].yvel; 		object[i ].orient += object[i ].rotVel; 		if(object[i ].x >= screenWidth || object[i ].x <= 0.0f) object[i ].xvel = -object[i ].xvel; 		if(object[i ].y >= screenHeight || object[i ].y <= 0.0f) object[i ].yvel = -object[i ].yvel; 	} 	return 0; }  void destroy() { 	gyVidTexDestroy(tex); 	gyInputUninit(); 	gyVidUninit(); 	gySysUninit(); }  int main() { 	initExample(); 	while(1) 	{ 		if(update()) break; 		render(); 	} 	destroy(); 	return 0; }; 

Note: The Dreamcast image looks different because I enabled an OpenGL blend mode that made the demo look really badass with OpenGL.

The demo is basically a shitload of sprite frames from a sprite sheet being rendered with the rectangle API. That is 1000 sprites being transformed and rendered with libGyro's internal matrix stack. Each sprite is bouncing, rotating, and scaling at 60fps.

[b]Future Library Development--READ: ANDROID BUILD[/b]

For those of you who haven't seen MDK's "I need money" thread, I have decided to pay him $150 (to help with rent) to port libGyro to the [b]GOOGLE ANDROID[/b] platform. This is happening completely independently of my development and will require a layer of Java abstraction much like the iOS's ObjC abstraction... I'm sure MDK can tell you all about it...

There's still looots for me to do on the library in the way of optimization, features, and the PSP build, but these will be coming incrementally. I'll always work on libGyro on the side (because it's my baby) as my driver-layer C escape from the OO world of C++... but the time has come to push onward with Elysian Shadows, and I have an Engine/Toolkit to return to. With the release of the next dev video, I'm also hoping to recruit slaves to write PS3 and Wii implementations. ;)

But yeah, the three dark knights (MDK, Tvspels, me) will continue to support the library.

[b]Outro[/b]

Well, there's not much left to say... I'm ITCHING to get back to some object-oriented C++ development and see the fruits of our low-level labor reflected in Elysian Shadows. 

ON BEHALF OF TVSPELSFREAK, M_D_K, AND MYSELF, FEEL FREE TO GET THE FUCK ON OUR LEVEL, GENTLEMEN!

(Going clockwise: Dreamcast, Linux, OSX, iPad (not enough monitors for Win7))

PS: Oh yeah... recording of the next dev video will commence next weekend. Feels fucking good to be back ontop of the world.

Falco Girgis
Falco Girgis is the founder and lead software architect of the Elysian Shadows project. He was previously employed in the telecom industry before taking a chance on Kickstarter and quitting his job to live the dream. He is currently pursuing his masters in Computer Engineering with a focus on GPU architecture.