Optimizations and Drastic Performance Improvements

by Falco Girgis January 9, 2012 0 comments

On Friday, I decided that I was completely fucking sick of the loadtimes and lack of responsiveness on the Windows build. I spent literally the entire day doing absolutely nothing but performance profiling and optimizations... the results are what you see in the repositories. The loadtimes are 100% eliminated, BUT there are still a few small issues with regards to moving the selection around the scene on certain platforms. My changes required some SERIOUS internal rewriting, and we are still performing testing and resolving a few small issues. We are aware of them. :D

[b]Tested Platforms[/b]:

1) Falco's work i7 (Windows) - works flawlessly

2) All of our Macbook pros - works flawlessly

3) Jarrod's PC (Windows) - works flawlessly

4) Tyler's laptop (Windows) - serious redraw issues with tile selection

5) Our server (Kubuntu) - serious redraw issues with tile selection

* - even the two with redraw issues had absolutely no loadtimes and a drastic improvement in performance.

[size=20]So what the fuck did you do?[/size]

Wheeeeeeell... gather round, gentlemen. After a bit of profiling, it turned out that the unforgivable load times on Windows were due to two contributing issues...

[b]1) Tile Cutting Algorithm[/b]

I had a pretty farfetched theory with this one, and it turns out that I was right.

[u]Fig 1. Reested Code[/u]

QImageReader reader; reader.setFileName(loc); QRect clipRect(0,0,spriteWidth,spriteHeight); int id = 0; if(!reader.canRead()) { 	Debug::Logf(Debug::CRITICAL, "Could not read image file %s!", loc.toStdString().c_str()); 	return false; } for(; clipRect.y() < (int)sheetHeight; clipRect.translate (0,spriteHeight)) { 	for(clipRect.moveTo(0,clipRect.y()); clipRect.x() < (int)sheetWidth; clipRect.translate(spriteWidth,0)) { 		reader.setFileName(loc); 		reader.setClipRect(clipRect); 		QImage image = reader.read(); 		if(image.isNull()) { 			QString error = reader.errorString(); 			Debug::Log(Debug::CRITICAL, "TileManager::DivideTileSheet(): There was an error reading the image! Error message: " + error); 			return false; 		} 		VisualTile *tile=new VisualTile; 		tile->image=QPixmap::fromImage(image); 		tile->id=id++; 		container.push_back(tile); 	} } return true;

It turns out that our original, QImageReader implementation of the algorithm works as such:

1) fetch image from hard drive

2) seek to location

3) read in small portion of image

4) close image

(REPEAT FOR SHEET)

The shit performance was due to the opening and closing of the sheet from the drive for each tile loaded...

[u]Fig 2. Pristine Code[/u]

QImage image(loc); QRect clipRect(0,0,spriteWidth,spriteHeight); int id = 0; if(image.isNull()) { 	Debug::Logf(Debug::CRITICAL, "Could not read image file %s!", loc.toStdString().c_str()); 	return false; } for(; clipRect.y() < (int)sheetHeight; clipRect.translate (0,spriteHeight)) { 	for(clipRect.moveTo(0,clipRect.y()); clipRect.x() < (int)sheetWidth; clipRect.translate(spriteWidth,0)) { 		VisualTile *tile=new VisualTile; 		tile->image = QPixmap::fromImage(image.copy(clipRect)); 		tile->id=id++; 		if(tile->image.isNull()) { 			Debug::Log(Debug::CRITICAL, "TileManager::DivideTileSheet(): There was an error reading the image!"); 			delete tile; 			return false; 		} 		container.push_back(tile); 	} } return true;

My new algorithm reads the entire picture into RAM, then copies subsections of the image. There is only one read from the hard drive.

The effect on loadtimes was fucking insane. [i]We went from 5.2+ seconds to about 50ms.[/i] Yes, bitches. That's 100x faster.

[b]2) QGraphicsScene Population[/b]

This has been a known issue for awhile... I just haven't been able to devise a workable solution. The problem is that for every tile, we used a QGraphicsItem to render its image... That's exactly how the QGraphicsParadigm is meant to be used. No big deal, right? Well, we have a 200x200 map at 4 layers... 200x200x4 = 160,000 QGraphicsItems. Amazingly enough, this seemed fine on Linux and OSX... but for some reason, QT's Windows implementation just couldn't handle the sheer number. That was out of our control.

I finally had an epiphany on Friday... Rather than having an entire set of QGraphicsItems to represent each layer, I could use a single grid of QGraphicsItems whose paint() functions were overloaded to [i]render all four layers[/i]. Boom! Windows can populate a scene with 60k items easily. All of the scene population happens the instant you load the Toolkit. It will never need to do that again.

[b]IN ADDITION TO THE ABOVE REEST[/b]

I have discovered a way to hardware accelerate QT's QGraphicsView by using OpenGL. On various platforms, this results in a huuuuuuuuuge performance increase. On some (with shitty OpenGL drivers), this is actually slower. Jarrod is going to be implementing a Toolbar UI at the top of the Toolkit with handy icons for common actions (flip/rotate tiles, reload sheets, invoke engine, etc). A "toggle hardware acceleration" button will certainly be up there as well. You guys will love it.

So there yo have it. Between the two optimizations, there are no longer any loadtimes in the Toolkit. My top priority is to resolve the redraw issues. Once this is done, I will feel very comfortable encouraging everybody to start using the "optimized-as-shit" build. :D

Falco Girgis

Falco Girgis is the founder and lead software architect of the Elysian Shadows project. He was previously employed in the telecom industry before taking a chance on Kickstarter and quitting his job to live the dream. He is currently pursuing his masters in Computer Engineering with a focus on GPU architecture.