Optimizing several glDrawElements calls

Whether you're a newbie or an experienced programmer, any questions, help, or just talk of any language will be welcomed here.

Moderator: Coders of Rage

Post Reply
User avatar
Nokurn
Chaos Rift Regular
Chaos Rift Regular
Posts: 164
Joined: Mon Jan 31, 2011 12:08 pm
Favorite Gaming Platforms: PC, SNES, Dreamcast, PS2, N64
Programming Language of Choice: Proper C++
Location: Southern California
Contact:

Optimizing several glDrawElements calls

Post by Nokurn »

I am currently working on the font system for my game's engine, and I've come up with this great design that gives me great frame rates and a shitload of flexibility. However, there is some optimization that I think could be done with my rendering code, and I've come up against a bit of a snag in getting it to work. The original code works (the #if 0 code), but the "optimized" code does not!

DISCLAIMER: This is the first time I've really used vertex buffer objects, so I am probably missing something crucial. I've checked the documentation for the relevant functions several times and can't seem to find anything.

So here's how it works:
I've got a FontRenderable class which generates several VBOs when it's constructed. Each of these VBOs corresponds to a different texture (each font face in my engine is a single texture, and this class can render using multiple faces, styles, colors, etc.), with the exception of an optional VBO that's generated when a part of the text given to the FontRenderable constructor is flagged to be underlined. This VBO does not use texture coordinates, because it's a solid color line.

These VBOs are then rendered when the FontRenderable object's Render() method is called. My current working version of the function involves a lot of (what I percieve to be) unnecessary state changes. It enables/disables the client states for each VBO, sets the vertex/color/texture pointers each iteration, enables/disables the GL_TEXTURE_2D state, etc. This seems like a lot of stuff that doesn't need to be done for every single VBO. What I would like to do is set this stuff beforehand, do a quick loop over my VBOs, render them, and then disable texturing and render the underline VBO separately. Basically, I want to take as much out of the for loop as possible. This might not even be possible for all I know--glDrawElements() might clobber the fuck out of the client states, or maybe you can't call glEnableClientState() or gl*Pointer() before glBindBuffer(). I don't know.

Code is below (I know it looks long; it's because there are two different versions of the function)

Code: Select all

    void FontRenderable::Render()
    {
#if 0
        // Draw the vertex buffers.
        std::map<std::string, GLuint>::const_iterator iterator;
        for (iterator = mHandles.begin()
            ; iterator != mHandles.end()
            ; ++iterator)
        {
            glDisable(GL_TEXTURE_2D);

            glBindBuffer(GL_ARRAY_BUFFER, iterator->second);

            // Configure the vertex state.
            glEnableClientState(GL_VERTEX_ARRAY);
            glVertexPointer(2, GL_FLOAT, sizeof Vertex, reinterpret_cast<GLvoid*>(0));

            // Configure the color state.
            glEnableClientState(GL_COLOR_ARRAY);
            glColorPointer(4, GL_UNSIGNED_BYTE, sizeof Vertex, reinterpret_cast<GLvoid*>(8));

            // Configure the texture state if this is not the underline buffer.
            if (iterator->first != "__underline__")
            {
                glEnable(GL_TEXTURE_2D);
                mFamily.Face(iterator->first).Texture().Active(true);
                glEnableClientState(GL_TEXTURE_COORD_ARRAY);
                glTexCoordPointer(2, GL_FLOAT, sizeof Vertex, reinterpret_cast<GLvoid*>(12));
            }

            glDrawElements(GL_QUADS, mIndices[iterator->first].size(), GL_UNSIGNED_INT, &mIndices[iterator->first][0]);

            // Disable the states.
            glDisableClientState(GL_TEXTURE_COORD_ARRAY);
            glDisableClientState(GL_COLOR_ARRAY);
            glDisableClientState(GL_VERTEX_ARRAY);
        }

        glEnable(GL_TEXTURE_2D);
#else
        // Configure the vertex state.
        glEnableClientState(GL_VERTEX_ARRAY);
        glVertexPointer(2, GL_FLOAT, sizeof Vertex, reinterpret_cast<GLvoid*>(0));

        // Configure the color state.
        glEnableClientState(GL_COLOR_ARRAY);
        glColorPointer(4, GL_UNSIGNED_BYTE, sizeof Vertex, reinterpret_cast<GLvoid*>(8));

        // Configure the texture state.
        glEnable(GL_TEXTURE_2D);
        glEnableClientState(GL_TEXTURE_COORD_ARRAY);
        glTexCoordPointer(2, GL_FLOAT, sizeof Vertex, reinterpret_cast<GLvoid*>(12));

        // Draw the text buffers.
        std::map<std::string, GLuint>::const_iterator iterator;
        for (iterator = mHandles.begin()
            ; iterator != mHandles.end()
            ; ++iterator)
        {
            // The underline buffer is drawn separately.
            if (iterator->first == "__underline__")
            {
                continue;
            }

            mFamily.Face(iterator->first).Texture().Active(true);
            glBindBuffer(GL_ARRAY_BUFFER, iterator->second);
            glDrawElements(GL_QUADS, mIndices[iterator->first].size(), GL_UNSIGNED_INT, &mIndices[iterator->first][0]);
        }

        // Disable the texture state.
        glDisableClientState(GL_TEXTURE_COORD_ARRAY);
        glDisable(GL_TEXTURE_2D);

        // Draw the underline buffer.
        iterator = mHandles.find("__underline__");
        if (iterator != mHandles.end())
        {
            glBindBuffer(GL_ARRAY_BUFFER, iterator->second);
            glDrawElements(GL_QUADS, mIndices[iterator->first].size(), GL_UNSIGNED_INT, &mIndices[iterator->first][0]);
        }

        // Disable the color and vertex states.
        glDisableClientState(GL_COLOR_ARRAY);
        glDisableClientState(GL_VERTEX_ARRAY);
#endif
    }
Edit: I forgot to actually mention what doesn't work--online the underline VBO is rendered.
Last edited by Nokurn on Sat Aug 13, 2011 4:30 pm, edited 1 time in total.
qpHalcy0n
Respected Programmer
Respected Programmer
Posts: 387
Joined: Fri Dec 19, 2008 3:33 pm
Location: Dallas
Contact:

Re: Optimizing several glDrawElements calls

Post by qpHalcy0n »

I always offer to people to first profile the code. Is this code actually a problem for you?

Now from what I can tell; no, state is not what you need to worry about. What IS going to hurt is all of this texture swapping you're doing. Texture binds are one of the costlier things you can do with any graphics API. I might advise that you stick all of your fonts into one large texture as a pre-processing step. Then, you can have one very large VBO that represents all possible fonts. You then use your index list to index into the font. So you don't have to bind vertex buffers over and over, you don't have to bind textures over and over. You just bind a large vertex buffer, and a large index buffer, then you submit multiple draw calls using the font as a starting index into the index buffer.

Multiple draw calls won't kill you. Binding textures and VBO's over and over and over might.

Is m_indices a local asset? Or is it a hardware buffered index buffer? I would highly consider that the index buffer be as temporally local as it can be to the vertex buffer (at least in the same memory pool).

Work with some of those ideas and see what happens. I would not worry about state in this case until those issues are cleared up.
User avatar
Nokurn
Chaos Rift Regular
Chaos Rift Regular
Posts: 164
Joined: Mon Jan 31, 2011 12:08 pm
Favorite Gaming Platforms: PC, SNES, Dreamcast, PS2, N64
Programming Language of Choice: Proper C++
Location: Southern California
Contact:

Re: Optimizing several glDrawElements calls

Post by Nokurn »

qpHalcy0n wrote:I always offer to people to first profile the code. Is this code actually a problem for you?

Now from what I can tell; no, state is not what you need to worry about. What IS going to hurt is all of this texture swapping you're doing. Texture binds are one of the costlier things you can do with any graphics API. I might advise that you stick all of your fonts into one large texture as a pre-processing step. Then, you can have one very large VBO that represents all possible fonts. You then use your index list to index into the font. So you don't have to bind vertex buffers over and over, you don't have to bind textures over and over. You just bind a large vertex buffer, and a large index buffer, then you submit multiple draw calls using the font as a starting index into the index buffer.

Multiple draw calls won't kill you. Binding textures and VBO's over and over and over might.

Is m_indices a local asset? Or is it a hardware buffered index buffer? I would highly consider that the index buffer be as temporally local as it can be to the vertex buffer (at least in the same memory pool).

Work with some of those ideas and see what happens. I would not worry about state in this case until those issues are cleared up.
I've been trying to find a good profiler for free (I am using Visual Studio 2010 Professional--just one step below having the Visual Studio profiler :(). If you have a recommendation, that would be great.

I wanted to have all of my fonts in one texture. That's how I was doing it originally. Then my textures started exceeding the maximum texture size for pretty much any card. Given the number of fonts that my game requires, and the number of different sizes of each font, the font texture would be reaching sizes in excess of 2^16 in both directions. This wouldn't work on any machine. The maximum texture size for my EVGA GTX 480 SuperClocked card is 16384 x 16384. If the number of fonts my game uses permits this method at some point, I will definitely use it.

mIndices was a local asset, yet. I've done my research and offloaded it to the GPU. The quick tutorial I read to get up and running did NOT use an IBO. Thanks for mentioning this.

Thank you for the suggestions. I just can't see a way to make them work at the moment. I will write a note reminding myself to look for a way again if font rendering becomes a performance issue. After reading about IBOs, I understand how OpenGL buffers work a bit better, and can see why my "optimized" code didn't work. For completeness sake, here's the code I am using now:

Code: Select all

struct Buffer
{
    GLuint IndexCount;
    GLuint VertexHandle;
    GLuint IndexHandle;
    GLuint TextureHandle;
};

glEnable(GL_TEXTURE_2D);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
for (std::map<std::string, Buffer>::const_iterator iterator = mBuffers.begin()
    ; iterator != mBuffers.end()
    ; ++iterator)
{
    glBindTexture(GL_TEXTURE_2D, iterator->second.TextureHandle);
    glBindBuffer(GL_ARRAY_BUFFER, iterator->second.VertexHandle);
    glVertexPointer(2, GL_FLOAT, sizeof Vertex, Pointer(0));
    glColorPointer(4, GL_UNSIGNED_BYTE, sizeof Vertex, Pointer(8));
    glTexCoordPointer(2, GL_FLOAT, sizeof Vertex, Pointer(12));
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, iterator->second.IndexHandle);
    glDrawElements(GL_QUADS, iterator->second.IndexCount, GL_UNSIGNED_INT, Pointer(0));
}
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);
glDisableClientState(GL_VERTEX_ARRAY);
glDisable(GL_TEXTURE_2D);
qpHalcy0n
Respected Programmer
Respected Programmer
Posts: 387
Joined: Fri Dec 19, 2008 3:33 pm
Location: Dallas
Contact:

Re: Optimizing several glDrawElements calls

Post by qpHalcy0n »

No problem at all. Eeek, yea you need VS Premium for the profiler :( Either way, I still believe they're giving away licenses of gDebugger for free for a year. Its far from the most amazing GPU profiler around, but will work. That is certainly a ton of fonts, the size thing should be able to be mitigated, though. I mean, I just don't see a font running far too large that a line detection algorithm couldn't fix, nor one so small that even linear minification couldn't fix.

That being the case, though. You're basically stuck where you are. The best you can hope for is to hash out the characters in the fonts, render all A's at one time, all B's at one time, etc etc. Which it appears that you're doing.
User avatar
Nokurn
Chaos Rift Regular
Chaos Rift Regular
Posts: 164
Joined: Mon Jan 31, 2011 12:08 pm
Favorite Gaming Platforms: PC, SNES, Dreamcast, PS2, N64
Programming Language of Choice: Proper C++
Location: Southern California
Contact:

Re: Optimizing several glDrawElements calls

Post by Nokurn »

qpHalcy0n wrote:No problem at all. Eeek, yea you need VS Premium for the profiler :( Either way, I still believe they're giving away licenses of gDebugger for free for a year. Its far from the most amazing GPU profiler around, but will work. That is certainly a ton of fonts, the size thing should be able to be mitigated, though. I mean, I just don't see a font running far too large that a line detection algorithm couldn't fix, nor one so small that even linear minification couldn't fix.

That being the case, though. You're basically stuck where you are. The best you can hope for is to hash out the characters in the fonts, render all A's at one time, all B's at one time, etc etc. Which it appears that you're doing.
Oh, no. I was doing something like that but with display lists in my last font renderer. The performance was absolutely awful. What I am doing this time is creating one texture for each face/size, so it goes like this:

Code: Select all

struct FontGlyph
{
    // This contains the glyph's coordinates within the face's texture
};

class FontFace
{
public:
    uint GlyphIndex(uint characterCode);
    FontGlyph& Glyph(uint glyphIndex);

protected:
    std::map<uint, uint> mGlyphIndices; // This maps character codes to glyph indices
    std::map<uint, FontGlyph> mGlyphs;
    Texture* mTexture;
};

class FontManager
{
public:
    // This method gets a face from mFace, loading it if it doesn't exist
    // id is formatted like this: <name>:<size>
    FontFace& Face(std::string const& id);

protected:
    std::map<std::string, FontFace> mFaces;
};
So there aren't quite as many texture/buffer switches as you would think. Just three for each face being rendered.
qpHalcy0n
Respected Programmer
Respected Programmer
Posts: 387
Joined: Fri Dec 19, 2008 3:33 pm
Location: Dallas
Contact:

Re: Optimizing several glDrawElements calls

Post by qpHalcy0n »

I'm sorry, I'm not sure what you're talking about. I was not alluding to anything like this in my post :\ *confused*
User avatar
Nokurn
Chaos Rift Regular
Chaos Rift Regular
Posts: 164
Joined: Mon Jan 31, 2011 12:08 pm
Favorite Gaming Platforms: PC, SNES, Dreamcast, PS2, N64
Programming Language of Choice: Proper C++
Location: Southern California
Contact:

Re: Optimizing several glDrawElements calls

Post by Nokurn »

qpHalcy0n wrote:I'm sorry, I'm not sure what you're talking about. I was not alluding to anything like this in my post :\ *confused*
Perhaps I misunderstood what you meant by this:
qpHalyc0n wrote:The best you can hope for is to hash out the characters in the fonts, render all A's at one time, all B's at one time, etc etc. Which it appears that you're doing.
I am not rendering each character in batch. Each of the buffers that I am rendering in my for loop corresponds to one font face, which has a single texture, containing all of the glyphs for that font, rendered at a specific size. So every time a new face/size combo is used, a new buffer is generated. The buffers store all of the vertices for all of the characters that use that buffer's face/size.

So if I used two different sizes of the same font in a single text string, I would have two VBOs, like this:

Text: text using font 1; text using font 2
The first VBO would contain the vertices representing the string "text using font 1; ", and the second would contain the vertices for "text using font 2".
qpHalcy0n
Respected Programmer
Respected Programmer
Posts: 387
Joined: Fri Dec 19, 2008 3:33 pm
Location: Dallas
Contact:

Re: Optimizing several glDrawElements calls

Post by qpHalcy0n »

Well what I'm saying is: I don't understand why size is such a huge issue or why combinations of size REQUIRES a completely separate vertex buffer. Are you rendering fonts to RTT's then using them later? Either way aside from the CLEARLY obvious minification/magnification issues, is there another reason? Smoothing out errors due to texture minification and magnification can be rectified in a shader pass very very quickly.

I'm just trying to get back to the original question. Understanding why a buffer is generated on every combination of size is going to be critical to answer it as it could possibly reduce a lot of overhead being incurred by the API. And again, it's one of those things where you have to evaluate whether or not its really worth it for you. I can sit here and spin around for weeks optimizing a ton of code that isn't likely to make a difference :)

If you're not terribly worried about code privacy and you can set up a very rigorous test case, I'd be happy to profile it on several profilers. Or if you'd like you can send me a PM with a skype name and I'd be happy to lend further assistance if I can.
Post Reply