C vs C++ style of interface for hardware abstraction layer
Moderator: Coders of Rage
-
- Chaos Rift Newbie
- Posts: 3
- Joined: Sat Jan 12, 2013 8:26 pm
- Current Project: Yes.
- Favorite Gaming Platforms: PC, PS1, SNES but I forgot how to controller :P
- Programming Language of Choice: C++
- Location: Ireland
C vs C++ style of interface for hardware abstraction layer
Hey,
I've recently started a new game project and I've began working on the hardware abstraction layer. I cannot decide between a C or a C++ (OOP) style of interface for the rendering part. (Engine is in C++ and will be using DX11 for rendering on PC)
When writing the vector library for the engine I came across this article that shows when writing a SIMD optimized vector library you can gain a 62.74ms speed increase (Microsoft compiler, the one I'm using) on a cloth simulation just by giving the library a C style interface instead of encapsulating the vector in a class!
Will giving my rendering engine a C style interface give a notable speed increase? I prefer using OOP but I don't want to throw away performance I may need later when working on much less powerful consoles.
Thanks.
I've recently started a new game project and I've began working on the hardware abstraction layer. I cannot decide between a C or a C++ (OOP) style of interface for the rendering part. (Engine is in C++ and will be using DX11 for rendering on PC)
When writing the vector library for the engine I came across this article that shows when writing a SIMD optimized vector library you can gain a 62.74ms speed increase (Microsoft compiler, the one I'm using) on a cloth simulation just by giving the library a C style interface instead of encapsulating the vector in a class!
Will giving my rendering engine a C style interface give a notable speed increase? I prefer using OOP but I don't want to throw away performance I may need later when working on much less powerful consoles.
Thanks.
Last edited by CAR145 on Tue Jan 15, 2013 1:51 pm, edited 1 time in total.
-
- Chaos Rift Newbie
- Posts: 41
- Joined: Tue Jun 21, 2011 5:39 am
- Programming Language of Choice: C++
Re: C vs C++ style of interface for hardware abstraction lay
If I were you, I would not worry about performance, just get it working, and then later you can optimize if you need to. You should not premature optimize, optimize when you have working code that you think could be faster. Do note that this doesn't mean write dumb code. Read this: http://prog21.dadgum.com/106.html
Also... why make re-invent the wheel if there's plenty of math libraires pre-written for you directly related to computer graphics? For example: GLM, http://glm.g-truc.net/, although aimed for OpenGL it can be used for DirectX. Unless you're doing this for an educational purpose? (although vector/matrix math are pretty easy topics in math, so I don't see why you'd want make your own classes/structures/operations if there's portable solutions) s:
Also... why make re-invent the wheel if there's plenty of math libraires pre-written for you directly related to computer graphics? For example: GLM, http://glm.g-truc.net/, although aimed for OpenGL it can be used for DirectX. Unless you're doing this for an educational purpose? (although vector/matrix math are pretty easy topics in math, so I don't see why you'd want make your own classes/structures/operations if there's portable solutions) s:
-
- Respected Programmer
- Posts: 387
- Joined: Fri Dec 19, 2008 3:33 pm
- Location: Dallas
- Contact:
Re: C vs C++ style of interface for hardware abstraction lay
Well I see the point and I always advocate learning from the ground up. I think the analytic approach imparts a knowledge necessary to build the technology of 5 years from now, not the technology of a few years ago to now. On some systems, this is pretty crucial. On the PS2 this was exceptionally evident as they had built in types for aligned 128bit float types even before SSE2 was a "thing". It really was a new form of optimization that wasn't new for architects and people who developed systems but WAS a different thing for game programmers accustomed to developing on prototypical Intel 32-bit architecture. Learn all you can, I say.
That said, there is a level of practicality that you have to be willing to approach and it's completely application dependent. For real-time graphics, you're probably not going to be doing THAT much CPU-level computation unless you're running pure Monte-Carlo simulations (which there is definitely application for). The thing about these numbers is this: first off, it's on the order of 62ms for the entire data set to be computed. Not per tick. Processors execute instructions in NANO-seconds. 6 orders of magnitude faster than milliseconds. That said, for simulations that may be relevant. In the real-time domain with a ~16ms frame time budget, if you spent 1ms on CPU floating point ops you're abusing the CPU. At that point, the time slice is so fine that the two numbers start to converge. IE: When you force the simulation to run at that fine of a granularity, the differences tend to disappear. There are cases on some architectures where this can be a BIG stumbling block and it really requires a more intimate knowledge of that architecture and that toolchain. I can say for typical Intel platforms, it really comes down to just "writing better C++" code. I always say, "What runs faster than C++ code?".....well "Better C++ code". Similarly, "How can I make Java code run faster?"....."Write better Java".
In this case, the author misses a few things that the compiler (especially in the VS 2012 case) is not doing that it probably should still not be doing. By inlining the method, the intrinsic can be passed to an XMM register directly and avoid needless overhead. The compiler has no reason to inline these methods, and you're really enforcing that point if they're overloaded. It's pretty critical to provide a discrete pathway there or inline the method. I think a few comments alluded to this fact and they were spot on. Of course, you wouldn't know this unless you profiled your code and identified it as a bottleneck. In a pipelined architecture, you will only run as fast as your slowest code...so it goes without saying there are things that can be "optimized" that would never make a difference anyways.
I'm also not sure what "hardware abstraction" you're referring to. DX11 *IS* a hardware abstraction layer.
That said, there is a level of practicality that you have to be willing to approach and it's completely application dependent. For real-time graphics, you're probably not going to be doing THAT much CPU-level computation unless you're running pure Monte-Carlo simulations (which there is definitely application for). The thing about these numbers is this: first off, it's on the order of 62ms for the entire data set to be computed. Not per tick. Processors execute instructions in NANO-seconds. 6 orders of magnitude faster than milliseconds. That said, for simulations that may be relevant. In the real-time domain with a ~16ms frame time budget, if you spent 1ms on CPU floating point ops you're abusing the CPU. At that point, the time slice is so fine that the two numbers start to converge. IE: When you force the simulation to run at that fine of a granularity, the differences tend to disappear. There are cases on some architectures where this can be a BIG stumbling block and it really requires a more intimate knowledge of that architecture and that toolchain. I can say for typical Intel platforms, it really comes down to just "writing better C++" code. I always say, "What runs faster than C++ code?".....well "Better C++ code". Similarly, "How can I make Java code run faster?"....."Write better Java".
In this case, the author misses a few things that the compiler (especially in the VS 2012 case) is not doing that it probably should still not be doing. By inlining the method, the intrinsic can be passed to an XMM register directly and avoid needless overhead. The compiler has no reason to inline these methods, and you're really enforcing that point if they're overloaded. It's pretty critical to provide a discrete pathway there or inline the method. I think a few comments alluded to this fact and they were spot on. Of course, you wouldn't know this unless you profiled your code and identified it as a bottleneck. In a pipelined architecture, you will only run as fast as your slowest code...so it goes without saying there are things that can be "optimized" that would never make a difference anyways.
I'm also not sure what "hardware abstraction" you're referring to. DX11 *IS* a hardware abstraction layer.
- Falco Girgis
- Elysian Shadows Team
- Posts: 10294
- Joined: Thu May 20, 2004 2:04 pm
- Current Project: Elysian Shadows
- Favorite Gaming Platforms: Dreamcast, SNES, NES
- Programming Language of Choice: C/++
- Location: Studio Vorbis, AL
- Contact:
Re: C vs C++ style of interface for hardware abstraction lay
Same goes for the Dreamcast. The SH4 has SIMD instructions for loading and storing 4x4 matrices and transforming 4x4 matrices and 4x1 vectors. That's why they called it the first 128-bit console.qpHalcy0n wrote:Well I see the point and I always advocate learning from the ground up. I think the analytic approach imparts a knowledge necessary to build the technology of 5 years from now, not the technology of a few years ago to now. On some systems, this is pretty crucial. On the PS2 this was exceptionally evident as they had built in types for aligned 128bit float types even before SSE2 was a "thing". It really was a new form of optimization that wasn't new for architects and people who developed systems but WAS a different thing for game programmers accustomed to developing on prototypical Intel 32-bit architecture. Learn all you can, I say.
I think qp really covered the majority of it from an optimization standpoint, but I will at least mention the C vs C++ performance part. I am all about writing low-level drivery shit in C, but that's admittedly my own personal taste (especially from my fucking around with drivers as Linux kernel modules). There is no reason that a C++ driver could not be as fast as C. There is no magic intrinsic to the C language that makes it faster or to the C++ language that makes it slower. With just a little bit of understanding of how C++ works under the hood, you can easily get C-like performance with the OO organization C++ offers.
Probably the most common overhead from C++ is associated with run-time polymorphism and vtable lookups from invoking virtual member functions. I would highly recommend not using the "virtual" keyword for anything time-critical. I get pissy when static drivers implement virtual interfaces just because it looks pretty, when they will never ever be polymorphed to or from any other datatype at runtime. Don't EVER use virtual inheritance for a performance-critical class either.
Templates have overhead associated with them, but that usually tends to be with code size/bloat rather than execution time (templates used for compile-time polymorphism can be much faster than using virtuals for run-time polymorphism and even sometimes faster than C-style function pointers).
Accessor methods (that aren't inlined) invoked on multiple layers of encapsulated C++ objects is another great way to quickly add function call overhead in C++. When calling set on object A calls a set on the object B it is encapsulating which calls a set on object C that is encapsulated by B, you are adding a shitload of pushes/pops to the stack to achieve almost nothing... That's why simple accessor methods should ALWAYS be inlined in C++.
But anyway, not to get into too much gory detail... C makes it easier to write very fast code by the nature of the language. You can still achieve these speeds with C++ with a little bit of understanding of the language. It's up to you whether you prefer to write in C or C++. There is no "better" option based solely on speed.
I prefer C-style drivers because 1) they can be used in C or C++ 2) It is easier for me to write fast C code 3) I feel like a static, global, C-style API more adequately represents something like a driver. Hardware truly IS a global state.
- Falco Girgis
- Elysian Shadows Team
- Posts: 10294
- Joined: Thu May 20, 2004 2:04 pm
- Current Project: Elysian Shadows
- Favorite Gaming Platforms: Dreamcast, SNES, NES
- Programming Language of Choice: C/++
- Location: Studio Vorbis, AL
- Contact:
Re: C vs C++ style of interface for hardware abstraction lay
Let me comment directly on this.CAR145 wrote:When writing the vector library for the engine I came across this article that shows when writing a SIMD optimized vector library you can gain a 62.74ms per tick speed increase (Microsoft compiler, the one I'm using) on a cloth simulation just by giving the library a C style interface instead of encapsulating the vector in a class!
Using overloaded operators in C++ is a VERY rookie mistake for time-critical code. You can look all around game development boards like this and see Vector2/3/4 classes and Matrix classes with overloaded operators. These are going to introduce a good amount of overhead by nature of temporary variable creation.
Lets use the common example:
Code: Select all
class Vector2 {
private:
int _x, _y;
public:
inline Vector2(const int x, const int y): _x(x), _y(y) {}
inline Vector2 operator+(const Vector2 &rhs) {
return Vector2(x+rhs.x, y+rhs.y);
}
};
Code: Select all
Vector2 vec = vec1 + vec2 + vec3 + vec4;
But what you don't realize is the completely unnecessary run-time overhead you just introduced just so you can do cute shit like that. If you notice, the overloaded '+' operator is actually returning a temporary object by value. For every addition you make, you have just created a new, temporary variable that must be stored intermediately on the stack by the compiler. For the above code, you created and initialized 3 temporary variables on the stack behind the scenes just in that one line! Absolutely unjustifiable for a math class like that. Note: Actual overhead from that statement may vary slightly based on how smart your compiler is, but it's still not justifiable.
The most efficient way to do that is definitely C-style:
Code: Select all
inline void Vec2Add(Vector2 *const dest, const Vector2 *const src1, const Vector2 *const src2) {
dest->_x = src1->_x + src2->_x;
dest->_y = src1->_y + src2->_y;
}
Code: Select all
Vector2 vec;
Vec2Add(&vec, &vec1, &vec2);
Vec2Add(&vec, &vec, &vec3);
Vec2Add(&vec, &vec, &vec4);
You can still achieve that same thing in (uglier) C++ too:
Code: Select all
Class Vector2 {
private:
int _x, _y;
public:
inline Vector2(const int x, const int y): _x(x), _y(y) {}
inline add(const Vector2 &src1, const Vector2 &src2) {
_x = src1.x + src2.x;
_y = src1.y + src2.y;
}
};
Code: Select all
Vector2 vec;
vec.add(vec1, vec2);
vec.add(vec, vec3);
vec.add(vec, vec4);
Now this is not "pretty" C++. This is not the way of doing things that the OOphiles and Javacunts like. But then again, those guys aren't reaping the benefits of efficiency or hardware acceleration... A more important question to ask yourself is would you rather write a "pretty" C API or an "ugly" C++ API?
The fast approach is "just the way you do it" in C.
The fast approach is "an ugly way of doing it" in C++.
- Falco Girgis
- Elysian Shadows Team
- Posts: 10294
- Joined: Thu May 20, 2004 2:04 pm
- Current Project: Elysian Shadows
- Favorite Gaming Platforms: Dreamcast, SNES, NES
- Programming Language of Choice: C/++
- Location: Studio Vorbis, AL
- Contact:
Re: C vs C++ style of interface for hardware abstraction lay
The truth is that what he is asking is not nitty-gritty details that are encapsulated somewhere and can later be fixed. This is a question of high-level API design, which needs to be addressed before he even begins writing the code.Pornomag wrote:If I were you, I would not worry about performance, just get it working, and then later you can optimize if you need to. You should not premature optimize, optimize when you have working code that you think could be faster. Do note that this doesn't mean write dumb code. Read this: http://prog21.dadgum.com/106.html
-
- Chaos Rift Newbie
- Posts: 3
- Joined: Sat Jan 12, 2013 8:26 pm
- Current Project: Yes.
- Favorite Gaming Platforms: PC, PS1, SNES but I forgot how to controller :P
- Programming Language of Choice: C++
- Location: Ireland
Re: C vs C++ style of interface for hardware abstraction lay
Awesome, thanks Falco!
I think I'll take the C route, because I know that if I write ugly code, I will end up re-writing it later when I have to change something in that section of code (andI'm a bit very OCD with my code).
I'll definitely read everything else posted here too.
Thanks guys! Maybe I'll post my progress when I get something worth showing
I think I'll take the C route, because I know that if I write ugly code, I will end up re-writing it later when I have to change something in that section of code (and
I'll definitely read everything else posted here too.
Thanks guys! Maybe I'll post my progress when I get something worth showing
-
- Chaos Rift Cool Newbie
- Posts: 85
- Joined: Thu Jun 23, 2011 11:12 am
Re: C vs C++ style of interface for hardware abstraction lay
Why does this happen and is this done by compilers? What is causing the variables to be created?Falco Girgis wrote:These are going to introduce a good amount of overhead by nature of temporary variable creation.
- dandymcgee
- ES Beta Backer
- Posts: 4709
- Joined: Tue Apr 29, 2008 3:24 pm
- Current Project: https://github.com/dbechrd/RicoTech
- Favorite Gaming Platforms: NES, Sega Genesis, PS2, PC
- Programming Language of Choice: C
- Location: San Francisco
- Contact:
Re: C vs C++ style of interface for hardware abstraction lay
Rebornxeno wrote:Why does this happen and is this done by compilers? What is causing the variables to be created?Falco Girgis wrote:These are going to introduce a good amount of overhead by nature of temporary variable creation.
class Vector2 { private: int _x, _y; public: inline Vector2(const int x, const int y): _x(x), _y(y) {} inline Vector2 operator+(const Vector2 &rhs) { return Vector2(x+rhs.x, y+rhs.y); //This line creates a new instance of the Vector2 class } };
Falco Girgis wrote:It is imperative that I can broadcast my narcissistic commit strings to the Twitter! Tweet Tweet, bitches!
-
- Chaos Rift Newbie
- Posts: 41
- Joined: Tue Jun 21, 2011 5:39 am
- Programming Language of Choice: C++
Re: C vs C++ style of interface for hardware abstraction lay
To be fair, the compiler can optimize away any temporary variables from ever being created, this isn't a guarantee but any sane compiler will do so (if it can).dandymcgee wrote:Rebornxeno wrote:Why does this happen and is this done by compilers? What is causing the variables to be created?Falco Girgis wrote:These are going to introduce a good amount of overhead by nature of temporary variable creation.class Vector2 { private: int _x, _y; public: inline Vector2(const int x, const int y): _x(x), _y(y) {} inline Vector2 operator+(const Vector2 &rhs) { return Vector2(x+rhs.x, y+rhs.y); //This line creates a new instance of the Vector2 class } };
Just incase anyone doesn't believe me:
http://en.wikipedia.org/wiki/Return_value_optimization
http://stackoverflow.com/questions/6658 ... 854#665854
- Falco Girgis
- Elysian Shadows Team
- Posts: 10294
- Joined: Thu May 20, 2004 2:04 pm
- Current Project: Elysian Shadows
- Favorite Gaming Platforms: Dreamcast, SNES, NES
- Programming Language of Choice: C/++
- Location: Studio Vorbis, AL
- Contact:
Re: C vs C++ style of interface for hardware abstraction lay
Which is exactly why I said this:Pornomag wrote:To be fair, the compiler can optimize away any temporary variables from ever being created, this isn't a guarantee but any sane compiler will do so (if it can).
Just incase anyone doesn't believe me:
http://en.wikipedia.org/wiki/Return_value_optimization
http://stackoverflow.com/questions/6658 ... 854#665854
The compiler "may" use the Return Value Optimization (RVO), but with a scenario as complex as this:Falco Girgis wrote:Note: Actual overhead from that statement may vary slightly based on how smart your compiler is, but it's still not justifiable.
Code: Select all
Vector2 vec = vec1 + vec2 + vec3;
This is far more than a simple "get" method initializing a single object.
Well, lets see then, shall we? The following program should test the compiler's optimization. Note that I have been extremely strict with my consting, and that everything is inlined. The members _x and _y couldn't even be made const in the real world, but I wanted to see the compiler's best shot. Build and run the following program with GCC on any optimization setting.
Code: Select all
struct Vector2 {
const int _x, _y;
Vector2(const int x, const int y): _x(x), _y(y) {
qDebug() << "Constructing <" << x << ", " << y << "> at " << this;
}
Vector2(const Vector2 &vec): _x(vec._x), _y(vec._y) {
qDebug() << "Copy Constructing <" << _x << ", " << _y << "> from " << &vec;
}
Vector2 operator+(const Vector2 &rhs) const {
return Vector2(_x+rhs._x, _y+rhs._y);
}
};
int main(int argc, char *argv[]) {
const Vector2 vec1(1, 1);
const Vector2 vec2(2, 2);
const Vector2 vec3(3, 3);
const Vector2 vec = vec1 + vec2 + vec3;
return 0;
}
Code: Select all
Constructing < 1 , 1 > at 0x28fe30
Constructing < 2 , 2 > at 0x28fe28
Constructing < 3 , 3 > at 0x28fe20
Constructing < 3 , 3 > at 0x28fe10
Constructing < 6 , 6 > at 0x28fe18
But even then, we see 5 objects being constructed here, rather than just the three we explicitly constructed. This statement still requires temporary storage allocated (and initialized) on the stack for every addition that is not immediately assigned to an lvalue. For n additions, that's n-1 temporary variables allocated on the stack.
So no, even the RVO with overloaded operators cannot defeat the C-style math API. Assuming any compiler out there could have the audacity to completely optimize-away the temporary storage here, what have you achieved? You have code that at best case will run as fast as the C code, but will usually fall short on other compilers.
For some high-level C++ API that is not being called thousands of times per frame, go ahead and return by value. For very time-critical math operations like this, it is simply poor coding practice or ignorance to not embrace a C-style API.
-
- Chaos Rift Newbie
- Posts: 41
- Joined: Tue Jun 21, 2011 5:39 am
- Programming Language of Choice: C++
Re: C vs C++ style of interface for hardware abstraction lay
Ah yes, I see, my bad.
Re: C vs C++ style of interface for hardware abstraction lay
Hi,
there is a problem with the example provided. You effectively prevent the compiler from optimizing away all temporaries by making your constructors non-trival(because of the debug output).
The last published draft of the current C++ Standard dictates that a non-trivial constructor of temporaries must be called:
In Section 12.2, Page 259:
In Section 12.7, Page 283:
Of course, having side effects in the Constructor significantly reduce the efficency of a simple addition is still pretty unintuitive and quite undesired behaviour, but instead of falling back to a C like method, one could simply overload operator+=, which would retain a bit of the beauty of the C++ version, whilst being as efficient as the C-like add method in pretty much all cases.
I am sorry if I offended anyone, that is not my intention, I merely wanted to point out the fact that if you know what you are doing overloading operators doesnt necessarily slow down your code(and imho it really does make the code look more beautiful)
Fillius
there is a problem with the example provided. You effectively prevent the compiler from optimizing away all temporaries by making your constructors non-trival(because of the debug output).
The last published draft of the current C++ Standard dictates that a non-trivial constructor of temporaries must be called:
In Section 12.2, Page 259:
The only exception are non-trivial Copy or Move Constructors, which may be omitted for copy elision:When an implementation introduces a temporary object of a class that has a non-trivial constructor (12.1,
12.8), it shall ensure that a constructor is called for the temporary object.
In Section 12.7, Page 283:
This code(essentially the same as above, but without most of the const and without any debug output):When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class
object, even if the copy/move constructor and/or destructor for the object have side effects. In such cases,
the implementation treats the source and target of the omitted copy/move operation as simply two different
ways of referring to the same object, and the destruction of that object occurs at the later of the times
when the two objects would have been destroyed without the optimization.123
struct Vector2 { int _x, _y; Vector2(){}; Vector2(int x, int y): _x(x), _y(y) {} Vector2(const Vector2 &vec): _x(vec._x), _y(vec._y) {} Vector2 operator+(const Vector2 &rhs) const { return Vector2(_x+rhs._x, _y+rhs._y); } void add(const Vector2 &src1, const Vector2 &src2) { _x = src1._x + src2._x; _y = src1._y + src2._y; } }; int main(int argc, char *argv[]) { Vector2 vec1(1, 1); Vector2 vec2(2, 2); Vector2 vec3(3, 3); Vector2 vec;//=vec1 + vec2 + vec3; vec.add(vec1,vec2); vec.add(vec,vec3); return 0; }Produces(on my laptop, g++ version 4.4.5, lowest optimization level) the excact same assembly as the version using the overloaded operator+:
struct Vector2 { int _x, _y; Vector2(){}; Vector2(int x, int y): _x(x), _y(y) {} Vector2(const Vector2 &vec): _x(vec._x), _y(vec._y) {} Vector2 operator+(const Vector2 &rhs) const { return Vector2(_x+rhs._x, _y+rhs._y); } void add(const Vector2 &src1, const Vector2 &src2) { _x = src1._x + src2._x; _y = src1._y + src2._y; } }; int main(int argc, char *argv[]) { Vector2 vec1(1, 1); Vector2 vec2(2, 2); Vector2 vec3(3, 3); Vector2 vec=vec1 + vec2 + vec3;/* vec.add(vec1,vec2); vec.add(vec,vec3);*/ return 0; }The assembly(generated with a call "g++ -O -S main.cpp") is the following:
Code: Select all
.file "main.cpp"
.text
.globl main
.type main, @function
main:
.LFB11:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl %ebp
.cfi_def_cfa_offset 8
movl %esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
movl $0, %eax
popl %ebp
ret
.cfi_endproc
.LFE11:
.size main, .-main
.ident "GCC: (Debian 4.4.5-8) 4.4.5"
.section .note.GNU-stack,"",@progbits
I am sorry if I offended anyone, that is not my intention, I merely wanted to point out the fact that if you know what you are doing overloading operators doesnt necessarily slow down your code(and imho it really does make the code look more beautiful)
Fillius
Re: C vs C++ style of interface for hardware abstraction lay
Well that's because you use a non-trivial constructor with side effects (printing something) so those constructor calls had to be generated, an actual vector class wouldn't have any side effects in the constructor.Falco Girgis wrote:I am getting this output:Since we see no copy constructor being invoked anywhere, we can assume the compiler has optimized away copying the return value (rvalue) to the lvalue.Code: Select all
Constructing < 1 , 1 > at 0x28fe30 Constructing < 2 , 2 > at 0x28fe28 Constructing < 3 , 3 > at 0x28fe20 Constructing < 3 , 3 > at 0x28fe10 Constructing < 6 , 6 > at 0x28fe18
But even then, we see 5 objects being constructed here, rather than just the three we explicitly constructed. This statement still requires temporary storage allocated (and initialized) on the stack for every addition that is not immediately assigned to an lvalue. For n additions, that's n-1 temporary variables allocated on the stack.
Code: Select all
struct vec2 {
int x, y;
vec2(int xx, int yy) : x(xx), y(yy) {}
vec2(const vec2 &v) : x(v.x), y(v.y) {}
vec2 operator+(const vec2 &v) {
return vec2(x+v.x, y+v.y);
}
};
int main()
{
vec2 a = vec2(rand(), rand());
vec2 b = vec2(rand(), rand());
vec2 c = vec2(rand(), rand());
vec2 d = a + b + c;
printf("%d,%d\n", d.x, d.y);
return 0;
}
Code: Select all
call rand
movl %eax, %r14d
call rand
movl %eax, %ebp
call rand
movl %eax, %r13d
call rand
movl %eax, %ebx
addl %r14d, %r13d
call rand
addl %ebp, %ebx
movl %eax, %r12d
call rand
leal (%r12,%r13), %edx
leal (%rax,%rbx), %esi
movl $.LC0, %edi
xorl %eax, %eax
call printf
- Falco Girgis
- Elysian Shadows Team
- Posts: 10294
- Joined: Thu May 20, 2004 2:04 pm
- Current Project: Elysian Shadows
- Favorite Gaming Platforms: Dreamcast, SNES, NES
- Programming Language of Choice: C/++
- Location: Studio Vorbis, AL
- Contact:
Re: C vs C++ style of interface for hardware abstraction lay
Well goddamn. Thanks, Fillius and moreson. I was unaware of this.Fillius wrote:there is a problem with the example provided. You effectively prevent the compiler from optimizing away all temporaries by making your constructors non-trival(because of the debug output).
The last published draft of the current C++ Standard dictates that a non-trivial constructor of temporaries must be called:
In Section 12.2, Page 259:When an implementation introduces a temporary object of a class that has a non-trivial constructor (12.1,
12.8), it shall ensure that a constructor is called for the temporary object.
Is this just new to C++11x, or did the old standards also work in such a manner?