Right, C++ leaves whitespaces as optional. Although, we all know that a whitespace in C++ isn't always so innocent -- try sticking one in the middle of a variable name or reserved word or data type :P. A C++ lexer will stop reading in a token if it reaches a whitespace just as soon as it will stop reading in a token if it reaches a character that cannot legally be part of the token it's reading in.
Ex: Say you have:
After the "if" and the "(" are processed as tokens, the first "i" in "i_var1" will tell the lexer to jump to a state that reads in tokens that are okay to begin with a letter (might be reserved words, variables, data types, etc.), and then the underscore, the v, a, r, and 1 are all legal characters for such a token, so the lexer reads them in without a problem. When the lexer reaches the "<" operator, it stops, knowing that this character cannot be part of a token like "i_var1". Because "<" can't legally be part of the "i_var1" string, "i_var1" is registered as a token, and the lexer refreshes itself to prepare reading in the next token that now begins with "<". If there were a space (or a thousand spaces) between "i_var1" and the "<" operator, the lexer would jump to a state that reads whitespace characters, and stay there until another non-whitespace character was read in, at which point it would begin reading in the next token, starting with that character.
If you had something like this:
(and we all know whitespaces are just as illegal in a token like "i_var1" as the "<" operator is), the lexer would read the "i", register it as a legal token, read whitespaces until the next non-whitespace character (in this case there is only the one whitespace separating "i" and "var1"), read "var1", register it as a legal token, and so on, leaving "i var1" as two independent tokens, "i" and "var1", rather than one. The lexer of course, thinks it's done its job, and would pass this right on to the parser as completely legit, not knowing there's even the slightest hint of a problem. The parser would tear this apart though, because we all know that statement is nowhere near syntactically correct.
In response to the question at hand, lexers for C++ know that a punctuation token is just one character long, so it doesn't matter to the compiler whether there's a whitespace or the beginning of a variable or data type that follows it, because that punctuation token's already been registered and set aside (if there is whitespace following it, the compiler just keeps reading in characters until it finds one that isn't whitespace -- otherwise it starts reading in characters for the next token).
Some languages
do "care about" whitespace, though. Take Python, for example. Rather than using curly braces { } to indicate a code block, Python uses indentations:
Code: Select all
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n-1) + fib(n-2)
Hope this helps and wasn't too long :P I have a tendency to ramble.