More interesting than the split vote on this test are the arguments in the comments. What actually is the correct behavior? C99 is better than K&R C, but as the article says, it's vague enough that it's not an chip agnostic assembler.
I'm not sure about this, but isn't the struct bitfield value uninitialised, therefore random? Unless in C it's default 0 initialised because it's in the global scope.
It's initialised to 0 because it has static storage duration (in this case, it has static storage duration because it's declared at file scope, but there are other ways to get static storage duration too).
It is a requirement of the C standard that objects with static storage duration are initialised to a "zero of the appropriate type" (which eg. means NULL for pointers). In most implementations this can take advantange of the BSS section (which is why the requirement came about in the first place), but it's required to work even on eg. MS-DOS.
For the second part of your comment, comparisons are always done between the same type. In this case, either the -3 is promoted to unsigned (which gives the result UINT_MAX - 2, a large positive number) or the bitfield is promoted to int. It is the rules for this type promotion that is the heart of the problem.
I think the problem is that -3 < 1 but -3 > 1U because the -3 is "promoted" (quietly corrupted) to UINT_MAX - 2 (edit: C99 guarantees this value). So the question is whether a small unsigned bitfield is handled like a signed int (which can represent all its possible values) or an unsigned int (which is more like what you asked for).
It's just astounding how many developer-years have been wasted by the industry's insistence on producing and accepting wrong answers when commodity hardware has been maintaining under/overflow flags for literally my entire career.
-3 > 1U is guaranteed to be true. Conversion of an out-of-range value to an unsigned type is well-defined in C: it is taken modulo one greater than the maximum value representable in the target type. (unsigned)-3 must always be equal to UINT_MAX - 2, there is no choice for the implementation here.
Conversion of out-of-range values to signed types is, however, implementation-defined (and also allows an implementation-defined signal to be raised, which in practice makes it something to avoid in correct programs).
Thank you, I wasn't aware that C99 now mandates the twos-complement result; I think it had been implementation-defined in C89 (which is all I ever used seriously).
A computation involving unsigned operands can
never overflow, because a result that cannot be
represented by the resulting unsigned integer type
is reduced modulo the number that is one greater
than the largest value that can be represented by
the resulting unsigned integer type.
But in general, uninitialized variables aren't even random in C. Accessing them is undefined. That means you aren't guaranteed to get a bogus value, you could just as well get a nasal demon.
The comma operator evaluates its left argument, introduces a sequence point, then evaluates its right argument. The result has the same type and value as the right argument. It is useful in cases like:
On second thought, let's not go to Camelot. It is a silly place.
It's useful shorthand in a few specific cases, but I would argue that this sort of code is bad style and if I was in charge (which I am not) it would be in the "never do this" section.
That this edge case apparently took decades to find implies (to me) that the comma operator is not to be used in production code. I'll always choose a few extra lines of code over ambiguity, because someday other will people have to read my code when there's some kind of a problem with it, and I'd rather make the problems obvious, not subtle. :)
The problem is that if you try to rewrite that example to avoid the comma operator, you either have to duplicate the update_thing(&foo) call, as in:
update_thing(&foo);
while (foo != 0) {
/* loop body */
update_thing(&foo);
}
Unnecessary duplication like this is itself a potential source of bugs - it's all to easy to update one but not the other. The other alternative is to hack up the loop to exit in a strange place:
while (1) {
update_thing(&foo);
if (foo == 0)
break;
/* loop body */
}
This is arguably even worse - the actual loop termination condition is not where you expect to find it anymore. Personally, I find the formulation using the comma operator to be completely clear.
Note that the bug referenced here is more about the subtleties of bitfield type promotion in expressions, and the interplay of bitfields with operators that evaluate to the type of one of their arguments, than it is about the comma operator. You can show the same bug using the assignment operator instead of the comma operator.
If you're going consign anything involved here to the "never do this" section, I'd start with any use of bitfields, and maybe also include mixing unsigned and signed types in expressions without explicit conversion.
We're coming in the dangerous area of personal preferences, but with the decades of writing and also maintaining code, I certainly prefer "exit in a strange place" variant, as it clearly better allows the code to "grow" with less differences between revisions, like adding a few more lines between update_thing and if. And if the revisions are not expected then how is written is in my opinion realy irrelevant. (Disclaimer: I won't discuss this subject more here.)
A little ugly but you can get it to work without duplicate code by:
bool second_loop = false;
do
{
if (second_loop) {...}
update_thing(&foo);
second_loop = true;
} while (foo != 0);
The real question is which is more clear when some else looks at the code. I suspect even if it's shorter and faster while (foo,bar) is less clear to most people than the above code.
I completely agree with everything you've said except for "any use of bitfields." The comma operator is ok in this circumstance, mixing unsigned and signed is a big no-no, but what have you got against bitfields? They're very useful when it comes to pulling sub-byte fields out of network packets.
They're not any use for pulling sub-byte fields out of network packets in portable code, because everything about the precise layout of bitfields is entirely implementation-defined. Portable code has to use masking-and-shifting.
The only portable uses of bitfields are to potentially save a little memory, and to get "modulo-power-of-2" behaviour. Bitfields are a classic "not as useful as they first appear" feature.
For what it's worth, I use bitfields on occasion, but only to save memory and get better warnings when storing range-limited values. It's nice to have the compiler warn about a comparison always being true or false due to the limited range of a type. The performance of bitfields is probably worse than just tossing all my boolean values into an int32_t or int8_t.
Aha, thank you. As most of my C code has been written for a single compiler and a single platform I was less aware of the pitfalls than I should have been. I will know now to be extra vigilant if I ever have to port that code.
Well, the problem here is (assuming I'm understanding everything correctly) pretty specifically the combination of the comma operator and a bitfield. Of those two, I'd say the latter, even if superficially more intuitive, is probably the more obscure and advisable to avoid.
The comma operator, while not all that common, can be pretty useful in common situations, and can easily achieve both clarity and conciseness. For example, to iterate over a singly-linked list while keeping a pointer to the previous item:
for (prev = NULL, this = first; this != NULL; prev = this, this = this->next) {
...
}
I think this is a perfectly reasonable way to do things. Disallowing the comma operator due to an underspecified interaction with bitfields would be throwing out the baby with the bathwater, in my opinion.