The first split vote about C compiler random test

cdcarter · on July 14, 2011

More interesting than the split vote on this test are the arguments in the comments. What actually is the correct behavior? C99 is better than K&R C, but as the article says, it's vague enough that it's not an chip agnostic assembler.

copper · on July 14, 2011

According to this explanation, the behaviour isn't completely specified: http://blog.regehr.org/archives/558#comment-2698

daemin · on July 14, 2011

I'm not sure about this, but isn't the struct bitfield value uninitialised, therefore random? Unless in C it's default 0 initialised because it's in the global scope.

caf · on July 14, 2011

It's initialised to 0 because it has static storage duration (in this case, it has static storage duration because it's declared at file scope, but there are other ways to get static storage duration too).

a1k0n · on July 14, 2011

Wouldn't that put it in the bss section and therefore not initialize it (but most OS's will probably zero out the page for security reasons)?

Ether way, as it's a 1-bit field, the value can only be 0 or 1 and is always > -3.

caf · on July 14, 2011

It is a requirement of the C standard that objects with static storage duration are initialised to a "zero of the appropriate type" (which eg. means NULL for pointers). In most implementations this can take advantange of the BSS section (which is why the requirement came about in the first place), but it's required to work even on eg. MS-DOS.

For the second part of your comment, comparisons are always done between the same type. In this case, either the -3 is promoted to unsigned (which gives the result UINT_MAX - 2, a large positive number) or the bitfield is promoted to int. It is the rules for this type promotion that is the heart of the problem.

prodigal_erik · on July 14, 2011

I think the problem is that -3 < 1 but -3 > 1U because the -3 is "promoted" (quietly corrupted) to UINT_MAX - 2 (edit: C99 guarantees this value). So the question is whether a small unsigned bitfield is handled like a signed int (which can represent all its possible values) or an unsigned int (which is more like what you asked for).

It's just astounding how many developer-years have been wasted by the industry's insistence on producing and accepting wrong answers when commodity hardware has been maintaining under/overflow flags for literally my entire career.

caf · on July 14, 2011

-3 > 1U is guaranteed to be true. Conversion of an out-of-range value to an unsigned type is well-defined in C: it is taken modulo one greater than the maximum value representable in the target type. (unsigned)-3 must always be equal to UINT_MAX - 2, there is no choice for the implementation here.

Conversion of out-of-range values to signed types is, however, implementation-defined (and also allows an implementation-defined signal to be raised, which in practice makes it something to avoid in correct programs).

prodigal_erik · on July 14, 2011

Thank you, I wasn't aware that C99 now mandates the twos-complement result; I think it had been implementation-defined in C89 (which is all I ever used seriously).

http://stackoverflow.com/questions/50605/signed-to-unsigned-...

caf · on July 14, 2011

C89 has this language in §3.1.2.5 Types:

  A computation involving unsigned operands can
  never overflow, because a result that cannot be
  represented by the resulting unsigned integer type
  is reduced modulo the number that is one greater 
  than the largest value that can be represented by 
  the resulting unsigned integer type.

eru · on July 14, 2011

As somebody pointed out, it's initialised here.

But in general, uninitialized variables aren't even random in C. Accessing them is undefined. That means you aren't guaranteed to get a bogus value, you could just as well get a nasal demon.

cschneid · on July 14, 2011

Can somebody link me a quick explanation of what the comma operator does in C?

caf · on July 14, 2011

The comma operator evaluates its left argument, introduces a sequence point, then evaluates its right argument. The result has the same type and value as the right argument. It is useful in cases like:

    while (update_thing(&foo), foo != 0) {

    }

owyn · on July 14, 2011

On second thought, let's not go to Camelot. It is a silly place.

It's useful shorthand in a few specific cases, but I would argue that this sort of code is bad style and if I was in charge (which I am not) it would be in the "never do this" section.

That this edge case apparently took decades to find implies (to me) that the comma operator is not to be used in production code. I'll always choose a few extra lines of code over ambiguity, because someday other will people have to read my code when there's some kind of a problem with it, and I'd rather make the problems obvious, not subtle. :)

caf · on July 14, 2011

The problem is that if you try to rewrite that example to avoid the comma operator, you either have to duplicate the update_thing(&foo) call, as in:

  update_thing(&foo);
  while (foo != 0) {
      /* loop body */

      update_thing(&foo);
  }

Unnecessary duplication like this is itself a potential source of bugs - it's all to easy to update one but not the other. The other alternative is to hack up the loop to exit in a strange place:

  while (1) {
      update_thing(&foo);
      if (foo == 0)
          break;

      /* loop body */
   }

This is arguably even worse - the actual loop termination condition is not where you expect to find it anymore. Personally, I find the formulation using the comma operator to be completely clear.

Note that the bug referenced here is more about the subtleties of bitfield type promotion in expressions, and the interplay of bitfields with operators that evaluate to the type of one of their arguments, than it is about the comma operator. You can show the same bug using the assignment operator instead of the comma operator.

If you're going consign anything involved here to the "never do this" section, I'd start with any use of bitfields, and maybe also include mixing unsigned and signed types in expressions without explicit conversion.

acqq · on July 14, 2011

We're coming in the dangerous area of personal preferences, but with the decades of writing and also maintaining code, I certainly prefer "exit in a strange place" variant, as it clearly better allows the code to "grow" with less differences between revisions, like adding a few more lines between update_thing and if. And if the revisions are not expected then how is written is in my opinion realy irrelevant. (Disclaimer: I won't discuss this subject more here.)

zokier · on July 14, 2011

  do {
      update_thing(&foo);
  } while (foo != 0);

jws · on July 14, 2011

This works for the example as typed, but not if there is a body to the original loop.

onemoreact · on July 14, 2011

A little ugly but you can get it to work without duplicate code by:

  bool second_loop = false;
  do
  {
  if (second_loop) {...}
  update_thing(&foo);
  second_loop = true;
  } while (foo != 0);

The real question is which is more clear when some else looks at the code. I suspect even if it's shorter and faster while (foo,bar) is less clear to most people than the above code.

tb · on July 14, 2011

I completely agree with everything you've said except for "any use of bitfields." The comma operator is ok in this circumstance, mixing unsigned and signed is a big no-no, but what have you got against bitfields? They're very useful when it comes to pulling sub-byte fields out of network packets.

caf · on July 14, 2011

They're not any use for pulling sub-byte fields out of network packets in portable code, because everything about the precise layout of bitfields is entirely implementation-defined. Portable code has to use masking-and-shifting.

The only portable uses of bitfields are to potentially save a little memory, and to get "modulo-power-of-2" behaviour. Bitfields are a classic "not as useful as they first appear" feature.

nitrogen · on July 14, 2011

They're very useful when it comes to pulling sub-byte fields out of network packets.

That is, if you don't mind tying your code to a particular platform and compiler. The alignment and packing of bitfields is implementation dependent: http://stackoverflow.com/questions/1490092/c-c-force-bit-fie...

For what it's worth, I use bitfields on occasion, but only to save memory and get better warnings when storing range-limited values. It's nice to have the compiler warn about a comparison always being true or false due to the limited range of a type. The performance of bitfields is probably worse than just tossing all my boolean values into an int32_t or int8_t.

tb · on July 15, 2011

Aha, thank you. As most of my C code has been written for a single compiler and a single platform I was less aware of the pitfalls than I should have been. I will know now to be extra vigilant if I ever have to port that code.

bartwe · on July 14, 2011

Loop-and-a-half is a very common and useful pattern, I don't see anything wrong with the second code snippet.

1amzave · on July 14, 2011

Well, the problem here is (assuming I'm understanding everything correctly) pretty specifically the combination of the comma operator and a bitfield. Of those two, I'd say the latter, even if superficially more intuitive, is probably the more obscure and advisable to avoid.

The comma operator, while not all that common, can be pretty useful in common situations, and can easily achieve both clarity and conciseness. For example, to iterate over a singly-linked list while keeping a pointer to the previous item:

  for (prev = NULL, this = first; this != NULL; prev = this, this = this->next) {
    ...
  }

I think this is a perfectly reasonable way to do things. Disallowing the comma operator due to an underspecified interaction with bitfields would be throwing out the baby with the bathwater, in my opinion.

5hoom · on July 14, 2011

Thats really cool, never thought of doing that before. It's always enjoyable to learn a new c trick, thanks :)

(oops, read further down that this is a common pattern. I need to go read K&R again!)