Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The only thing that is not defined is comparing a pointer one-past-the-end to a pointer to the very beginning of a toplevel object. Apart from this rule, pointers of course do not need to be derived from the same object in order to be compared with == and !=.

&a + 1 == &b is unspecified: it may produce 0 or 1, and it may not produce the same result if you evaluate it several times.

Similarly, if both the char pointers p and q were obtained with malloc(10), after they have been tested for NULL, all these operations are valid:

  p == q (false)
  p + 1 == q (false)
  p + 1 == q + 1 (false)
  p + 10 == q + 1 (false)
Only p+10 == q and p == q+10 are unspecified (of the comparisons that can be built without invoking UB during the pointer arithmetic itself).

I have no idea what led that person to (apparently) write that &a==&b is undefined. This is plain wrong. I do not see any ambiguity in the relevant clause (https://port70.net/~nsz/c/c11/n1570.html#6.5.9p6 ). Yes, the standard is in English and natural languages are ambiguous, but you might as well claim that a+b is undefined because the standard does not define what the word “sum” means (https://port70.net/~nsz/c/c11/n1570.html#6.5.6p5 ).



That’s quite precise, can you give a sense of why it’s useful to have? Does it translate as “you can never know whether two mallocs are adjacent, so don’t even try merging them”?


One concrete reason why “unspecified” means “anything and not always the same thing” is to enable the maximum of optimizations.

Write a function c that compares pointers in a compilation unit, and in another compilation using, define:

    int a, b;
    X1 = (&a == &b + 1);
    X2 = c(&a, &b + 1);
The compiler can optimize the computation of X1 on the basis that comparing an offset of &a to an offset of &b will always:

  - be false
  - or invoke undefined behavior
  - or be unspecified
But the optimization will not apply to the computation of X2, so the two variables X1 and X2 can receive different values when you execute this example, although they appear to compute the same thing.


I get why unspecified means that and it’s good to know what the limit is for applying an optimisation, but I was asking about why the specific comparison of “one past the end” with the beginning of another being unspecified would be useful. It’s cool you can optimise it out, but what does a compiler gain from being able to do that?

Imagine a standard stated that > and < character comparisons involving '%' were unspecified. Why would this be good? It wouldn’t, so it’s not in any standard. But specifically it wouldn’t because (a) nobody writes ch < '%', and (b) if they did, compilers couldn’t make programs any faster, more portable, etc, because of its inclusion.

I guessed above that this is kinda like having hashmaps iterate in a random order: compilers do spooky things when you try to check whether two allocas/mallocs are adjacent, so don’t do it. Is that accurate? Or does it mean that compilers can move things around on the stack if they want, without worrying about updating the registers or locations that store the pointers, i.e. this is mainly to make compilers easier to write? If it’s that, I imagine I would want some other pointer comparisons on the list. The reason it’s in there is what I wanted you to shed some light on.


Oh, that was your question. In this case, the reason why &a + 1 == &b is unspecified is that:

- it's generally false—there is no reason for b to be just after a in memory, so these two addresses compare different.

- it is sometimes true: when addresses are implemented as integers, and compilers use exactly sizeof(T) bytes to represent an object of type T, and do not waste precious integers by leaving gaps between objects, and == between pointers is implemented as the assembly instruction that compares integers, sometimes that instruction produces true for &a + 1 == &b, because b was placed just after a in memory.

In short, &a + 1 == &b was made unspecified so that compilers could implement pointer == by the integer equality instruction, and could place objects in memory without having to leave gaps between them. Anything more specific (such as “&a + 1 == &b is always false”) would have forced compilers to take additional measures against providing the wrong answer.


Why is this undefined if it’s all just pointers to addresses in memory, regardless if the memory is valid for that object or not?


Here is an example I have at hand that shows that when you are using an optimizing compiler, there is no such thing as “just pointers to addresses in memory”. There are plenty more examples, but I do not have the other ones at hand.

https://gcc.godbolt.org/z/Budx3n


Please correct me if I am wrong, but I think here the optimization is possible because "* p = 2" is UB, because the compiler can assume that "p" points to invalid memory. For this assumption, the compiler must know that "realloc" invalidates its first argument.

How does it know that? The definition of "realloc" lives in the source of "libc.so", so the compiler should not be able to see into it. Its declaration in "malloc.h" does not have any special attributes. Does the standard and/or the compiler handles "realloc" differently from other functions?

edit:

It looks like clang inserts a "noalias" attribute to the declaration of "realloc" in the LLVM IR, so it seems it does handle "realloc" specially.

    declare dso_local noalias i8* @realloc(i8* nocapture, i64) local_unnamed_addr #3


I would guess that it is because it gives some freedom to the compiler. e.g. If you have two pointers 'foo' and 'bar' that point to two separate structures (e.g. two arrays of ints), the compiler can always assume that the pointers, even with some adds/subtracts, will never 'collide', i.e. foo will never == bar, regardless of their relative memory positions.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: