Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some constructive feedback:

> Here are the absolute essential flags you may need.

I highly recommend including `-fsanitize=address,undefined` in there (docs: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.h...).

(Edit: But probably not in release builds, as @rmind points out.)

> The closest thing to a convention I know of is that some people name types like my_type_t since many standard C types are like that

Beware that names beginning with "int"/"uint" and ending with "_t" are reserved in <stdint.h>.

[Edited; I originally missed the part about "beginning with int/uint", and wrote the following incorrectly comment: "That shouldn't be recommended, because names ending with "_t" are reserved. (As of C23 they are only "potentially reserved", which means they are only reserved if an implementation actually uses the name: https://en.cppreference.com/w/c/language/identifier. Previously, defining any typedef name ending with "_t" technically invokes undefined behaviour.)"]

The post never mentions undefined behaviour, which I think is a big omission (especially for programmers coming from languages with array index checking).

> void main() {

As @vmilner mentioned, this is non-standard (reference: https://en.cppreference.com/w/c/language/main_function). The correct declaration is either `int main(void)` or the argc+argv version.

(I must confess that I am guilty of using `int main()`, which is valid in C++ but technically not in C: https://stackoverflow.com/questions/29190986/is-int-main-wit...).

> You can cast T to const T, but not vice versa.

This is inaccurate. You can implicitly convert T* to const T*, but you need to use an explicit cast to convert from const T* to T*.



UPDATE regarding "_t" suffix:

POSIX reserves "_t" suffix everywhere (not just for identifiers beginning with "int"/"uint" from <stdint.h>); references: https://www.gnu.org/software/libc/manual/html_node/Reserved-..., https://pubs.opengroup.org/onlinepubs/9699919799/functions/V....

So I actually stand by my original comment that the convention of using "_t" suffix shouldn't be recommended. (It's just that the reasoning is for conformance with POSIX rather than with ISO C.)


Well, semantically, "size_t" makes sense to me ("the type of a size variable"), while "uint_t" does not ("the type of a uint variable"), because "uint" is already a type, obviously - just like "int".


> -fsanitize=address,undefined

In addition, I recommend -fsanitize=integer. This adds checks for unsigned integer overflow which is well-defined but almost never what you want. It also checks for truncation and sign changes in implicit conversions which can be helpful to identify bugs. This doesn't work if you pepper your code base with explicit integer casts, though, which many have considered good practice in the past.


Good one, thanks. Note that it requires Clang; GCC 12.2 doesn't have it.


Wow nice, I didn't know about this one. I can add some more which are less known. This is my current sanitize invocation (minus the addition of "integer" which I'll be adding, unless one of these other ones covers it):

  -fsanitize=address,leak,undefined,cfi,function
CFI has checks for unrelated casts and mismatched vtables which is very useful. It requires that you pass -flto or -flto=thin and -fvisibility=hidden.

You can read a comparison with -fsanitize=function here:

https://clang.llvm.org/docs/ControlFlowIntegrity.html#fsanit...

There's also TypeSanitizer, which isn't officially released, but is really interesting and should be able to be applied via a patch from the branch:

https://www.youtube.com/watch?v=vAXJeN7k32Y

https://reviews.llvm.org/D32199

  $ curl -L 'https://reviews.llvm.org/D32199?download=1' | patch -p1


I think "leak" is always enabled by "address". It's only useful if you want run LeakSanitizer in stand-alone mode. "integer" is only enabled on demand because it warns about well-defined (but still dangerous) code. You can also enable "unsigned-integer-overflow" and "implicit-conversion" separately. See https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#...


Ah I wasn't sure if LSAN was always enabled with ASAN, good to know -- ty!


Why the hell "potentially reserved" was introduced? How is it different from simply "reserved" in practice except for the fact such things can be missing? How do you even use a "potentially reserved" entity reliably? Write your own implementation for platforms where such an entity is not provided, and then conditionally not link it on the platforms where it actually is provided? Is the latter even possible?

Also, apparently, "function names [...] beginning with 'is' or 'to' followed by a lowercase letter" are reserved if <ctype.h> and/or <wctype.h> are included. So apparently I can't have a function named "touch_page()" or "issue_command()" in my code. Just lovely.


From https://www.open-std.org/JTC1/sc22/wg14/www/docs/n2625.pdf:

> The goal of the future language and library reservations is to alert C programmers of the potential for future standards to use a given identifier as a keyword, macro, or entity with external linkage so that WG14 can add features with less fear of conflict with identifiers in user’s code. However, the mechanism by which this is accomplished is overly restrictive – it introduces unbounded runtime undefined behavior into programs using a future language/library reserved identifier despite there not being any actual conflict between the identifier chosen and the current release of the standard. ...

> Instead of making the future language/library identifiers be reserved identifiers, causing their use to be runtime unbounded undefined behavior per 7.1.3p1, we propose introducing the notion of a potentially reserved identifier to describe the future language and library identifiers (but not the other kind of reservations like __name or _Name). These potentially reserved identifiers would be an informative (rather than normative) mechanism for alerting users to the potential for the committee to use the identifiers in a future release of the standard. Once an identifier is standardized, the identifier stops being potentially reserved and becomes fully reserved (and its use would then be undefined behavior per the existing wording in C17 7.1.3p2). These potentially reserved identifiers could either be listed in Annex A/B (as appropriate), Annex J, or within a new informative annex. Additionally, it may be reasonable to add a recommended practice for implementations to provide a way for users to discover use of a potentially reserved identifier. By using an informative rather than normative restriction, the committee can continue to caution users as to future identifier usage by the standard without adding undue burden for developers targeting a specific version of the standard.


So... instead of mandating implementations to warn about (re)defining a reserved identifier, they introduce another class of "not yet reserved indentifiers" and advise implementations to warn about defining such identifiers in the user code — even though it's completely legal, — until the moment the implementation itself actually uses/defines such an identifier at which point warning about such redefinition in the user code — now illegal and UB — is no longer necessary or advised.

Am I completely misreading this or is this actually insane? Besides, there is already a huge swath of reserved identifiers in C, why do they feel the need to make an even larger chunk of names unavailable to the programmers?


The problem is that the traditional wording of C meant that any variable named 'top' was technically UB, because it begins with `to'.

In practical terms, what compilers will do is, if C2y adds a 'togoodness' function, they will add a warning to C89-C2x modes saying "this is now a library function in C2y," or maybe even have an extension to use the new thing in earlier modes. This is what they already do in large part; it's semantic wording to make this behavior allowable without resorting to the full unlimited power of UB.


> Besides, there is already a huge swath of reserved identifiers in C, why do they feel the need to make an even larger chunk of names unavailable to the programmers?

The C23 change was mostly to downgrade some of the existing reserved identifiers from "reserved" to "potentially reserved". (It also added some new reserved and potentially reserved identifiers, but they seem reasonable to me.)


I still fail to see any practical difference between these two categories, except that the implementations are recommended to diagnose illegal-in-the-future uses of potentially reserved identifiers but are neither required nor recommended to diagnose actually illegal uses of reserved identifiers. There is also no way to distinguish p.r.i from r.i.

It also means that if an identifier becomes potentially reserved in C23 and reserved in C3X, then compiling a valid C11 program that uses it as C23 will give you a warning, which you can fix and then compile resulting valid C23 program as C3X without any problem; but compiling such a C11 program straight up as C3X will give you no warning and a program with UB.

Seriously, it boggles my mind. Just a) require diagnostics for invalid uses of reserved identifiers starting from C23, b) don't introduce new reserved identifiers, there is already a huge amount of them.


How can a (badly chosen) typedef name trigger _undefined behavior_, and not just, say, a compilation error...?

I find it difficult to imagine what that would even mean.


You can declare a type without (fully) defining it, like in

    typedef struct foo foo_t;
and then have code that (for example) works with pointers to it (*foo_t). If you include a standard header containing such a forward declaration, and also declare foo_t yourself, no compilation error might be triggered, but other translation units might use differing definitions of struct foo, leading to unpredictable behavior in the linked program.


One potential issue would be that the compiler is free to assume any type with the name `foobar_t` is _the_ `foobar_t` from the standard (if one is added), it doesn't matter where that definition comes from. It may then make incorrect assumptions or optimizations based on specific logic about that type which end up breaking your code.


The problem being that to trigger a compile error the compiler would have to know all its reserved type names ahead of time.

It is not required to do so, hence undefined behavior. You might get a wrong underlying type under that name.


But wouldn't one be required to include a particular header in such case (i.e. the correct header for defining a particular type)?

I mean, no typedef names are defined in the global scope without including any headers right? Like I find it really weird that a type ending in _t would be UB if there is no such typedef name declared at all.

Or is this UB stuff merely a way for the ISO C committee to enforce this without having to define <something more complicated>?


[Note: What I originally wrote in my top-level comment was inaccurate; I edited that comment, but later posted another update: https://news.ycombinator.com/item?id=33773043#33775630.]

The purpose of this particular naming rule is to allow adding new typedefs such as int128_t. The "undefined behaviour" part is for declaration of any reserved identifier (not specifically for this naming rule). I don't know why the standard uses "undefined behaviour" instead of the other classes (https://en.cppreference.com/w/cpp/language/ub); I suspect because it gives compilers the most flexibility.


[Edit: My link to the behaviour classes was wrong (it was for C++ instead of C), it should have been https://en.cppreference.com/w/c/language/behavior]


Doesn’t the compiler need to know all of the types to do the compilation anyway?


I'm not sure, but in general having incompatible definitions for the same name is problematic.


Thank you so much! I will definitely be amending a few things. WRT no section on undefined behaviour - you're so right, how could I forget?


Certainly yes, but for debug builds and tests. It can be heavyweight for production.


C spec:

>That shouldn't be recommended, because names ending with "_t" are reserved.

Also C spec naming new things:

>_Atomic _Bool

I'm glad to see the C folks have a sense of humor.


Not all reserved names are reserved for all purposes. _t is reserved only for type names (typedefs), whereas _Atomic and _Bool are keywords.


The standard reserves several classes of identifiers, "_t" suffix [edit: with also "int"/"uint" prefix] is just one of several rules. Another rule is "All identifiers that begin with an underscore followed by a capital letter or by another underscore" (and also "All external identifiers that begin with an underscore").


That only because bool was usually an old alias to int.. It's defined as alias to _Bool in stdbool.h, highly recommended.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: