Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Β£ is still just a byte


When using latin-1/latin-15/iso-8859-1/iso-8859-15/cp1252 that statement is true. With utf-8 it is two bytes (c2 a3), if a software uses utf-16, ucs-2, etc. it may be more.


And yet it is reasonably common to see "£" when the UTF-8 is misinterpreted.


Not in any modern encoding and certainly not in ASCII either. Having the highest order bit set makes that kind of problematic.


'u32_pound & 0xff == u32_pound' happens to be true, ye. It doesn't make it a byte. You need the leading 0s.


My mistake


Nope, says UTF-8.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: