(interesting) using bzip2, compression is better when the following file are encoded first with base64 or hex: bing.png googlelogo.png peppers_color.jpg
Useless takeaways:
- prefer base64 over hex when encoding already compressed images before further compression
- prefer hex over base64 when encoding plain text / low entropy data before further compression
Well they will never be equivalent since the compressor has to learn and encode the set of 64 characters used, and passing along that information has some cost. In practice, that cost is either sending the probabilities up front in a compressor that does that, or the ramp up cost of using the wrong probabilities for a compressor that just uses the existing frequencies as the implicit probabilities.
Otherwise we could pass along some information "for free" in any base-64 encoding scheme by choosing some set of 64 characters (there are lots of choices and whichever set we choose encodes a message), encoding the original message with it, and then compressing it back to the original size - leading to "infinite" compression.
Other reasons base64 can't be compressed exactly back to its original form include the presence of arbitrary newlines, and trailing padding characters.
This doesn't matter in practice for compressing one large base64 encoded file where the overheads go to approximately zero, but for compressing a larger non-base64 file (e.g., HTML file) that contains embedded base64 chunks it is actually a real problem the compressor has to delimit the base64 encoded regions and communicate the new symbol probabilities somehow for each region.