So the fact that 96% of the characters you are sending are markup is not a downside?
S-expressions solved this problem a long time ago. (a b c) is only 57% markup. I don't think you can get much more succinct than that and still express the idea of a list.
XML have redundancy by design. It's a deliberate trade-off to make it easier to read and write by hand, at the cost of size.
If message size is an issue then gzip the data. Or if you have very specific needs for processing speed, look into something like Googles protocol buffers.
You are optimizing at the wrong level if you are concerned about a few extra characters in a human-readable data exchange format.
But is it really easier to read and write? Properly-indented S-expressions are just as readable. Generating XML and then gzip-ing it is a lot more work (and requires a lot more libraries) than generating S-expressions.
Perhaps the real problem is that too many people use terrible text editors. Paren-matching and auto-indentation makes writing S-expressions orders of magnitude easier, and at least a constant factor easier than writing XML.
You are right about text-editors. XML was designed to be reasonable easy to write and edit by humans without specialized software. The redundant end-tag helps to catch errors and make structure more explicit.
Sure everyone could just use a fancy specialized editor with paren-matching auto-indentation. But one of the goals of XML was precisely that it should not rely on specialized software to be able to read and write.
Your example with the table is a lot clearer with sexpr syntax because you don't actually have any content in the table. Try again with a few sentences of mixed content, some bolded words, a link, and so on, and you will get my point.
Note that you would also need to gzip your s-expressions if you are concerned about size.
The HTML of your comment, with some formatting added:
<span class="comment">
<font color=#000000>
You are right about text-editors. XML was designed to be reasonable
easy to write and edit by humans without specialized software. The
redundant end-tag helps to catch errors and make structure more
explicit.
<p>
Sure everyone could just use a fancy specialized editor with paren
matching auto-indentation. But one of the goals of XML was
precisely that it should not rely on specialized software to be
able to read and write.
<p>
Your example with the table is a lot clearer with sexpr syntax
<b>because you don't actually have any content in the table.</b>
Try again with a few sentences of mixed content, some bolded
words, a link, and so on, and you will get my point.
<p>
Note that you would also need to gzip your s-expressions if you are
concerned about size.
</font>
</span>
The same thing in S-expressions (an invented syntax):
(span (class . comment)
(font (color . #000000)
You are right about text-editors. XML was designed to be reasonable
easy to write and edit by humans without specialized software. The
redundant end-tag helps to catch errors and make structure more
explicit.
(p)
Sure everyone could just use a fancy specialized editor with paren
matching auto-indentation. But one of the goals of XML was
precisely that it should not rely on specialized software to be
able to read and write.
(p)
Your example with the table is a lot clearer with sexpr syntax
(b because you don't actually have any content in the table). Try
again with a few sentences of mixed content, some bolded words, a
link, and so on, and you will get my point.
(p)
Note that you would also need to gzip your s-expressions if you are
concerned about size.))
It's really not a lot different. Of course, parens would need to be escaped, but this is no different from needing to escape < and >.
But one of the goals of XML was precisely that it should not rely on specialized software to be able to read and write.
But this is the problem with XML... it does rely on special libraries to validate and parse into reasonable data structures. It requires special heuristics to describe how to recover nicely in the event that markup isn't valid. It requires a document describing exactly what the XML needs to look like.
S-expressions are easier to parse, less verbose, and can accomplish all the same tasks and more, all while being more flexible in general.
What you have done in you example is reinvent XML with round parenthesis instead of pointy brackets. Why is this better? The only difference is that you leave out the redundant end tags, which are there for good reason.
Yes, XML requires a library to parse - so does s-expressions! The reason XML seem more complex than sexprs is that it defines a higher-level syntax e.g. with element/attribute-distinctions. You have reinvented that yourself in you example, so you need a spec for it and you need the parser to support it. Also the rules of encodings and character sets have to be specified (e.g. how do you detect the encoding of a file? Which characters count as whitespace?). You will end up with a spec much like XML, except with round parentheses. (OK, XML is also complex because of DTD's but that is a optional part. If you want something like DTD's for sexprs, again, you have to specify it, and you get something like XML.)
Btw. there is no heuristics for recovery in XML. XML parsers must fail when encountering malformed syntax. This is one of the major (and controversial) differences between XML and HTML.
I appreciate s-expressions as a syntax for a programming language. But code is a very different use case than documents. I wouldn't like to program in XML syntax either! E.g. programs (hopefully) don't have deeply nested structures covering several pages. That is common in documents, hence the importance of the redundant end tag.
I like sexprs for code and data, but for documents they are only simpler if you ignore a lot of real-world issues.
BTW. the HTML is not valid XML so your example is a bit misleading. The P-elements contain the paragraphs rather than delimit them. The XML would be more verbose since it needs end-tags for P:
<p>Note that you would also need to gzip your
s-expressions if you are concerned about size.</p>
The s-expr OTOH would be more confusing, because there isn't a clear distinction between element-name and content:
(p Note that you would also need to gzip your
s-expressions if you are concerned about size.)
You might want to choose a different syntax to make the distinction clearer:
(p "Note that you would also need to gzip your
s-expressions if you are concerned about size.")
or:
((p) Note that you would also need to gzip your
s-expressions if you are concerned about size.)
In the end, you have to make some of the same trade-off decisions that the designers of SGML and XML did. Just saying that s-expressions are simpler than XML is like saying ASCII is simpler than s-expressions: True, but kind of missing the point.
Well in that case of using XML as markup (what it was designed for) - it is clearer then the s-exp. the only time I like XML editing is docbook - cause when you end a tag, you never have to bounce back up (which may be more then a screen away) to know what tag you are in.
That does not seem to be to be typical XML, and if it is, it's really being stretched to do something it's not intended to do, IMO. XML's strength is in representing tree-based structures, but that appears to be an attempt to represent an associative structure. With Sexps, this is just as easy:
(sizes (dress . 5) (pants . 7) (shoes . 11))
But in doing that, the structure really looks off, even though it's almost exactly mirroring the XML. I think this is a clue that the XML is a bit of a stretch. Much better (in Lisp code) is:
(let ((sizes '((dress . 5) (pants . 7) (shoes . 11))))
; do something with sizes
...)
But I guess the real question is what this is trying to represent. If it's the sizes of various people, then the Sexps are quite simple:
But now we're getting away from the structure we defined using S-expressions, and besides, name='...' seems to be distinctly different information from the sizes themselves, so something else is strange. Perhaps
Well, that looks nice, and closer to what we are trying to represent, but of course it's impossible to validate (at least from what I know of XML), since the person names are not part of our schema. We'll have to do something like:
Great! Now we have something that matches our desired structure and is easy to validate. Of course, it's much more verbose, but that made it easier to read and write, right?
Part of the problem with XML is that it causes these huge debates about how to structure and name the data. Another problem is that attributes don't nest nicely; that was the main problem in this instance. In other words, XML can be used nicely to represent a tree structure and reasonably well for lists or simple associative structures. But as soon as those associative elements need to map to something more complicated, you start having issues with how best to structure everything.
With S-expressions easily able to express assoc-lists while also being trivially nestable, these issues don't come up.
"That does not seem to be to be typical XML, and if it is, it's really being stretched to do something it's not intended to do, IMO. XML's strength is in representing tree-based structures, but that appears to be an attempt to represent an associative structure. "
It's fairly typical of the XML I've used, and well within what XML was intended to do. That some (many?) people end up with needlessly verbose markup is not the fault of XML. Some people write verbose Scheme. Go figure.
A major point of XML is just simply tree data, but meta-data. You first showed a basic, non-annotated list; I showed a list with meta-data. Seems that you didn't like how the s-exp version of that XML looked, so you changed that use case.
"But in doing that, the structure really looks off, even though it's almost exactly mirroring the XML. I think this is a clue that the XML is a bit of a stretch. "
Or just maybe it's an example where XML differs from s-exps.
"Part of the problem with XML is that it causes these huge debates about how to structure and name the data."
Not really. I mean, some people like that stuff (I see it as a bike shed thing; it's a chance to show off how complex people can make something), but many other folks find a sparse, good-enough structure and move on. Quite honestly, the way you exaggerated the initial example is pure strawman. And you can have the same arguments about representation using s-expressions.
Don't blame a syntax because it allows people to be dopey.
I can see how nicely s-exp can work for markup, but I'm still curious how name-spaces, schema, ID + IDREF, transclusion, and other XML features are handled in s-expressions.
I mostly get the feeling that the only real gripe about XML is the duplication in the closing tags. (The W3C has explained why they dropped the short-form of XML and went with explicit end tags.)
It's not just about the amount of markup, though, it's about the unnecessary complexity of the markup. The software that generates and parses S-expressions is much simpler than that which generates and parses XML. Of course, in Lisp, it's just
(let ((list (read data)))
...)
But even in Python, you could easily hack together (not recommended) something like
list = ", ".join(data.split())
...
Of course, that's not robust, but the library which is robust is much simpler than the one that requires the use of a C sax parser just to be usably fast.
If the argument is that XML is more human-readable, that is implying that it's being human-modified, and then XML creates more work since it's so verbose. If the verbosity is not an issue because it's auto-generated, that implies that it's not being read/modified by humans, and the whole point of using XML in the first place is lost. I just can't see any problem that XML solves that S-expressions didn't already solve in a simpler way.
S-expressions are nice but not superior to XML for all use cases. S-expression syntax are optimized for lists of names and numbers. XML-syntax is optimized for structured documents. Since XML is used just as much for data as for documents these days, s-expressions would perhaps be just as good as XML for a common data exchange meta format. But that train left the station a decade ago.
I suspect a reason XML catched on and s-expressions didn't (outside of the Lisp-niche) is that XML tackled difficult internationalization issues like different encodings and character sets head on.
The xml solved the problem of me having to write a parser from scratch for whatever terms of transfer you come up with.
And I don't really see the downside for most applications.