Friday, 1 May 2015

XML is good, really!

It's long been fashionable to dislike XML.  I have never really understood why.  XML was designed to solve many problems with data formats - sadly, problems that are being re-introduced with newer XML alternatives.  Let's take a look at XML and see what it really can offer.

XML is an extensible mark-up language.  It's designed so that an XML format can be added to without breaking existing usage.   So many legacy formats have become unusable because they were inflexible, because extensions break assumptions about factors such as record sizes.

XML has name spaces.  Data from different origins can be combined into a single XML format without conflict.  This allows for things like data objects embedded in documents.

XML is human readable.   XML was designed so that archived data stored as XML would always be readable by at least a human, and so data would never be irretrievable.  XML marks up all aspects of data - there are no invisible assumptions such as column sizes or column meanings that are so often present in other formats.

XML explicitly starts and terminates all items of data.  There are no assumed data separators such as tabs or line ends.

XML is easy to validate and process by software.  Any XML document from any source can be validated because of the rules of tag and attribute use.

XML can include a semantic description of a specific format: a DTD (Document Type Definition) or schema reference.  This allows for format validation in addition to general document structure validation.

XML is verbose because it was specifically designed to be readable - it's not a flaw, it's a design feature.   Compare a well-designed XML specification to typical JSON content - it should be clear which is the more  intelligible format.  For example, a JSON document doesn't contain information about its semantics (as XML can), and so key names can be arbitrary.  CSV (comma-separated variable) format is a horror - just consider the countless legacy CSV documents that are now useless because their meaning has been lost.

XML is a valuable way to transmit and store information, with major benefits for data integrity and longevity.  It should be even more widely used than it already is.

No comments: