Do you know how to write “well-formed” XML?

The Astronomer’s Telegram is approaching a day soon when we will require all submitted posts to be consistent with “well-formed” XML.

The driver behind this is the goal to distribute ATels through machine-readable XML documents — through ATELstream, and other such systems. And because XML documents must be machine-readable, they have slightly tighter constraints on what they can contain than we have had previously on ATel.

There are three new, specific requirements which all submitted ATels will have to meet, when this is enforced:

  1. Element tags are case sensitive, and there must be an end tag for every opening tag. Thus, when authors open a paragraph with <p>, they must have a matching </p> at the end (and not </P>, or leave it off altogether, both of which are acceptable in standard HTML, but are not acceptable in XML.)  Or, for <br>, you type <br />.
  2. Element tags must be properly nested. Elements which are opened within another element, must be closed within the same element. You can write “<p> I had a nice <b>day today </b> </p>”, but not  “<p> I had a nice <b>day today </p> </b>.”
  3. None of the special syntax characters such as “<” and “&” appear except when performing their markup-delineation roles.

That’s it.  There are other rules for “well-formed” XML, but the authors either already comply with these, or they aren’t necessary.  Specifically:

  1. Only valid Unicode characters are allowed.  ATel has required UTC-8 characters (which are valid Unicode) in all submissions for about 6 months now; and the website is programmed to not accept any but UTC-8 characters.
  2. There is a single “root” element, which contains all other elements. Authors can ignore this rule. When ATels are published in an XML context, they will be enclosed within such a root element by The Astronomer’s Telegram.

How will authors know if they aren’t providing text which meet the above requirements?  We will install a parser, which will parse the submitted ATel, and provide a descriptive error message, pointing out where in the text the ATel fails to meet the above requirements. Initially, we will include a toggle switch, which will permit authors to submit non-XML compliant text – although we would advise authors against using this, to insure that their ATels can be widely read on the XML compliant systems into the future.

Why are we doing this?

The Astronomer’s Telegram is for humans. ATels are meant to be read by humans and reacted to by humans.  But  in the future, there will be an increasing role played in reacting to the unfolding observational contexts of astrophysical transients by automated processes — computer reactions, mechanical reactions, telescopes being repointed, analyses being performed, much of it without a human in the loop.   When this occurs, it is going to be important that ATels can be carried by the same processes which carry raw data.

One of  those processes will be XML documents.  An example of an XML standard developed for astrophysical transients is the VOEvent standard — which has been in development for almost a decade.    And for ATels to be distributed via XML documents, ATels will have to be XML compliant. Because this is related to the content of the ATel, making a submitted ATel XML compliant is an author responsibility.

If this is going to happen in the future, why enforce this standard now?

We have to start enforcing an XML-compliant standard now, because there’s no backwards compatibility with XML.  Even today,  all ATels in the past which are not XML compliant will be unreadable in the future by XML enabled systems. (The vast majority of ATels are already XML compliant, but a few aren’t).   That means that if an author included a table, and had a line like this: “<tr><td> object <td> flux <td> observing time </TR>”, and we attempt to distribute this to an XML system, it will have undefined behavior (it could break the system, it could cause the system to throw away the whole document).

We want all our author’s ATels to be useful in the rapidly developing observational context of tracking down astrophysical transients now, and in the future.

 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply