[tei-council] Numeric entity references in Guidelines text
Martin Holmes
mholmes at uvic.ca
Sat Apr 13 23:09:43 EDT 2013
Hi all,
I'm going to be doing the conversion of list/@type to list/@rend by
processing the Guidelines and Specs through XSLT. I have to use XSLT
rather than regex search/replace because I need to leave alone all the
instances in <egXML>s (meaning in a different namespace) because I need
to change all those manually later, since the explanatory text will have
to be changed at the same time. In the meantime, I'm going to do all the
ones in the regular TEI namespace with XSLT.
One side-effect of XSLT processing is the resolution of character entity
references. So where the Guidelines code has this:
<formula>n×(n-1)</formula>
the output will resolve the numeric entity like this:
<formula>n×(n-1)</formula>
I would like to preserve the entity references in their original state,
but the only way to do this is to specify the output encoding as
us-ascii, and that means that ALL non-ASCII characters would become
entities -- obviously not what we want. There's no actual way to
preserve only the existing entities as entities; they're resolved to
their codepoints during the XML parse before the XSLT transform is done.
But I don't really think there's any reason to maintain the character
entities, is there? There are only 13 in the Guidelines text, and 53 in
the specs. Does anyone have any objection if they just get resolved to
their characters? Most of them are uncontroversial things like accented
es and degree signs that will display in most fonts anyway.
Silence means "by all means go ahead, and I promise not to complain
later when I notice what you did."
Cheers,
Martin
More information about the tei-council
mailing list