[tei-council] Numeric entity references in Guidelines text

Martin Holmes mholmes at uvic.ca
Sat Apr 13 23:09:43 EDT 2013


Hi all,

I'm going to be doing the conversion of list/@type to list/@rend by 
processing the Guidelines and Specs through XSLT. I have to use XSLT 
rather than regex search/replace because I need to leave alone all the 
instances in <egXML>s (meaning in a different namespace) because I need 
to change all those manually later, since the explanatory text will have 
to be changed at the same time. In the meantime, I'm going to do all the 
ones in the regular TEI namespace with XSLT.

One side-effect of XSLT processing is the resolution of character entity 
references. So where the Guidelines code has this:

   <formula>n&#x00D7;(n-1)</formula>

the output will resolve the numeric entity like this:

   <formula>n×(n-1)</formula>

I would like to preserve the entity references in their original state, 
but the only way to do this is to specify the output encoding as 
us-ascii, and that means that ALL non-ASCII characters would become 
entities -- obviously not what we want. There's no actual way to 
preserve only the existing entities as entities; they're resolved to 
their codepoints during the XML parse before the XSLT transform is done.

But I don't really think there's any reason to maintain the character 
entities, is there? There are only 13 in the Guidelines text, and 53 in 
the specs. Does anyone have any objection if they just get resolved to 
their characters? Most of them are uncontroversial things like accented 
es and degree signs that will display in most fonts anyway.

Silence means "by all means go ahead, and I promise not to complain 
later when I notice what you did."

Cheers,
Martin


More information about the tei-council mailing list