[tei-council] Numeric entity references in Guidelines text

Syd Bauman Syd_Bauman at Brown.edu
Sun Apr 14 01:07:02 EDT 2013


1) I don't care at all if you change numeric character entities to
   their characters or vice-versa. Feel free.

2) I'm not convinced that you really need to change those in
   <eg:egXML> later manually. (Although this is an academic point --
   I don't care how you do it.) There are only 5 of them (i.e.,
   <eg:list>) with @rend (and I think all 5 of those also have
   @type). So couldn't you do those by hand first, then globally
   change all?

> I'm going to be doing the conversion of list/@type to list/@rend by
> processing the Guidelines and Specs through XSLT. I have to use
> XSLT rather than regex search/replace because I need to leave alone
> all the instances in <egXML>s (meaning in a different namespace)
> because I need to change all those manually later, since the
> explanatory text will have to be changed at the same time. In the
> meantime, I'm going to do all the ones in the regular TEI namespace
> with XSLT.
> 
> One side-effect of XSLT processing is the resolution of character
> entity references. So where the Guidelines code has this:
> 
>    <formula>n&#x00D7;(n-1)</formula>
> 
> the output will resolve the numeric entity like this:
> 
>    <formula>n×(n-1)</formula>
> 
> I would like to preserve the entity references in their original
> state, but the only way to do this is to specify the output encoding
> as us-ascii, and that means that ALL non-ASCII characters would
> become entities -- obviously not what we want. There's no actual way
> to preserve only the existing entities as entities; they're resolved
> to their codepoints during the XML parse before the XSLT transform is
> done.
> 
> But I don't really think there's any reason to maintain the character
> entities, is there? There are only 13 in the Guidelines text, and 53
> in the specs. Does anyone have any objection if they just get
> resolved to their characters? Most of them are uncontroversial things
> like accented es and degree signs that will display in most fonts
> anyway.
> 
> Silence means "by all means go ahead, and I promise not to complain 
> later when I notice what you did."


More information about the tei-council mailing list