[tei-council] Numeric entity references in Guidelines text
Syd Bauman
Syd_Bauman at Brown.edu
Sun Apr 14 01:07:02 EDT 2013
1) I don't care at all if you change numeric character entities to
their characters or vice-versa. Feel free.
2) I'm not convinced that you really need to change those in
<eg:egXML> later manually. (Although this is an academic point --
I don't care how you do it.) There are only 5 of them (i.e.,
<eg:list>) with @rend (and I think all 5 of those also have
@type). So couldn't you do those by hand first, then globally
change all?
> I'm going to be doing the conversion of list/@type to list/@rend by
> processing the Guidelines and Specs through XSLT. I have to use
> XSLT rather than regex search/replace because I need to leave alone
> all the instances in <egXML>s (meaning in a different namespace)
> because I need to change all those manually later, since the
> explanatory text will have to be changed at the same time. In the
> meantime, I'm going to do all the ones in the regular TEI namespace
> with XSLT.
>
> One side-effect of XSLT processing is the resolution of character
> entity references. So where the Guidelines code has this:
>
> <formula>n×(n-1)</formula>
>
> the output will resolve the numeric entity like this:
>
> <formula>n×(n-1)</formula>
>
> I would like to preserve the entity references in their original
> state, but the only way to do this is to specify the output encoding
> as us-ascii, and that means that ALL non-ASCII characters would
> become entities -- obviously not what we want. There's no actual way
> to preserve only the existing entities as entities; they're resolved
> to their codepoints during the XML parse before the XSLT transform is
> done.
>
> But I don't really think there's any reason to maintain the character
> entities, is there? There are only 13 in the Guidelines text, and 53
> in the specs. Does anyone have any objection if they just get
> resolved to their characters? Most of them are uncontroversial things
> like accented es and degree signs that will display in most fonts
> anyway.
>
> Silence means "by all means go ahead, and I promise not to complain
> later when I notice what you did."
More information about the tei-council
mailing list