[tei-council] Numeric entity references in Guidelines text

Martin Holmes mholmes at uvic.ca
Sun Apr 14 07:39:38 EDT 2013


Hi Syd,

I want to change our usage first, before changing the examples, because 
I want to take time to look at the examples -- there's explanatory text 
to rewrite there.

Cheers,
Martin

On 13-04-14 01:07 AM, Syd Bauman wrote:
> 1) I don't care at all if you change numeric character entities to
>     their characters or vice-versa. Feel free.
>
> 2) I'm not convinced that you really need to change those in
>     <eg:egXML> later manually. (Although this is an academic point --
>     I don't care how you do it.) There are only 5 of them (i.e.,
>     <eg:list>) with @rend (and I think all 5 of those also have
>     @type). So couldn't you do those by hand first, then globally
>     change all?
>
>> I'm going to be doing the conversion of list/@type to list/@rend by
>> processing the Guidelines and Specs through XSLT. I have to use
>> XSLT rather than regex search/replace because I need to leave alone
>> all the instances in <egXML>s (meaning in a different namespace)
>> because I need to change all those manually later, since the
>> explanatory text will have to be changed at the same time. In the
>> meantime, I'm going to do all the ones in the regular TEI namespace
>> with XSLT.
>>
>> One side-effect of XSLT processing is the resolution of character
>> entity references. So where the Guidelines code has this:
>>
>>     <formula>n&#x00D7;(n-1)</formula>
>>
>> the output will resolve the numeric entity like this:
>>
>>     <formula>n×(n-1)</formula>
>>
>> I would like to preserve the entity references in their original
>> state, but the only way to do this is to specify the output encoding
>> as us-ascii, and that means that ALL non-ASCII characters would
>> become entities -- obviously not what we want. There's no actual way
>> to preserve only the existing entities as entities; they're resolved
>> to their codepoints during the XML parse before the XSLT transform is
>> done.
>>
>> But I don't really think there's any reason to maintain the character
>> entities, is there? There are only 13 in the Guidelines text, and 53
>> in the specs. Does anyone have any objection if they just get
>> resolved to their characters? Most of them are uncontroversial things
>> like accented es and degree signs that will display in most fonts
>> anyway.
>>
>> Silence means "by all means go ahead, and I promise not to complain
>> later when I notice what you did."


More information about the tei-council mailing list