[tei-council] ampersand report

Syd Bauman s.bauman at neu.edu
Wed Nov 13 18:49:29 EST 2013


[Our Chair asked me to look for entity references, particularly
SYSTEM entity references, before 23:00 tonight. Sorry I'm a bit
late.]

Dear James --

Looking through all .xml files in P5/Source/Specs/ and in
P5/Source/Guidelines/*/ I found 127 candidate strings on 115 lines.
These are strings that match the production for an entity reference
but are not one of &, >, <, ', or ".[1] (They are
candidates because they are not entity references if they occur in a
comment or CDATA marked section.)

After knocking off those that were obviously inside comments (i.e.,
had both "<!--" before the candidate string, "-->" after it, all on
the same line -- and yes, I remembered to check for cases like "-->
&duck; <!--", although I only did that by eye), only 79 candidate
strings on 75 lines remained.

After knocking off those that were obviously inside CDATA marked
sections, only 73 candidates on 71 lines remained, spread over files
as follows:
   1 Source/Guidelines/en/ND-NamesDates.xml
   1 Source/Guidelines/fr/ND-NamesDates.xml
   1 Source/Guidelines/en/SA-LinkingSegmentationAlignment.xml
   1 Source/Guidelines/fr/SA-LinkingSegmentationAlignment.xml
   1 Source/Specs/re.xml
  66 Source/Guidelines/fr/HD-Header.xml

Those that are in ND and SA (which are being found in both en/ and
fr/, but of course there's really only 1 file) are just in comments
that my procedure didn't catch.[2] The one in re.xml is also in a
comment.[3]

A few of the 66 in fr/HD are in comments, but most are real entity
references. To be honest, I'm not sure how French P5 builds, as I was
not able to find where these entities are declared. (Although I only
looked for a few minutes.)

Notes
-----
[1] There were 684 occurrences of these 5 "built-in" entity references
    which are not of interest.
[2] Because "<!--" and "-->" were on different lines.
[3] A comment that was spread over several lines. Interestingly, it
    was written by me in January of 2002, which describes a change I
    made that to this day I don't know if I got it right or not.


More information about the tei-council mailing list