[tei-council] Constraints on anyURI

Martin Holmes mholmes at uvic.ca
Mon Nov 19 13:17:05 EST 2012


I was just addressing myself to some attribute-abuse I've been knowingly 
perpetrating in one of my projects, where I'm using @sameAs with a sort 
of key-like thing:

<m sameAs="n-CTL">...</m>

and I as intending to switch these values to a private URI scheme using 
a prefix. However, in the process I discovered that our encoders have 
been abusing the attribute in a broader sense, by putting multiple 
values in there, occasionally separated by commas:

<m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m> (full example is below).

This validates against RNG schemas. The datatype of @sameAs is a single 
data.pointer, which is xsd:anyURI, so both the spaces and the comma 
should ideally trigger an error. But the schema doesn't seem to attempt 
to check anyURI values as far as I can see.

Would it be practical to make this happen? In other words, could we 
(perhaps through Schematron) enforce compliance with RFC 3986 and 3987 
for xsd:anyURI values?

I did try validating the file with tei_all.xsd:

<http://www.tei-c.org/release/xml/tei/custom/schema/xsd/tei_all.xsd>

but it doesn't seem to be working properly; I get lots of error messages 
like this:

"Engine name: Xerces
Severity: error
Description: src-resolve: Cannot resolve the name 'xml:base' to a(n) 
'attribute declaration' component.
Start location: 926:35
URL: http://www.w3.org/TR/xmlschema-1/#src-resolve"


[Full entry from which example above comes]
<entry xml:id="ḥəyḥuyn_lx">

                <form>
                   <pron>
                      <seg type="p" subtype="i">ḥəyḥúyn lx</seg>
                      <bibl corresp="psn:ECH">ECH</bibl>
                      <seg type="n">ḥəyḥóyənlᵊx</seg>
                      <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
                   </pron>
                   <hyph>
                      <m sameAs="DIST">CəC</m>+√<m 
sameAs="ḥuy">ḥúy</m>-<m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m>
                      <m sameAs="lx">lx</m>
             </hyph>
                </form>

                <sense>
                   <def>
                      <seg>
                         <gloss>annoy</gloss>; <gloss>bother</gloss> 
someone; <gloss>disturb</gloss>
                      </seg>
                      <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
                   </def>
                </sense>

             </entry>

Cheers,
Martin
-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)


More information about the tei-council mailing list