[tei-council] Constraints on anyURI

Martin Holmes mholmes at uvic.ca
Mon Nov 19 14:19:16 EST 2012


Further to this: the XML Schema Datatypes rec says:

"Note:  Each URI scheme imposes specialized syntax rules for URIs in 
that scheme, including restrictions on the syntax of allowed fragment 
identifiers. Because it is impractical for processors to check that a 
value is a context-appropriate URI reference, this specification follows 
the lead of [RFC 2396] (as amended by [RFC 2732]) in this matter: such 
rules and restrictions are not part of type validity and are not checked 
by ·minimally conforming· processors. Thus in practice the above 
definition imposes only very modest obligations on ·minimally 
conforming· processors. "

Looking at RFC 3987 (IRIs), I think it's probably impractical to do 
anything approaching real validation, but it might be possible to catch 
some obvious errors (such as spaces in single data.pointer values, or 
percent characters not followed by hexadecimal numbers). Is this worth 
pursuing?

Cheers,
Martin

On 12-11-19 10:17 AM, Martin Holmes wrote:
> I was just addressing myself to some attribute-abuse I've been knowingly
> perpetrating in one of my projects, where I'm using @sameAs with a sort
> of key-like thing:
>
> <m sameAs="n-CTL">...</m>
>
> and I as intending to switch these values to a private URI scheme using
> a prefix. However, in the process I discovered that our encoders have
> been abusing the attribute in a broader sense, by putting multiple
> values in there, occasionally separated by commas:
>
> <m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m> (full example is below).
>
> This validates against RNG schemas. The datatype of @sameAs is a single
> data.pointer, which is xsd:anyURI, so both the spaces and the comma
> should ideally trigger an error. But the schema doesn't seem to attempt
> to check anyURI values as far as I can see.
>
> Would it be practical to make this happen? In other words, could we
> (perhaps through Schematron) enforce compliance with RFC 3986 and 3987
> for xsd:anyURI values?
>
> I did try validating the file with tei_all.xsd:
>
> <http://www.tei-c.org/release/xml/tei/custom/schema/xsd/tei_all.xsd>
>
> but it doesn't seem to be working properly; I get lots of error messages
> like this:
>
> "Engine name: Xerces
> Severity: error
> Description: src-resolve: Cannot resolve the name 'xml:base' to a(n)
> 'attribute declaration' component.
> Start location: 926:35
> URL: http://www.w3.org/TR/xmlschema-1/#src-resolve"
>
>
> [Full entry from which example above comes]
> <entry xml:id="ḥəyḥuyn_lx">
>
>                  <form>
>                     <pron>
>                        <seg type="p" subtype="i">ḥəyḥúyn lx</seg>
>                        <bibl corresp="psn:ECH">ECH</bibl>
>                        <seg type="n">ḥəyḥóyənlᵊx</seg>
>                        <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
>                     </pron>
>                     <hyph>
>                        <m sameAs="DIST">CəC</m>+√<m
> sameAs="ḥuy">ḥúy</m>-<m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m>
>                        <m sameAs="lx">lx</m>
>               </hyph>
>                  </form>
>
>                  <sense>
>                     <def>
>                        <seg>
>                           <gloss>annoy</gloss>; <gloss>bother</gloss>
> someone; <gloss>disturb</gloss>
>                        </seg>
>                        <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
>                     </def>
>                  </sense>
>
>               </entry>
>
> Cheers,
> Martin
>

-- 
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)


More information about the tei-council mailing list