[tei-council] Constraints on anyURI
Martin Holmes
mholmes at uvic.ca
Mon Nov 19 14:19:16 EST 2012
Further to this: the XML Schema Datatypes rec says:
"Note: Each URI scheme imposes specialized syntax rules for URIs in
that scheme, including restrictions on the syntax of allowed fragment
identifiers. Because it is impractical for processors to check that a
value is a context-appropriate URI reference, this specification follows
the lead of [RFC 2396] (as amended by [RFC 2732]) in this matter: such
rules and restrictions are not part of type validity and are not checked
by ·minimally conforming· processors. Thus in practice the above
definition imposes only very modest obligations on ·minimally
conforming· processors. "
Looking at RFC 3987 (IRIs), I think it's probably impractical to do
anything approaching real validation, but it might be possible to catch
some obvious errors (such as spaces in single data.pointer values, or
percent characters not followed by hexadecimal numbers). Is this worth
pursuing?
Cheers,
Martin
On 12-11-19 10:17 AM, Martin Holmes wrote:
> I was just addressing myself to some attribute-abuse I've been knowingly
> perpetrating in one of my projects, where I'm using @sameAs with a sort
> of key-like thing:
>
> <m sameAs="n-CTL">...</m>
>
> and I as intending to switch these values to a private URI scheme using
> a prefix. However, in the process I discovered that our encoders have
> been abusing the attribute in a broader sense, by putting multiple
> values in there, occasionally separated by commas:
>
> <m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m> (full example is below).
>
> This validates against RNG schemas. The datatype of @sameAs is a single
> data.pointer, which is xsd:anyURI, so both the spaces and the comma
> should ideally trigger an error. But the schema doesn't seem to attempt
> to check anyURI values as far as I can see.
>
> Would it be practical to make this happen? In other words, could we
> (perhaps through Schematron) enforce compliance with RFC 3986 and 3987
> for xsd:anyURI values?
>
> I did try validating the file with tei_all.xsd:
>
> <http://www.tei-c.org/release/xml/tei/custom/schema/xsd/tei_all.xsd>
>
> but it doesn't seem to be working properly; I get lots of error messages
> like this:
>
> "Engine name: Xerces
> Severity: error
> Description: src-resolve: Cannot resolve the name 'xml:base' to a(n)
> 'attribute declaration' component.
> Start location: 926:35
> URL: http://www.w3.org/TR/xmlschema-1/#src-resolve"
>
>
> [Full entry from which example above comes]
> <entry xml:id="ḥəyḥuyn_lx">
>
> <form>
> <pron>
> <seg type="p" subtype="i">ḥəyḥúyn lx</seg>
> <bibl corresp="psn:ECH">ECH</bibl>
> <seg type="n">ḥəyḥóyənlᵊx</seg>
> <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
> </pron>
> <hyph>
> <m sameAs="DIST">CəC</m>+√<m
> sameAs="ḥuy">ḥúy</m>-<m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m>
> <m sameAs="lx">lx</m>
> </hyph>
> </form>
>
> <sense>
> <def>
> <seg>
> <gloss>annoy</gloss>; <gloss>bother</gloss>
> someone; <gloss>disturb</gloss>
> </seg>
> <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
> </def>
> </sense>
>
> </entry>
>
> Cheers,
> Martin
>
--
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)
More information about the tei-council
mailing list