[tei-council] Constraints on anyURI
James Cummings
James.Cummings at it.ox.ac.uk
Mon Nov 19 15:28:29 EST 2012
Hi Martin,
I think that if possible we should try to validate that
attributes which contain a single data.pointer do not have any
spaces in them. More complex validation is certainly possible,
but lower priority I would say and much more error prone.
-James
On 19/11/12 19:19, Martin Holmes wrote:
> Further to this: the XML Schema Datatypes rec says:
>
> "Note: Each URI scheme imposes specialized syntax rules for URIs in
> that scheme, including restrictions on the syntax of allowed fragment
> identifiers. Because it is impractical for processors to check that a
> value is a context-appropriate URI reference, this specification follows
> the lead of [RFC 2396] (as amended by [RFC 2732]) in this matter: such
> rules and restrictions are not part of type validity and are not checked
> by ·minimally conforming· processors. Thus in practice the above
> definition imposes only very modest obligations on ·minimally
> conforming· processors. "
>
> Looking at RFC 3987 (IRIs), I think it's probably impractical to do
> anything approaching real validation, but it might be possible to catch
> some obvious errors (such as spaces in single data.pointer values, or
> percent characters not followed by hexadecimal numbers). Is this worth
> pursuing?
>
> Cheers,
> Martin
>
> On 12-11-19 10:17 AM, Martin Holmes wrote:
>> I was just addressing myself to some attribute-abuse I've been knowingly
>> perpetrating in one of my projects, where I'm using @sameAs with a sort
>> of key-like thing:
>>
>> <m sameAs="n-CTL">...</m>
>>
>> and I as intending to switch these values to a private URI scheme using
>> a prefix. However, in the process I discovered that our encoders have
>> been abusing the attribute in a broader sense, by putting multiple
>> values in there, occasionally separated by commas:
>>
>> <m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m> (full example is below).
>>
>> This validates against RNG schemas. The datatype of @sameAs is a single
>> data.pointer, which is xsd:anyURI, so both the spaces and the comma
>> should ideally trigger an error. But the schema doesn't seem to attempt
>> to check anyURI values as far as I can see.
>>
>> Would it be practical to make this happen? In other words, could we
>> (perhaps through Schematron) enforce compliance with RFC 3986 and 3987
>> for xsd:anyURI values?
>>
>> I did try validating the file with tei_all.xsd:
>>
>> <http://www.tei-c.org/release/xml/tei/custom/schema/xsd/tei_all.xsd>
>>
>> but it doesn't seem to be working properly; I get lots of error messages
>> like this:
>>
>> "Engine name: Xerces
>> Severity: error
>> Description: src-resolve: Cannot resolve the name 'xml:base' to a(n)
>> 'attribute declaration' component.
>> Start location: 926:35
>> URL: http://www.w3.org/TR/xmlschema-1/#src-resolve"
>>
>>
>> [Full entry from which example above comes]
>> <entry xml:id="ḥəyḥuyn_lx">
>>
>> <form>
>> <pron>
>> <seg type="p" subtype="i">ḥəyḥúyn lx</seg>
>> <bibl corresp="psn:ECH">ECH</bibl>
>> <seg type="n">ḥəyḥóyənlᵊx</seg>
>> <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
>> </pron>
>> <hyph>
>> <m sameAs="DIST">CəC</m>+√<m
>> sameAs="ḥuy">ḥúy</m>-<m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m>
>> <m sameAs="lx">lx</m>
>> </hyph>
>> </form>
>>
>> <sense>
>> <def>
>> <seg>
>> <gloss>annoy</gloss>; <gloss>bother</gloss>
>> someone; <gloss>disturb</gloss>
>> </seg>
>> <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
>> </def>
>> </sense>
>>
>> </entry>
>>
>> Cheers,
>> Martin
>>
>
--
Dr James Cummings, James.Cummings at it.ox.ac.uk
Academic IT Services, University of Oxford
More information about the tei-council
mailing list