[tei-council] Constraints on anyURI

James Cummings James.Cummings at it.ox.ac.uk
Mon Nov 19 15:28:29 EST 2012


Hi Martin,

I think that if possible we should try to validate that 
attributes which contain a single data.pointer do not have any 
spaces in them. More complex validation is certainly possible, 
but lower priority I would say and much more error prone.

-James

On 19/11/12 19:19, Martin Holmes wrote:
> Further to this: the XML Schema Datatypes rec says:
>
> "Note:  Each URI scheme imposes specialized syntax rules for URIs in
> that scheme, including restrictions on the syntax of allowed fragment
> identifiers. Because it is impractical for processors to check that a
> value is a context-appropriate URI reference, this specification follows
> the lead of [RFC 2396] (as amended by [RFC 2732]) in this matter: such
> rules and restrictions are not part of type validity and are not checked
> by ·minimally conforming· processors. Thus in practice the above
> definition imposes only very modest obligations on ·minimally
> conforming· processors. "
>
> Looking at RFC 3987 (IRIs), I think it's probably impractical to do
> anything approaching real validation, but it might be possible to catch
> some obvious errors (such as spaces in single data.pointer values, or
> percent characters not followed by hexadecimal numbers). Is this worth
> pursuing?
>
> Cheers,
> Martin
>
> On 12-11-19 10:17 AM, Martin Holmes wrote:
>> I was just addressing myself to some attribute-abuse I've been knowingly
>> perpetrating in one of my projects, where I'm using @sameAs with a sort
>> of key-like thing:
>>
>> <m sameAs="n-CTL">...</m>
>>
>> and I as intending to switch these values to a private URI scheme using
>> a prefix. However, in the process I discovered that our encoders have
>> been abusing the attribute in a broader sense, by putting multiple
>> values in there, occasionally separated by commas:
>>
>> <m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m> (full example is below).
>>
>> This validates against RNG schemas. The datatype of @sameAs is a single
>> data.pointer, which is xsd:anyURI, so both the spaces and the comma
>> should ideally trigger an error. But the schema doesn't seem to attempt
>> to check anyURI values as far as I can see.
>>
>> Would it be practical to make this happen? In other words, could we
>> (perhaps through Schematron) enforce compliance with RFC 3986 and 3987
>> for xsd:anyURI values?
>>
>> I did try validating the file with tei_all.xsd:
>>
>> <http://www.tei-c.org/release/xml/tei/custom/schema/xsd/tei_all.xsd>
>>
>> but it doesn't seem to be working properly; I get lots of error messages
>> like this:
>>
>> "Engine name: Xerces
>> Severity: error
>> Description: src-resolve: Cannot resolve the name 'xml:base' to a(n)
>> 'attribute declaration' component.
>> Start location: 926:35
>> URL: http://www.w3.org/TR/xmlschema-1/#src-resolve"
>>
>>
>> [Full entry from which example above comes]
>> <entry xml:id="ḥəyḥuyn_lx">
>>
>>                   <form>
>>                      <pron>
>>                         <seg type="p" subtype="i">ḥəyḥúyn lx</seg>
>>                         <bibl corresp="psn:ECH">ECH</bibl>
>>                         <seg type="n">ḥəyḥóyənlᵊx</seg>
>>                         <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
>>                      </pron>
>>                      <hyph>
>>                         <m sameAs="DIST">CəC</m>+√<m
>> sameAs="ḥuy">ḥúy</m>-<m sameAs="n-CTL t-TR Ø-OBJ, n-SUBJ">n</m>
>>                         <m sameAs="lx">lx</m>
>>                </hyph>
>>                   </form>
>>
>>                   <sense>
>>                      <def>
>>                         <seg>
>>                            <gloss>annoy</gloss>; <gloss>bother</gloss>
>> someone; <gloss>disturb</gloss>
>>                         </seg>
>>                         <bibl corresp="psn:JM psn:AM">Y21.96</bibl>
>>                      </def>
>>                   </sense>
>>
>>                </entry>
>>
>> Cheers,
>> Martin
>>
>


-- 
Dr James Cummings, James.Cummings at it.ox.ac.uk
Academic IT Services, University of Oxford


More information about the tei-council mailing list