[tei-council] <content> vs <mixedContent>
Martin Holmes
mholmes at uvic.ca
Fri Oct 3 12:15:34 EDT 2014
On 14-10-03 08:55 AM, Lou Burnard wrote:
> On 03/10/14 16:47, Martin Holmes wrote:
>> I agree, on the assumption that there's nothing intrinsically
>> different between a content model which has some structure but allows
>> <p> anywhere in it, and a model with some structure which allows a
>> text node anywhere in it.
>
> There is a major difference: I don't know of any schema language that
> allows you to say "allow <p> anywhere" now that XML got rid of inclusion
> exceptions
No, but we can achieve the same effect with alternations between <p> and
members of another model, can't we?
>> Are we sure we need to make a distinction at all? In one sense,
>> there's no reason to treat a text node any differently from an element
>> node: mention it when it can appear, and that's the end of it.
>
> The reason for treating it differently is that a serialization can have
> a zero length textnode, which is not quite the same as an empty element.
Could you unpack that for me? Are you saying that this:
<foo></foo>
is different from this:
<bar/>
in a context in which <foo> is allowed to have text node children and
<bar> is not? If so, I'm not quite sure it's true to say that
<foo></foo> has an empty text node (unless the text node is mandatory).
If that were true, then if you specify that <foo> must have a text node
followed by a <thing> node, then this:
<foo><thing/></foo>
would be valid because you could claim that there was an empty text node
lurking in there. That feels strange to me. How do I know there aren't
invalid empty text nodes lurking all over my XML where they're not
supposed to be?
>> On the other hand, it's a bit special in that all sequences of text
>> nodes collapse to a single node;
>
> eh? I dont think a content model can have a sequence containing just
> <textNode>s. That would be nonsense.
That's not what I meant; I'm thinking of the problem that could arise
from content models in which various collections of elements and text
are interleaved such that a valid sequence of nodes includes more than
one text node together.
>
>> I can imagine content models (especially those relying on pre-defined
>> models) which end up allowing sequences of multiple text nodes; would
>> an ODD processor have to be aware of that?
>
> An ODD processor already has to resolve sequences of identical elements
> resulting from adapation of pre-defined models (we saw one just the
> other day on tei-l) so I think this is a known problem
In that case, isn't it just cleaner all round to specify text nodes
explicitly, rather than treating them separately from element nodes via
<mixedContent> or @allowText?
Cheers,
Martin
More information about the tei-council
mailing list