[tei-council] <content> vs <mixedContent>

Fri Oct 3 12:15:34 EDT 2014

On 14-10-03 08:55 AM, Lou Burnard wrote:
> On 03/10/14 16:47, Martin Holmes wrote:
>> I agree, on the assumption that there's nothing intrinsically
>> different between a content model which has some structure but allows
>> <p> anywhere in it, and a model with some structure which allows a
>> text node anywhere in it.
>
> There is a major difference: I don't know of any schema language that
> allows you to say "allow <p> anywhere" now that XML got rid of inclusion
> exceptions

No, but we can achieve the same effect with alternations between <p> and 
members of another model, can't we?

>> Are we sure we need to make a distinction at all? In one sense,
>> there's no reason to treat a text node any differently from an element
>> node: mention it when it can appear, and that's the end of it.
>
> The reason for treating it differently is that a serialization can have
> a zero length textnode, which is not quite the same as an empty element.

Could you unpack that for me? Are you saying that this:

<foo></foo>

is different from this:

<bar/>

in a context in which <foo> is allowed to have text node children and 
<bar> is not? If so, I'm not quite sure it's true to say that 
<foo></foo> has an empty text node (unless the text node is mandatory). 
If that were true, then if you specify that <foo> must have a text node 
followed by a <thing> node, then this:

<foo><thing/></foo>

would be valid because you could claim that there was an empty text node 
lurking in there. That feels strange to me. How do I know there aren't 
invalid empty text nodes lurking all over my XML where they're not 
supposed to be?

>> On the other hand, it's a bit special in that all sequences of text
>> nodes collapse to a single node;
>
> eh? I dont think a content model can have a sequence containing just
> <textNode>s. That would be nonsense.

That's not what I meant; I'm thinking of the problem that could arise 
from content models in which various collections of elements and text 
are interleaved such that a valid sequence of nodes includes more than 
one text node together.

>
>> I can imagine content models (especially those relying on pre-defined
>> models) which end up allowing sequences of multiple text nodes; would
>> an ODD processor have to be aware of that?
>
> An ODD processor already has to resolve sequences of identical elements
> resulting from adapation of pre-defined models (we saw one just the
> other day on tei-l) so I think this is a known problem

In that case, isn't it just cleaner all round to specify text nodes 
explicitly, rather than treating them separately from element nodes via 
<mixedContent> or @allowText?

Cheers,
Martin