[tei-council] list types and rends: bug 460
Martin Holmes
mholmes at uvic.ca
Mon Dec 23 12:37:23 EST 2013
This makes perfect sense; but the chapter does appear to be making the
claim that @n is, or can be, transcribed from the text:
"If the numbering or other identification for the items in a list is
unremarkable and may be reconstructed by any processing program, no
enumerator need be specified. If however an enumerator is retained in
the encoded text, it may be supplied either by using the n attribute on
the item element, or by using a label element."
The chapter claims that these two examples are equivalent:
I will add two facts, which have seldom occurred in
the composition of six, or even five quartos.
<list rend="runon" type="ordered">
<label>(1)</label>
<item>My first rough manuscript, without any
intermediate copy, has been sent to the press.</item>
<label>(2)</label>
<item>Not a sheet has been seen by any human
eyes, excepting those of the author and the printer:
the faults and the merits are exclusively my own.</item>
</list>
I will add two facts, which have seldom occurred in
the composition of six, or even five quartos.
<list rend="runon" type="ordered">
<item n="1">My first rough manuscript, without any
intermediate copy, has been sent to the press.</item>
<item n="2">Not a sheet has been seen by any human
eyes, excepting those of the author and the printer:
the faults and the merits are exclusively my own.</item>
</list>
However, the first example transcribes the numerals with parentheses
(presumably as they originally appeared in the text), while the second
provides @n attributes where the numerals do not have parentheses.
Processing #2 will require magic knowledge to restore the parentheses.
It's very handy to have normalized versions of chapter, page and list
item numbers or symbols (although if they're regular, in that they
accord with a straightforward count, they can be calculated). But it's
not true to say that these are equivalent to textual transcription; and
I think the chapter prose gives that impression.
Cheers,
Martin
On 13-12-23 08:51 AM, Paul Schaffner wrote:
> This conversation took place completely over the weekend (my offline
> period), so I am late catching up. But my answer would be that to
> expect the contents of all attributes never to be based on something
> in the text is silly. No one said that @n was *transcribed* from the
> text: simply that it is, or can be, based on the text. The extent to
> which you
> normalize the value is up to you. E.g. we normalize chapter numbers
> into arabic numerals (from roman, or roman/arabic mixtures, etc.)
> when assigning @n values to chapter divs. And we remove all
> printing accidents when capturing page numbers in pb @ns, but
> retain the roman/arabic distinction there. The attributes that we
> use similarly based in part on a normalized transcription of the text
> are:
>
> pb @n
> p @n
> lg @n
> l @n
> div @n
> item @n
> milestone @n
> milestone @unit
>
> and maybe
>
> gap @rend
>
> In some, but not all of these, cases, label is available:
> but we use label *alongside* @n, not as alternatives.
>
> <div type="chapter" n="1">
> <head>Incipit capitulum primum</head>
>
> <div type="reign" n="Edward IV">
> <head>Here begins the reign of Edward, fourth of that name.</head>
>
> <milestone unit="fol." n="23v">
>
> <pb n="42 [recte 43]" facs="...">
>
> <p n="3"><label>Reason III.</label> ... </p>
>
>
> pfs
>
> On Sun, Dec 22, 2013, at 22:59, Martin Holmes wrote:
>> I agree completely with Kevin on this. Not only is it hard to correct
>> errors etc., it's also impossible to record styling information for the
>> text in attributes -- what if your page number happens to be in italics,
>> for instance?
>>
>> I've always assumed that the tolerance for transcribed text in @n was a
>> kindness for those moving over from P4, and that it would eventually be
>> phased out. Despite the recent discussion with Peter Robinson et al
>> about the convenience of the old <orig reg="thing">wosname</orig>, I
>> really don't like it.
>>
>> Cheers,
>> Martin
>>
>> On 13-12-22 05:05 PM, Kevin Hawkins wrote:
>>> While it's convenient to put transcribed text in certain attribute
>>> values, like @n and @orig, you can run into problems when doing so if
>>> the text you want to transcribe isn't a valid attribute value or
>>> contains gaiji, or if you want to point out an error.
>>>
>>> So if there was an error in the page numbering, I might capture this by
>>> doing:
>>>
>>> <fw type="pageNum" place="top-right">
>>> <choice>
>>> <sic>29</sic>
>>> <corr>30</corr>
>>> </choice>
>>> </fw>
>>>
>>> For an error in a chapter number, I would have:
>>>
>>> <div>
>>> <head>Chapter
>>> <choice>
>>> <sic>III</sic>
>>> <corr>IV</corr>
>>> </choice>
>>> </head>
>>> </div>
>>>
>>> Similarly, for an error in a list numbering, I would have:
>>>
>>> <list>
>>>
>>> <label>1</label>
>>> <item>first item</item>
>>>
>>> <label>2</label>
>>> <item>second item</item>
>>>
>>> <label>
>>> <choice>
>>> <sic>2</sic>
>>> <corr>3</corr>
>>> </choice>
>>> </label>
>>> <item>third item</item>
>>>
>>> </list>
>>>
>>> On 12/22/13 4:34 PM, Lou Burnard wrote:
>>>> But would you use <label> to capture a page number or chapter number?
>>>> (see my comment earlier in this thread)?
>>>>
>>>>
>>>>
>>>> On 22/12/13 19:13, Kevin Hawkins wrote:
>>>>> Yes, it might be worth exploring this distinction in the Guidelines,
>>>>> giving contrasting examples of it done both ways and noting that the
>>>>> <label> approach is required if you want to capture errors in the original.
>>>>>
>>>>> I've also been thinking about how the question of whether a list can
>>>>> reallyl be ordered depends in part on whether you are transcribing a
>>>>> source document (as we generally assume in TEI) or you are composing
>>>>> something brand new in TEI (as we've tried to support since P5 was
>>>>> released). In other words, when you identify a list in a source
>>>>> document, it has inherent ordering, but when you create your own list,
>>>>> you may want to assert that the list you are creating has no order (even
>>>>> though the act of writing requires that you impose an order when putting
>>>>> it in a document).
>>>>>
>>>>> --Kevin
>>>>>
>>>>> On 12/22/2013 2:03 PM, Martin Holmes wrote:
>>>>>> Thinking more about this, there is some apparent inconsistency in my
>>>>>> position:
>>>>>>
>>>>>> On the one hand, I'm arguing that "1", "2", "3" etc. shouldn't appear in
>>>>>> @n if they appear in the original text, because transcribed text
>>>>>> shouldn't be put into attributes;
>>>>>>
>>>>>> On the other hand, I'm arguing that<list rend="numbered"> should be
>>>>>> used to represent a list which appears with numbers in front of the items.
>>>>>>
>>>>>> But there is some method in this. If the transcriber's view is that the
>>>>>> numerical or bullet-like symbols decorating the items are in textual --
>>>>>> in other words, part of the transcription -- then they can use<label>
>>>>>> to capture them. If they believe that the decorations are non-textual
>>>>>> (in the same way that indents, margins, italics and other such features
>>>>>> are non-textual -- maybe supra-textual?), and that they are
>>>>>> typographically consistent, then they can be represented using @rend.
>>>>>> This is a useful distinction. It's interesting that if you create a list
>>>>>> in HTML and set it to list-style-type: decimal, then copy-paste the list
>>>>>> from your browser, the numbers will not be included in the paste.
>>>>>>
>>>>>> Cheers,
>>>>>> Martin
>>>>>>
>>>>>> On 13-12-22 10:45 AM, Martin Holmes wrote:
>>>>>>> On 13-12-22 10:12 AM, Sebastian Rahtz wrote:
>>>>>>>> On 22 Dec 2013, at 17:54, Martin Holmes<mholmes at uvic.ca> wrote:
>>>>>>>>> I see nothing in the definition of @n which suggests it's intended for transcribing things that actually appear in the text:
>>>>>>>>>
>>>>>>>>> <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.global.html#tei_att.n>
>>>>>>>>>
>>>>>>>>> Are there other instances in which we ask people to put transcribed text into attributes? I thought the war on attributes was supposed to eliminate this sort of thing entirely. It seems especially bad when<label> is sitting there for precisely this purpose.
>>>>>>>> if you want a glorious example of our madness, look at att.global.xml:
>>>>>>>>
>>>>>>>> <bibl n=" 1">
>>>>>>>> <bibl n=" 2">
>>>>>>>> <bibl n=" 3”>
>>>>>>>>
>>>>>>>> what on earth are those spaces/tabs doing in @n, I wonder??
>>>>>>> That is very hideous. I couldn't bear it so I've removed them. But even
>>>>>>> more amusing is the following French example, in which although the
>>>>>>> nasty @n attributes remain, the @xml:base attribute which is supposed to
>>>>>>> be the point of the example has been deleted. Urg. Should I make up a
>>>>>>> phony @xml:base for that one?
>>>>>>>
>>>>>>>
>>>>>>>> but consider these:
>>>>>>>>
>>>>>>>> <divGen n="Index Nominum" type="NAMES"/>
>>>>>>>> <divGen n="Index Rerum" type="THINGS”/>
>>>>>>>>
>>>>>>>> what is “Index Rerum” if not literal text? mind you, that suggests to me that<divGen> should support<head>.
>>>>>>>
>>>>>>> I've always assumed that divGen is most likely to be used to create a
>>>>>>> modern, external list of contents, rather than to hopefully reconstruct
>>>>>>> programmatically something that appears in the original text; my
>>>>>>> experience with original TOCs is that they're inevitably inconsistent or
>>>>>>> idiosyncratic, and it would be impractical to try to reconstruct them
>>>>>>> mechanically.
>>>>>>>
>>>>>>>> @n "gives a number (or other label) for an element”, which surely is something that should have been killed the The Attribute Wat.
>>>>>>> I have no objection to its being used to provide a label, but not when
>>>>>>> that label is in the original text.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Martin
>>>>>>>
>>>>>>>> --
>>>>>>>> Sebastian Rahtz
>>>>>>>> Director (Research) of Academic IT
>>>>>>>> University of Oxford IT Services
>>>>>>>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>>>>>>>>
>>>>
>> --
>> tei-council mailing list
>> tei-council at lists.village.Virginia.EDU
>> http://lists.village.Virginia.EDU/mailman/listinfo/tei-council
>>
>> PLEASE NOTE: postings to this list are publicly archived
--
Martin Holmes
University of Victoria Humanities Computing and Media Centre
(mholmes at uvic.ca)
More information about the tei-council
mailing list