[tei-council] Report from Berlin
Lou Burnard
lou.burnard at retired.ox.ac.uk
Tue Oct 23 16:07:08 EDT 2012
*EIT MMI Meeting, Berlin 22 oct 2012*
As noted at the last FTF, Laurent Romary in his capacity as ISO TC7 WG3
chair has proposed a new ISO/TEI joint activity in the area of speech
transcription, which comes with the slightly obscure label of EIT MMI:
the last part of which is short for “multimodal interaction”, although
it seems the activity is really only concerned with speech
transcription. I was invited to attend the third EIT MMI workshop, held
at the DIN's offices in Berlin. Prime movers in the activity, apart from
Laurent, appear to be Thomas Schmidt and Andreas Witt from the Institut
fur Deutsche Sprache in Mannheim, but a number of other European
research labs, mostly concerned with analysis of corpora of human
computer interaction, were also represented; specifically: Nadia Mana
from FBK (Trento, Italy); Tatjana Scheffler (DFKI, Germany); Khiet
Truong (Univ of Twente) ; Benjamin Weiss (TU Berlin); Mathias Wilhelm
(DAI Labor); Bertrand Gaiffe (ATILF, Nancy). This being an ISO activity,
the real world of commerce and industry was also represented by Felix
Burkhardt from Deutsche Telekom's Innovation Lab.
Related ISO activity mentioned by Laurent included the work on Discourse
Relations led by Harry Bunt, and the long-awaited MAF (morpho-syntactic
annotation framework) which are both due to appear Real Soon Now. A
quick tour de table confirmed my impression that most of the attendees
were primarily researchers in Human Computer Interaction with little
direct experience of the construction or encoding of spoken corpora, but
Thomas Schmidt more than made up for that. The main business of the day
was to go through his preliminary draft working document, the objective
of which is to confer ISO authority on a subset of the existing TEI
proposals for spoken text transcription, with some possible
modification. The underlying work is well described in Schmidt's recent
excellent article in TEIJ, so I won't repeat it: essentially, it
consists of a close look at the majority of transcription formats used
by the relevant research community/ies and tools, a synthesis of what
they have in common, and suggestions of how that synthesis maps to TEI.
This is to a large extent motivated by concerns about preservation and
migration of data in “legacy” formats.
The discussion began by establishing boundaries: despite my proposal to
the contrary, it seems there was little appetite to extend the work into
the area of truly multimodal transcriptions, which was still generally
felt to be insufficiently understood for a practice-based standard to be
appropriate. Concern was expressed that we should not make ad hoc
premature suggestions. So the document really only concerns transcribed
speech. There was no disagreement with the general approach which is to
distinguish a small number of macro-structural featuresprovide
guidelines about how to mark up specific units of analysis at the
micro-structural level, using a subset of the TEI.
I was also much cheered by two further remarks he made
the graph-based “annotation framework” formalisation proposed by Bird
and Liberman was theoretically complete but so generic as to be
practically useless (I paraphrase)
at the micro level, everything you need is there in the TEI (I quote)
Discussion focussed on the following points raised by the working document:
*Tiers*
Many existing tools organise transcriptions into “tiers” of annotation.
These seem to be purely technical artefacts, which can be addressed more
exactly by used of XML markup. Unlike “levels” of annotation, they have
no semantics. It's doubtful that we need a <tier> element.
*Metadata -1*
How many of the (very rich) TEI proposals should be included, or
mentioned? And how should the three things Thomas had found missing be
supplied? I suggested that <appinfo> was an appropriate way to record
information about the transcription tool used; that the definition of
the transcription system used belonged in the <encodingDesc>; and agreed
that there was nothing specifically provided for recording pointers or
links to the original video or audio transcribed. In the meeting, I
speculated that maybe there was scope for extending (or misusing)
<facsimile> for this last purpose; another possibility which pccurs to
me as I type these notes is that one could also extend <recordingDesc>.
*Timing*
The timeline is fundamental to the macrostructure of a transcript.
Thomas' examples all used absolute times for its <when>s, but I
suggested that relative ones might be easier. The document ordering both
of <when>s and of transcribed speech should reflect the temporal order
as far as possible; this would allegedly facilitate interoperability
*Metadata-2*
What metadata was needed, required, recommended for the description of
participants? (@sex raised its ugly head here). Could we use <person> to
refer to artificial respondents in MMI experiments? (yes, if they have
person-like characteristics; no otherwise)
It was noted that almost any personal trait or state might be crucial to
the analysis of some corpora. We noted that CMDI now recommended using
the ISOCAT data category registry as an independent way of defining
metadata terminology; also that ISOCAT was now available within the TEI
scheme (though whether it fits into personal metadata I am less sure).
There was (I think) general agreement that we'd reference the various
options available in the TEI but not incorporate all of them.
We agreed that the principles underlying a given transcription should be
clearly documented, either in associated articles, in the formal
specification for an encoding, or in the header of individual documents.
*Utterances*
Several people disliked the expanded element name <u> and its
definition, for various theoretical reasons. Its definition should be
modified to remove the implication that it necessarily followed a
silence, though we seemed to agree that a <u> could only contain a
stretch of speech from a single speaker.
The temporal alignment of a <u> can be indicated either by @start and
@end or by nested <anchor/>s : the standard should probably recommend
use of one or the other methods but not both. We discussed whether or
not the fact that existing tools did not support the (even simpler) use
of @trans to indicate overlap should lead us not to recommend it.
*U-plus*
Thomas wanted some method of associating with a <u> the whole block of
annotations made on it (represented as one or more <interpGrp>s). His
document suggested using <div> for this purpose. A lighter-weight
solution might be to include <interpGrp> within <u>, or to propose a new
wrapper <annotatedU> element.
*Tokenization*
Laurent noted that MAF recommended use of <w> for individual tokens; we
didn't need to take a stand on the definition of “word” but could simply
refer to MAF. We needed some way of signalling the things that older
transcription formats had found important, e.g. words considered
incomplete, false starts, repetitions, abbreviations etc. so we needed
to choose an appropriate TEI construct for them, even if we thought the
concept was not useful or ill-defined. The general purpose <seg> element
might be the simplest solution, but some diplomacy would be needed about
how to define its application and its possible @type or @function values.
*Conclusions*
This workgroup will probably produce a useful document describing an
important use case for the TEI recommendations on spoken language. It is
currently a Google Doc which the group has agreed to share with the
Council. I undertook to help turn this into an ODD, which could
eventually become one of our Exemplars. Work on standardising other
aspects of transcribed multimodal interactions probably needs to be
deferred to a later stage.
More information about the tei-council
mailing list