Humanist Discussion Group, Vol. 14, No. 374. Centre for Computing in the Humanities, King's College London <http://www.princeton.edu/~mccarty/humanist/> <http://www.kcl.ac.uk/humanities/cch/humanist/> Date: Wed, 18 Oct 2000 09:42:41 +0100 From: Wendell Piez <wapiez@mulberrytech.com> Subject: Re: 14.0368 markup, encoding, content-modelling, primitives At 10:30 AM 10/17/00 +0100, Francois wrote: >Would a third term help? > > Content modeling >... Encoding >... Markup >... >Encoding would cover working out the relationships between the various >elements, attributes, entities of a markup scheme. >... Mm, I was with us up to this point, and in particular I agreed with the "intuitions" of Fotis and Thierry (since the lexicographers haven't tackled this to my knowledge, I agree intuition and usage is what we have to go on). But I think Francois steps a bit too far from currently-recognized semantics into a distinction that may be useful, but isn't at all common. I also think the containment relation is backward. Remember that a text can be "encoded" without being its marked up. In fact, any electronic ("machine readable") text must be, ipso facto, encoded. Standard text encodings include US-ASCII, EUC-JP, ISO 8859 in its variants, etc. etc., including, now, Unicode (ISO/IEC 10646). These all provide mappings from written characters into bit-sequences of known lengths, enabling a digital processor to handle them internally. More broadly, however, Morse code is be an encoding in this sense. At its loosest, I'd suppose a code to be a representation of one type of information in another form, either to facilitate or to obfuscate its transmission. Markup is an addition of code to code: a layering of an encoding practice following a different protocol, over and above an initial layer. All markup is encoding, but not all encoding is markup. The super-added protocol must include a way to make the distinction between which encoded sequences are "data" and which, "markup." Think of your favorite plain-text transcription of a literary work. Much of the difficulty that comes from processing even a good, clean, well-edited plain text, arises from the fact that so much information in it (say, the boundaries between chapters) is not explicit in its code. We might call the creative use of white space, all caps for CHAPTER headings etc., a kind of "passive" or "implicit" markup -- but since it's not explicit it's relatively difficult to program a machine to handle it. Cheers, Wendell ====================================================================== Wendell Piez mailto:wapiez@mulberrytech.com Mulberry Technologies, Inc. http://www.mulberrytech.com 17 West Jefferson Street Direct Phone: 301/315-9635 Suite 207 Phone: 301/315-9631 Rockville, MD 20850 Fax: 301/315-8285 ---------------------------------------------------------------------- Mulberry Technologies: A Consultancy Specializing in SGML and XML ======================================================================
This archive was generated by hypermail 2b30 : 10/18/00 EDT