Archive for the ‘OMDoc’ Category

TEI Guidelines mention MathML, OpenMath, and OMDoc

Saturday, July 31st, 2010

Someone in the humanities must be interested in OMDoc. I was really surprised to find a reference to OMDoc in the section “Formulæ and Mathematical Expressions” guidelines (a.k.a. specification) for TEI. TEI (Text Encoding Initiative) is the standard semantic markup language for humanities, social sciences and linguistics, much like DocBook for technical manuals. All that TEI itself has is an element <formula notation=”…”/>, where notation refers to the language in which the formula is represented. But the guidelines refer to some mathematical markup languages, from which the document author is asked to “make an informed choice”:

  • TeX – the obvious candidate, also used in some examples
  • MathML – the obvious candidate when XML is desired.  They give one Presentation MathML example but also mention Content MathML.
  • OpenMath – much less expected. Nice to see that here. Oh the other hand, the links to the OpenMath standard are outdated. I should probably report that.
  • OMDoc – I didn’t expect that at all.

DITA/OMDoc Compatibility (or topic-based writing in OMDoc)

Friday, April 9th, 2010

When I was at the WritersUA conference before easter, the compatibility (and transformation) between DITA (as a topic-centered format) and DocBook (as a narrative one) was one of the topics with wider interest. In OMDoc we have always maintained that we can follow both the topic-centered approach (which is quite natural for mathematical texts and indeed for wiki-based approaches like the one in SWiM) as well as the narrative one. So I got thinking how we would really do the topic-centered approach in OMDoc.

When I was reading Christine Müller’s Ph.D. thesis that looked a the integration of topic-based and narrative writing styles, I noticed that she says that OMDoc does not have support for topic-style writing. I think that this is wrong. Taking her example (slightly simplified)

<concept id="A.dita">
 <title>Natural Numbers</title>
 <conbody>
 <p>The set of <term>natural numbers</term>
 defined <cite>here</cite> or in <xref href="nat.dita#nat1"/>.
 </p>
 <para conref="topic/p2"/>
 </conbody>
 <related-link>http://example.com/nats.html</related-link>
</concept>

it is obviously directly  expressible in OMDoc as

<omdoc>
 <omgroup type="concept" xml:id="A.dita">
 <metadata>
 <dc:title>Natural Numbers</dc:title>
 <link rel="dita:related-link" resource="http://example.com/nats.html"/>
 </metadata>
 <omgroup type="conbody">
 <omtext&gt
 <CMP>
 <p>The set of <phrase role="term">natural numbers</phrase>
 defined <cite>here</cite> or in <ref type="cite" href="nat.dita#nat1"/>.
 </p>
 </CMP>
 </omtext>
 <ref href="topic/p2" type="include"/>
 </omgroup>
 </omgroup>
</omdoc>

(again slightly simplified; I am leaving out the relevant namespace declarations). It should be directly obvious that we can define an OMDoc sublanguage that is isomorphic to DITA. Indeed I think that this is an exercise that would be worth doing. After all, there was a message from Bryce Nordgren  about opening oup a Math domain in DITA (see http://openmath.org/pipermail/om/2009-February/001203.html for details), which could use this isomorphism as a guiding light.

Of course DITA not only has topics, but also topic maps, let me again use an example from Christine’s thesis.

<map title="title">
 <topichead navtitle="navi-title" audience="math"/>
 <topicref href="A.dita" collection-type="sequence">
 <topicref href="A1.dita"/>
 <topicref href="A2.dita"/>
 </topicref>
 <reltable>
 <relrow>
 <relcell>A.dita</relcell>
 <relcell>B.dita</relcell>
 </relrow>
 </reltable>
</map>

The first part of this map is just what we have always thought of as a narrative structure in our NarCon approach in OMDoc. So we can directly represent it as something like

<omdoc>
 <metadata>
 <dc:title>title</dc:title>
 <link rel="dita:audience" resource="something:math"/>
 <link rel="dita:navtitle" resource="navi-title"/>
 </metadata>
 <omgroup xml:id="A.narrative" type="sequence">
 <ref type="include" href="A1.omdoc"/>
 <ref type="include" href="A2.omdoc"/>
 </omgroup>
</omdoc>

I must confess that I do not really understand what the href on the top-level topicref means, so I have left it out. Note that I am only interested in the general compatibility of the formats and not the details of the translation, which will have to be worked out. That leaves us with the reltable, which (as far as I can understand it a way to specify cross-references that is a better alternative to <related-links>, since it is more portable and attached to DITA maps (which we can think of as discourse-level presentation of the content structure given by the graph of DITA topics). So I would just add the following metadata section to the <omgroup> element:

<metadata>
 <link rel="dita:related-link" resource="http://example.com/nats.html"/>
</metadata>

OK, that ends our little comparison exercise. There are a couple of conclusions I would like to draw from this:

  1. OMDoc can do topic-oriented writing quite nicely
  2. the OMDoc1.3-style metadata help significantly
  3. rather than develop a DITA ontology (hinted at with the dita: namespace prefixes) we should develop ontologies that describe the various aspects of topic-based writing in generality and find the respective markup primitives. For instance dita:audience seems weird, there must be an ontology in the eLearning realm that already formalizes this.
  4. The OMDoc-1.6 idea of leaving out the <metadata> element and freely intermixing the metadata <link>, <resource> and <meta> with the OMDoc content will make the translation much simpler and direct, e.g. for the <reltable> and <related-link> elements from DITA which are situated at the end in the original.

OK, that is all I have to say at the moment, please give me feedback.

Microdata vs. RDFa – What does it mean to us?

Wednesday, October 28th, 2009

Only today I became aware of microdata, the proposed way of embedding semantic annotations into HTML5. (Yes, they adopted the syntax that Michael also prefers for OMDoc, and which I personally hate, but I will get used to it.) Microdata are not to be confused with microformats, a poor man’s way of annotation that (ab)uses CSS classes and thus is compatible with HTML 4. Microdata are something like RDFa but

  1. are slightly easier to use for people who don’t understand XML namespaces
    • granted, RDFa’s excessive reliance on XML namespaces makes it hard to parse, and makes it unbearably complex to copy/paste a fragment, which is an important use case for HTML5
  2. allow for ad hoc pseudo-semantic markup when you do not use an ontology
    • What’s the point in annotating at all, then?
  3. compatible with the non-XML syntax of HTML5 (which should have been ditched IMHO, but, well, in the interest of reactionary users and software, they decided differently)

The fight for the future of RDFa in HTML is going on, but what does that mean to KWARC? We have incorporated RDFa into OMDoc as a means of extending the metadata vocabularies. RDFa, originally designed for XHTML, is prepared for being integrated into any XML language, including OMDoc. HTML5 microdata are an integral part of the HTML5 specification and would not work in other XML languages. OK, but we present OMDoc documents as HTML to make them human-readable. In this output, we want to preserve the semantics of the OMDoc markup, and for that we had always been thinking about using RDFa. (We know exactly how to do it, but just have not yet implemented that step, though.) We could use HTML5 microdata instead, but:

  1. RDFa has little software support so far, but microdata have none (beyond proofs of concept)
  2. We generate XML-compliant HTML. The non-XML syntax of HTML5 supports embedded MathML, but I doubt that it will support parallel OpenMath markup, where elements from yet another namespace are embedded into the MathML formulae.
  3. We generate HTML. The embedded annotations need not be authored manually, so they do not have to be easy to author.
  4. We are interested in using well-defined ontologies to express semantics, so we don’t need ad hoc “semantic” markup.

What do you think?

Google likes us (2)

Wednesday, February 18th, 2009

I wanted to explain “parallel markup” to a colleague and was too lazy to look it up in the MathML specification, so I googled it.  It turned out that KWARC ranks quite well on that topic, far ahead of the MathML spec.  First hit was a www-math mailing list thread following up a question about parallel markup that I once asked.  A Trac ticket on parallel markup support in our JOMDoc library ranks #5.

Designing a complex ontology and reusing others

Wednesday, December 3rd, 2008

This description of the Music Ontology provides an excellent and easy to understand example of how existing ontologies were reused, and other small ontologies designed, to contribute to the development of a larger, integrated ontology.  A similar case as in our current development of a versioning ontology for OMDoc, which we are specifying at the moment and which will be made up of a versioning ontology, a change ontology, an event ontology (the one that is part of the music ontology), and an ontology for people.

Submitting content to OMBase and logging

Monday, March 24th, 2008

While I was reading up on the REST papers in my last post, I stunbled upon the following best practice for making sure that material is only submitted once to a RESTful application. This is something we should adopt in OMBase as well, just to be safe.

Another thing that we should think of in this  arena is to enable some form of RESTful logging facility, so that users can find out what happened to the content. The technology that seems best suited for that seems to be RSS or Atom Syndication (probably the latter). The nice thing is that we could log all the changes to any URI we use in the system. I am not sure under which URL we would address the log, one idea is to just make use of the the mime type application/atom+xml just as for the xhtml presentation as suggested in my last post that would at least alleviate the choice of URL.

Integrating Presentation into OMBase

Monday, March 24th, 2008

I have just been reading up on REST again, since I found a very palatable pair of articles (REST intro, and  practices). This got me thinking about the state of OMBase, and the integration of our presentation pipeline into the OMBase interface. It is RESTful, since we have MMT addressing via URIs implemented. You just use a GET to retrieve them.

What I have talked with Florian about, but maybe not with the OMBase team, is how to integrate presentation. That should be very simple from the interface point of view: we just take the same URLs, but a different HTTP header.

GET /arith1/lcm
Host: cds.omdoc.org
Accept: application/omdoc+xml

gives you the OMDoc file and

GET /arith1/lcm
Host: cds.omdoc.org
Accept: application/xhtml+xml

gives you the presented version (with the standard options). Now, we have written a paper about presentation and submitted it to MKM and Christine has spent a lot of ingenuity on defining user options to the presentation process.This should be easy to integrate with the URI query interface:

GET /arith1/lcm?ext=foo.ntn∫=lang:ntn;style:physics
Host: cds.omdoc.org
Accept: application/xhtml+xml

That should do the trick.

SCOOP workshop (Communities of Practice)

Thursday, August 30th, 2007

I am sitting the the SCOOP workshop of the JEM Network, which really shaping up nicely we have the MKM people meet with education and social software guys. I will blog a couple of impressions from the KWARC angle.

The discussion is quite stimulating.

Ralf Klamma (RWTH Aachen) gives an intro to Community Information Systems and claims that the constitutive features of CoPs are:

Mutual Engagement (ME): “You have to know which community you are belonging to”; I am a little sceptical whether this is really true for CoPs in Science which are very distributed, and may even be disconnected.

Joint Enterprize (JE): There is something you want to do together, and you want to learn to do it better. This is at the center at the CoP definition of Wenger. We have been neglecting this in our KWARC models here, or taking if for granted. We need to think more about this.

Joint Resources (JR): This is really where our MKM paper sits, and I have the feeling that we have something to bring to the table here. Klammer is interested in Multi-medial theories. I must say that with the OMDoc approach, we are interested in a Omni-Medial approach (OMDoc as a omnipresent semantic medium that covers all). The idea here is that the content Markup allows to generate multiple medial representations from this source and any media can be marked up to OMDoc. So maybe this is compatible.

Klamma also talks about a cross-media theory of transcription that sounds interesting (J”ager, Stanitzek Transkribieren – Media/Lecktu”ure 2002). The gist of it seems to be that events (e.g. historical) and objects are transcribed across media (e.g. to OMDoc or SciML). So we only have access to the media trace, not the event itself (it is long gone). I wonder what this theory predicts, it seems compatible with what we are doing.

A great example: The babylonian Thalmud has been transcribed to an XML markup, where you can annotate relations. Then the text can be acessed as semantic hypertext. One effect was that thalmud students were asking tougher questions earlier. That is very encouraging. I wonder if the sources are available for this, and how an OMDoc version of the thalmud would fare, and how much of the structure could be transferred in the CD-based structure we claim to be so essential.

Disambiguation of Mathematical Text

Friday, August 17th, 2007

Oooops, this is a left-over draft from MKM

…..

Claudio Sacerdoti Coen (HELM group in Bologna) is talking about disambiguation. It seems that he has really nailed down most of the practical aspects the problem.

When types do not help, then we have to ask the user, and he is try to do this with the least nuiscance. He defines the notion of a spurious error (most errors are), and the “real errors” are not. As always it is great to hear him talk. I wonder what information he needs for the algorithm, is what we have in the new OMDoc presentation system enough?

He even has a correctness proof. I want a demo.

A radical new referencing scheme for Openmath and MathML (and OMDoc)

Monday, July 2nd, 2007

We are thinking about how to reference theory-constitutive elements in content dictionaries. We had distinguished “reference by location” (via usual URIs) and “reference by context” (via the OMDoc theories and their constitutive elements) in OMDoc 1.1. It was very hard to explain the latter, and the encoding was a little weird, so I dropped it again from OMDoc 1.2. But the concept is valid and important, so here we go again.

This topic is important, since we are thinking about OpenMath3 and we are adding CDs in MathML3. And I guess that there will be quite a while until we can change these two again, so we better get it right. Moreover, the referencing scheme better be compatible with those two.

Here is the idea: we have nested theories in OMDoc1.2, and we need to reference symbols from them. Now, symbols are referenced by their name, which need not be document-unique (and we do not want to do that, since we want to compose theories in documents. That is why their names have three components: the theory name (cd name in OM; which is document-unique at least in OMDoc), and a symbol name, which is theory/cd-unique. And to disambiguate we have URIs for the cds in the cdbase attribute.

We would like to generalize that in OMDoc1.8, theory names should only be unique in their context (which might be the document context or a theory). So far so good, but then we need a path-like referencing scheme at least for the cd names. So we can really combine them in one path/URI as described in a post on MathML/OM referencing.

The next step in OMDoc would be to allow any content element to be theory-like, and allow it to import. Here is a somewhat extreme example of what we would be able to do.

<!-- all statements are theories, so this is also one -->
<symbol name="nat"/>

<!– this symbol declaration imports from theory “nat” –>
<symbol name=”zero”>
<imports=”nat”/>
<type><csymbol pref=”nat/nat”/></type>
</symbol>

<!– this one also needs a function type, so we import it –>
<symbol name=”suc”>
<imports=”nat”/>
<imports=”simple-types”/>
<type>
<apply>
<csymbol pref=”simple-types/funtype”/>
<csymbol pref=”nat”/>
<csymbol pref=”nat”/>
</apply>
</type>
</symbol>

<!– the third Peano Axiom (1&2 are about types) is only about suc –>
<axiom name=”peano3″>
<imports from=”suc”/>
<imports from=”quant1″/>
<bind>
<csymbol pref=”quant1/forall”/>
<bvar><ci>a</ci><ci>b</ci></bvar>
<apply>
<iff/>
<apply><eq/><ci>a</ci><ci>b</ci></apply>
<apply><eq/>
<apply><csymbol pref=”suc/suc”/><ci>a</ci></apply>
<apply><csymbol pref=”suc/suc”/><ci>b</ci></apply>
</apply>
</apply>
</bind>
</axiom>