If you are developing mashups – lightweight web applications that offer new functionality by combining, aggregating and transforming web resources and services – this a great opportunity to win a prize.
Archive for the ‘semantic documents’ Category
SePublica@ESWC Workshop on Semantic Publication (May 30, Crete), LNCS Post-proceedings, Best Paper Award by ElsevierSunday, January 16th, 2011
I am a chair of the following workshop (and Michael is on the PC), which is closely related to KWARC’s research interests (specifically KWARC-relevant topics highlighted below):
1st International Workshop on Semantic Publication (SePublica 2011)
at the 8th Extended Semantic Web Conference (ESWC 2011)
May 30th, Hersonissos, Crete, Greece
Keynote by Steve Pettifer, Manchester University, UK: “Utopia Documents and The Semantic Biochemical Journal experiment”
SUBMISSION DEADLINE (extended) March 4
- Best Paper Award sponsored by Elsevier: US$ 750+250 for the most innovative and feasible proposal concerning semantic publishing
- Springer LNCS post-proceedings: A selection of revised versions of the best submissions will be published in the “ESWC 2011 Workshop Highlights” LNCS volume.
The MISSION of the SePublica workshop is to bring together researchers and practitioners dealing with different aspects of Semantic Technologies in the Publishing Industry. How is the Semantic Web impacting the publishing industry? How is our experience of publications changing because of Semantic Web technologies being applied to the publishing industry?
Someone in the humanities must be interested in OMDoc. I was really surprised to find a reference to OMDoc in the section “Formulæ and Mathematical Expressions” guidelines (a.k.a. specification) for TEI. TEI (Text Encoding Initiative) is the standard semantic markup language for humanities, social sciences and linguistics, much like DocBook for technical manuals. All that TEI itself has is an element <formula notation=”…”/>, where notation refers to the language in which the formula is represented. But the guidelines refer to some mathematical markup languages, from which the document author is asked to “make an informed choice”:
- TeX – the obvious candidate, also used in some examples
- MathML – the obvious candidate when XML is desired. They give one Presentation MathML example but also mention Content MathML.
- OpenMath – much less expected. Nice to see that here. Oh the other hand, the links to the OpenMath standard are outdated. I should probably report that.
- OMDoc – I didn’t expect that at all.
When I was at the WritersUA conference before easter, the compatibility (and transformation) between DITA (as a topic-centered format) and DocBook (as a narrative one) was one of the topics with wider interest. In OMDoc we have always maintained that we can follow both the topic-centered approach (which is quite natural for mathematical texts and indeed for wiki-based approaches like the one in SWiM) as well as the narrative one. So I got thinking how we would really do the topic-centered approach in OMDoc.
When I was reading Christine Müller’s Ph.D. thesis that looked a the integration of topic-based and narrative writing styles, I noticed that she says that OMDoc does not have support for topic-style writing. I think that this is wrong. Taking her example (slightly simplified)
<concept id="A.dita"> <title>Natural Numbers</title> <conbody> <p>The set of <term>natural numbers</term> defined <cite>here</cite> or in <xref href="nat.dita#nat1"/>. </p> <para conref="topic/p2"/> </conbody> <related-link>http://example.com/nats.html</related-link> </concept>
it is obviously directly expressible in OMDoc as
<omdoc> <omgroup type="concept" xml:id="A.dita"> <metadata> <dc:title>Natural Numbers</dc:title> <link rel="dita:related-link" resource="http://example.com/nats.html"/> </metadata> <omgroup type="conbody"> <omtext> <CMP> <p>The set of <phrase role="term">natural numbers</phrase> defined <cite>here</cite> or in <ref type="cite" href="nat.dita#nat1"/>. </p> </CMP> </omtext> <ref href="topic/p2" type="include"/> </omgroup> </omgroup> </omdoc>
(again slightly simplified; I am leaving out the relevant namespace declarations). It should be directly obvious that we can define an OMDoc sublanguage that is isomorphic to DITA. Indeed I think that this is an exercise that would be worth doing. After all, there was a message from Bryce Nordgren about opening oup a Math domain in DITA (see http://openmath.org/pipermail/om/2009-February/001203.html for details), which could use this isomorphism as a guiding light.
Of course DITA not only has topics, but also topic maps, let me again use an example from Christine’s thesis.
<map title="title"> <topichead navtitle="navi-title" audience="math"/> <topicref href="A.dita" collection-type="sequence"> <topicref href="A1.dita"/> <topicref href="A2.dita"/> </topicref> <reltable> <relrow> <relcell>A.dita</relcell> <relcell>B.dita</relcell> </relrow> </reltable> </map>
The first part of this map is just what we have always thought of as a narrative structure in our NarCon approach in OMDoc. So we can directly represent it as something like
<omdoc> <metadata> <dc:title>title</dc:title> <link rel="dita:audience" resource="something:math"/> <link rel="dita:navtitle" resource="navi-title"/> </metadata> <omgroup xml:id="A.narrative" type="sequence"> <ref type="include" href="A1.omdoc"/> <ref type="include" href="A2.omdoc"/> </omgroup> </omdoc>
I must confess that I do not really understand what the href on the top-level topicref means, so I have left it out. Note that I am only interested in the general compatibility of the formats and not the details of the translation, which will have to be worked out. That leaves us with the reltable, which (as far as I can understand it a way to specify cross-references that is a better alternative to <related-links>, since it is more portable and attached to DITA maps (which we can think of as discourse-level presentation of the content structure given by the graph of DITA topics). So I would just add the following metadata section to the <omgroup> element:
<metadata> <link rel="dita:related-link" resource="http://example.com/nats.html"/> </metadata>
OK, that ends our little comparison exercise. There are a couple of conclusions I would like to draw from this:
- OMDoc can do topic-oriented writing quite nicely
- the OMDoc1.3-style metadata help significantly
- rather than develop a DITA ontology (hinted at with the dita: namespace prefixes) we should develop ontologies that describe the various aspects of topic-based writing in generality and find the respective markup primitives. For instance dita:audience seems weird, there must be an ontology in the eLearning realm that already formalizes this.
- The OMDoc-1.6 idea of leaving out the <metadata> element and freely intermixing the metadata <link>, <resource> and <meta> with the OMDoc content will make the translation much simpler and direct, e.g. for the <reltable> and <related-link> elements from DITA which are situated at the end in the original.
OK, that is all I have to say at the moment, please give me feedback.
Only today I became aware of microdata, the proposed way of embedding semantic annotations into HTML5. (Yes, they adopted the syntax that Michael also prefers for OMDoc, and which I personally hate, but I will get used to it.) Microdata are not to be confused with microformats, a poor man’s way of annotation that (ab)uses CSS classes and thus is compatible with HTML 4. Microdata are something like RDFa but
- are slightly easier to use for people who don’t understand XML namespaces
- granted, RDFa’s excessive reliance on XML namespaces makes it hard to parse, and makes it unbearably complex to copy/paste a fragment, which is an important use case for HTML5
- allow for ad hoc pseudo-semantic markup when you do not use an ontology
- What’s the point in annotating at all, then?
- compatible with the non-XML syntax of HTML5 (which should have been ditched IMHO, but, well, in the interest of reactionary users and software, they decided differently)
The fight for the future of RDFa in HTML is going on, but what does that mean to KWARC? We have incorporated RDFa into OMDoc as a means of extending the metadata vocabularies. RDFa, originally designed for XHTML, is prepared for being integrated into any XML language, including OMDoc. HTML5 microdata are an integral part of the HTML5 specification and would not work in other XML languages. OK, but we present OMDoc documents as HTML to make them human-readable. In this output, we want to preserve the semantics of the OMDoc markup, and for that we had always been thinking about using RDFa. (We know exactly how to do it, but just have not yet implemented that step, though.) We could use HTML5 microdata instead, but:
- RDFa has little software support so far, but microdata have none (beyond proofs of concept)
- We generate XML-compliant HTML. The non-XML syntax of HTML5 supports embedded MathML, but I doubt that it will support parallel OpenMath markup, where elements from yet another namespace are embedded into the MathML formulae.
- We generate HTML. The embedded annotations need not be authored manually, so they do not have to be easy to author.
- We are interested in using well-defined ontologies to express semantics, so we don’t need ad hoc “semantic” markup.
What do you think?
While I was reading up on the REST papers in my last post, I stunbled upon the following best practice for making sure that material is only submitted once to a RESTful application. This is something we should adopt in OMBase as well, just to be safe.
Another thing that we should think of in this arena is to enable some form of RESTful logging facility, so that users can find out what happened to the content. The technology that seems best suited for that seems to be RSS or Atom Syndication (probably the latter). The nice thing is that we could log all the changes to any URI we use in the system. I am not sure under which URL we would address the log, one idea is to just make use of the the mime type
application/atom+xml just as for the
xhtml presentation as suggested in my last post that would at least alleviate the choice of URL.
I have just been reading up on REST again, since I found a very palatable pair of articles (REST intro, and practices). This got me thinking about the state of OMBase, and the integration of our presentation pipeline into the OMBase interface. It is RESTful, since we have MMT addressing via URIs implemented. You just use a GET to retrieve them.
What I have talked with Florian about, but maybe not with the OMBase team, is how to integrate presentation. That should be very simple from the interface point of view: we just take the same URLs, but a different HTTP header.
gives you the OMDoc file and
gives you the presented version (with the standard options). Now, we have written a paper about presentation and submitted it to MKM and Christine has spent a lot of ingenuity on defining user options to the presentation process.This should be easy to integrate with the URI query interface:
That should do the trick.
I am just sitting the CIAO workshop and Alan Bundy and Michael Chan are talking about a very nice topic: the evolution of ontologies in Physics. They are applying this to historical examples like the latent heat problem and the MOND theory that is hot in Physics at the moment. The idea is that when experiments contradict theory, there is a clash between the theory ontology Ot and the sensory Ontology Os, which they solve by renaming apart selected concepts between the ontologies to resolve the contradiction. So they change the ontologies by renaming. The nice thing is that they can interpret the operation of renaming as a conservative theory extension which gives a nice interpretation of minimal theory change/repair.
You can find the details here.
Even though I totally buy into their observations, I think that it would be better to keep the theories as they are and interpret the repair operations as theory morphims. That would be a non-desctructive operation, and the operations would become very natural theory morphisms.