Archive for the ‘Krextor’ Category

Microdata vs. RDFa – What does it mean to us?

Wednesday, October 28th, 2009

Only today I became aware of microdata, the proposed way of embedding semantic annotations into HTML5. (Yes, they adopted the syntax that Michael also prefers for OMDoc, and which I personally hate, but I will get used to it.) Microdata are not to be confused with microformats, a poor man’s way of annotation that (ab)uses CSS classes and thus is compatible with HTML 4. Microdata are something like RDFa but

  1. are slightly easier to use for people who don’t understand XML namespaces
    • granted, RDFa’s excessive reliance on XML namespaces makes it hard to parse, and makes it unbearably complex to copy/paste a fragment, which is an important use case for HTML5
  2. allow for ad hoc pseudo-semantic markup when you do not use an ontology
    • What’s the point in annotating at all, then?
  3. compatible with the non-XML syntax of HTML5 (which should have been ditched IMHO, but, well, in the interest of reactionary users and software, they decided differently)

The fight for the future of RDFa in HTML is going on, but what does that mean to KWARC? We have incorporated RDFa into OMDoc as a means of extending the metadata vocabularies. RDFa, originally designed for XHTML, is prepared for being integrated into any XML language, including OMDoc. HTML5 microdata are an integral part of the HTML5 specification and would not work in other XML languages. OK, but we present OMDoc documents as HTML to make them human-readable. In this output, we want to preserve the semantics of the OMDoc markup, and for that we had always been thinking about using RDFa. (We know exactly how to do it, but just have not yet implemented that step, though.) We could use HTML5 microdata instead, but:

  1. RDFa has little software support so far, but microdata have none (beyond proofs of concept)
  2. We generate XML-compliant HTML. The non-XML syntax of HTML5 supports embedded MathML, but I doubt that it will support parallel OpenMath markup, where elements from yet another namespace are embedded into the MathML formulae.
  3. We generate HTML. The embedded annotations need not be authored manually, so they do not have to be easy to author.
  4. We are interested in using well-defined ontologies to express semantics, so we don’t need ad hoc “semantic” markup.

What do you think?

Krextor Publicity

Monday, July 20th, 2009

I was surprised to find the following search result for Krextor

The document “Krextor – An extensible XML→RDF extraction framework.pdf” is no longer available on docstoc.
It has either been removed by the original owner of the document or by the docstoc staff due to copyrighted or inappropriate content.

Isn’t that actually a proof of success, in this new age of the Pirate Party? ;-)

Here is where it was stolen from, and here is the paper.

Google likes us

Monday, January 12th, 2009

I’m really not the only one who has ever implemented compact URIs (CURIEs), but when I googled for “safe curie” today, Krextor was the first hit, far ahead of the RDFa specification. Cool!

Reinventing the XML→RDF wheel?

Sunday, January 11th, 2009

When researching into related work for Krextor, I discovered this paper about XSDL (XML Semantics Definition Language). (Note that by XSDL the authors do not mean the new name of W3C XML Schema, as the latter has only been renamed recently.) XSDL is a language that allows for solving very similar problems as Krextor – extracting RDF in terms of some ontology from XML documents. I had always been looking for a nice declarative way of doing so, and there it is.

(more…)

XML Pattern Matching and Functional Programming

Tuesday, December 2nd, 2008

I’m currently reconsidering whether it was a good idea to implement my XML→RDF extraction library Krextor in XSLT. Writing down my actual requirements, I realized that I need a language that supports

  • pattern matching on XML elements and attributes, using a syntax that is close to literal XML or to XPath (for easily writing extraction rules, which should also be done by other developers in future)
  • functional programming (in some way), as the whole idea of mapping XML to RDF (and thus XML nodes to URIs) can be modeled most elegantly using a functional approach. (This is rather a requirement for me implementing the generic core of Krextor, but also for extraction module developers once the XML input language is a bit more complex.)

Having looked a bit into XQuery, Scala, and JavaScript (and a little bit into Haskell), I decided to stick to XSLT for now. Functional programming is awkward but possible, and XML pattern matching is awkward or non-intuitive in most other languages.

Documenting XSLT

Wednesday, November 26th, 2008

A considerable part of the implementation of my research prototype(s) is done in XSLT. Now that the extraction of RDF from semantic markup is more and more turning in to a project of its own, more software engineering was needed – including proper documentation.

It turned out that XSLTdoc is a really nice solution for that: Just put a few additional XML elements in front of every template or function and run a special XSLT to generate the documentation. Works like javadoc and looks nice.