KiWi Programming Camp

March 23rd, 2009

I spent the whole last week at the first Programming Camp of the KiWi project (“Knowledge in a Wiki”), who are developing the successor of the IkeWiki system that SWiM has been based on so far. My plan is to port SWiM from the abandoned IkeWiki to KiWi, which will be under development in the namesake project for two years from now, and further on by the community that is now starting to grow. Version 0.1 of SWiM as a KiWi extension is not yet out, but the KiWi members, particularly those from Salzburg Research, managed to give me a good understanding of the next steps that I need to do. Some preliminary conclusions so far:

  • KiWi’s strength as a scalable and extensible platform for social software (not just as a semantic wiki!) is its architecture. Based on EJB3 and Seam, it has an incredibly steep learning curve – EJB was one of the topics that I tried best to avoid when studying, now I regret that; on the other hand it also took the people at Salzburg Research several months to come up with that elaborate architecture.
  • Openness attracts the community: The KiWis decided to open their programming camp to external developers, as a first effort to start community-building.  They were really committed also to teaching me, the visitor, how to use their system (thanks, Mihai, Rolf, Sebastian, Stephanie, Szaby, Thomas – and their colleagues from Aalborg, Brno and Munich as well!). Even before the programming camp, they did not jealously lock away their sources, but gave external interested people access.  And now, with further adoption of the software in mind, they are switching to the most liberal license, i.e. BSD.
  • With our Subversion and Trac infrastructure, we have done great steps towards more productive development. Still, the KiWis leverage more professional tools, which really make life easier. OK, they are not open source, but require considerably less hacking in order to make them productive: Jira (bug tracker), Fisheye (repository browser), Crucible (code review system, not yet used), and Hudson (automated integration tester).

“KiWi knows” what else I will be able to report in the near future – stay tuned!

Google likes us (2)

February 18th, 2009

I wanted to explain “parallel markup” to a colleague and was too lazy to look it up in the MathML specification, so I googled it.  It turned out that KWARC ranks quite well on that topic, far ahead of the MathML spec.  First hit was a www-math mailing list thread following up a question about parallel markup that I once asked.  A Trac ticket on parallel markup support in our JOMDoc library ranks #5.

Google likes us

January 12th, 2009

I’m really not the only one who has ever implemented compact URIs (CURIEs), but when I googled for “safe curie” today, Krextor was the first hit, far ahead of the RDFa specification. Cool!

Reinventing the XML→RDF wheel?

January 11th, 2009

When researching into related work for Krextor, I discovered this paper about XSDL (XML Semantics Definition Language). (Note that by XSDL the authors do not mean the new name of W3C XML Schema, as the latter has only been renamed recently.) XSDL is a language that allows for solving very similar problems as Krextor – extracting RDF in terms of some ontology from XML documents. I had always been looking for a nice declarative way of doing so, and there it is.

Read the rest of this entry »

Designing a complex ontology and reusing others

December 3rd, 2008

This description of the Music Ontology provides an excellent and easy to understand example of how existing ontologies were reused, and other small ontologies designed, to contribute to the development of a larger, integrated ontology.  A similar case as in our current development of a versioning ontology for OMDoc, which we are specifying at the moment and which will be made up of a versioning ontology, a change ontology, an event ontology (the one that is part of the music ontology), and an ontology for people.

XML Pattern Matching and Functional Programming

December 2nd, 2008

I’m currently reconsidering whether it was a good idea to implement my XML→RDF extraction library Krextor in XSLT. Writing down my actual requirements, I realized that I need a language that supports

  • pattern matching on XML elements and attributes, using a syntax that is close to literal XML or to XPath (for easily writing extraction rules, which should also be done by other developers in future)
  • functional programming (in some way), as the whole idea of mapping XML to RDF (and thus XML nodes to URIs) can be modeled most elegantly using a functional approach. (This is rather a requirement for me implementing the generic core of Krextor, but also for extraction module developers once the XML input language is a bit more complex.)

Having looked a bit into XQuery, Scala, and JavaScript (and a little bit into Haskell), I decided to stick to XSLT for now. Functional programming is awkward but possible, and XML pattern matching is awkward or non-intuitive in most other languages.

Three types of mathematicians

November 26th, 2008

Cristian Calude and I discussed the price and gains of content mark-up. He emphasized that unless we provide interesting features, mathematicians will not see the value in content mark-up and the reason for additional efforts (e.g. when using sTeX instead of LaTeX).

Below you find three groups of mathematician that most likely need different amount of arguments to be convinced (please note that I am not citing Cris. I simply present what I remember from our discussion and made up the names of the groups):

  • Pen-and-Paper guys: They only use computers for publishing but all mathematics is actual developed on paper. The publishing process is seen as a tedious and inconvenient activity that takes time away from the actual job that a mathematicians wants to do. The digitalization is annoying (proof reading). In earlier times, this was done by secretaries and the publisher, but nowadays publishers only accept LaTeX, which is really seen as a burden by this group of mathematician.
  • LaTeX Lovers: There are mathematicians that think in LaTeX. The use it a lot for developing their ideas and incrementally revising proofs with colleagues. This groups seems to have an increasing influence on scientific publishing as most publisher (in mathematics) will nowadays reject a submission if it is not provided in LaTeX.
  • Innovators: The third group wants even more. (We didn’t really talk about this group long) For example, this groups promotes semantic technologies and aims at making mathematics machine-processable as well as bringing mathematics to the web. I assume that includes vast parts of the MKM community.

Maybe we need to start asking ourselves: Would we use our tools and services? (Who is using sTeX?) And if so, for which activities? Think of the very early steps towards a new topic. Would you like to be forced to content mark-up? Although we provide full flexibility in switching between concepts, simply having to establish theories and marking up structure really slows down the creative thinking. So when is a good timing of using content-based techniques? Do we restrict it to the very last stage of scientific work, i.e. the publishing process, or teaching (the latter not even recognized as scientific activity)?

I am collecting arguments on gains and burdens of content mark-up (in the panta rhei trac), in particular, with a focus on the technologies and services provided by KWARC. I’d appreciate your feedback and comments!

Documenting XSLT

November 26th, 2008

A considerable part of the implementation of my research prototype(s) is done in XSLT. Now that the extraction of RDF from semantic markup is more and more turning in to a project of its own, more software engineering was needed – including proper documentation.

It turned out that XSLTdoc is a really nice solution for that: Just put a few additional XML elements in front of every template or function and run a special XSLT to generate the documentation. Works like javadoc and looks nice.

Growing the Semantic Web with Inverse Semantic Search

November 5th, 2008

I recently read an interesting paper that was presented at the INSEMTIVE workshop (incentives for the semantic web). Hans-Jörg Happel addresses the problem that a lot of metadata (tags, annotations, etc.) exist in private knowledge spaces but are not shared on the semantic web – maybe because of privacy concerns, maybe because nobody has ever asked for them, whatever. His new idea is to design a search engine that knows which private metadata could improve the result set – and then asks their authors to publish them: i.e. the opposite of the “publish first, retrieve then” process of traditional IR.

Versioning Ontology

October 24th, 2008

Cory Casanave, a participant of today’s Ontolog conference call mentioned a versioning ontology, as versioning (of data and metadata!) was identified as an important feature of semantic wikis.

Here is their ontology for versioning and management of change; somewhat different from our group’s notion of management of change, though.