Archive for August, 2007

More Scoop Musing

Thursday, August 30th, 2007

In the second invited talk, Toby White is talking about SciSpace an experiment of social-software-mediated collaborative scientific research.

The main thrust of the intro is that there is a new kind of scientific practice is emerging, e.g. in the environmental sciences. This involves massive cross-institutional collaboration of scientists and programs. The problem in collaboration is not the lack of communication. We have giant bandwidth, but understanding it is the problem. But just managing e-mail discussions across multiple interlocutors is almost unworkable (think adding a person to a long one). In particular, you are interested in the history of the project, and that is extremely hard to extract from the discussions, since it is multi-threaded and distributed.

Toby and some colleagues decided that they need something like scientific Facebook. SciSpace is like MySpace, but for Scientists. The logic is trivial, to implement as a system, but it is very hard to get to look nice, and easy to use. They are using an open-source social networking framework called ELGG out of Oxford. SciSpace has about 100-200 users and about 30 active ones from that.

Toby claims that the nice thing about SciSpace is that you kind of know what people you are blogging to, you can just keep up with what your colleagues/boss is doing, and what you may contribute to.

This would be a great thing to integrate with Panta-Rhei, maybe we can even re-implement that system in ELGG.  I really wonder whether they have some kind of repository feature. Toby tells us that there are wikis, but they are not very well-integrated, at least not in the same cool way.

SCOOP 2

Thursday, August 30th, 2007

This is about the talk of German Nemirovskij (Fachhochschule Albstadt-Sigmaringen) about Semantic Document Annotation for Global Search on Study Programs (e.g. semester abroad). in the SWAPS project they are looking at the Bologna Module descriptions of the European Documents; they have a similar structure, so can be screen-scraped into a database. Applications: Module search Personalized Search, and Comparison. For CoPs the second is interesting, since (German claims), since CoPs can be used to tweak this.

They want to reach semantic search by doing “search wrt. Ontologies”.

Problems:

  1. how to populate Ontologies (is this a only ABox population). ch
  2. how to index Ontologies for search

ad 1: From the layout scrape attribute-value pairs, then semantic annotation of documents fragments (the values of attributes). It seems that this is ABox population, and is quite at the beginning.

SCOOP workshop (Communities of Practice)

Thursday, August 30th, 2007

I am sitting the the SCOOP workshop of the JEM Network, which really shaping up nicely we have the MKM people meet with education and social software guys. I will blog a couple of impressions from the KWARC angle.

The discussion is quite stimulating.

Ralf Klamma (RWTH Aachen) gives an intro to Community Information Systems and claims that the constitutive features of CoPs are:

Mutual Engagement (ME): “You have to know which community you are belonging to”; I am a little sceptical whether this is really true for CoPs in Science which are very distributed, and may even be disconnected.

Joint Enterprize (JE): There is something you want to do together, and you want to learn to do it better. This is at the center at the CoP definition of Wenger. We have been neglecting this in our KWARC models here, or taking if for granted. We need to think more about this.

Joint Resources (JR): This is really where our MKM paper sits, and I have the feeling that we have something to bring to the table here. Klammer is interested in Multi-medial theories. I must say that with the OMDoc approach, we are interested in a Omni-Medial approach (OMDoc as a omnipresent semantic medium that covers all). The idea here is that the content Markup allows to generate multiple medial representations from this source and any media can be marked up to OMDoc. So maybe this is compatible.

Klamma also talks about a cross-media theory of transcription that sounds interesting (J”ager, Stanitzek Transkribieren – Media/Lecktu”ure 2002). The gist of it seems to be that events (e.g. historical) and objects are transcribed across media (e.g. to OMDoc or SciML). So we only have access to the media trace, not the event itself (it is long gone). I wonder what this theory predicts, it seems compatible with what we are doing.

A great example: The babylonian Thalmud has been transcribed to an XML markup, where you can annotate relations. Then the text can be acessed as semantic hypertext. One effect was that thalmud students were asking tougher questions earlier. That is very encouraging. I wonder if the sources are available for this, and how an OMDoc version of the thalmud would fare, and how much of the structure could be transferred in the CD-based structure we claim to be so essential.

SWiM development roadmap established

Saturday, August 25th, 2007

Hello, World!

Although I’m not (yet?) so much into blogging, I should shortly
introduce my project. SWiM is a semantic wiki for mathematics,
targeted both at researchers and students. For further information, see…

Welcome to panta rhei!

Friday, August 24th, 2007

panta rhei — everything flows. These two words where used by Plato to subsume the study of his colleague Heraklit, who argued that the world is a permanent becoming and vanishing, in which everything constantly changes.

This is the vision for the collaborative and interactive reader panta rhei that facilitates the continuously discussion and assessment of arbitrary material.

Applications of Structured Documents

Wednesday, August 22nd, 2007

Transfered from Sundries of Anything posted by Jochem Liem

I met Normen during the poster session at KI2006 while he was explaining Christine about his research. Basically, he uses a structured markup language (in this case LaTeX) to describe the “logical” structure of parts of the document (called infoms).
Currently, an ontology about types of infoms and their relations developed in a European project is used, but custom ontologies can be used to structure documents.
Structuring the document using the infom types defined in the ontology should allow versioning as the changes to infoms (and their relations to other infoms) are formally defined.

I did not really understand why the killer application for structured documents would be versioning, and argued that there are other applications which seem at least as sexy.

1. Collaboration. The infoms could describe the flow of an argument. When writing papers together, the changed infoms could be highlighted. Metadata (data about the changes) could be added to describe why the infom was changed. This seems like something which would be extremely handy to use. Because you structure your content using infoms, the purpose of specific parts of text is also made explicit. This makes it far easier for co-authors to understand the reasoning of the author. Collaboration is a topic which fits in the AIED conference series.

2. Text analysis, suggesting improvements. If the structure of a document in terms of content is known, it is probably possible to say something about the flow of the document. For example, if support for a statement is extremely far away of the statement itself, the document probably has to be restructured. If the knowledge about the document is good, you might even suggest where certain infoms have to be placed.

3. Text editing on a meta-level. It might be possible for a perfectly annotated document to be restructured. For example, the text could be visualised using boxes for the infoms, en arrows for the relations between them. Changing the infoms on the screen also changes their position in the document. This way the flow of a document (an argumentation for example) can be changed on a meta-level. Visualising these relations makes it intuitive for the user whether his new flow of infoms makes sense.

Somehow, I think this kind of research also fits in the semantic web vision. By structuring documents you might be able to make them machine accessible. Currently, it is not clear how text-documents and ontologies should be integrated. We have markup languages like XHTML for text and ontology languages like OWL for concepts, but they each serve a different purpose. So, how do we use the semantic web accomplishments to annotate text document in such a way they can be used by semantic search/service discovery/brokering/etc?

Solving these problems is probably worth multiple PhD’s. Normen, from what I’ve written, have I understood what you are working on? Also, please let us know when we can use your work. ;P

Disambiguation of Mathematical Text

Friday, August 17th, 2007

Oooops, this is a left-over draft from MKM

…..

Claudio Sacerdoti Coen (HELM group in Bologna) is talking about disambiguation. It seems that he has really nailed down most of the practical aspects the problem.

When types do not help, then we have to ask the user, and he is try to do this with the least nuiscance. He defines the notion of a spurious error (most errors are), and the “real errors” are not. As always it is great to hear him talk. I wonder what information he needs for the algorithm, is what we have in the new OMDoc presentation system enough?

He even has a correctness proof. I want a demo.

Grokking Conflicts in Managment of Change

Friday, August 17th, 2007

We are working on a semantics-based management of change systems (sCMS) for a while (see our locutor project at KWARC).

This project builds onto three main intuitions. In a nutshell: if we know more about the semantics of a document type,

  1. then we know what the meaning-atomic document fragments are (that have an explicit contribution and dependency on the document context, we call them “information items” or “infom”s) [infom]
  2. then we can determine less intrusive differences (some differences don’t really change the document) [mDiff]
  3. then we know when changes in document (fragment) A will affect document (fragment) B, even if A and B are far apart. [long-range effects]

Today I am thinking some more about 2. and 3. The main application of this is the notion of conflicts in change management, which I had not fully grokked (at least to my own satisfaction) in the past. Here goes my new-found understanding.

  1. conflicts are about focus (maybe the word focus is not ideal, but I will use it for now).
  2. you focus on an infom, if you write or change it, or if you explicitly set a focus on it.
  3. If there is a long-range effect on an infom A from an infom B and that changes, then there is a conflict from A to B (interestingly conflicts are directed, I claim).

Now, let us see whether this concept is enough to understand Subversion (SVN) our paradigmatic CMS. In lack of anything else SVN considers lines to be infoms, and does not have long-range effects, but only line/infom-local effects: a change effects it’s whole line/infom. Furtheremore focus is set to an infom exactly when it is changed. As a consequence, we have a conflict, exactly when there are two changes to a line (one from the update and one in the local copy).

For a more semantic document type like OMDoc we have non-trivial infoms (statements and paragraphs usually) and long-range effects given by dependency relation (e.g. a definition depends on all the concepts in the definiens or a theorem depends on it’s proof, which depends on all theorems it uses in turn). If we assume a focus on everything we have ever written, then we come to a very interesting notion of conflict. If A depends on B, which changes and I focus both, then I get a conflict from A to B, and will be notified by the sCMS.

As I said, I am not sure that focus is exactly the right concept; we might have to think of “read focus” and “write focus” to account for the directionality of conflicts. But I am pretty sure that I understand more about CMS now. I have not really checked the literature, if this is all well-known, then please tell me.