In my last post I just learned about a new search engine. We should really have a competition and example library for Math Search Engines. We talked about this some years back but we really need to get our act together, probably for the next MKM.
I can see three tasks that we have to accomplish for a competition
- collect a search corpus. It seems that the arXiv would be the right thing to start from here, it is big enough to pick competition examples randomly.
- cooperate on an analysis pipeline and corpora. This would allow people to cooperate without having a full analysis pipeline.
- collect a corpus of search queries. This may be the biggest hurdle, since we need a gold standard of what we expect the hits to be
- come up with “divisions”. not all engines can do the same, so we should only let comparable engines compete; also multiple divisions will allow to have multiple trophies.
- build a competition harness. So that tests can be automated. This will also require and thus lead to general search APIs.
This is all I can think about at the moment, so give me your feedback.