Thursday, October 21, 2010 - 02:34

Fun with Apache Solr. Deep content indexing of pdf, doc and whatever you please

I recently upgraded the search with Apache Solr. The search function itself is actually far from finished and since I wasn't deep indexing before I'm a little short of proper documents to index at the moment. The bottom line is: It's awesomeness and then some. And the best part about it. If you have it configured properly once it's pretty much a self employed document indexer. You attach the document to a page, blog or $_whatever and a bit later it's automagically indexed. Provided of course you can spare some resources for a servlet container. Solr is a java webapp. Tomcat works fine. Jetty shouldn't cause problems as well. I'm going to write a tutorial about it in a bit since it has some quirks to it.

The two attached documents are only a demonstration. It's this blog and another one in pdf format. If you want to see how the output looks like try "solr" as search phrase.