For my projects I use the awesome tool Capistrano for deployment. In a project that uses Solr, I want to update the Solr core as a deployment step if the Solr configuration of that core is changed. After the reload of the configuration I want to do a full import with the Solr Data Import Handler.
If you have a large import then this can take a while. But it is possible to do this without any downtime! The trick is to work with two cores. A “live” core and a “on deck” core. The More >
This tutorial requires that Jetty is installed as described at http://pietervogelaar.nl/ubuntu-12-04-install-jetty-9.
In this tutorial we use an example project named “airport” and a core named “flight”.
cd /opt wget http://apache.hippo.nl/lucene/solr/4.0.0/apache-solr-4.0.0.tgz tar -xvf apache-solr-4.0.0.tgz cp apache-solr-4.0.0/dist/apache-solr-4.0.0.war /opt/jetty/webapps/solr.war cp -R apache-solr-4.0.0/example/solr /opt
Optionally you can copy the “dist” and “contrib” folder as while if you want to use the data import handler for example:
cp -r /opt/apache-solr-4.0.0/dist /opt/solr cp -r /opt/apache-solr-4.0.0/contrib /opt/solr
Add to the bottom of /etc/default/jetty this line:
Add a Solr core, as example copy the More >
The Solr schema.xml has by default the “string” fieldType.
Add the following fieldType to your schema.xml and use it as field type for your field:
Solr has by default in the schema.xml the field type “string” available.
With this field type you can search with a string to get an exact match, instead of a contains match that the “text_general” field type will give you. However another difference is that the “text_general” field type is case insensitive by default and “string” is case sensitive.
To perform an case insensitive exact match search, you’ll have to add a custom field type More >
At http://wiki.apache.org/solr/LanguageAnalysis stop word files can be downloaded for several languages. Also the one for the English language is more extended than the default one shipped with Solr.
However if you enable them with the solr.StopFilterFactory the stop words still are not removed. This is caused by the “|” pipe characters after each word. Solr wants every word on a new line without anything else. Also replacement of | to # doesn’t work. This problem can be hard to discover if you expect that information provided by the Solr website should just work.
So to still contain the More >
As of Solr version 1.3 it’s possible to use multiple stop word files. You can define multiple files comma seperated like this: