Apache Solr and Ubuntu: Multiple Instances
Written
At CalArts, I have wanted to move the search functionality on our Drupal powered websites into something better for a long time. We have been using the Lucene API (which is lucene search ported to PHP) module on most of them since September of last year but (even though I am a big fan of the module) we truly wanted a way to offload the search services onto another vps (or server; basically, something more flexible). Over the past year, I have had the Apache Solr module running on our photo archive and the results have been nothing short of phenomenal: very fast (over 20x the content of the main calarts website yet over 20x the performance on hardware that is nearly 4 years old), fantastic results, and the faceted searching provides a way to find content you wouldn't otherwise. However, Drupal and Solr now also has another module that has come by: Search API. While I want to start evaluating between the two modules AND as I want to start moving all of our search capabilities over to Solr, I am left with one question: how do I get new instances of Solr up and running?
I could use something like Chef (ultimately, I most likely will) and replicate a Solr server, but that would also mean each vps would need to have a certain number of resources allocated to it to function nicely (which was not a guarantee). What I really wanted to do was create a solr server that could have multiple search instances (think of it like virtualhosts in apache but with solr search). We could provide it with a nice number of resources and *then* replicate our solr server configuration should the need arise in the future. Apache Solr actually supports such a concept with what are called multiple cores. I had a lot of trouble finding online resources that tackle creating such a setup until I came across a wonderful article by Dustin on setting up multiple cores. So I took what Dustin did and expanded on it with my own script for creating, reloading, unloading, and removing cores.
Installing Apache Solr
In the past, installing solr has been a pain (installing solr on freebsd is a particularly painful point) but the call to install the solr and see it up and running is:
apt-get install solr-tomcat
That's it! It'll probably take some time to install (java, tomcat, solr and a whole other bunch of dependencies need to get downloaded and installed) but at this stage, if you wanted a single solr server, you are ready to start setting it up with either the apachesolr module or the search api module. You should be able to see your solr search up and running at http://localhost:8080/solr (or the localhost with your domain or ip address if solr is not on your local machine).
Enable Multicore capability
Using your favorite text editor create a file called solr.xml at /usr/share/solr with the following contents:
<?xml version='1.0' encoding='UTF-8'?> <solr sharedLib="lib" persistent="true"> <cores adminPath="/admin/cores"> </cores> </solr>
Next, you need to ensure that Tomcat is able to write out new versions of the solr.xml file. As cores are added or removed, this file is updated. The following commands ensure Tomcat has write permissions to needed directory and file:
chown tomcat6.tomcat6 /usr/share/solr/solr.xml chown tomcat6.tomcat6 /usr/share/solr
And we can restart tomcat (/etc/init.d/tomcat6 restart
). We are now ready to start setting up multiple cores.
Managing Cores
Before we can start creating multiple cores, we need to create config files, directories, set permissions, etc. To make the process easier, we'll first create a template config directory:
cp -av /etc/solr/conf /etc/solr/conftemplate
Next, we edit the solrconfig.xml file by editing the dataDir
option from:
<dataDir>/var/lib/solr/data</dataDir>
to
<dataDir>/var/lib/solr/data/CORENAME</dataDir>
NOTE
Since we are using drupal and the solr modules that come with this, I would recommend also copying over the solrconfig.xml file they provide to the template directory (you could name it drupal.solrconfig.xml), and making the same change. In the future you would simply need to overwrite the defaultsolrconfig.xml file with yours and you'd be good to go :)
Now that we have our template directory ready, we need a way to do a few things as a start:
- Create a new core. This involves letting solr know there is a new core and to create a copy of the configuration (these per-core config will be in /etc/solr/conf/
- Reload a core. This would reload the settings for a particular core so that tomcat (and your other search cores) do not need to restart
- Unload a core. This is to stop using a particular core. Note that this is *not* to remove the index, settings, etc for a particular core from the server.
- Remove a core. This would remove the core (index, settings, everything) from the server.
For this piece, I actually wrote a script which is available at http://pastebin.com/MWaqe7xi (I'm also pasting the current code below). The script does all of the above - what you want to do is download the script to a file on your server (in my case, I call it 'solr-admin.sh' and you can place it in your /usr/sbin folder /home//bin). Now I issue the following commands:
- Create new core:
solr-admin create <CORENAME>
- Reload existing core settings:
solr-admin reload <CORENAME>
- Unload existing core:
solr-admin unload <CORENAME>
- Remove existing core:
solr-admin remove <CORENAME>
There are a number of other functions that can also be created (such as merging the indexes of multiple cores into one, renaming a core, etc) which are not yet handled by the script. Everyone is welcome to contribute and help flesh this out :D
Contents of solr-admin.sh
#!/bin/sh # This file mimics creating / updating / unloading / deleting solr cores # Create a new core # arg 1: core name create_solr_core() { # creates a new Solr core if [ "$1" = "" ]; then echo -n "Name of core to create: " read name else name=$1 fi if [ -d "/var/lib/solr/data/$name" ]; then echo "Cannot create $name: core with same name already exists" exit 1 else mkdir /var/lib/solr/data/$name chown tomcat6.tomcat6 /var/lib/solr/data/$name mkdir -p /etc/solr/conf/$name/conf cp -a /etc/solr/conftemplate/* /etc/solr/conf/$name/conf/ sed -i "s/CORENAME/$name/" /etc/solr/conf/$name/conf/solrconfig.xml curl "http://localhost:8080/solr/admin/cores?action=CREATE&name=$name&instanceDir=/etc/solr/conf/$name" echo "Please read status from solr - by all accounts (pending a proper name), core $name was created" fi exit 0 } # Reload existing core # arg 1: core name reload_solr_core() { # reloads a Solr core if [ "$1" = "" ]; then echo -n "Name of core to reload: " read name else name=$1 fi if [ ! -d /var/lib/solr/data/$name ] || [ $name = "" ]; then echo "Core doesn't exist" exit fi curl "http://localhost:8080/solr/admin/cores?action=RELOAD&core=$name" echo "Core $name has been reloaded" } # Update existing core # arg 1: core name unload_solr_core() { if [ "$1" = "" ]; then echo -n "Name of core to remove: " read name else name=$1 fi if [ -d "/var/lib/solr/data/$name" ]; then curl "http://localhost:8080/solr/admin/cores?action=UNLOAD&core=$name" echo "Core $name has been unloaded" else echo "Core $name does not exist" fi } # Remove existing core # arg 1: core name remove_solr_core() { if [ "$1" = "" ]; then echo -n "Name of core to remove: " read name else name=$1 fi if [ -d "/var/lib/solr/data/$name" ]; then unload_solr_core $name rm -rf /var/lib/solr/data/$name rm -rf /etc/solr/conf/$name echo "Deleted configuration settings for $name" else echo "Core $name does not exist" fi exit 0 } case "$1" in create) create_solr_core $2 ;; reload) reload_solr_core $2 ;; unload) unload_solr_core $2 ;; remove) remove_solr_core $2 ;; esac exit 0