Website search with Apache Solr, at STC Summit
This week I’m attending STC Summit 2019, the annual conference of the Society for Technical Communication (STC). I’m blogging my notes from the sessions that I attend. Thanks and all credit go to the speakers. Any mistakes are my own.
Scott Prentice‘s session was titled Website Search with Apache Solr. The presentation covered an open source search platform, Apache Solr, introducing its features and showing us how to install the platform.
Solr is a wrapper around the Lucene indexing and search technology. It has a REST API and some native client APIs.
While Apache Solr is a vast and complex system, it’s not to hard to get in and get started.
A quick bit about search in general
Why add a search to your website? Having your own search helps you keep visitors on your site. You can allow people to use Google Search, but having your own means you can curate the search to your own requirements. Having your own search also gives you insights into what people are searching for, and thus into your content.
Types of search:
- Remote search service through a web form or API
- A custom search platform, which is what Solr is.
You can set up Solr in a standalone mode, or you can use SolrCloud, which is a collection of search engines spread across multiple servers. Scott showed us how to set up the standalone search.
The process is:
- Download and extract the installation file
- Start the server
- Test the server
Scott walked us through the process in more detail, which involved creating an installation directory and a data directory, editing a config file, and moving some files around.
Then he started the server from the command line (
solr start), and accessed the Solr admin page at a
localhost:// address in a web browser.
The next steps involved copying the default schema to create a collection (basically, an index), and adding some example docs as data for indexing. The default schema works, but it’s very broad since it’s designed to handle a wide variety of content types.
Scott walked us through the syntax of a Solr query. You’ll use the query syntax when constructing a search and also if you set up a panel of faceted search results for display as a navigation aid. The default response is in JSON format, but you can request XML or CSV instead.
Customising your search
After testing the search, you need to:
- Customise your schema to suit your content and your website’s needs. Your schema defines the fields for the index. Scott showed us how to create a very simple schema, and how to apply it to your Solr installation.
- Generate a JSON or XML feed from your content, based on the schema. There are various web crawlers available to generate the feed, such as Apache Nutch, Heritrix, GNU Wget, and more
- Upload the feed to a Solr collection.
Scott mentioned CORS (Cross-Origin Resource Sharing), which you’ll run into when trying to read data from a remote server. The server owner has to enable the reading of content. So you need to enable your Solr server, by adding a config file. Scott recommends this blog post for help with setting up CORS.
Scott also gave us some tips on securing and scaling your Solr server before taking it to production. You can also consider using SolrCloud.
Thank you Scott for a useful quick introduction to Apache Solr.