Website search with Apache Solr, at STC Summit

This week I’m attending STC Summit 2019, the annual conference of the Society for Technical Communication (STC). I’m blogging my notes from the sessions that I attend. Thanks and all credit go to the speakers. Any mistakes are my own.

Scott Prentice‘s session was titled Website Search with Apache Solr. The presentation covered an open source search platform, Apache Solr, introducing its features and showing us how to install the platform.

Solr is a wrapper around the Lucene indexing and search technology. It has a REST API and some native client APIs.

While Apache Solr is a vast and complex system, it’s not to hard to get in and get started.

A quick bit about search in general

Why add a search to your website? Having your own search helps you keep visitors on your site. You can allow people to use Google Search, but having your own means you can curate the search to your own requirements. Having your own search also gives you insights into what people are searching for, and thus into your content.

Types of search:

  • Remote search service through a web form or API
  • A static JavaScript search, which provides a precompiled static index accessed via JavaScript
  • A custom search platform, which is what Solr is.

Installing Solr

You can set up Solr in a standalone mode, or you can use SolrCloud, which is a collection of search engines spread across multiple servers. Scott showed us how to set up the standalone search.

The process is:

  • Download and extract the installation file
  • Install
  • Start the server
  • Test the server

Scott walked us through the process in more detail, which involved creating an installation directory and a data directory, editing a config file, and moving some files around.

Then he started the server from the command line (solr start), and accessed the Solr admin page at a localhost:// address in a web browser.

The next steps involved copying the default schema to create a collection (basically, an index), and adding some example docs as data for indexing. The default schema works, but it’s very broad since it’s designed to handle a wide variety of content types.

Scott walked us through the syntax of a Solr query. You’ll use the query syntax when constructing a search and also if you set up a panel of faceted search results for display as a navigation aid. The default response is in JSON format, but you can request XML or CSV instead.

Customising your search

After testing the search, you need to:

  • Customise your schema to suit your content and your website’s needs. Your schema defines the fields for the index. Scott showed us how to create a very simple schema, and how to apply it to your Solr installation.
  • Generate a JSON or XML feed from your content, based on the schema. There are various web crawlers available to generate the feed, such as Apache Nutch, Heritrix, GNU Wget, and more
  • Upload the feed to a Solr collection.
  • Develop a search UI, typically in JavaScript. Scott showed us a simple UI that he’d developed using jQuery. Examples of search UI types include a search form with a list of results, highlighting of search terms, faceting, autocomplete, and so on.

Scott mentioned CORS (Cross-Origin Resource Sharing), which you’ll run into when trying to read data from a remote server. The server owner has to enable the reading of content. So you need to enable your Solr server, by adding a config file. Scott recommends this blog post for help with setting up CORS.

Scott also gave us some tips on securing and scaling your Solr server before taking it to production. You can also consider using SolrCloud.

For a server, consider linode.com and websolr.com as affordable options.

In conclusion

Thank you Scott for a useful quick introduction to Apache Solr.

About Sarah Maddox

Technical writer, author and blogger in Sydney

Posted on 8 May 2019, in STC, technical writing and tagged , , , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: