giger.dev


Spelling Suggestions in Lucene

Introduction

Users often mistype their queries leading to no results. The query can be checked against an index using the SpellCheck
functionality. This will give the user possible alternative spellings.

Gradle Config

We will be using the following libraries:

compile group: 'org.apache.lucene', name: 'lucene-core', version: '8.2.0'
compile group: 'org.apache.lucene', name: 'lucene-suggest', version: '8.2.0'

SpellChecker

The solution is built on the SpellChecker functionality provided by lucene-suggest.

The SpellChecker needs its own Directory, like any other Lucene index. Also it is possible to provide a distance metric
which will be used to calculate the distance of the query to a word in the index.

SpellChecker spellChecker = new SpellChecker(FSDirectory.open(Paths.get("./index/spellcheck/")), new LevenshteinDistance());

All available distance algorithms are listed in the Documentation.

Building the SpellChecker Index

Afterwards the index can be filled using dictionaries. For our usecase it makes sense to fill the spellchecker
based on an existing lucene index. For this the LuceneDictionary class is helpful: It creates a Dictionary based on a single field
of an existing index.

Directory indexDirectory = [...]; // Search index we are searching
IndexReader indexReader = DirectoryReader.open(indexDirectory);
Analyzer analyzer = [...]; // Same analyzer used for the search index
IndexWriterConfig config = new IndexWriterConfig(analyzer), true);
spellChecker.indexDictionary(new LuceneDictionary(indexReader, "name"), config);

spellChecker.indexDictionary can be used multiple times to add more values to the spellchecker dictionary.

Finally: Check Spelling of Query

Simple version: Just get 5 alternative spellings from the index.

String queryString = [...]; // Query entered by the user
String[] suggestions = spellChecker.suggestSimilar(queryString, 5);

This will always lead to five suggestions, even if the query is typed correctly. Here is an improved version
that only shows suggestions if the query word is not in the index:

Directory indexDirectory = [...]; // Search index
IndexReader indexReader = DirectoryReader.open(indexDirectory);
String field = "name"; // Field of indexDirectory that should be used to check if the word is spelled correctly
String[] alternativeSpelling = spellChecker.suggestSimilar(queryString, 5, indexReader, field, SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX);

— Nov 16, 2019