Exact searches with Lucene (and searches with special characters)

There’s off course thousands of ways to solve this issue. I decided to go old-school 🙂

So whats the problem? Well, if you really want to search for an exact value, and perhaps it contains strange characters (like formulas, algorithms), maybe it’s to short to be indexed correctly… and I’m sure there are more reasons.

What can you do? One way to address this is to construct a unique string, store it in the index and search that column instead of the column containing the clear text value.

An example, lets say you have values like A(b)-c in a column, and you really only want to find exact matches. Parentheses and hyphens makes this hard, and if you replace these you risk get other matches as well. Want you can do is pass the value to a base64-encoder, which will give you a unique string without strange characters, in this case you’ll get: QShiKS1j

This has to be done when you create your index. So now you have two columns, side-by-side, one with the actual value and one with encoded value.

So, when a user enters A(b)-c in the search field, all you have to do in base64-encode it before using it in your search query, and of course search in the encoded field rather than the clear text one.

The code to encode and decode strings:

The code to create an index

And here’s one way to search the index



Leave a Reply

Your email address will not be published. Required fields are marked *