Documentation

Introduction

Letarette is a modular full-text search system that indexes text documents and makes them available for fast retrieval using a familiar query language.

The distributed nature of Letarette makes it scalable to a large number of processes, but it runs just as well on a single developer machine.

Letarette provides core search features, such as stemming in 25 languages, synonyms and stop word handling. All aimed to get your search solution up and running fast, with a minimum of configuration.

Letarette documents

Letarette is designed as a complement to a primary document storage, the single source of truth, where documents can be of any shape. The primary storage is often a SQL database, but can be any kind of storage.

Documents in Letarette have a simple structure of two indexable fields, title and text.

They also have a unique string ID, an updated timestamp field and an alive field to keep track of deleted documents.

Documents are stored in named spaces in the index, within which document IDs must be unique. IDs must never be reused, and documents are marked as "not alive" instead of being deleted, to enable robust distributed handling.

The exact format of the ID is up to the client. It could be a row ID from the primary storage, a UUID or something else.

Service components

Letarette uses an active index, meaning that it keeps itself updated. Each index periodically requests information from the primary document storage.

A Document Manager handles these requests by presenting documents from the primary storage.

For SQL-based primary storages, there is a SQL Document Manager that makes it easy to connect to Letarette. If more control is needed, one of the client libraries can be used to connect to any kind of primary storage.

A Search Agent sends queries to the Letarette index and collects the responses. These responses contain IDs of and short snippets from the matching documents. To provide richer responses, clients can dress up the search results with data from the primary storage.

As part of dressing up the search result, clients can apply further filtering based on the most recent state in the primary storage. This makes sure that search results are accurate in situations of rapidly updating data or permissions.

The component that makes Letarette a distributed system is the high performance messaging system NATS. NATS provides redundancy, load distribution and pub/sub features to Letarette. A Letarette installation requires at least one NATS server to connect the different components together.


Getting Started

Starting with Docker

The fastest way to get a tiny Letarette system up and running is by using docker-compose.

Download the docker-compose.yml file used to demonstrate the SQL Document Manager and start it up:

$ docker-compose up -d
Pulling lrsql (letarette/sql:0.1.1)...

Now, the index can be queried using the lrcli command line in the letarette image:

$ docker-compose exec letarette ./lrcli search -i docs
search>carrots -celery
Query executed in 0.000564746 seconds with status "found in index"
Returning 2 of 2 total hits, capped: false

[135] …and a carrot; boil until soft. When done, take out the…
[303] …pot, 1 carrot, 1 onion, thyme, bay leaf, salt and pepper, 2 cloves…

Starting without Docker

Install and launch a NATS Server instance:

$ nats-server

Download Letarette and a test index DB file. Then start the service (linux/mac example):

$ LETARETTE_INDEX_DISABLE=true ./letarette

This assumes that an index DB file called letarette.db is in the current directory. The LETARETTE_INDEX_DISABLE=true variable is just to suppress warning messages since we are running without a document manager.

Now, the index can be queried using the lrcli command line tool:

$ ./lrcli search -i docs
search>cat -black
...

Search syntax

The Letarette search syntax is simple and similar to other search interfaces.

Semi-formal definition:

<phrase> ::= string | quotedstring
<query> ::= [-] <phrase> [*]
<query> ::= <query> <query>

The - prefix denotes exclusions and the * denotes wildcard searches. All parts of the query must match for a document to be returned as a hit. There are no parentheses and there is no way to write "OR" - type expressions.

Examples:

animal -dog -cat

horse* - "horse head"


Tech stuff

The Letarette main service is a single binary (letarette), loading its configuration from the environment. There is a commandline tool for interacting with the index (lrcli), and tools for monitoring (lrmon) and load testing (lrload).

Letarette is written in Golang and a bit of C. It relies heavily on SQLite FTS5, NATS and the Snowball stemmer.

Next steps