Search for Static Sites with SQLite FTS5

How I built a fast, multilingual search system for Hugo sites using SQLite FTS5, a PHP API, and vanilla JavaScript—no Elasticsearch, no Algolia, no client-side index.

When I migrated my sites from WordPress to Hugo, search was the one thing that gave me pause. WordPress has search plugins. Static sites have… nothing. You generate HTML files and dump them on a web server. There’s no runtime, no database, no query engine.

The common solutions didn’t excite me. Client-side search libraries like Lunr.js or Fuse.js ship your entire search index as a JSON blob to the browser. For a blog with ten posts, fine. For kommandozeile.org with well over a hundred articles or help.pdf-pro.net with hundreds of pages across eleven languages? You’d be forcing users to download megabytes of index data before they can type a single character. Hosted solutions like Algolia work great but cost money and add a dependency I didn’t need.

So I built my own. A custom indexer creates a SQLite FTS5 database at build time. A tiny PHP script serves as the search API. Vanilla JavaScript handles the frontend with debounced input and keyboard navigation. The whole thing is maybe 100 lines of PHP and 100 lines of JavaScript, and I use the same pattern across all three sites I migrated.

The Architecture

The search system has three components:

  1. Indexer — a CLI tool that reads Hugo content and creates a SQLite FTS5 database
  2. API — a PHP script that queries the database and returns JSON
  3. Frontend — JavaScript that calls the API as you type and renders results

The indexer runs at build time, alongside Hugo. The API and frontend run at request time on the web server. The only runtime dependency is PHP with SQLite support—which is enabled by default on virtually every PHP installation. No Elasticsearch, no Redis, no background processes—just a single binary at build time and a PHP script plus a database file at runtime.

SQLite FTS5: The Engine

FTS5 is SQLite’s full-text search extension. If you’ve never heard of it, here’s the short version: it creates an inverted index inside a SQLite database. The same data structure that powers Elasticsearch, Lucene, and every search engine you’ve ever used—but in a single file with zero configuration.

An FTS5 virtual table looks like a regular SQLite table, but it’s backed by a full-text index:

CREATE VIRTUAL TABLE posts USING fts5(
    slug,
    title,
    excerpt,
    categories,
    content
);

Every column is full-text indexed by default. You can query it with the MATCH operator:

SELECT slug, title, excerpt, categories
FROM posts
WHERE posts MATCH 'sqlite'
ORDER BY rank
LIMIT 10;

The rank column is implicit—FTS5 generates it using a BM25-like scoring algorithm. Results that match in the title naturally rank higher than matches buried deep in the content body. I didn’t have to configure any of this. It just works.

FTS5 also supports prefix matching out of the box. Append * to your query and “migr” matches “migration,” “migrating,” and “migrate”:

WHERE posts MATCH 'migr*'

And phrase matching:

WHERE posts MATCH '"static site generator"'

And boolean operators:

WHERE posts MATCH 'hugo AND search NOT wordpress'

I only use prefix matching in my implementation—it covers the instant-search-as-you-type use case perfectly. But the full query syntax is there if you need it.

The Indexer

I wrote a CLI tool called hugo-search-indexer that reads a Hugo content directory, parses the YAML front matter and Markdown body of each file, and inserts the fields into an FTS5 database.

hugo-search-indexer \
    --content content/posts/ \
    --output website/data/search.sqlite \
    --fields "title,excerpt,categories,content"

The --fields flag controls what gets indexed. For my blog, I index the title, excerpt, categories, and full post content. For kommandozeile.org, articles don’t have excerpts, so I index title, content, and categories instead:

hugo-search-indexer \
    --content content/articles/ \
    --output website/data/search.sqlite \
    --fields "title,content,categories"

The indexer runs as part of the build script, between the Hugo build step and the rsync step:

# Step 1: Build Hugo site
hugo --minify --source hugo/ --destination "$TEMP_DIR"

# Step 2: Index content for search
hugo-search-indexer \
    --content content/posts/ \
    --output "$TEMP_DIR/data/search.sqlite"

# Step 3: Sync to output directory
rsync -a --checksum --delete "$TEMP_DIR/" "website/"

Because the indexer reads the raw Markdown source—not the generated HTML—it doesn’t need Hugo to finish first. But I run it after Hugo anyway so the database lands in the temp directory and gets synced alongside everything else.

The resulting databases are tiny—a few hundred kilobytes for my blog, under a megabyte even for sites with over a hundred articles.

The Search API

This is the entire backend. One PHP file, about 30 lines of logic. I’ll walk through it piece by piece, then show the complete code.

Input Sanitization

$query = isset($_GET['q']) ? trim($_GET['q']) : '';

if (strlen($query) < 2) {
    echo '[]';
    exit;
}

Minimum two characters. Anything shorter returns an empty array immediately. This prevents single-character queries from hammering the index and returning hundreds of useless results.

Query Cleaning

$searchQuery = preg_replace('/[^\p{L}\p{N}\s\-]/u', '', $query);
$searchQuery = trim($searchQuery) . '*';

This is the most important line in the entire API. The regex strips everything except Unicode letters (\p{L}), numbers (\p{N}), whitespace (\s), and hyphens (\-). The /u flag enables Unicode mode.

Why does this matter? FTS5 has metacharacters—*, ", AND, OR, NOT, NEAR, parentheses—that change query semantics. If a user types "hello world", the quotes would trigger a phrase search. If they type NOT password, it becomes a negation query. Stripping these characters turns any user input into a safe, literal search.

But here’s the subtle part: the \p{L} character class preserves letters from every Unicode script. German umlauts (ü, ö, ä), French accents (é, è, ê), Japanese hiragana, Chinese hanzi, Korean hangul, Cyrillic characters—they all survive the sanitization. A naive [^a-zA-Z0-9] filter would destroy any non-Latin query, which would be a disaster for help.pdf-pro.net where users search in eleven different languages.

The * appended at the end enables prefix matching. User types “conf” and gets results for “configuration,” “configure,” “conflict.”

The Query

$db = new SQLite3($dbPath, SQLITE3_OPEN_READONLY);

$stmt = $db->prepare('
    SELECT slug, title, excerpt, categories
    FROM posts
    WHERE posts MATCH :query
    ORDER BY rank
    LIMIT 10
');

$stmt->bindValue(':query', $searchQuery, SQLITE3_TEXT);
$results = $stmt->execute();

Read-only mode. SQLITE3_OPEN_READONLY means the connection physically cannot write to the database. Even if someone somehow injected a DROP TABLE through the sanitization—which the regex prevents—SQLite would reject it at the connection level.

Prepared statement. The :query parameter is bound separately from the SQL, preventing injection. Belt and suspenders.

ORDER BY rank. FTS5’s built-in ranking. Higher rank means better match. Title matches outweigh content matches. Exact matches outweigh partial matches. No tuning required.

LIMIT 10. Search results beyond the first ten are rarely useful in an instant-search dropdown. I use 15 for kommandozeile.org where articles are shorter and users scan more results.

The Complete API

Here’s the full file. It lives at /api/search.php relative to the site root:

<?php
header('Content-Type: application/json');
header('Cache-Control: no-cache');

$query = isset($_GET['q']) ? trim($_GET['q']) : '';

if (strlen($query) < 2) {
    echo '[]';
    exit;
}

$dbPath = __DIR__ . '/../data/search.sqlite';
if (!file_exists($dbPath)) {
    echo '[]';
    exit;
}

try {
    $db = new SQLite3($dbPath, SQLITE3_OPEN_READONLY);

    $searchQuery = preg_replace('/[^\p{L}\p{N}\s\-]/u', '', $query);
    $searchQuery = trim($searchQuery) . '*';

    $stmt = $db->prepare('
        SELECT slug, title, excerpt, categories
        FROM posts
        WHERE posts MATCH :query
        ORDER BY rank
        LIMIT 10
    ');

    $stmt->bindValue(':query', $searchQuery, SQLITE3_TEXT);
    $results = $stmt->execute();

    $posts = [];
    while ($row = $results->fetchArray(SQLITE3_ASSOC)) {
        $posts[] = [
            'slug'       => $row['slug'],
            'title'      => $row['title'],
            'excerpt'    => $row['excerpt'] ?? '',
            'categories' => $row['categories'] ?? ''
        ];
    }

    $db->close();
    echo json_encode($posts);

} catch (Exception $e) {
    error_log("Search API error: " . $e->getMessage());
    echo '[]';
}

If anything fails—missing database, corrupt file, invalid query—the API returns an empty JSON array and logs the error server-side. The user sees “No results found.” No stack traces, no 500 errors, no information leakage.

The Frontend JavaScript

The frontend needs to do three things: call the API as the user types, render results, and handle keyboard navigation. Here’s the full implementation:

(function() {
    'use strict';

    const DEBOUNCE_MS = 300;
    const MIN_QUERY_LENGTH = 2;

    function initSearch(inputId, resultsId) {
        const searchInput = document.getElementById(inputId);
        const searchResults = document.getElementById(resultsId);

        if (!searchInput || !searchResults) return;

        let debounceTimer = null;
        let selectedIndex = -1;

        async function performSearch(query) {
            if (query.length < MIN_QUERY_LENGTH) {
                hideResults();
                return;
            }

            try {
                const response = await fetch(
                    `/api/search.php?q=${encodeURIComponent(query)}`
                );
                const results = await response.json();
                renderResults(results);
            } catch (error) {
                console.error('Search error:', error);
                hideResults();
            }
        }

        function renderResults(results) {
            if (!results || results.length === 0) {
                searchResults.innerHTML =
                    '<div class="search-no-results">No results found</div>';
                showResults();
                return;
            }

            let html = '';
            for (const result of results) {
                const excerpt = result.excerpt
                    ? `<div class="result-excerpt">${escapeHtml(result.excerpt)}</div>`
                    : '';
                html += `
                    <a href="/${result.slug}/" class="search-result-item">
                        <span class="result-title">${escapeHtml(result.title)}</span>
                        ${excerpt}
                    </a>
                `;
            }

            searchResults.innerHTML = html;
            showResults();
        }

        function showResults() {
            searchResults.classList.add('active');
        }

        function hideResults() {
            searchResults.classList.remove('active');
            selectedIndex = -1;
        }

        function escapeHtml(text) {
            const div = document.createElement('div');
            div.textContent = text;
            return div.innerHTML;
        }

        // Debounced input handler
        searchInput.addEventListener('input', function(e) {
            const query = e.target.value.trim();
            clearTimeout(debounceTimer);
            selectedIndex = -1;

            if (query.length < MIN_QUERY_LENGTH) {
                hideResults();
                return;
            }

            debounceTimer = setTimeout(() => performSearch(query), DEBOUNCE_MS);
        });

        // Keyboard navigation
        searchInput.addEventListener('keydown', function(e) {
            const items = searchResults.querySelectorAll('.search-result-item');
            const resultsVisible = searchResults.classList.contains('active');

            if (e.key === 'ArrowDown' && resultsVisible && items.length > 0) {
                e.preventDefault();
                selectedIndex = Math.min(selectedIndex + 1, items.length - 1);
                items.forEach(item => item.classList.remove('selected'));
                items[selectedIndex].classList.add('selected');
                items[selectedIndex].scrollIntoView({ block: 'nearest' });

            } else if (e.key === 'ArrowUp' && resultsVisible && items.length > 0) {
                e.preventDefault();
                selectedIndex = Math.max(selectedIndex - 1, -1);
                items.forEach(item => item.classList.remove('selected'));
                if (selectedIndex >= 0) {
                    items[selectedIndex].classList.add('selected');
                    items[selectedIndex].scrollIntoView({ block: 'nearest' });
                }

            } else if (e.key === 'Enter' && selectedIndex >= 0) {
                e.preventDefault();
                window.location.href = items[selectedIndex].href;

            } else if (e.key === 'Escape') {
                hideResults();
                searchInput.blur();
            }
        });

        // Dismiss on outside click
        document.addEventListener('click', function(e) {
            if (!searchInput.contains(e.target) && !searchResults.contains(e.target)) {
                hideResults();
            }
        });

        // Re-search on focus with existing query
        searchInput.addEventListener('focus', function() {
            if (searchInput.value.trim().length >= MIN_QUERY_LENGTH) {
                performSearch(searchInput.value.trim());
            }
        });
    }

    // Initialize for desktop and mobile inputs
    initSearch('search-input-desktop', 'search-results-desktop');
    initSearch('search-input-mobile', 'search-results-mobile');
})();

Let me walk through the key decisions.

Debouncing

debounceTimer = setTimeout(() => performSearch(query), DEBOUNCE_MS);

Without debouncing, every keystroke fires an API request. Type “migration” and you’d send eight requests: “m”, “mi”, “mig”, “migr”… Most would be discarded before the response even arrives. With a 300ms debounce, the request only fires after the user stops typing. Fast typists generate one request instead of eight.

300 milliseconds feels instant to the user but dramatically reduces server load. I experimented with 100ms (too aggressive—still sends multiple requests) and 500ms (feels laggy). 300ms is the sweet spot.

HTML Escaping

function escapeHtml(text) {
    const div = document.createElement('div');
    div.textContent = text;
    return div.innerHTML;
}

This is the XSS prevention layer. Every piece of text from the API—titles, excerpts—passes through this function before being inserted into the DOM. The trick is using the browser’s own DOM API: setting textContent automatically escapes HTML entities, and reading innerHTML gives you the escaped string.

It’s safer than manual string replacement (str.replace(/</g, '&lt;')) because the browser handles edge cases you’d never think of—character encoding, Unicode normalization, zero-width characters. Let the browser do what the browser does best.

Keyboard Navigation

Arrow keys move through results, Enter opens the selected one, Escape dismisses the dropdown. This is table-stakes UX for a search component, but it’s surprising how many implementations skip it. The key detail is e.preventDefault() on arrow keys—without it, the cursor jumps to the start or end of the input while you’re trying to navigate results.

The scrollIntoView({ block: 'nearest' }) call keeps the selected result visible when the dropdown is scrollable. Without it, users arrow-key their way off the bottom of the visible area and lose track of what’s selected.

Dual Search Inputs

initSearch('search-input-desktop', 'search-results-desktop');
initSearch('search-input-mobile', 'search-results-mobile');

My blog has two search inputs: one in the desktop navbar and one in a mobile-specific collapsible panel. Same API, same logic, separate DOM elements. The initSearch function is a closure—each instance gets its own debounce timer, its own selection state, its own results container. They don’t interfere with each other.

On kommandozeile.org, there’s a single search input in a sub-header bar. Same JS, just one initSearch call instead of two.

Focus Restore

searchInput.addEventListener('focus', function() {
    if (searchInput.value.trim().length >= MIN_QUERY_LENGTH) {
        performSearch(searchInput.value.trim());
    }
});

Small detail, big impact on usability. You search for something, click a result to read it, hit the back button, click back into the search box—and your results reappear without retyping. The search state doesn’t survive page navigation (it’s not cached), but the input value does (browser form restoration), so we just re-execute the search on focus.

For help.pdf-pro.net with eleven languages, the single-database approach doesn’t work. A German user searching for “Datei” doesn’t want English results about “File.” Each language needs its own index.

Per-Language Databases

The build script loops through all languages and creates a separate database for each:

LANGUAGES=("en" "de" "es" "fr" "it" "ja" "ko" "nl" "pt" "ru" "zh")

for lang in "${LANGUAGES[@]}"; do
    if [ "$lang" = "en" ]; then
        content_dir="content/articles"
    else
        content_dir="content/articles.$lang"
    fi

    hugo-search-indexer \
        --content "$content_dir" \
        --output "$TEMP_DIR/data/search-$lang.sqlite"
done

This creates eleven databases: search-en.sqlite, search-de.sqlite, search-es.sqlite, and so on. Each one indexes only the content in that language.

Language-Aware API

The PHP API takes a lang parameter and loads the matching database:

$lang = isset($_GET['lang'])
    ? preg_replace('/[^a-z]/', '', strtolower($_GET['lang']))
    : 'en';

$validLangs = ['en', 'de', 'es', 'fr', 'it', 'ja', 'ko', 'nl', 'pt', 'ru', 'zh'];
if (!in_array($lang, $validLangs)) {
    $lang = 'en';
}

$dbPath = __DIR__ . "/../data/search-$lang.sqlite";

The language parameter is sanitized aggressively—preg_replace('/[^a-z]/', '', ...) strips everything that isn’t a lowercase letter, then validated against a whitelist. This prevents path traversal attacks. Someone sending lang=../../etc/passwd gets cleaned to langetchpasswd, which doesn’t match the whitelist, and falls back to English.

The rest of the API is identical to the single-language version. Same FTS5 query, same result format, same error handling.

Language Detection on the Frontend

The JavaScript reads the current language from the <html lang> attribute and passes it to the API:

function getCurrentLanguage() {
    return document.documentElement.lang || 'en';
}

async function performSearch(query) {
    const lang = getCurrentLanguage();
    const response = await fetch(
        `/api/search.php?q=${encodeURIComponent(query)}&lang=${lang}`
    );
    // ...
}

Result URLs also need the language prefix. English results link to /article-slug/, German results to /de/article-slug/:

function displayResults(results) {
    const lang = getCurrentLanguage();
    const langPrefix = lang === 'en' ? '' : `/${lang}`;

    results.forEach(result => {
        html += `<a href="${langPrefix}/${result.slug}/">...`;
    });
}

The “no results found” text is also localized—stored in a data-no-results attribute on the results container and read by the JavaScript. Hugo’s i18n system fills in the translated string at build time:

<div id="search-results" data-no-results="{{ i18n "search_no_results" }}"></div>

No runtime translation library. No JavaScript i18n framework. Just a data attribute set at build time.

Performance

Some numbers, because “it’s fast” is vague and unhelpful.

The SQLite FTS5 query itself takes less than a millisecond. The PHP overhead—opening the database, executing the query, encoding JSON—adds a couple of milliseconds. Network latency dominates at that point.

From the user’s perspective, the debounce adds 300ms of intentional delay, then results appear within a single frame. It feels instant—which is what matters. WordPress search, for comparison, required a full page reload and a round trip through PHP and MySQL just to render results.

The database files themselves are small enough to fit in the OS page cache after the first query. Subsequent searches don’t even hit disk.

The Pattern

I’ve been calling this a “search system,” but it’s really a pattern. The same architecture—indexer, API, frontend—works for any static site that runs on a PHP-capable server. I use it across three sites: this blog, kommandozeile.org, and help.pdf-pro.net. The PHP API, the JavaScript, and the build-time indexer are the same everywhere. What changes between sites is minor—which fields get indexed, how many results to show, whether there’s one search input or two. The multi-language version adds a lang parameter and per-language databases. That’s the only architectural difference.

No Elasticsearch cluster. No Algolia subscription. No client-side index download. Just SQLite doing what SQLite does best: being reliable and simple.

The next and final post in this series covers the broader challenge of building a multi-language Hugo site—the content organization, localized URLs, taxonomy translations, and language switcher that make help.pdf-pro.net work across eleven languages.