From redaxo-search-it
Generates and manages Search It indexes in REDAXO: reindex articles, DB columns, files/PDFs; console commands like search_it:reindex/clearCache; cronjobs; plaintext/PDF conversion. Use for stale results or custom indexing.
npx claudepluginhub friendsofredaxo/claude-marketplace --plugin redaxo-search-itThis skill uses the workspace's default tool permissions.
Search It stores all searchable content in database tables (`rex_tmp_search_it_index`, `rex_tmp_search_it_keywords`). Content must be indexed before it appears in search results.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
Search It stores all searchable content in database tables (rex_tmp_search_it_index, rex_tmp_search_it_keywords). Content must be indexed before it appears in search results.
By default:
pdftotext (must be installed on server)Backend > Search It > Generate index > "Generate index" button.
php redaxo/bin/console search_it:reindex # full reindex
php redaxo/bin/console search_it:clearCache # clear search cache only
use FriendsOfRedaxo\SearchIt\SearchIt;
$search = new SearchIt();
$search->generateIndex(); // full reindex
$search = new SearchIt();
// Single article (optionally for specific language)
$search->indexArticle(42); // all languages
$search->indexArticle(42, rex_clang::getCurrentId()); // current language only
// Database column
$search->indexColumn(rex::getTable('my_table'), 'description');
// File
$search->indexFile('document.pdf');
$search = new SearchIt();
$search->deleteIndex(); // drop entire index (requires regeneration)
$search->deleteCache(); // clear cached search results
$search->deleteKeywords(); // clear similarity keyword index
Search It hooks into REDAXO extension points via EventHandler and automatically re-indexes articles when they are saved, published or deleted. This happens for:
ART_ADDED, ART_UPDATED, ART_DELETEDART_STATUS (online/offline toggle)SLICE_ADDED, SLICE_UPDATED, SLICE_DELETEDMEDIA_ADDED, MEDIA_UPDATEDNo manual action needed for article content changes. But if you change backend settings (e.g. add a new DB column source), you must trigger a full reindex.
In the backend under Cronjob addon, two cronjob types are available:
Useful for sites with frequently changing external data sources (e.g. DB columns filled by imports).
Articles are fetched via HTTP (or socket), rendered, then converted to plaintext. The PlaintextConverter strips HTML, applies CSS selector exclusions, runs regex replacements and optionally parses Textile.
Configure in backend: Settings > Plaintext settings:
nav, .no-search, footer (content in these elements is not indexed)rex_extension::register('SEARCH_IT_PLAINTEXT', function(rex_extension_point $ep) {
$text = $ep->getSubject();
// Remove specific content before indexing
$text = preg_replace('/<div class="no-index">.*?<\/div>/s', '', $text);
return $text;
});
Return an array to control further processing:
return ['text' => $cleanedText, 'process' => true];
// process = true: standard plaintext conversion still runs after your hook
// process = false: use your text as-is, skip built-in conversion
Requires pdftotext (from poppler-utils) on the server:
apt-get install poppler-utils # Debian/Ubuntu
Search It uses PdfConverter to extract text from PDF files in the media pool. Enable file indexing and add pdf to the allowed extensions in backend settings.
| Table | Purpose |
|---|---|
rex_tmp_search_it_index | Main index (plaintext, metadata per article/column/file) |
rex_tmp_search_it_keywords | Keywords for similarity search |
rex_tmp_search_it_cache | Cached search results |
rex_tmp_search_it_cacheindex_ids | Links cache entries to index entries |
rex_tmp_search_it_stats_searchterms | Search term statistics |
Tables use the tmp_ prefix because they are regenerable – the index can always be rebuilt from source content.
generateIndex() on every page load – this is an expensive operation. Only call it from console, cronjob, or backend.indexArticle() – if the search cache contains stale results, call deleteCache() as well.poppler-utils on the server – PDF files are silently skipped if pdftotext is not available.