Help us improve
Share bugs, ideas, or general feedback.
npx claudepluginhub frank-luongt/faos-skills-marketplace --plugin faos-architectHow this skill is triggered — by the user, by Claude, or both
Slash command
/faos-architect:enterprise-searchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<!-- AUTO-GENERATED by export-plugins.py — DO NOT EDIT -->
Decomposes natural language questions into type-specific sub-queries for multi-source search (chat, docs, trackers), translates to source syntax, ranks relevance, handles ambiguity.
Searches Confluence, Jira, and internal docs for company systems, terminology, processes, deployment, authentication, infrastructure, and technical concepts with parallel cited results.
Share bugs, ideas, or general feedback.
Business-oriented framework for designing cross-tool knowledge retrieval, architecting enterprise search systems, and tuning relevance models. Focused on strategy and requirements — for technical implementation, see hybrid-search-implementation and similarity-search-patterns in the ai-ml domain.
hybrid-search-implementation)similarity-search-patterns)seo-audit)| Pattern | How It Works | Pros | Cons | Best For |
|---|---|---|---|---|
| Federated | Query multiple sources in real-time, merge results | No data duplication, real-time freshness | Slower, limited cross-source ranking | Small orgs (<500 people), few sources |
| Centralized | Ingest all content into single search index | Best relevance, fastest queries, unified ranking | Data duplication, sync complexity, stale content risk | Large orgs, search-critical workflows |
| Hybrid | Centralized index for primary sources + federated for long-tail | Balanced cost vs. quality | Most complex to maintain | Mid-to-large orgs with diverse source landscape |
| Source | Connector Type | Sync Method | Typical Latency |
|---|---|---|---|
| Confluence / Wiki | REST API | Incremental (webhook + poll) | Near real-time |
| Slack / Teams | Events API | Streaming | Real-time |
| Google Drive | Drive API + Changes API | Incremental | 5-15 min |
| SharePoint | Graph API | Delta query | 5-15 min |
| GitHub | Webhooks + REST API | Event-driven | Near real-time |
| Jira / Linear | REST API + Webhooks | Incremental | Near real-time |
| Graph API / Gmail API | Incremental | 15-30 min | |
| Database | CDC (Change Data Capture) | Streaming | Near real-time |
Critical requirement: Search results must respect source-level permissions.
| Approach | How It Works | Trade-off |
|---|---|---|
| Early binding | Filter at index time (only index what user can access) | Secure but requires per-user indices or ACL tagging |
| Late binding | Filter at query time (check permissions on each result) | Simpler indexing but slower queries at scale |
| Hybrid | Group-based ACL at index + user-level check at query | Best balance for most orgs |
Every indexed document should carry these metadata fields:
| Field | Type | Purpose | Example |
|---|---|---|---|
title | string | Primary display and search field | "Q4 Revenue Report" |
source | enum | Origin system | confluence, slack, drive, github |
content_type | enum | Document classification | document, conversation, code, ticket |
team | string | Owning team or department | "Engineering", "Sales" |
created_at | datetime | For freshness scoring | 2026-01-15T10:30:00Z |
updated_at | datetime | For freshness and deduplication | 2026-02-28T14:00:00Z |
author | string | For personalization and credibility | "jane.doe@company.com" |
access_groups | list[string] | For permission filtering | ["engineering", "all-staff"] |
tags | list[string] | For faceted navigation | ["architecture", "adr", "database"] |
status | enum | Content lifecycle | draft, published, archived |
| Rule | Rationale |
|---|---|
| Use controlled vocabulary (not free-text tags) | Prevents tag proliferation and inconsistency |
| Max 5 tags per document | Forces specificity over over-tagging |
| Tags use kebab-case | Consistency with URLs and search queries |
| Review tag taxonomy quarterly | Remove unused tags, merge synonyms |
| Auto-tag where possible | Use classification models to suggest tags on creation |
| Content Type | Freshness Target | Stale Threshold | Action When Stale |
|---|---|---|---|
| Documentation | Updated quarterly | >6 months | Flag for review |
| Meeting notes | Permanent | N/A | Reduce ranking weight over time |
| Code / PRs | Always current (live sync) | N/A | N/A |
| Tickets / Issues | Live sync | N/A | Archive closed items after 12 months |
| Policies / Runbooks | Updated semi-annually | >12 months | Alert content owner |
| Factor | Weight | Description |
|---|---|---|
| Text relevance (BM25) | 40% | Keyword match quality — title, body, tags |
| Freshness | 20% | More recent content ranked higher (decay function) |
| Popularity | 15% | View count, link count, citation count |
| Personalization | 15% | User's team, recent searches, frequently accessed sources |
| Source authority | 10% | Official docs > Slack messages > personal notes |
| Field | Boost Factor | Rationale |
|---|---|---|
| Title | 3.0x | Titles are the strongest relevance signal |
| Tags | 2.0x | Curated metadata is high-signal |
| Headings (H1-H3) | 1.5x | Section headers indicate topic boundaries |
| Body text | 1.0x | Baseline — full content match |
| Comments | 0.5x | Noisy, often tangential |
| Technique | Purpose | Example |
|---|---|---|
| Synonym expansion | Match equivalent terms | "deploy" → "deploy, release, ship" |
| Spell correction | Handle typos | "kuberntes" → "kubernetes" |
| Intent classification | Route to specialized search | "how do I deploy" → tutorial filter |
| Entity recognition | Boost specific entities | "John's PR for auth" → person + code filter |
| Metric | Formula | Target | How to Measure |
|---|---|---|---|
| MRR (Mean Reciprocal Rank) | Average of 1/rank of first relevant result | >0.6 | Relevance judgments on sample queries |
| NDCG@10 | Normalized discounted cumulative gain at position 10 | >0.7 | Graded relevance judgments |
| Precision@5 | % of top 5 results that are relevant | >60% | Binary relevance judgments |
| Zero-Result Rate | % of queries returning no results | <5% | Log analysis |
| Click-Through Rate | % of searches that result in a click | >40% | Click tracking |
| Query Reformulation Rate | % of searches followed by a refined query | <20% | Session analysis |
| Time to Result | p50 and p95 query latency | p50 <200ms, p95 <1s | Infrastructure monitoring |
1. Sample 100 queries weekly from search logs
2. Have 2+ raters judge relevance of top 10 results (0-3 scale)
3. Calculate MRR, NDCG@10, Precision@5
4. Identify failure patterns (categories of bad results)
5. Adjust relevance model (boosting, synonyms, freshness weights)
6. A/B test changes against baseline
7. Repeat monthly
| Pattern | Purpose | Implementation Notes |
|---|---|---|
| Autocomplete | Reduce typing, guide to known content | Suggest from titles, tags, and popular queries |
| Faceted navigation | Filter by source, type, team, date | Show counts per facet; update dynamically |
| Snippets / Highlights | Show matching content in context | Highlight query terms in 2-3 sentence excerpts |
| Related queries | Help users refine or explore | "People also searched for..." based on co-occurrence |
| Source badges | Indicate content origin | Confluence icon, Slack icon, etc. |
| Freshness indicator | Show content age | "Updated 2 days ago" vs. "Updated 2 years ago" |
| "Did you mean?" | Handle typos gracefully | Only suggest when confidence >80% |
# Enterprise Search Requirements — [Project Name]
## Current State
- **Content sources:** [list with estimated volumes]
- **Current search tools:** [what people use today]
- **Top pain points:** [from user interviews]
## Architecture Decision
- **Pattern:** [Federated / Centralized / Hybrid]
- **Rationale:** [why this pattern]
- **Search platform:** [Elasticsearch, Typesense, Algolia, Vespa, etc.]
## Scope (Phase 1)
- **Sources to index:** [list with priority]
- **Content types:** [documents, conversations, code, tickets]
- **Users:** [target audience and access model]
## Relevance Model
- **Scoring factors:** [weights per factor]
- **Field boosting:** [title, tags, headings, body]
- **Freshness decay:** [function and parameters]
## Quality Targets
| Metric | Baseline | Target |
|--------|----------|--------|
| MRR | [current] | [goal] |
| Zero-result rate | [current] | <5% |
| p95 latency | [current] | <1s |
## Roadmap
- Phase 1: [Core sources, basic search] — [timeline]
- Phase 2: [Additional sources, relevance tuning] — [timeline]
- Phase 3: [Personalization, AI-powered features] — [timeline]
hybrid-search-implementation (ai-ml — technical implementation), similarity-search-patterns (ai-ml — vector search)