From glean-pack
Enterprise architecture: Source Systems to Connectors (Cloud Run/Lambda, event-driven or scheduled) to Glean Indexing API to Glean Search Index to Client API (Search + Chat) to Your Apps (Slack bot, portal, internal tools). Trigger: "glean reference architecture", "reference-architecture".
npx claudepluginhub flight505/skill-forge --plugin glean-packThis skill is limited to using the following tools:
Enterprise search integration architecture for connecting internal knowledge systems to Glean's indexing and search platform. Designed for organizations needing unified search across Confluence, Google Drive, Notion, Slack, Jira, and custom internal tools. Key design drivers: connector reliability for continuous indexing, permission synchronization to enforce source-system ACLs, incremental vs ...
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Enterprise search integration architecture for connecting internal knowledge systems to Glean's indexing and search platform. Designed for organizations needing unified search across Confluence, Google Drive, Notion, Slack, Jira, and custom internal tools. Key design drivers: connector reliability for continuous indexing, permission synchronization to enforce source-system ACLs, incremental vs bulk indexing tradeoffs, and low-latency search aggregation across heterogeneous document types.
Source Systems ──→ Connector Framework ──→ Queue (SQS) ──→ Glean Indexing API
(Confluence, Drive, (Cloud Run) ↓ /indexing/documents
Notion, Slack, Jira) ↓ Permission Sync /indexing/permissions
Schedule (cron) ──→ Bulk Reindexer /indexing/datasources
↓
Glean Search Index ──→ Client API ──→ Your Apps
/search (Slack bot, portal)
/chat (internal tools)
class ConnectorService {
constructor(private glean: GleanIndexingClient, private cache: CacheLayer) {}
async indexDocument(doc: SourceDocument): Promise<void> {
const gleanDoc = this.transformToGleanFormat(doc);
await this.glean.indexDocument(doc.datasource, gleanDoc);
await this.syncPermissions(doc.id, doc.acl);
}
async bulkReindex(datasource: string, since?: string): Promise<IndexReport> {
const docs = await this.fetchAllDocuments(datasource, since);
const batches = this.chunk(docs, 100); // Glean recommends batches of 100
let indexed = 0;
for (const batch of batches) {
await this.glean.bulkIndex(datasource, batch);
indexed += batch.length;
}
return { datasource, totalIndexed: indexed, timestamp: new Date().toISOString() };
}
}
const CACHE_CONFIG = {
searchResults: { ttl: 30, prefix: 'search' }, // 30s — freshness critical for search
permissions: { ttl: 300, prefix: 'perm' }, // 5 min — ACL changes are infrequent
datasources: { ttl: 3600, prefix: 'ds' }, // 1 hr — datasource config rarely changes
connectorState: { ttl: 60, prefix: 'conn' }, // 1 min — sync cursor freshness
documentMeta: { ttl: 120, prefix: 'docmeta' }, // 2 min — title/author for search previews
};
// Webhook-driven invalidation: source system change events flush document cache immediately
class IndexingPipeline {
private queue = new Bull('glean-indexing', { redis: process.env.REDIS_URL });
async onSourceChange(event: SourceChangeEvent): Promise<void> {
await this.queue.add(event.type, event, { attempts: 5, backoff: { type: 'exponential', delay: 3000 } });
}
async processDocumentChange(event: DocumentChangeEvent): Promise<void> {
if (event.action === 'deleted') await this.glean.deleteDocument(event.datasource, event.docId);
else await this.connector.indexDocument(await this.fetchDoc(event.datasource, event.docId));
}
async processPermissionChange(event: PermissionChangeEvent): Promise<void> {
await this.glean.syncPermissions(event.datasource, event.docId, event.newAcl);
}
}
interface SourceDocument { id: string; datasource: string; title: string; body: string; url: string; author: string; updatedAt: string; acl: Permission[]; }
interface Permission { type: 'user' | 'group' | 'domain'; value: string; access: 'read' | 'write'; }
interface ConnectorState { datasource: string; lastSyncCursor: string; lastFullReindex: string; documentCount: number; status: 'healthy' | 'degraded' | 'failed'; }
interface IndexReport { datasource: string; totalIndexed: number; failures: string[]; timestamp: string; }
| Component | Failure Mode | Recovery |
|---|---|---|
| Connector sync | Source API rate limit | Per-datasource backoff, degrade to hourly bulk sync |
| Document indexing | Glean 429 throughput limit | Queue retry with jitter, batch size reduction |
| Permission sync | ACL mismatch between source and Glean | Reconciliation job flags discrepancies for admin review |
| Bulk reindex | Timeout on large datasource | Checkpoint cursor, resume from last successful batch |
| Search aggregation | Stale index for one datasource | Degrade gracefully — return results from healthy sources, flag staleness |
See glean-deploy-integration.