From daffy0208-ai-dev-standards
Design and build knowledge graphs. Use when modeling complex relationships, building semantic search, or creating knowledge bases. Covers schema design, entity relationships, and graph database selection.
npx claudepluginhub joshuarweaver/cascade-content-creation-misc-1 --plugin daffy0208-ai-dev-standardsThis skill uses the workspace's default tool permissions.
Build structured knowledge graphs for enhanced AI system performance through relational knowledge.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Build structured knowledge graphs for enhanced AI system performance through relational knowledge.
Knowledge graphs make implicit relationships explicit, enabling AI systems to reason about connections, verify facts, and avoid hallucinations.
Goal: Define entities, relationships, and properties for your domain
Entity Types (Nodes):
Relationship Types (Edges):
Properties (Attributes):
Example Ontology:
# RDF/Turtle format
@prefix : <http://example.org/ontology#> .
:Person a owl:Class ;
rdfs:label "Person" .
:Organization a owl:Class ;
rdfs:label "Organization" .
:worksFor a owl:ObjectProperty ;
rdfs:domain :Person ;
rdfs:range :Organization ;
rdfs:label "works for" .
Validation:
Decision Matrix:
Neo4j (Recommended for most):
Amazon Neptune:
ArangoDB:
TigerGraph:
Technology Stack:
graph_database: 'Neo4j Community' # or Enterprise for production
vector_integration: 'Pinecone' # For hybrid search
embeddings: 'text-embedding-3-large' # OpenAI
etl: 'Apache Airflow' # For data pipelines
Neo4j Schema Setup:
// Create constraints for uniqueness
CREATE CONSTRAINT person_id IF NOT EXISTS
FOR (p:Person) REQUIRE p.id IS UNIQUE;
CREATE CONSTRAINT org_name IF NOT EXISTS
FOR (o:Organization) REQUIRE o.name IS UNIQUE;
// Create indexes for performance
CREATE INDEX entity_search IF NOT EXISTS
FOR (e:Entity) ON (e.name, e.type);
CREATE INDEX relationship_type IF NOT EXISTS
FOR ()-[r:RELATED_TO]-() ON (r.type, r.confidence);
Goal: Extract entities and relationships from data sources
Data Sources:
Entity Extraction Pipeline:
class EntityExtractionPipeline:
def __init__(self):
self.ner_model = load_ner_model() # spaCy, Hugging Face
self.entity_linker = EntityLinker()
self.deduplicator = EntityDeduplicator()
def process_text(self, text: str) -> List[Entity]:
# 1. Extract named entities
entities = self.ner_model.extract(text)
# 2. Link to existing entities (entity resolution)
linked_entities = self.entity_linker.link(entities)
# 3. Deduplicate and resolve conflicts
resolved_entities = self.deduplicator.resolve(linked_entities)
return resolved_entities
Relationship Extraction:
class RelationshipExtractor:
def extract_relationships(self, entities: List[Entity],
text: str) -> List[Relationship]:
relationships = []
# Use dependency parsing or LLM for extraction
doc = self.nlp(text)
for sent in doc.sents:
rels = self.extract_from_sentence(sent, entities)
relationships.extend(rels)
# Validate against ontology
valid_relationships = self.validate_relationships(relationships)
return valid_relationships
LLM-Based Extraction (for complex relationships):
def extract_with_llm(text: str) -> List[Relationship]:
prompt = f"""
Extract entities and relationships from this text:
{text}
Format: (Entity1, Relationship, Entity2, Confidence)
Only extract factual relationships.
"""
response = llm.generate(prompt)
relationships = parse_llm_response(response)
return relationships
Validation:
Goal: Combine structured graph with semantic vector search
Architecture:
class HybridKnowledgeSystem:
def __init__(self):
self.graph_db = Neo4jConnection()
self.vector_db = PineconeClient()
self.embedding_model = OpenAIEmbeddings()
def store_entity(self, entity: Entity):
# Store structured data in graph
self.graph_db.create_node(entity)
# Store embeddings in vector database
embedding = self.embedding_model.embed(entity.description)
self.vector_db.upsert(
id=entity.id,
values=embedding,
metadata=entity.metadata
)
def hybrid_search(self, query: str, top_k: int = 10) -> SearchResults:
# 1. Vector similarity search
query_embedding = self.embedding_model.embed(query)
vector_results = self.vector_db.query(
vector=query_embedding,
top_k=100
)
# 2. Graph traversal from vector results
entity_ids = [r.id for r in vector_results.matches]
graph_results = self.graph_db.get_subgraph(entity_ids, max_hops=2)
# 3. Merge and rank results
merged = self.merge_results(vector_results, graph_results)
return merged[:top_k]
Benefits of Hybrid Approach:
Common Query Patterns:
1. Find Entity:
MATCH (e:Entity {id: $entity_id})
RETURN e
2. Find Relationships:
MATCH (source:Entity {id: $entity_id})-[r]-(target)
RETURN source, r, target
LIMIT 20
3. Path Between Entities:
MATCH path = shortestPath(
(source:Person {id: $source_id})-[*..5]-(target:Person {id: $target_id})
)
RETURN path
4. Multi-Hop Traversal:
MATCH (p:Person {name: $name})-[:WORKS_FOR]->(o:Organization)-[:LOCATED_IN]->(l:Location)
RETURN p.name, o.name, l.city
5. Recommendation Query:
// Find people similar to this person based on shared organizations
MATCH (p1:Person {id: $person_id})-[:WORKS_FOR]->(o:Organization)<-[:WORKS_FOR]-(p2:Person)
WHERE p1 <> p2
RETURN p2, COUNT(o) AS shared_orgs
ORDER BY shared_orgs DESC
LIMIT 10
Knowledge Graph API:
class KnowledgeGraphAPI:
def __init__(self, graph_db):
self.graph = graph_db
def find_entity(self, entity_name: str) -> Entity:
"""Find entity by name with fuzzy matching"""
query = """
MATCH (e:Entity)
WHERE e.name CONTAINS $name
RETURN e
ORDER BY apoc.text.levenshtein(e.name, $name)
LIMIT 1
"""
return self.graph.run(query, name=entity_name).single()
def find_relationships(self, entity_id: str,
relationship_type: str = None,
max_hops: int = 2) -> List[Relationship]:
"""Find relationships within specified hops"""
query = f"""
MATCH (source:Entity {{id: $entity_id}})
MATCH path = (source)-[r*1..{max_hops}]-(target)
RETURN path, relationships(path) AS rels
LIMIT 100
"""
return self.graph.run(query, entity_id=entity_id).data()
def get_subgraph(self, entity_ids: List[str],
max_hops: int = 2) -> Subgraph:
"""Get connected subgraph for multiple entities"""
query = f"""
MATCH (e:Entity)
WHERE e.id IN $entity_ids
CALL apoc.path.subgraphAll(e, {{maxLevel: {max_hops}}})
YIELD nodes, relationships
RETURN nodes, relationships
"""
return self.graph.run(query, entity_ids=entity_ids).data()
Goal: Use knowledge graph to ground LLM responses and detect hallucinations
Knowledge Graph RAG:
class KnowledgeGraphRAG:
def __init__(self, kg_api, llm_client):
self.kg = kg_api
self.llm = llm_client
def retrieve_context(self, query: str) -> str:
# Extract entities from query
entities = self.extract_entities_from_query(query)
# Retrieve relevant subgraph
subgraph = self.kg.get_subgraph(
[e.id for e in entities],
max_hops=2
)
# Format subgraph for LLM
context = self.format_subgraph_for_llm(subgraph)
return context
def generate_with_grounding(self, query: str) -> GroundedResponse:
context = self.retrieve_context(query)
prompt = f"""
Context from knowledge graph:
{context}
User query: {query}
Answer based only on the provided context. Include source entities.
"""
response = self.llm.generate(prompt)
return GroundedResponse(
response=response,
sources=self.extract_sources(context),
confidence=self.calculate_confidence(response, context)
)
Hallucination Detection:
class HallucinationDetector:
def __init__(self, knowledge_graph):
self.kg = knowledge_graph
def verify_claim(self, claim: str) -> VerificationResult:
# Parse claim into (subject, predicate, object)
parsed_claim = self.parse_claim(claim)
# Query knowledge graph for evidence
evidence = self.kg.find_evidence(
parsed_claim.subject,
parsed_claim.predicate,
parsed_claim.object
)
if evidence:
return VerificationResult(
is_supported=True,
evidence=evidence,
confidence=evidence.confidence
)
# Check for contradictory evidence
contradiction = self.kg.find_contradiction(parsed_claim)
return VerificationResult(
is_supported=False,
is_contradicted=bool(contradiction),
contradiction=contradiction
)
Define your schema before ingesting data. Changing ontology later is expensive.
Deduplicate entities aggressively. "Apple Inc", "Apple", "Apple Computer" → same entity.
Every relationship should have a confidence score (0.0-1.0) and source.
Don't try to model entire domain at once. Start with core entities and expand.
Combine graph traversal (structured) with vector search (semantic) for best results.
1. Question Answering:
2. Recommendation:
3. Fraud Detection:
4. Knowledge Discovery:
5. Semantic Search:
For MVPs (<10K entities):
For Production (10K-1M entities):
For Enterprise (1M+ entities):
Related Skills:
rag-implementer - For hybrid KG+RAG systemsmulti-agent-architect - For knowledge-graph-powered agentsapi-designer - For KG API designRelated Patterns:
META/DECISION-FRAMEWORK.md - Graph DB selectionSTANDARDS/architecture-patterns/knowledge-graph-pattern.md - KG architectures (when created)Related Playbooks:
PLAYBOOKS/deploy-neo4j.md - Neo4j deployment (when created)PLAYBOOKS/build-kg-rag-system.md - KG-RAG integration (when created)