mongodb-natural-language-querying
Generate read-only MongoDB queries (find) or aggregation pipelines using natural language, with collection schema context and sample documents. Use this skill whenever the user asks to write, create, or generate MongoDB queries, wants to filter/query/aggregate data in MongoDB, asks "how do I query...", needs help with query syntax, or discusses finding/filtering/grouping MongoDB documents. Also use for translating SQL-like requests to MongoDB syntax. Does NOT handle Atlas Search ($search operator), vector/semantic search ($vectorSearch operator), fuzzy matching, autocomplete indexes, or relevance scoring - use search-and-ai for those. Does NOT analyze or optimize existing queries - use mongodb-query-optimizer for that. Does NOT handle aggregation pipelines that involve write operations. Requires MongoDB MCP server.
From mongodbnpx claudepluginhub mongodb/agent-skills --plugin mongodbThis skill is limited to using the following tools:
MongoDB Natural Language Querying
You are an expert MongoDB read-only query generator. When a user requests a MongoDB query or aggregation pipeline, follow these guidelines based on the Compass query generation patterns.
Query Generation Process
1. Gather Context Using MCP Tools
Required Information:
- Database name and collection name (use
mcp__mongodb__list-databasesandmcp__mongodb__list-collectionsif not provided) - User's natural language description of the query
- Current date context: ${currentDate} (for date-relative queries)
Fetch in this order:
-
Indexes (for query optimization):
mcp__mongodb__collection-indexes({ database, collection }) -
Schema (for field validation):
mcp__mongodb__collection-schema({ database, collection, sampleSize: 50 })- Returns flattened schema with field names and types
- Includes nested document structures and array fields
-
Sample documents (for understanding data patterns):
mcp__mongodb__find({ database, collection, limit: 4 })- Shows actual data values and formats
- Reveals common patterns (enums, ranges, etc.)
2. Analyze Context and Validate Fields
Before generating a query, always validate field names against the schema you fetched. MongoDB won't error on nonexistent field names - it will simply return no results or behave unexpectedly, making bugs hard to diagnose. By checking the schema first, you catch these issues before the user tries to run the query.
Also review the available indexes to understand which query patterns will perform best.
3. Choose Query Type: Find vs Aggregation
Prefer find queries over aggregation pipelines because find queries are simpler and easier for other developers to understand.
For Find Queries, generate responses with these fields:
filter- The query filter (required)project- Field projection (optional)sort- Sort specification (optional)skip- Number of documents to skip (optional)limit- Number of documents to return (optional)collation- Collation specification (optional)
Use Find Query when:
- Simple filtering on one or more fields
- Basic sorting and limiting
For Aggregation Pipelines, generate an array of stage objects.
Use Aggregation Pipeline when the request requires:
- Grouping or aggregation functions (sum, count, average, etc.)
- Multiple transformation stages
- Joins with other collections ($lookup)
- Array unwinding or complex array operations
4. Format Your Response
Always output queries in a JSON response structure with stringified MongoDB query syntax. The outer response must be valid JSON, while the query strings inside use MongoDB shell/Extended JSON syntax (with unquoted keys and single quotes) for readability and compatibility with MongoDB tools.
Find Query Response:
{
"query": {
"filter": "{ age: { $gte: 25 } }",
"project": "{ name: 1, age: 1, _id: 0 }",
"sort": "{ age: -1 }",
"limit": "10"
}
}
Aggregation Pipeline Response:
{
"aggregation": {
"pipeline": "[{ $match: { status: 'active' } }, { $group: { _id: '$category', total: { $sum: '$amount' } } }]"
}
}
Note the stringified format:
- ✅
"{ age: { $gte: 25 } }"(string) - ❌
{ age: { $gte: 25 } }(object)
For aggregation pipelines:
- ✅
"[{ $match: { status: 'active' } }]"(string) - ❌
[{ $match: { status: 'active' } }](array)
Best Practices
Query Quality
- Generate correct queries - Build queries that match user requirements, then check index coverage:
- Generate the query to correctly satisfy all user requirements
- After generating the query, check if existing indexes can support it
- If no appropriate index exists, mention this in your response (user may want to create one)
- Never use
$wherebecause it prevents index usage - Do not use
$textwithout a text index $exprshould only be used when necessary (use sparingly)
- Avoid redundant operators - Never add operators that are already implied by other conditions:
- Don't add
$existswhen you already have an equality or inequality check (e.g.,status: "active"orage: { $gt: 25 }already implies the field exists) - Don't add overlapping range conditions (e.g., don't use both
$gte: 0and$gt: -1) - Each condition should add meaningful filtering that isn't already covered
- Don't add
- Project only needed fields - Reduce data transfer with projections
- Add
_id: 0to the projection when_idfield is not needed
- Add
- Validate field names against the schema before using them
- Use appropriate operators - Choose the right MongoDB operator for the task:
$eq,$ne,$gt,$gte,$lt,$ltefor comparisons$in,$ninfor matching against a list of possible values (equivalent to multiple $eq/$ne conditions OR'ed together)$and,$or,$not,$norfor logical operations$regexfor case sensitive text pattern matching (prefer left-anchored patterns like/^prefix/when possible, as they can use indexes efficiently)$existsfor field existence checks (prefera: {$ne: null}toa: {$exists: true}to leverage available indexes)$typefor type matching
- Optimize array field checks - Use efficient patterns for array operations:
- To check if array is non-empty: use
"arrayField.0": {$exists: true}instead ofarrayField: {$exists: true, $type: "array", $ne: []} - Checking for the first element's existence is simpler, more readable, and more efficient than combining existence, type, and inequality checks
- For matching array elements with multiple conditions, use
$elemMatch - For array length checks, use
$sizewhen you need an exact count
- To check if array is non-empty: use
Aggregation Pipeline Quality
- Filter early - Use
$matchas early as possible to reduce documents - Project at the end - Use
$projectat the end to correctly shape returned documents to the client - Limit when possible - Add
$limitafter$sortwhen appropriate - Use indexes - Ensure
$matchand$sortstages can use indexes:- Place
$matchstages at the beginning of the pipeline - Initial
$matchand$sortstages can use indexes if they precede any stage that modifies documents - After generating
$matchfilters, check if indexes can support them - Minimize stages that transform documents before first
$match
- Place
- Optimize
$lookup- Consider denormalization for frequently joined data
Error Prevention
- Validate all field references against the schema
- Quote field names correctly - Use dot notation for nested fields
- Escape special characters in regex patterns
- Check data types - Ensure field values match field types from schema
- Geospatial coordinates - MongoDB's GeoJSON format requires longitude first, then latitude (e.g.,
[longitude, latitude]or{type: "Point", coordinates: [lng, lat]}). This is opposite to how coordinates are often written in plain English, so double-check this when generating geo queries.
Schema Analysis
When provided with sample documents, analyze:
- Field types - String, Number, Boolean, Date, ObjectId, Array, Object
- Field patterns - Required vs optional fields (check multiple samples)
- Nested structures - Objects within objects, arrays of objects
- Array elements - Homogeneous vs heterogeneous arrays
- Special types - Dates, ObjectIds, Binary data, GeoJSON
Sample Document Usage
Use sample documents to:
- Understand actual data values and ranges
- Identify field naming conventions (camelCase, snake_case, etc.)
- Detect common patterns (e.g., status enums, category values)
- Estimate cardinality for grouping operations
- Validate that your query will work with real data
Error Handling
If you cannot generate a query:
- Explain why - Missing schema, ambiguous request, impossible query
- Ask for clarification - Request more details about requirements
- Suggest alternatives - Propose different approaches if available
- Provide examples - Show similar queries that could work
Example Workflow
User Input: "Find all active users over 25 years old, sorted by registration date"
Your Process:
- Check schema for fields:
status,age,registrationDateor similar - Verify field types match the query requirements
- Generate query based on user requirements
- Check if available indexes can support the query
- Suggest creating an index if no appropriate index exists for the query filters
Generated Query:
{
"query": {
"filter": "{ status: 'active', age: { $gt: 25 } }",
"sort": "{ registrationDate: -1 }"
}
}
Size Limits
Keep requests under 5MB:
- If sample documents are too large, use fewer samples (minimum 1)
- Limit to 4 sample documents by default
- For very large documents, project only essential fields when sampling