From jarvis-onshape-mcp
Decomposes engineering reference images into structured feature trees for Onshape CAD modeling. Activates automatically on image uploads with modeling requests.
npx claudepluginhub reshefelisha/jarvis-onshape-mcp --plugin jarvis-onshape-mcpThis skill uses the workspace's default tool permissions.
When the user shares an engineering reference image and asks you to build
Guides Onshape CAD operations via onshape-mcp plugin with protocols for render/entity workflows, units/coordinates, FeatureScript, iteration, and gotchas like REMOVE-on-face auto-flip. Load before any Onshape builds.
Analyzes images with MiniMax vision tool for description, OCR, text extraction, UI mockup review, chart data parsing, diagrams. Auto-triggers on image shares or analysis requests.
Generates 3D CAD models from text prompts using AI coding agents with build123d/OpenCascade. Exports STEP/STL/URDF and previews in local React/Vite CAD Explorer viewer.
Share bugs, ideas, or general feedback.
When the user shares an engineering reference image and asks you to build
it, your first job is NOT to start create_document. It is to produce a
rigorous, structured feature tree the user can sanity-check and the build
phase can execute against.
Why this matters: vision-to-CAD is the hard part of this loop, not CAD-to-CAD. If you skim the image and jump to building, you'll mis-read features (boss → pocket, complex pill outline → rounded rectangle, missed callouts) and burn a long iteration loop fixing errors you could have caught in 30 seconds of careful looking. A good decomposition makes the build trivial — show it to the user, get a quick confirmation or correction, then build with confidence.
mcp__onshape__load_local_image(imagePath) — cache an image (typically
the user's reference, on disk) at native resolution.mcp__onshape__crop_image(imageId, x1, y1, x2, y2) — zoom into a region
of any cached image. Crops are independently re-loadable; use them
liberally to read small text or count features.A single structured response in this exact format:
## OVERVIEW
One sentence: what IS this part? (bracket / plate / flange / housing / etc.)
## ENVELOPE
Approximate overall dimensions in mm if readable from callouts. State each
axis: X_length × Y_width × Z_height. If unreadable, say "UNKNOWN" and the
build phase will need a reasonable default or ask the user.
## FEATURE TREE
F1: <short name>
type: base-plate | boss | through-hole | blind-hole | pocket | slot | fillet | chamfer | counterbore | countersink | shell | rib | taper | thread | other
role: primary | secondary | subtractive | cosmetic
size: approximate mm (diameter, length×width, radius, etc.)
position: fraction of envelope OR relative to another feature
face: which face of the part (top / bottom / front / back / left / right / +Z-face-of-F3 / etc.)
orientation: axis direction (e.g. "axis along +Z")
count: 1 if single, N if pattern (e.g. 4 for a 4-corner bolt pattern)
dim_source: drawing_callout | render_inferred
notes: anything unusual — tolerance, finish, special constraint
F2: ...
...
## RELATIONSHIPS
Which features are subtractive-on-top-of which, which are patterned from which.
List them as one-liners: "F4 (through-hole) is cut INTO F1 (base-plate)."
Critically: when one feature's outline is *derived* from another's
silhouette (e.g. "the inset pocket follows the pill outline minus the two
holes, offset inward 10 mm"), say so explicitly. Derived outlines are the
single biggest class of misread feature in single-image-to-CAD.
## UNCERTAINTIES
Anything you weren't sure about. Be explicit — list what you'd want the
user to confirm before you commit to a build.
After producing the structured response, briefly check with the user: "Does this match what you intended? Any corrections before I start building?" The user has more context than the image alone — let them fix your read before you spend turns building wrong.
Before every non-trivial tool call (crop_image, load_local_image), emit a short plan text (1-3 sentences, plain assistant output) saying WHY and WHAT YOU EXPECT TO SEE. Example:
"About to crop the top-left quadrant of the drawing — that's where the dimension callouts for the main bolt circle usually live on ASME title sheets. Expecting to see two Ø dimensions."
The observer watching the run uses these thought lines to follow your reasoning. Don't let your only visible output be tool-call JSON.
After each crop, also say in 1-2 sentences what you actually saw before moving on. If the crop didn't show what you expected, name the surprise explicitly — that's valuable signal.
Mandatory steps, in order:
load_local_image(imagePath=<path>) so you get a cached image_id.
Scan the full image for distinct features. Count them mentally.
If you count ≤ 3 features on a part that fills most of the frame, you're
missing things — non-trivial engineering parts have 6–12+ distinct features.crop_image(imageId=<id>, x1,y1,x2,y2) to zoom into its region at native
resolution. State what you see in the crop:
Ø25 mean diameter 25, R3
mean radius 3, 4X prefix means "this callout applies to 4 instances".In this phase, the only tools you should reach for are load_local_image
and crop_image (plus reading static files). Don't call build tools yet —
finish describing first, then transition to the building phase once the
user confirms the spec.
35.2,
that's the dimension. Don't estimate from pixels.Budget: 15–25 turns total. You have time to crop several regions. Don't rush, don't dawdle.
Output quality standard: a competent CAD designer reading only your output (not the original images) should be able to build the part. If your output lacks a size, position, or role for a feature, the downstream agent will guess — and may guess wrong. Be specific.