March 4, 20268 min read

LLM CAD generation: how large language models create geometry

Large language models can write CAD code, generate CAD operation sequences, and sometimes produce actual usable geometry. Here's how they do it and where they fall apart.

Quick answer

LLMs generate CAD geometry through three approaches: writing CAD scripting code (OpenSCAD, FreeCAD Python), generating CAD operation sequences (sketch→extrude→fillet), or driving CAD APIs through function calling. Code generation works best because LLMs understand programming syntax. Direct geometry generation requires specialized fine-tuning like the Text2CAD model.

I was sitting in front of Claude at about eleven at night, trying to get it to generate a FreeCAD Python script for a simple motor mounting plate. Four holes in a rectangular pattern, a center bore, some countersinks. The kind of part I could model in Fusion 360 in six minutes without thinking about it. Claude's first script used a FreeCAD API method that doesn't exist. The second script used the right methods but put the holes on the wrong face. The third script worked but forgot the countersinks. The fourth script added the countersinks in the wrong coordinate system. The fifth script was perfect. It took about forty minutes, two cups of tea, and enough frustrated backspace to wear out a key, but the geometry rendered correctly and I could export a STEP file.

That experience is a miniature version of the entire LLM CAD generation story: language models can produce geometry. The path from prompt to usable output is just a lot messier than the demos suggest. Understanding how LLMs actually create CAD geometry, the specific mechanisms, the failure modes, the architectural choices, helps explain both why the technology works at all and why it breaks in the ways it does.

Three approaches to the same problem#

LLMs don't understand geometry the way a CAD kernel does. They don't have a spatial model. They don't reason about topology, or B-Rep faces, or surface normals. What they do have is an extremely good understanding of sequences, patterns, and code syntax. Every approach to LLM CAD generation exploits that strength, and the differences between approaches come down to what kind of sequence the LLM generates.

The first approach is code generation. The LLM writes a script in a CAD scripting language, OpenSCAD, CadQuery Python, FreeCAD Python, or Fusion 360's API, and a separate program executes the script to produce geometry. The LLM never touches the geometry directly. It writes instructions. A geometric kernel follows them.

The second approach is operation sequence generation. The LLM generates a structured sequence of CAD operations: create sketch on XY plane, draw rectangle with dimensions, extrude by 20mm, create sketch on top face, draw circle at center, cut-extrude through all. This sequence gets parsed and executed by a CAD engine or a custom interpreter. The Text2CAD model works this way, generating sketch-and-extrude sequences from a fine-tuned transformer.

The third approach is API driving through function calling or tool use. The LLM connects to a running CAD application via an API bridge (typically MCP, the Model Context Protocol) and issues commands one at a time, receiving feedback between each step. CADAgent and the Fusion 360 MCP bridges work this way. The LLM isn't generating the full sequence in advance. It's interacting with the CAD tool in real time, seeing results, and adjusting.

Each approach has different strengths, different failure modes, and different implications for the quality of what comes out the other end. The how text-to-CAD works post covers the conceptual pipeline. This post is about the mechanics, the places where the mechanism matters for the output.

Code generation: the one that works best#

The most reliable LLM CAD generation in 2026 is code generation. This is partly because LLMs have been trained on enormous amounts of programming data, and partly because CAD scripting languages are constrained enough that the probability of generating valid syntax is high.

OpenSCAD is the sweet spot. Its scripting language is small, well-documented, and deterministic. A cube([30, 20, 10]) always produces the same box. The language has clear error messages. The rendering is fast.

Several projects have formalized pipelines around this. PromptSCAD uses DeepSeek to generate OpenSCAD code and renders in-browser. The OpenSCAD MCP Server gives the LLM visual feedback. For simple to moderate parts, these pipelines produce usable geometry more reliably than any other LLM-based approach.

CadQuery Python is the next step up. CadQuery wraps OpenCascade and produces real B-Rep geometry, STEP-exportable. The API is larger and less forgiving, so scripts fail more often. But when they work, the output is manufacturing-grade. Recent research projects like FutureCAD and CADSmith use CadQuery as their target language, combining code generation with validation loops where one agent generates code, another checks dimensional accuracy, and a vision model evaluates the result visually.

The pattern across all of these: the LLM generates text (code), a separate system interprets the text (a compiler or runtime), and a proper geometric kernel produces the geometry. The LLM never reasons about geometry directly. It generates instructions in a language it was trained on, and the geometry is a downstream consequence.

Why this works: LLMs are good at code. They've been trained on millions of code examples. OpenSCAD and Python are in the training data. The mapping from natural language description to code is the kind of task transformers handle well, translating from one structured sequence to another.

Why it breaks: the LLM doesn't have spatial reasoning. It can write translate([10, 0, 0]) without understanding that 10mm to the right means the feature will overlap with an existing wall. It can generate a boolean subtraction that produces invalid geometry without knowing the result is non-manifold. Every failure where the code is syntactically correct but geometrically wrong traces back to this: the model understands the language, not the space. And in CAD, the space is what matters.

Operation sequence generation: the research approach#

Instead of writing code in an existing language, some systems train a model to generate CAD operations directly. The model outputs a structured sequence, something like: create_sketch(plane=XY), draw_line(0,0,50,0), draw_line(50,0,50,30), close_sketch(), extrude(distance=10). A custom interpreter parses this sequence and builds the geometry.

The Text2CAD model from NeurIPS 2024 is the most prominent example: a transformer fine-tuned on the DeepCAD dataset of roughly 178,000 parametric CAD models. Given a text prompt, it generates a sequence of sketch-and-extrude operations that an interpreter converts to geometry. NURBGen takes a different approach, generating NURBS surface parameters as structured output, directly convertible to B-Rep.

The advantage: the model learns domain-specific patterns. Text2CAD has learned that brackets tend to have certain proportions, that holes appear in regular patterns. The disadvantage: the training data bottleneck. DeepCAD has 178,000 models. Image generation models train on billions. The gap shows in the output: only simple prismatic shapes, nothing complex. Most real-world CAD data is proprietary, and that data bottleneck is the single biggest obstacle to better LLM CAD generation.

API driving: the real-time approach#

The third approach doesn't generate a complete sequence up front. Instead, the LLM connects to a running CAD application and issues commands one at a time, receiving feedback after each step. This is how CADAgent works with Fusion 360, and how the various MCP bridge projects connect language models to CAD tools.

The workflow looks like this: the LLM says "create a sketch on the XY plane," the CAD tool does it and reports success. The LLM says "draw a rectangle, 50mm by 30mm, centered at the origin," the CAD tool confirms. Each step includes feedback, often a screenshot or model state, so the LLM can adjust. If a fillet fails, the AI can try a different radius. If a sketch lands on the wrong plane, the AI can delete it and start over.

This iterative process is closer to how a human uses a CAD tool, and it produces more reliable results for complex models than generating the entire sequence blind.

The disadvantages: it's slow (each operation requires an API round trip) and expensive (each round trip costs tokens). It's also dependent on the quality of the CAD tool's API. FreeCAD's API, for example, is extensive but inconsistent. A wrong parameter type fails silently. The feedback loop helps, but it doesn't solve the gap between understanding syntax and understanding geometry.

Where this all breaks down#

Across all three approaches, the failure modes cluster around the same issues.

Spatial reasoning. LLMs don't have it. They generate coordinates and transforms from learned patterns, but they don't understand that two features will interfere, that a wall is too thin to machine, or that a chamfer will remove material needed for a mating surface. Every approach compensates differently: vision models, screenshots, spatial training data. The compensation works for simple parts and breaks down as complexity increases.

Manufacturing awareness. No LLM CAD generation system understands manufacturing constraints. The AI generates geometry in a mathematical vacuum. It doesn't know about draft angles, tool access, or minimum wall thickness. A human designer carries these constraints in their head. An LLM doesn't know they exist unless you put them in the prompt, and even then it applies them inconsistently.

Dimensional precision. LLMs produce the most likely next token, not the geometrically correct next dimension. Ask for a hole at 25.4mm from the edge and you might get 25mm or 26mm. For concept models, this doesn't matter. For production parts, it's the difference between a hole that aligns and one that doesn't.

Where this is going#

The most promising direction isn't any single approach. It's the combination.

CADSmith and FutureCAD point toward the likely architecture: an LLM generates CadQuery code, a geometric kernel executes and measures it, a validation agent checks against dimensional requirements, and the system iterates until the geometry passes. Code generation provides kernel reliability. Validation loops compensate for the LLM's lack of spatial reasoning.

The practical implication: LLM-generated geometry will get more reliable, not because the LLMs understand space, but because the validation systems improve. The LLM remains a text machine generating text instructions. The geometric validity comes from the kernels and feedback loops wrapped around it.

For now, if you want to use LLM CAD generation in your work, code generation via OpenSCAD or CadQuery is the most reliable path. The text-to-CAD open source post covers the tools. If you want the convenience of a polished interface, Zoo.dev wraps the whole pipeline into a single prompt box. If you want parametric output inside Fusion 360, CADAgent uses the API-driving approach with real-time feedback.

And if you want to understand the research foundations, the Text2CAD paper and the how text-to-CAD works post lay it out. The technology is real. The geometry it produces is getting better. The gap between "generated" and "production-ready" is still wide, and closing it is going to take better validation systems more than better language models. The LLMs already know how to write the code. They just don't know yet whether the code they wrote makes something you can actually build.

Newsletter

New articles, product updates, and practical ideas on Text-to-CAD, AI CAD, and CAD workflows.