April 9, 20267 min read

Does text-to-CAD work offline?

Almost no text-to-CAD tool works offline. The models are too large, the inference too expensive, and the vendors need their API meters running. Here's the full picture.

Quick answer

No mainstream text-to-CAD tool works offline in 2026. Zoo.dev, AdamCAD, and CADScribe all require cloud API access. The AI models are too large for local inference on consumer hardware. The closest offline option is running OpenSCAD with a local LLM, which generates code rather than direct geometry. True offline text-to-CAD doesn't exist yet.

No mainstream text-to-CAD tool works offline in 2026. Not Zoo.dev, not AdamCAD, not CADScribe, not any of the browser-based generators. Every one of them sends your prompt to a cloud server and waits for geometry to come back. If your internet drops, your text-to-CAD workflow stops. I found this out the hard way during a site visit last fall, sitting in a manufacturing client's conference room with decent coffee and terrible Wi-Fi, trying to generate a quick bracket concept to show during the meeting. The prompt sat there spinning while everyone watched. Eventually I just sketched it on a notepad like it was 2005.

The question comes up a lot, especially from people working in environments where internet access is restricted, unreliable, or forbidden. Factory floors without public Wi-Fi. Classified facilities with air-gapped networks. Field offices in places where the nearest reliable connection is a forty-minute drive. Remote workshops where the satellite internet works between weather events. For all of these, the answer is the same: text-to-CAD in its current form is a cloud service, and cloud services need clouds.

Why offline doesn't work yet#

The core problem is model size. The AI models that generate CAD geometry from text prompts are large neural networks. Zoo.dev's KittyCAD system runs on GPU clusters with significant compute resources. The models that power AdamCAD and CADScribe run on similar cloud infrastructure. These aren't lightweight algorithms you can run on a laptop CPU during a flight.

A typical large language model capable of generating competent code (which is essentially what text-to-CAD does under the hood, generating sequences of geometric operations from natural language) has tens of billions of parameters. Running a 70-billion-parameter model locally requires at minimum a workstation-class GPU with 40GB or more of VRAM, or multiple consumer GPUs. Running the smaller models that fit on a single consumer GPU (7B to 13B parameters) produces noticeably worse results, because smaller models are less capable at the complex reasoning needed to turn "flanged bracket with four M5 holes" into correct geometry.

Then there's the geometry kernel. Zoo.dev's KittyCAD is a proprietary GPU-native kernel. It's not something you can download and run locally. The kernel itself is part of the cloud service. Even if you could run the AI model locally, you'd still need a geometry kernel to execute the operations and produce a valid solid, and the best one available is a cloud service.

What each tool requires#

Zoo.dev requires an internet connection and an API key. All generation happens server-side. The web interface and the Python SDK both connect to Zoo's cloud infrastructure. No connection, no geometry. There's no offline mode, no local caching of the generation capability, and no announced plans for one.

AdamCAD runs in the browser. The generation happens on AdamCAD's servers. You need an active internet connection for every prompt. The parametric sliders work in the browser after generation, but the initial creation is fully cloud-dependent.

CADScribe is cloud-based. Same pattern: prompt goes up, geometry comes down, nothing happens without a connection.

Vondy, HP's text-to-3D tools, and every other browser-based generator follow the same architecture. The browser is just a thin client for a remote service.

CADAgent for Fusion 360 is an interesting partial exception. The add-in itself runs locally inside Fusion, and the geometry creation happens through Fusion's own local kernel. But the AI inference, the part where your text prompt gets turned into a sequence of Fusion operations, requires an Anthropic API call. So you still need internet for the AI reasoning step, even though the geometry creation step happens on your machine.

The OpenSCAD workaround#

The closest thing to offline text-to-CAD that actually works today is running OpenSCAD with a local LLM. The setup: install a local language model using Ollama or llama.cpp, run it on your machine, feed it prompts, and have it generate OpenSCAD scripts. Then run those scripts through OpenSCAD, which is free and runs locally, to produce 3D geometry. Everything stays on your machine. No network call leaves your system.

I've tested this on a workstation with an RTX 4090 running Llama 3 70B quantized. The experience is usable for simple parts. A rectangular plate with holes, a basic bracket, a cylindrical spacer. The model generates syntactically correct OpenSCAD about 70% of the time for simple geometry. For more complex prompts, the failure rate climbs quickly, and you end up debugging OpenSCAD code, which is its own kind of afternoon.

The quality gap between this local setup and what Zoo.dev produces is significant. The local model doesn't have the geometric reasoning depth of a purpose-built text-to-CAD system. It's generating code, not geometry, and the code generation quality of a local 70B model is noticeably below what you get from cloud models like Claude or GPT-4. The output is STL, not STEP, because that's what OpenSCAD produces. And the geometry is limited to what OpenSCAD's CSG (Constructive Solid Geometry) approach can express, which excludes freeform surfaces, fillets with variable radius, and other features you'd take for granted in Fusion 360.

It's a workaround, not a solution. But if you absolutely need text-to-geometry on an air-gapped machine, it's the most practical option that exists today. The self-hosted text-to-CAD post covers the setup and alternatives in more detail.

Hardware for local inference#

If you're thinking about running AI models locally for offline text-to-CAD, here's what the hardware picture looks like.

For a small model (7B to 13B parameters, quantized to 4-bit): a consumer GPU with 8-16GB VRAM. An RTX 3060 or 4060 can handle this. Generation is fast, quality is poor. These models struggle with even moderately complex geometry descriptions.

For a medium model (30B to 70B parameters, quantized): you need 24-48GB of VRAM. An RTX 3090, 4090, or an A6000 workstation GPU. Generation is slower (maybe 20-60 seconds for a response), quality is acceptable for simple parts.

For a large model (70B+ at full precision): you're looking at multiple GPUs or a professional setup with A100/H100 cards. This is server-room hardware, not desktop hardware. The quality approaches what cloud models offer, but the cost and complexity approach "just buy a server" territory.

Apple Silicon Macs with large unified memory (M2 Ultra, M3 Max/Ultra with 96GB or more) can run quantized 70B models through llama.cpp, using system memory instead of VRAM. It's slower than dedicated GPU inference but it works, and for a single-user offline setup it's surprisingly practical. I've tested it on an M3 Max with 64GB and the results for simple OpenSCAD generation are tolerable, if you're patient with the generation speed.

The bottom line: you can run a local AI model for text-to-CAD on high-end consumer hardware, but the output quality scales directly with the model size, and the model size scales directly with the hardware cost. There's no free lunch here.

When offline text-to-CAD might become practical#

Three things need to happen for real offline text-to-CAD to work.

First, local AI models need to get better at code generation in the 7B to 30B parameter range, so that a single consumer GPU can run a model capable of generating reliable CAD scripts. This is happening, slowly. Each generation of open-source models improves, and specialized fine-tuning for CAD code generation could accelerate it.

Second, an open-source geometry kernel needs to be integrated into the local pipeline so that the output is B-Rep STEP files, not just OpenSCAD STL. The pieces exist: OpenCascade is open source, build123d wraps it in Python, and projects like CAD Agent have demonstrated the architecture. What's missing is a packaged, reliable system that connects a local LLM to a B-Rep kernel with error correction and visual feedback, all running without a network connection.

Third, the hardware needs to come down. When a $1,500 workstation GPU can run a model that generates reliable geometry from text, offline text-to-CAD becomes practical for individual engineers and small firms. At current trajectories, that might be two to three years away, but hardware timelines are hard to predict.

Until all three of those pieces land, offline text-to-CAD remains a compromise: possible in a limited way, useful for simple parts, and significantly worse than what you get from cloud tools. If your work demands offline operation and cloud-quality results, you're stuck. The text-to-CAD guide covers the full range of what's available today, and the data safety post explains why some users need offline options in the first place.

My honest take: if you need offline CAD, use offline CAD. Fusion 360 with a downloaded cache, FreeCAD, SolidWorks on a laptop. Model the part yourself. It's not as fast as typing a prompt, but it works in a conference room with bad Wi-Fi, on a factory floor, or in a classified facility. Text-to-CAD is a cloud convenience, and treating it as anything else in 2026 means planning around a technology that isn't ready to meet you where you actually work.

Newsletter

New articles, product updates, and practical ideas on Text-to-CAD, AI CAD, and CAD workflows.

Why offline doesn't work yet#

What each tool requires#

The OpenSCAD workaround#

Hardware for local inference#

When offline text-to-CAD might become practical#

Get new TexoCAD thoughts in your inbox