10 min read

On-Device AI for Enterprise: Architecture and Security Model

Enterprise AI doesn't have to mean cloud AI. On-device architectures offer stronger security, simpler compliance, and genuine data sovereignty. Here's the full technical picture.


Enterprise AI has a location problem

Most enterprise AI solutions follow a simple pattern: your data goes to their cloud, their models process it, results come back. For the vendor, this is ideal -- centralized infrastructure, predictable scaling, maximum control. For the enterprise, it's a trade-off they've accepted as inevitable.

But it's not inevitable. The hardware in a modern enterprise laptop -- fast SSDs, multi-core CPUs, 16-32GB of RAM -- is more than capable of running document parsing, text extraction, and workflow orchestration. The language model inference, which does require significant compute, can be accessed via API without sending raw files. The piece that was missing was the software layer to orchestrate this.

On-device AI is the architecture where the agent runtime, file processing, and tool execution happen on the endpoint device. The cloud is used for model inference, not file storage. This post covers the architecture, security model, and enterprise implications in detail.

Architecture layers

An on-device AI system for document processing has four layers:

Layer 1: Local file access

The agent reads files directly from the local file system. No upload, no sync, no staging area. PDFs, DOCX files, XLSX spreadsheets, images -- whatever is in the project folder is accessible immediately.

This layer uses format-specific parsers. A PDF parser handles text extraction, layout analysis, and table reconstruction. A DOCX parser unpacks the OOXML archive and extracts semantic content (paragraphs, headings, lists, tables). An XLSX parser reads cell values, shared strings, and structure.

In docrew, these parsers are written in Rust and compiled into the application binary. There's no dependency on external libraries, Python environments, or system-installed tools. The parsing capability is self-contained.

Layer 2: Agent runtime

The agent runtime is the orchestration layer. It receives a user task ("extract all payment terms from these contracts"), plans the execution steps, and coordinates tools to accomplish the goal.

The runtime manages:

  • Session state: What's been done, what's pending, what the user asked for
  • Tool selection: Which tool to use for each step (file read, text extraction, code execution, model call)
  • Context management: What information the model needs for the next step
  • Error recovery: What to do when a step fails
  • Safety limits: Maximum tool calls, cost ceilings, loop detection

docrew's agent runtime is a Rust binary compiled into the Tauri desktop application. It's not a separate server process. It's not a container. It's native code running as part of the application, with the same lifecycle and security context.

Layer 3: Sandboxed execution

When the agent needs to process data -- run calculations, transform formats, generate reports -- it writes code (typically Python) and executes it in an OS-level sandbox.

The sandbox is the critical security boundary for enterprise deployment. It restricts what the executed code can do:

  • File access: Limited to the project folder. Cannot read files outside the workspace.
  • Network access: Disabled by default. Executed code cannot make network calls.
  • System access: Cannot modify system files, install software, or access other applications.
  • Process isolation: Runs as a restricted subprocess with minimal privileges.

On macOS, the sandbox uses Apple's Seatbelt framework -- the same technology that sandboxes App Store applications. On Linux, it uses bubblewrap (bwrap), the same tool that Flatpak uses for application sandboxing. These are kernel-level enforcement mechanisms, not software promises.

Layer 4: Model inference

The extracted text is sent to a language model for analysis. This is the only network-dependent layer. The text travels to the model API endpoint, the model processes it, and the response returns.

Key characteristics of this layer:

  • Text only: Raw files never leave the device. The model receives extracted text.
  • Transient: The model processes the text and returns results. No persistent storage of document content.
  • Regional: docrew routes requests to model endpoints in the user's region (EU users to EU endpoints).
  • Authenticated: Requests are authenticated and authorized through the proxy layer.

The security model

Enterprise security teams evaluate AI tools on several dimensions. Here's how on-device architecture addresses each one.

Data at rest

Cloud AI: Your documents exist on the provider's storage infrastructure. Encrypted at rest (usually AES-256), but the provider holds the keys. The encryption protects against physical theft of drives, not against the provider's own systems or employees.

On-device: Your documents exist on your endpoint device. Encrypted by the device's full-disk encryption (FileVault on macOS, BitLocker on Windows). The encryption keys are managed by the device owner, not a third party.

The security difference: on-device, the encryption is yours. You manage the keys, the policies, and the access controls. With cloud AI, you trust the provider to manage encryption correctly.

Data in transit

Cloud AI: Files are uploaded via HTTPS (TLS 1.2/1.3). The encrypted connection protects against interception, but the full file content is in transit.

On-device: Only extracted text is transmitted, via HTTPS to the model API. The volume of data in transit is smaller (text content vs. full files), and the content is less sensitive (no file metadata, embedded images, or binary data).

The security difference: smaller attack surface. Less data in transit means less exposure in the event of a TLS vulnerability or man-in-the-middle attack.

Data in use

Cloud AI: Your document content is processed on the provider's infrastructure. During processing, it exists in memory on their servers. Access controls depend on their infrastructure design.

On-device: Document parsing and text extraction happen in your process, on your hardware. Only the model inference happens remotely, and it processes text, not files.

The security difference: the processing of your actual files happens on hardware you control. The remote processing is limited to text analysis.

Insider threat

Cloud AI: Provider employees with system access could potentially view uploaded documents. Most providers implement access controls and audit logging, but the possibility exists.

On-device: No one at the AI vendor can view your documents because your documents never reach their systems. The model API processes text transiently, and text content in API logs (if any) is subject to the API provider's access controls -- but it's text, not your original documents.

Supply chain

Cloud AI: Your documents pass through the provider's full infrastructure stack: load balancers, API servers, processing pipelines, storage layers, logging systems. Each component is a potential attack vector.

On-device: The processing stack is local: your application, your OS, your file system. The supply chain is limited to the application binary (which can be verified via code signing) and the model API endpoint.

Enterprise deployment considerations

Device requirements

On-device AI doesn't require specialized hardware. The document processing -- file parsing, text extraction, code execution -- runs on standard enterprise hardware. The minimum practical configuration is:

  • Any modern x86_64 or ARM processor
  • 8GB RAM (16GB recommended for large batch operations)
  • SSD storage (for fast file reads)
  • Internet connection (for model API calls)

This is below the spec of most enterprise laptops deployed in 2024-2026. No GPU required, no high-end workstation needed.

Distribution and updates

docrew distributes as a standard desktop application. On macOS, it's a signed and notarized .app bundle. On Windows, it's an MSIX or MSI package. On Linux, it's an AppImage or .deb package.

Enterprise IT can distribute through existing software management tools (Jamf, SCCM, Intune). Updates are delivered through the application's built-in auto-updater, which verifies code signatures before applying.

Network requirements

The only outbound network connections are:

  • HTTPS to the model API proxy (for language model inference)
  • HTTPS for application update checks
  • HTTPS for authentication (Supabase auth)

No inbound connections are required. No persistent WebSocket connections. No special firewall rules beyond standard HTTPS outbound.

Authentication and access control

docrew uses Supabase authentication, supporting email/password and SSO providers. The authentication token authorizes API access and determines the user's subscription and credit balance.

For enterprise deployments, SSO integration means users authenticate with their existing corporate identity provider. No additional credentials to manage.

Audit trail

Every action the agent takes is logged locally: files read, tools executed, model calls made, results generated. This audit trail exists on the device and can be exported or forwarded to a SIEM system.

The proxy layer maintains usage logs for billing: model calls, token counts, credit deductions. These logs don't contain document content -- they contain metadata about model usage.

Data Loss Prevention integration

Because files are processed locally, existing DLP tools on the endpoint continue to function. If an enterprise has DLP policies that monitor file access, those policies apply to docrew's file reads the same way they apply to any other application.

Cloud AI tools typically bypass endpoint DLP because the file leaves the device via browser upload -- a path that DLP tools may not monitor for AI-specific content.

Comparison with air-gapped and hybrid models

On-device AI sits between fully cloud-based and fully air-gapped architectures.

Fully cloud-based: Files leave the device. All processing happens remotely. Maximum convenience, minimum data control.

On-device (hybrid): Files stay on the device. Processing happens locally. Model inference uses cloud API with text-only. Strong data control with full model capability.

Fully air-gapped: Everything runs locally, including the language model. Maximum data control, but limited to models that can run on local hardware (which are significantly less capable than cloud models for complex reasoning tasks).

For most enterprises, on-device hybrid is the optimal trade-off. It keeps files local (satisfying security and compliance requirements) while leveraging frontier cloud models (satisfying quality requirements). The text sent to the model is the minimum necessary for analysis, and it's processed transiently.

The enterprise buyer's checklist

When evaluating on-device AI for your organization:

1. Where do files go? Verify that raw files never leave the endpoint. Text extraction should be local; only extracted text should reach the cloud.

2. What's the sandbox model? Code execution should be sandboxed at the OS level, not just at the application level. Ask whether the sandbox uses kernel-level enforcement (Seatbelt, bwrap) or application-level restrictions.

3. How is the binary distributed? The application should be code-signed with a verifiable developer certificate. Updates should be signed and verified.

4. What network connections are required? The fewer outbound connections, the smaller the attack surface. Expect HTTPS to the model API, auth service, and update server -- and nothing else.

5. Where does model inference happen? Regional routing matters for data residency. Verify that your users' text content is processed in the appropriate region.

6. What audit logging exists? Both local (agent actions, file access) and remote (API usage, billing) should be logged and exportable.

7. How does it integrate with existing security tools? DLP, SIEM, EDR -- the application should work alongside your existing security stack, not require exceptions.

8. What happens offline? File reading and local processing should work without an internet connection. Model inference requires connectivity, but the application should degrade gracefully.

On-device AI isn't just a privacy feature. It's an enterprise architecture that keeps sensitive data where it belongs -- on the devices your security team already manages, under policies you already control, auditable by tools you already use.

The cloud was never the only option. It was just the first one most vendors built.

Back to all articles