13 min read

Subagents and Parallel Processing: How docrew Analyzes Faster

Subagent delegation and parallel document processing dramatically reduce analysis time. Here is how the pattern works and when it matters.


The sequential bottleneck

Most AI agents process work the way a single employee would: one thing at a time. Read document one. Analyze it. Move to document two. Analyze it. Repeat until done.

This is fine when you have one document. It becomes a problem when you have twenty.

A typical contract review takes 30 to 90 seconds per document, depending on length and complexity. The agent reads the file, extracts relevant text, sends it to the language model, receives the analysis, and formats the output. Twenty contracts at 60 seconds each is twenty minutes of sequential processing.

But here is the thing: those twenty contracts have nothing to do with each other. The analysis of contract number seven does not depend on the analysis of contract number three. They are independent tasks sharing a common instruction set. Sequential processing is not a requirement of the work -- it is a limitation of the architecture.

This is where subagents come in.

The subagent pattern

A subagent is exactly what it sounds like: a secondary agent spawned by the primary agent to handle a specific subtask. The primary agent acts as a coordinator. It understands the overall objective, breaks it into independent pieces, delegates each piece to a subagent, and merges the results when they return.

The pattern is not new in software engineering. It is the same principle behind thread pools, worker queues, and map-reduce pipelines. What makes it interesting in the context of AI agents is that the coordinator itself is an AI. It decides how to split the work, what instructions to give each worker, and how to synthesize the results.

A subagent is not a separate application or a different model. It is another instance of the same agent runtime, running in its own context, with its own conversation history and tool access. It receives a focused task -- "analyze this specific document and return these specific data points" -- and executes it independently.

The primary agent does not wait for each subagent to finish before spawning the next one. It launches them concurrently. Five subagents analyzing five documents simultaneously means the total time is roughly the time of the slowest single analysis, not the sum of all five.

How docrew uses subagents

In docrew, the subagent tool is a first-class capability of the desktop agent runtime. When the agent encounters a task that can be decomposed into independent subtasks, it spawns subagents to handle them in parallel.

The mechanics work like this:

The user asks the agent to analyze a folder of contracts and extract key terms from each one. The primary agent reads the directory listing, identifies the relevant files, and determines that each file can be analyzed independently. It then calls the subagent tool once per file, passing each subagent the file path and the extraction instructions.

Each subagent runs in its own execution context. It reads its assigned document using the local file tools -- the DOCX parser for Word documents, the XLSX parser for spreadsheets, or the PDF extraction pipeline for PDFs. It sends the extracted text to the language model with the specific analysis instructions. It receives the results and returns them to the primary agent.

The primary agent collects all subagent results, merges them into a coherent output, and presents the combined analysis to the user. If the user asked for a comparison table, the primary agent builds the table from the individual results. If they asked for a summary of findings across all documents, the primary agent synthesizes the individual analyses into a unified narrative.

The key architectural decision is that subagents run locally on the same machine. They share the same file system access, the same sandbox constraints, and the same security boundaries as the primary agent. There is no network overhead between the coordinator and the workers. The parallelism happens at the process level, not across distributed systems.

Splitting the work

The hardest part of parallel processing is not the parallelism itself. It is deciding how to split the work.

Some decompositions are obvious. Twenty files in a folder, each needing the same analysis? One subagent per file. That is straightforward.

Others require more thought. Consider this request: "Review these ten contracts and identify any clauses that conflict with each other." The individual contract analysis can be parallelized -- each subagent reads one contract and extracts all clauses. But the conflict detection cannot be parallelized because it requires comparing clauses across documents. That comparison step must happen sequentially, after all the individual analyses complete.

This is the split-merge pattern. The parallel phase extracts and analyzes. The sequential phase compares and synthesizes. The primary agent handles the sequential work; the subagents handle the parallel work.

The agent makes these decomposition decisions automatically. When a user says "analyze these contracts," the agent evaluates whether the task has independent subtasks, determines the right granularity for splitting, and decides what work to delegate versus what to handle directly.

Not every task benefits from splitting. If the user asks "summarize this one report," spawning a subagent adds overhead for no benefit. The primary agent handles it directly. Subagents are a tool in the agent's toolkit, not a default behavior.

A concrete example: twenty vendor contracts

Let's walk through a real scenario.

A procurement team has twenty vendor contracts from different suppliers. They need a standardized comparison: payment terms, liability caps, termination conditions, renewal clauses, and indemnification provisions from each contract. The contracts range from 15 to 80 pages, in a mix of PDF and DOCX formats.

Without subagents, the agent processes them sequentially. Each contract takes roughly 45 seconds: 5 seconds to read and parse the file locally, 35 seconds for the language model to analyze the extracted text, and 5 seconds to format the output. Twenty contracts at 45 seconds each is 15 minutes.

With subagents, the primary agent spawns workers to process contracts in parallel. The limiting factor becomes the language model's concurrent request capacity and the local machine's ability to parse files simultaneously.

In practice, docrew runs multiple subagents concurrently. The exact number depends on the model's throughput limits and the complexity of each document. For a batch of twenty contracts, the agent might process five to eight simultaneously, with new subagents launching as earlier ones complete.

The total processing time drops from 15 minutes to roughly 3 to 4 minutes. Not a perfect 20x speedup -- there is overhead for coordination, and the language model has throughput limits -- but a substantial improvement over sequential processing.

Once all subagents return their results, the primary agent builds the comparison table. It aligns the extracted terms across all twenty contracts, flags missing data points, and highlights outliers -- the contract with a 60-day payment term when the rest are net-30, or the one with uncapped liability when every other contract caps it.

The user gets a single, structured deliverable. They do not see the subagent orchestration. They asked a question and got a comprehensive answer, faster than they expected.

Merging results

The merge phase is where the primary agent earns its keep.

Individual subagent results are independent analyses. They do not share context. Subagent number three does not know what subagent number seven found. The primary agent is the only entity that sees all results.

This makes the merge phase more than simple concatenation. The primary agent must:

Normalize terminology. One contract might call it "termination for convenience," another "early termination," and a third "discretionary cancellation." The primary agent recognizes these as the same concept and presents them under a unified label.

Handle missing data. Some contracts might not have explicit liability caps. The subagent reports "no liability cap found." The primary agent flags this as a risk item in the comparison, rather than leaving a blank cell.

Detect conflicts and outliers. With all results in front of it, the primary agent can identify patterns and exceptions. "Nineteen contracts have 30-day payment terms. Contract with Vendor X specifies 90 days." This cross-document insight is impossible at the subagent level.

Produce coherent output. The final deliverable needs to read like a single analysis, not twenty stapled-together summaries. The primary agent weaves individual results into a unified narrative or structured table.

The merge phase is inherently sequential -- it requires seeing all results. But because the analysis phase ran in parallel, the overall time is still dramatically shorter than fully sequential processing.

Speedup characteristics and diminishing returns

Parallel processing does not scale linearly forever. There are real constraints.

Language model throughput. The language model can handle a finite number of concurrent requests. Beyond that limit, requests queue on the server side. Adding more subagents past the concurrency ceiling does not reduce total time -- it just moves the queue from the client to the server.

Local file parsing. Reading and parsing documents happens on the user's machine. File I/O is fast, but parsing complex DOCX or PDF files consumes CPU. On a modern machine, parsing ten documents simultaneously is fine. Parsing a hundred simultaneously might cause slowdowns.

Coordination overhead. Each subagent has startup cost: initializing the execution context, loading the task instructions, and establishing the language model connection. For very small tasks (analyzing a two-paragraph document), the coordination overhead might exceed the time saved by parallelization.

Memory. Each subagent maintains its own conversation context. Twenty subagents each holding a 50-page document in context consumes more memory than one agent processing them sequentially. For typical desktop machines and typical document sizes, this is not a problem. For extreme cases -- hundreds of large documents -- it could become one.

The practical sweet spot for most document analysis tasks is 5 to 10 concurrent subagents. Beyond that, the gains diminish and the overhead increases. The agent runtime manages this automatically, throttling concurrency based on available resources and model capacity.

The speedup curve looks roughly logarithmic. Going from 1 to 5 concurrent subagents provides a near-5x improvement. Going from 5 to 10 provides maybe a 1.5x additional improvement. Going from 10 to 20 provides diminishing returns with increasing resource pressure.

When parallel processing matters

Parallel processing is not universally beneficial. It matters most in specific scenarios.

Batch document analysis. The clearest use case. You have N documents, each needing the same or similar analysis. The work decomposes naturally into N independent tasks. This is where subagents deliver the most dramatic time savings.

Multi-source research. You need to gather information from multiple files to answer a complex question. The agent can dispatch subagents to search different files simultaneously, then synthesize the findings.

Cross-referencing. Comparing data across multiple spreadsheets or reports. Each subagent extracts the relevant data from its assigned file, and the primary agent performs the comparison.

Due diligence reviews. Legal or financial due diligence involves reviewing dozens or hundreds of documents against a checklist. Each document review is independent. Parallel processing compresses the review timeline from days to hours.

When parallel processing does not matter

Some tasks are inherently sequential, and forcing parallelism adds complexity without benefit.

Single document analysis. If you are analyzing one contract, one report, or one spreadsheet, there is nothing to parallelize. The agent reads it, analyzes it, and responds. Subagents would add overhead for zero gain.

Dependent analysis chains. "Read this report, then based on its findings, analyze the data in this spreadsheet, then write a summary." Each step depends on the previous step's output. You cannot parallelize a chain of dependencies.

Small tasks. Extracting one data point from one file takes a few seconds. The coordination overhead of spawning a subagent exceeds the analysis time. The agent handles it directly.

Conversational interaction. When the user is iterating -- asking follow-up questions, refining the analysis, exploring results -- the interaction is naturally sequential. The agent processes one request at a time because that is how the conversation flows.

The agent makes this assessment automatically. It does not spawn subagents for a single-file analysis any more than a project manager would convene a team meeting to answer one email.

The architectural advantage of local execution

Cloud-based AI agents face a fundamental challenge with parallel processing: network latency multiplied by concurrency.

Each subagent in a cloud system needs to download the file, process it remotely, and return the result. Five concurrent subagents means five concurrent file transfers. The upload bandwidth becomes a bottleneck, especially with large documents.

Desktop agents avoid this entirely. The files are already on the local disk. Subagents read them directly from the file system. The only network traffic is between the agent and the language model -- the extracted text, which is orders of magnitude smaller than the original files.

For a 50-page DOCX file, the raw file might be 2 MB. The extracted text is 100 KB. The agent sends 100 KB to the model, not 2 MB to a cloud processing service. Multiply that by twenty files running in parallel, and the bandwidth difference is significant.

This is why docrew's architecture -- local file processing with remote language model inference -- is particularly well-suited to parallel document analysis. The expensive part (file I/O) is local and fast. The latency-sensitive part (model inference) uses minimal bandwidth because only extracted text crosses the network.

Beyond document analysis

Subagents are not limited to document processing. The pattern applies to any task that can be decomposed into independent subtasks.

Code analysis. Reviewing a codebase for security vulnerabilities? Each source file can be analyzed independently by a subagent. The primary agent synthesizes the findings into a security report.

Data processing. Processing multiple CSV exports from different systems? Each subagent handles one file, extracts and transforms the data, and the primary agent merges the results.

Research tasks. Gathering information from multiple sources to answer a complex question? Each subagent handles one source, and the primary agent combines the insights.

The subagent pattern is a general-purpose parallelism tool. Documents are simply the most common and most obvious application.

The broader point

Sequential processing is a legacy of single-threaded thinking. When the only tool was a human reading documents one at a time, sequential was the only option.

AI agents don't have that constraint. They can spawn workers, delegate tasks, and process multiple streams simultaneously. The question is not whether parallel processing is possible -- it is whether the architecture supports it.

Most AI chat interfaces do not. They are built around a single conversation thread, processing one message at a time. Parallel execution is not part of their design.

Agent architectures that support subagent delegation treat parallelism as a first-class capability. The agent decides when to parallelize, how to split the work, and how to merge the results. The user does not manage the concurrency. They ask for the analysis and receive the output.

The twenty-contract review that takes fifteen minutes sequentially and four minutes with subagents is not just a speed improvement. It changes the calculus of what is worth doing. Tasks that were impractical due to time constraints become routine. Batch analyses that would take an afternoon are completed during a coffee break.

That is the promise of the subagent pattern: not just faster processing, but a shift in what you can reasonably ask an AI agent to do.

Back to all articles