By Pratik Agarwal, Director of Onboard Autonomy
Achieving safe, scalable autonomy is existential for the AV industry. The architectural choices developers make—modular pipelines, pure end-to-end learning, or a structured middle path—shape everything from data efficiency and interpretability to safety validation and public trust. Compound AI is emerging as an architecture that balances those demands: shaping autonomy by blending data-driven scale with safety and interpretability. While here we discuss our approach to solving 99% of everyday autonomous driving, in a related post, we also discuss research and emerging technologies to solve the long tail required for full general autonomy.
The autonomy architecture dilemma
Building systems that can drive at scale, explain their decisions, and meet stringent safety bars means balancing three tensions: data efficiency (how much data you need to reach a given performance level), interpretability (how well you can inspect and debug behavior), and safety performance (how well you can validate and certify the system). Historically, no single approach has delivered all three. That’s why the industry is converging on a new paradigm—Compound AI—that keeps the scalable benefits of end-to-end learning while reintroducing structure where it matters most.
The autonomy landscape: modular, end-to-end, and the gap
Three main paradigms define the recent evolution of autonomous driving stacks.
Modular systems
By Pratik Agarwal, Director of Onboard Autonomy
Achieving safe, scalable autonomy is existential for the AV industry. The architectural choices developers make in terms of modular pipelines, pure end-to-end learning, or a hybrid middle path shape everything from data efficiency and interpretability to safety validation and public trust. Compound AI is emerging as an architecture that balances those demands: shaping autonomy by blending data-driven scale with safety and interpretability. While here we discuss our approach to solving 99% of everyday autonomous driving, in a related post, we also discuss research and emerging technologies to solve the long tail required for full general autonomy.
The autonomy architecture dilemma
Building systems that can drive at scale, explain their decisions, and meet stringent safety bars means balancing three tensions: data efficiency (how much data you need to reach a given performance level), interpretability (how well you can inspect and debug behavior), and safety performance (how well you can validate and certify the system). Historically, no single approach has delivered all three. That’s why the industry is converging on a new Compound AI paradigm that keeps the scalable benefits of end-to-end learning while reintroducing structure where it matters most.
The autonomy landscape: modular, end-to-end, and the gap
Three main paradigms define the recent evolution of autonomous driving stacks.
Modular systems
Classical autonomy stacks split the problem into explicit stages: perception and prediction (what’s in the scene and explicit prediction of the scene), a motion plan (where to go), and control (how to actuate). Each stage can be introspected and validated on its own.
Modular designs have low data requirements, making it feasible even when large datasets are unavailable. Its introspection is straightforward, as the intermediate representations are interpretable and easier to analyze. Validation is also relatively simple, since individual components can be tested at the module level. In terms of safety assurance, the system is strong because it relies on certifiable modules that can be independently verified. However, its generalizability is limited, meaning scalability and the ability to generalize new or broader scenarios and expanding to new domains and long-tail scenarios remain relatively low.
End-to-end systems
Pure end-to-end systems map sensors directly to control (or trajectory) with a single learned model. There is no information bottleneck—just a black box that scales with data.
The pure end-to-end approach offers high generalizability, as it can scale effectively with large volumes of unlabeled trajectory data and avoids the traditional perception-to-behavior bottleneck. However, it has high data requirements, relying heavily on large amounts of training data to perform effectively. Introspection is difficult because the internal representations are often opaque, making it challenging to interpret how decisions are formed. Validation is also difficult, as the system lacks clear component boundaries, which complicates systematic testing and debugging. Consequently, safety assurance is challenging under current validation frameworks, since traditional methods for verifying system reliability are harder to apply.
End-to-end with safety guardrails
A middle step adds constraints (e.g., physics, rules of the road) around an end-to-end model. This improves safety relative to pure end-to-end but does not fully address introspection, validation difficulty, or the high data required to meet performance bars.
However, introspection, validation, and safety assurance remain difficult. The internal processes of the system are still challenging to interpret, which complicates efforts to understand how decisions are made. This lack of transparency also makes validation more complex and poses challenges for ensuring safety within existing evaluation and verification frameworks.
To address these gaps, we need an architecture that keeps end-to-end-style scalability, while restoring sufficient structure for safety, interpretability, and a higher performance floor with less data. That architecture is Compound AI.
Compound AI
Compound AI keeps an end-to-end learning that scales but explicitly adds introspection, constraints, basic physics, and rules of the road into the loop. In practice, that means:
So, it's the same continuous learning story as end-to-end, but with structure where it matters for safety and interpretability. In short, it blends data demand while maintaining safety and scalability.
How structure and learning work together
Compound AI is an integrated design where structure and learning reinforce each other at runtime and in training.
The trajectory generation receives both perception decoder inputs (e.g., decoded tracks and map) and embeddings. Reinforcement learning (RL) or other cost/reward terms use perception for train-time alignment—e.g., rewarding progress and safety, or penalizing collisions and boundary violations. This raises the floor and keeps behavior aligned with explicit rules while still learning from data.
Training can be extended further by incorporating world models and vision-language-action (VLA) style models: world models for prediction and rollout; VLA for grounding high-level goals and language in perception and action. Modern reasoning models and multimodal foundation models can be leveraged in training (e.g., for better scene understanding, planning, or distillation) to push the performance ceiling higher while keeping the same Compound AI structure at deployment.
Why Compound AI is different from modular and end-to-end
Advantages of Compound AI
One generalized system: Distilling strategy into product lines
A natural question after adopting a Compound AI architecture is how it maps to a product portfolio—from entry-level Advanced Driver-Assistance Systems (ADAS) through eyes-off and mind-off autonomy products. The answer is to treat the stack as one generalized system and distill all product lines from it, rather than building separate systems per tier or use case.
The strategy is to distill all tiers of ADAS and Autonomous Driving (AD) products from the generalized system:
In practice, the same Compound AI architecture—sensor encoders, shared backbone, perception decoder, trajectory generation, and guardrails—can power everything from basic ADAS to driverless personal autonomous vehicles (PAVs). What changes per product is the level of capability exposed, the ODD, and the validation and safety bar, not the core architecture.
Compound AI is the state-of-the-art architecture for autonomy that keeps the scalability of end-to-end learning while learning and promoting safety, interpretability, and a higher performance floor. It enables faster iteration through structure and component-level work and continued scaling with data. By blending data demand with explicit structure—introspection, constraints, physics, and rules of the road—Compound AI supports deployment at scale with explainable, validated behavior intended to achieve stronger safety.