Autonomous vehicles (AVs) rely on complex software architectures to process sensor data, make real-time decisions, and control vehicle motion. These tech stacks are designed to handle perception, planning, and actuation, ensuring the vehicle can navigate safely in diverse environments. In this blog, we'll explore the core components of an AV tech stack and then transition into our hands-on project—building a Finite State Machine (FSM) in ROS 2 to simulate decision-making in an autonomous system.
A modern autonomous vehicle stack consists of several key components:
At the heart of an autonomous vehicle lies the central computer, which processes vast amounts of data from sensors, nodes, and various vehicle subsystems in real time. This computer acts as the brain of the AV, integrating perception, planning, control, and connectivity into a single cohesive system.
Integrates real-time feeds from LiDAR, cameras, radar, and GPS.
Uses Kalman filters, deep learning models, and SLAM to extract meaningful insights.
Identifies obstacles, lane markings, pedestrians, and traffic signals.
Implements hierarchical decision layers, from behavior planning (e.g., overtaking, lane changes) to motion planning (smooth path execution).
Uses Finite State Machines (FSMs), reinforcement learning, and deep neural networks to optimize vehicle behavior.
Converts high-level decisions into precise control signals for acceleration, braking, and steering.
Ensures low-latency execution to maintain real-time responsiveness.
Manages ROS 2 nodes, orchestrating message passing between subsystems.
Implements fault tolerance and redundancy to prevent failures from affecting vehicle operation.
Compute Hardware: High-performance GPUs, TPUs, FPGAs, or specialized automotive AI chips (like NVIDIA Orin, Intel Mobileye, Tesla’s FSD computer).
Real-Time OS: Ensures low-latency processing using RTOS (Real-Time Operating Systems) or Linux-based ROS 2 distributions.
AI & ML Frameworks: TensorFlow, PyTorch, OpenCV, and YOLO for vision-based tasks.
Tesla's FSD Computer: Custom AI chips handle camera-based perception and planning.
Waymo’s AI Stack: Uses multiple GPUs and FPGAs for sensor fusion and decision-making.
NVIDIA DRIVE AGX: A full-stack AI computing platform for AVs.
I am interested in Rivian as a company so I did a little research on their plans.
Currently, Rivian's Gen 2 vehicles come equipped with the Rivian Autonomy Platform, which offers features like Highway Assist, Adaptive Cruise Control, and Lane Change Assist. These systems require continuous driver attention and control. Source
The upcoming hands-free system aims to allow drivers to remove their hands from the steering wheel on certain highways, similar to systems like Ford's BlueCruise and GM's Super Cruise. The subsequent "eyes-off" system, targeted for 2026, aspires to enable drivers to divert their attention from the road under specific conditions, aligning with Level 3 autonomous driving capabilities.
I was not familiar with the levels of autonomy prior to this article. Level 3 is defined as having these key features
Conditional Automation – The system can handle driving tasks, including acceleration, braking, and steering in well-defined environments (like highways).
Hands-Off & Eyes-Off (Temporarily) – The driver can take their hands off the wheel and even divert their attention from the road, but must be ready to intervene if the system asks.
Situational Awareness Required – The system monitors its environment and decides when it needs the driver to take over.
Restricted Operating Domain – Works only in specific conditions (e.g., clear weather, mapped highways, or designated urban zones).
This article describes all 6 levels of autonomous driving defined by the EPA.
Tesla’s autonomy stack is powered by deep learning models that process video feeds from the camera system in real-time. The key components include:
Tesla Neural Networks → AI models trained on millions of real-world driving miles to improve perception, path planning, and decision-making.
Full Self-Driving (FSD) Beta → A driver-assist system that provides lane navigation, automatic lane changes, and recognition of traffic signs and lights.
Dojo Supercomputer → Tesla’s custom-built AI training system designed to process massive datasets for self-driving improvements.
Unlike competitors like Waymo and Cruise, Tesla rejects LiDAR in favor of vision-based autonomy. Elon Musk believes LiDAR is unnecessary and that self-driving systems should rely on camera-based AI, just as human drivers do.
✅ No expensive LiDAR sensors, reducing production costs. ✅ Vision-based AI mimics human perception. ✅ Constant improvements via AI model updates.
⚠️ Depth perception is harder without radar. ⚠️ Struggles in fog, heavy rain, and low-visibility conditions. ⚠️ Requires massive amounts of real-world training data.
Cameras: 8 cameras (for 360-degree vision)
Ultrasonic Sensors: 12 ultrasonic sensors (for close-range object detection)
Radar: 1 front-facing radar (for detecting vehicles and obstacles at long range)
GPS & IMU: Used for localization and movement tracking
Cameras: 8 cameras (primary sensing method)
No Radar: Tesla removed radar sensors in favor of vision-based processing.
No Ultrasonic Sensors: Newer Teslas no longer include ultrasonic sensors, relying solely on camera-based perception.
GPS & IMU: Still included for localization and movement tracking.
Main Forward Camera (High-resolution)
Wide-Angle Forward Camera (Detects pedestrians and nearby objects)
Narrow Forward Camera (Long-range detection)
Left and Right Side Cameras (Monitor adjacent lanes for lane changes and objects)
Rearward-Facing Side Cameras (Detects vehicles in blind spots)
Tesla continues to refine its Full Self-Driving (FSD) system, aiming for Level 4/5 autonomy where the vehicle no longer requires human supervision. However, regulatory challenges, AI safety concerns, and unpredictable edge cases remain obstacles before Tesla can achieve full autonomy.
So if you have not heard of Waymo they are smart taxis that currently operate in certain US cities. They are fully self driving and I have heard good things. Lets compare them to Tesla and see how they line up.
Tesla and Waymo both aim to achieve full self-driving capabilities, but they take radically different approaches in their hardware, software, and philosophy of autonomy. While Tesla relies solely on vision-based AI, Waymo takes a sensor fusion approach using LiDAR, radar, and cameras to navigate the world.
Yes, Waymo is currently operating fully self-driving vehicles (Level 4 autonomy) in specific areas, such as Phoenix and San Francisco. Unlike Tesla’s driver-assist system (FSD Beta), Waymo’s robotaxis do not require human supervision and can operate without a driver in defined geofenced locations.
Waymo is "geofenced" → It works in pre-mapped areas but can't drive everywhere.
Tesla is "generalized" → It aims to work anywhere but still requires human supervision.
Tesla believes that a camera-only system (Tesla Vision) trained on real-world data will eventually surpass LiDAR-based autonomy. However, Tesla's FSD is still in beta and requires human intervention.
Waymo, on the other hand, heavily relies on LiDAR (Light Detection and Ranging), radar, and HD maps to create a detailed 3D representation of the world. This allows Waymo cars to drive without any human input in geofenced locations.
Elon Musk has publicly stated:
"LiDAR is a crutch. Anyone relying on LiDAR is doomed."
Tesla’s philosophy is that humans drive using vision alone, so AI should be able to do the same with deep learning and cameras.
Waymo argues that LiDAR provides precise depth perception and is necessary for safety-critical scenarios. Unlike cameras, LiDAR works in total darkness and poor weather.
✔ Waymo is fully self-driving today → But only in controlled, mapped areas. ✔ Tesla has broader potential → But still needs human oversight.
Waymo operates robotaxis in cities (Phoenix, SF, and soon LA). Tesla’s FSD isn't truly self-driving yet—it requires drivers to keep their hands on the wheel.
Waymo will expand to more cities, but it will take time to scale.
Tesla aims for full autonomy anywhere, but it must solve the edge cases and improve vision-based AI.
So in an effort to learn more about the systems used in autonomous vehicles I have started building a finite state machine simulation with Python. It uses a docker container to stand up three nodes that use the pub/sub pattern to speak to each other via the Ros2 Python client library.
FSMs are crucial in AV decision-making, helping vehicles transition between states like "Moving," "Stopping," and "Slowing Down." We built a ROS 2-based FSM that simulates this logic.
Our FSM-based ROS 2 system consists of:
True
or False
).MOVE
or STOP
.To run the project, follow these steps:
# Build and run the Docker container
cd finite-state-machine
docker build -t ros2_fsm_demo -f docker/Dockerfile .
docker run -it --rm ros2_fsm_demo
Then, open multiple terminals and start each ROS 2 node:
ros2 run fsm_demo sensor_simulator # Terminal 1
ros2 run fsm_demo planner_fsm # Terminal 2
ros2 run fsm_demo actuator_logger # Terminal 3
ros2 run fsm_demo brake_controller # Terminal 4
All the source code for this project is available here: GitHub Repository
Building a Finite State Machine (FSM) in ROS 2 provides a simplified yet insightful representation of how decision-making works in real autonomous vehicles. While our Python project only simulates basic states like "Moving" and "Stopping", real-world AVs rely on much more complex state machines, deep learning models, and sensor fusion techniques to navigate unpredictable environments.
In actual autonomous vehicles, the FSM-like logic is part of a broader system that integrates real-time sensor data, behavior planning, and vehicle control. Instead of simple boolean obstacle detection, AVs analyze millions of data points per second from LiDAR, cameras, and radar to make informed decisions. The central computing unit processes this data, much like our ROS 2 nodes communicate through publish-subscribe topics, ensuring that all components work in harmony.
Our project serves as a starting point for understanding decision-making in autonomous vehicles. Expanding this simulation could involve integrating real sensor feeds, visualizing decisions in rviz2, or testing more advanced FSM behaviors like yielding, lane changes, or obstacle avoidance.
Ultimately, autonomous vehicle development is an interdisciplinary challenge, requiring expertise in robotics, artificial intelligence, and systems engineering. By simulating an FSM in ROS 2, we've taken a small but meaningful step toward grasping the complexity of self-driving systems.
🚀 I plan to continue to expand on this project. There are many considerations like integrating sensor fusion algorithms, reinforcement learning, or real-time control systems to bridge the gap between simulation and real-world applications. Stay tuned.
Will