For years, Elon Musk has discussed Dojo, the AI supercomputer that will provide as the foundation for Tesla's AI ambitions. It's significant enough to Musk that he recently announced that the company's AI team will "double down" on Dojo as Tesla prepares to unveil its robotaxi in October.So, what exactly is Dojo? And why is it so important for Tesla's long-term strategy?In summary, Dojo is Tesla's custom-built supercomputer used to train its "Full Self-Driving" neural networks. Improving Dojo aligns with Tesla's objective of achieving complete self-driving and bringing a robotaxi to market. FSD, which is installed in about 2 million Tesla vehicles today, can do some automated driving functions, but it still requires a human to be present behind the wheel.
Tesla delayed the reveal of its robotaxi, which was slated for August, to October, but both Musk’s public rhetoric and information from sources inside Tesla tell us that the goal of autonomy isn’t going away.
And Tesla appears poised to spend big on AI and Dojo to reach that feat.
Musk does not want Tesla to be only an automaker, or even a supplier of solar panels and energy storage solutions. Instead, he envisions Tesla as an AI business that has unlocked the code for self-driving cars by imitating human perception.
Most other businesses developing autonomous vehicle technology use a combination of sensors to sense the world, such as lidar, radar, and cameras, as well as high-definition maps to locate the car. Tesla believes it can achieve completely autonomous driving by relying solely on cameras to record visual input, which will then be processed by powerful neural networks to make quick decisions about how the car should act.
Tesla's previous chief of AI, Andrej Karpathy, stated at the automaker's first AI Day in 2021 that the business is essentially attempting to create "a synthetic animal from the ground up." (Musk had teased Dojo since 2019, but Tesla officially revealed it at AI Day.)
Companies such as Alphabet's Waymo have marketed Level 4 autonomous vehicles, which the SAE defines as a system that can drive autonomously without human intervention in certain conditions, using a more traditional sensor and machine learning approach. Tesla has yet to develop an autonomous system that does not require a human behind the wheel.
Approximately 1.8 million customers have paid the exorbitant subscription fee for Tesla's FSD, which is currently $8,000 but has been priced as high as $15,000. The promise is that Dojo-trained AI software will eventually be made available to Tesla consumers via over-the-air updates. Because of the magnitude of FSD, Tesla has been able to collect millions of miles of video footage for use in training.
The theory is that the more data Tesla collects, the closer it will be to obtaining full self-driving capabilities.However, some industry experts warn there may be a limit to the brute force technique of throwing more data at a model to make it wiser.
First of all, there’s an economic constraint, and soon it will just get too expensive to do that,” Anand Raghunathan, Purdue University’s Silicon Valley professor of electrical and computer engineering, told TechCrunch. Further, he said, “Some people claim that we might actually run out of meaningful data to train the models on. More data doesn’t necessarily mean more information, so it depends on whether that data has information that is useful to create a better model, and if the training process is able to actually distill that information into a better model.”
Raghunathan said despite these doubts, the trend of more data appears to be here for the short-term at least. And more data means more compute power needed to store and process it all to train Tesla’s AI models. That is where Dojo, the supercomputer, comes in.
Dojo is Tesla’s supercomputer system that’s designed to function as a training ground for AI, specifically FSD. The name is a nod to the space where martial arts are practiced.
A supercomputer is made up of thousands of smaller computers called nodes. Each of those nodes has its own CPU (central processing unit) and GPU (graphics processing unit). The former handles overall management of the node, and the latter does the complex stuff, like splitting tasks into multiple parts and working on them simultaneously. GPUs are essential for machine learning operations like those that power FSD training in simulation. They also power large language models, which is why the rise of generative AI has made Nvidia the most valuable company on the planet.
Even Tesla buys Nvidia GPUs to train its AI (more on that later).
Tesla’s vision-only approach is the main reason Tesla needs a supercomputer. The neural networks behind FSD are trained on vast amounts of driving data to recognize and classify objects around the vehicle and then make driving decisions. That means that when FSD is engaged, the neural nets have to collect and process visual data continuously at speeds that match the depth and velocity recognition capabilities of a human.
In other words, Tesla means to create a digital duplicate of the human visual cortex and brain function.
To get there, Tesla needs to store and process all the video data collected from its cars around the world and run millions of simulations to train its model on the data.
Tesla appears to rely on Nvidia to power its current Dojo training computer, but it doesn’t want to have all its eggs in one basket — not least because Nvidia chips are expensive. Tesla also hopes to make something better that increases bandwidth and decreases latencies. That’s why the automaker’s AI division decided to come up with its own custom hardware program that aims to train AI models more efficiently than traditional systems.
At that program’s core is Tesla’s proprietary D1 chips, which the company says are optimized for AI workloads.
Tesla is of a similar opinion to Apple, in that it believes hardware and software should be designed to work together. That’s why Tesla is working to move away from the standard GPU hardware and design its own chips to power Dojo.
Tesla unveiled its D1 chip, a silicon square the size of a palm, on AI Day in 2021. The D1 chip entered into production as of at least May this year. The Taiwan Semiconductor Manufacturing Company (TSMC) is manufacturing the chips using 7 nanometer semiconductor nodes. The D1 has 50 billion transistors and a large die size of 645 millimeters squared, according to Tesla. This is all to say that the D1 promises to be extremely powerful and efficient and to handle complex tasks quickly.
“We can do computer and data transfers simultaneously, and our custom ISA, which is the instruction set architecture, is fully optimized for machine learning workloads,” said Ganesh Venkataramanan, former senior director of Autopilot hardware, at Tesla’s 2021 AI Day. “This is a pure machine learning machine.”
The D1 is still not as powerful as Nvidia’s A100 chip, though, which is also manufactured by TSMC using a 7 nanometer process. The A100 contains 54 billion transistors and has a die size of 826 square millimeters, so it performs slightly better than Tesla’s D1.
To get a higher bandwidth and higher compute power, Tesla’s AI team fused 25 D1 chips together into one tile to function as a unified computer system. Each tile has a compute power of 9 petaflops and 36 terabytes per second of bandwidth, and contains all the hardware necessary for power, cooling and data transfer. You can think of the tile as a self-sufficient computer made up of 25 smaller computers. Six of those tiles make up one rack, and two racks make up a cabinet. Ten cabinets make up an HExaPOD. At AI Day 2022, Tesla said Dojo would scale by deploying multiple ExaPODs. All of this together makes up the supercomputer.
Tesla is also working on a next-gen D2 chip that aims to solve information flow bottlenecks. Instead of connecting the individual chips, the D2 would put the entire Dojo tile onto a single wafer of silicon.