JetsonHacks

Developing on NVIDIA® Jetson™ for AI on the Edge

NVIDIA Jetson AGX Thor vs DGX Spark Benchmarks

About a month ago, I sat down and put pencil to paper to predict the performance of the announced NVIDIA DGX Spark versus the newly introduced Jetson AGX Thor. The DGX Spark is now shipping, so it’s time to revisit the predictions. This is a multi-part story. I’m working on the corresponding video explaining the results. Here I’m sharing just the benchmark numbers for comparison.

For now, here are the benchmarks for comparison. The folks over at llama.cpp ran sweep benchmarks on Spark for several LLMs, along with the instructions on how to run the benchmarks independently. The benchmarks are a living document, and have changed a little bit over time as llama.cpp performance has improved.

However, I saved the example results I tested against. These are the llama.cpp results from October 15, 2025. The llama.cpp build information is Build: b6767 (5acd45546).

Here are the Jetson AGX Thor versus DGX Spark benchmarks.

Here are the raw DGX Spark benchmarks.

Here are the raw Jetson AGX Thor benchmarks.

The Prediction

In retrospect, this prediction is more straightforward than I originally presented. If I were a little smarter, I would have realized that the Spark GB10 CPU would have performance equivalent to an Apple Macintosh M4, give or take. Both use the same chip manufacturing process, and have similar clock speeds. I’d give Apple the speed edge – not all the smart people work at NVIDIA.

The Spark has twice the number of Tensor Cores as the Thor. This gives a big clue on how it performs in prefill during LLM inference. We know that prefill is compute bound, and is mostly matrix multiplication and accumulation. That’s what Tensor Cores are designed for, so you’d figure it would be roughly twice as fast.

That leads to token generation. That’s where the real work in prediction comes in. I don’t think there is an easy shortcut, because there are so many factors to take into consideration. We know memory bandwidth plays a role, but people generally misinterpret why that is important. The results take a while to break down and understand where the underlying performance bottlenecks might be. And that’s the subject of the video and next article!

Facebook
Twitter
LinkedIn
Reddit
Email
Print

Leave a Reply

Your email address will not be published. Required fields are marked *

Disclaimer

Some links here are affiliate links. If you purchase through these links I will receive a small commission at no additional cost to you. As an Amazon Associate, I earn from qualifying purchases.

Books, Ideas & Other Curiosities