NVIDIA Jetson Orin Nano Guide: From Unboxing to Running Model Benchmarks

- Published on
- Domain
- Embedded AI and infrastructure
- Focus
- Jetson Orin Nano setup, performance tuning, and benchmarking
- Scope
- JetPack, Docker, vision benchmarks, Ollama, and vLLM benchmarking

NVIDIA Jetson Orin Nano Guide: From Unboxing to Running Model Benchmarks
The NVIDIA Jetson Orin Nano Developer Kit is one of the more approachable ways to do modern edge AI work locally. It gives you an ARM Linux system, CUDA support, TensorRT acceleration, camera interfaces, GPIO, NVMe support, and enough GPU performance to run real computer vision workloads plus smaller local LLM and VLM experiments.
NVIDIA's current Jetson Orin Nano Super Developer Kit positioning highlights 67 sparse INT8 TOPS, a 1024-core Ampere GPU, 32 Tensor Cores, a 6-core Arm Cortex-A78AE CPU, 8GB LPDDR5, 102 GB/s memory bandwidth, support for microSD and NVMe storage, and operation across 7W to 25W modes depending on configuration. NVIDIA Technical Blog
This guide covers the full setup flow from unboxing to benchmarking, including microSD preparation, firmware update considerations, first boot, system verification, performance mode configuration, Docker validation, computer vision benchmarking, Ollama testing, and vLLM-style LLM benchmarking.

Required hardware
The Jetson Orin Nano Developer Kit package typically includes:
- Jetson Orin Nano module
- reference carrier board
- preinstalled Wi-Fi/Bluetooth module
- Wi-Fi antenna
- 19V power supply
- quick-start material
Additional required items:
- 64GB or larger microSD card
- DisplayPort monitor, or a known-good DisplayPort-to-HDMI adapter
- USB keyboard
- USB mouse
- Ethernet or Wi-Fi network access
- host computer for writing the microSD card
Optional but strongly recommended:
- M.2 NVMe SSD
- better airflow or active cooling
- USB or CSI camera
The microSD card is fine for first boot and basic learning. For real benchmarking, Docker images, Hugging Face caches, and larger model files, NVMe storage is the better long-term path.
Recommended software version
JetPack 6.x is the practical target for Jetson Orin Nano development.
JetPack 6.2 includes Jetson Linux 36.4.3, Linux kernel 5.15, an Ubuntu 22.04 root filesystem, CUDA 12.6, TensorRT 10.3, cuDNN 9.3, VPI 3.2, and support for new Super modes on Jetson Orin Nano and Orin NX. NVIDIA Technical Blog
Jetson Linux 36.4.4, which ships as part of JetPack 6.2.1, is a product-quality release for Jetson Orin Nano and includes fixes related to SDK Manager flashing and Docker 28 compatibility. NVIDIA Docs
JetPack 7 exists, but current JetPack 7 releases are aimed primarily at newer Thor-class hardware. For Jetson Orin Nano, JetPack 6.x remains the safer target unless NVIDIA explicitly documents newer support for this board.
Firmware compatibility warning
Fresh Jetson Orin Nano Developer Kits may ship with older factory firmware. Older firmware can create a frustrating setup failure: the board may boot JetPack 5.x, but fail to boot JetPack 6.x cleanly.
Common pattern:
Fresh Jetson Orin Nano
|
+-- Firmware already supports JetPack 6.x
| |
| +-- Flash JetPack 6.x SD card and boot normally
|
+-- Firmware is too old
|
+-- JetPack 6.x may show black screen, UEFI shell, or boot failure
+-- Boot JetPack 5.x first
+-- Apply the QSPI bootloader update
+-- Reboot with JetPack 6.x SD card
NVIDIA's JetPack installation docs explicitly note that using JetPack 6.x SD card images for the first time may require updating the QSPI bootloaders first. NVIDIA Docs
The practical rule is: reflashing the same JetPack 6.x microSD card over and over will not fix an outdated QSPI firmware problem.
Setup path options
There are two normal setup paths.
Path A: microSD card method
Best for:
- Windows hosts
- macOS hosts
- Linux hosts where the simplest setup is preferred
- first-time board bring-up
- initial validation
This is the cleanest starting point for most users.
Path B: SDK Manager method
Best for:
- x86 Ubuntu 20.04 or 22.04 host systems
- direct flashing through NVIDIA SDK Manager
- NVMe-first installation
- recovery-mode workflows
- more advanced maintenance
A realistic workflow is:
Initial bring-up: microSD
Development storage: NVMe
Advanced flashing: SDK Manager
Production-style setup: SDK Manager or manual flashing
Flashing the microSD card
Download the Jetson Orin Nano Developer Kit SD card image from NVIDIA's JetPack download path.
If the board already has JetPack 6-compatible firmware, write the JetPack 6.x image directly. If the board fails to boot JetPack 6.x, complete the older-firmware update path first.
Windows host
- Insert the microSD card.
- Open Balena Etcher.
- Select the Jetson Orin Nano image.
- Select the microSD card.
- Start flashing.
- Safely eject the card.
macOS host
- Insert the microSD card.
- Open Balena Etcher.
- Select the Jetson Orin Nano image.
- Select the microSD card.
- Start flashing.
- Eject the card.
Linux host
GUI tools such as Balena Etcher, Raspberry Pi Imager, or GNOME Disks work well. A command-line path also works:
lsblk
# Replace /dev/sdX with the real microSD device.
# Do not use a partition path like /dev/sdX1.
sudo dd if=jetson-orin-nano-image.img of=/dev/sdX bs=64M status=progress conv=fsync
sync
Warning: dd can destroy the wrong disk if you choose the wrong device path.
First boot
Insert the flashed microSD card into the slot on the underside of the module.
Connect:
- DisplayPort monitor
- USB keyboard
- USB mouse
- Ethernet, if available
- 19V barrel power supply
The board powers on when power is connected.
During first boot, Ubuntu asks for:
- EULA acceptance
- language
- keyboard layout
- time zone
- network setup
- username and password
- login configuration
If the board drops into a UEFI shell, shows a black screen, or fails to boot Ubuntu, outdated firmware is one of the first things to suspect.
System verification
After login:
cat /etc/os-release
cat /etc/nv_tegra_release
dpkg-query -W nvidia-l4t-core
uname -a
lsblk
df -h
free -h
Open a second terminal and monitor runtime behavior:
sudo tegrastats
tegrastats is one of the most useful Jetson tools because it shows CPU activity, GPU activity, memory, thermals, and runtime behavior in one place.
System update
Run normal package updates:
sudo apt update
sudo apt upgrade -y
sudo reboot
After reboot:
cat /etc/nv_tegra_release
dpkg-query -W nvidia-l4t-core
Avoid mixing random desktop CUDA, TensorRT, PyTorch, or container packages with JetPack-managed Jetson packages unless you know exactly how they line up with the installed L4T release.
Basic developer tools
Install a practical baseline:
sudo apt install -y \
build-essential git curl wget htop nano vim \
python3-pip python3-venv python3-dev cmake pkg-config unzip
Optional but useful:
sudo apt install -y net-tools openssh-server tmux
Enable SSH:
sudo systemctl enable ssh
sudo systemctl start ssh
ip addr
Connect from another machine:
ssh username@jetson_ip_address
Performance mode setup
For benchmarking, use the official power supply, adequate cooling, and the highest supported performance mode that your thermal setup can actually sustain.
Check the current mode:
sudo nvpmodel -q
sudo nvpmodel -q --verbose
JetPack 6.2 introduced new Super modes for supported Orin Nano configurations. The exact mode IDs can vary, so check the board itself before assuming -m 0 means the same thing across every image. NVIDIA Technical Blog
Set the desired high-performance mode:
sudo nvpmodel -m 0
sudo reboot
After reboot:
sudo nvpmodel -q
Lock clocks for benchmark runs:
sudo jetson_clocks
sudo jetson_clocks --show
For reproducible results, always record:
sudo nvpmodel -q
sudo jetson_clocks --show
cat /etc/nv_tegra_release
Jetson Stats
jtop is a convenient higher-level monitor for Jetson devices.
Install:
sudo -H pip3 install -U jetson-stats
sudo reboot
Run:
jtop
Useful values include CPU, GPU, RAM, swap, temperatures, power mode, fan behavior, and throttling state.
Optional NVMe setup
NVMe storage is strongly recommended for model files, Docker images, and benchmark datasets.
Check the drive:
lsblk
Assuming the drive appears as /dev/nvme0n1:
sudo parted /dev/nvme0n1 --script mklabel gpt
sudo parted /dev/nvme0n1 --script mkpart primary ext4 0% 100%
sudo mkfs.ext4 -F /dev/nvme0n1p1
sudo mkdir -p /mnt/nvme
sudo mount /dev/nvme0n1p1 /mnt/nvme
Get the UUID:
sudo blkid /dev/nvme0n1p1
Add it to /etc/fstab:
UUID=YOUR_UUID_HERE /mnt/nvme ext4 defaults,noatime 0 2
Test the mount:
sudo umount /mnt/nvme
sudo mount -a
df -h
Create working folders:
mkdir -p /mnt/nvme/models
mkdir -p /mnt/nvme/hf-cache
mkdir -p /mnt/nvme/docker-data
Move Hugging Face caches:
echo 'export HF_HOME=/mnt/nvme/hf-cache' >> ~/.bashrc
echo 'export TRANSFORMERS_CACHE=/mnt/nvme/hf-cache' >> ~/.bashrc
source ~/.bashrc
Docker GPU runtime verification
JetPack includes NVIDIA container runtime integration for Jetson.
Check Docker:
docker --version
sudo docker info | grep -i runtime
Run a Jetson L4T base container:
sudo docker run --rm -it --runtime nvidia nvcr.io/nvidia/l4t-base:r36.2.0
Inside the container:
cat /etc/os-release
exit
If Docker breaks after updates, check the installed L4T version and whether the board is on a release with the Docker 28 compatibility fixes noted for 36.4.4. NVIDIA Docs
Running benchmarks
Recommended benchmark groups:
1. System baseline
2. Computer vision / TensorRT-style model benchmarks
3. Ollama local LLM smoke test
4. vLLM serving benchmark
5. Optional custom PyTorch CUDA benchmark
For every result, record:
Board:
JetPack version:
Jetson Linux / L4T version:
Power mode:
jetson_clocks status:
Storage:
Cooling:
Ambient temperature:
Model:
Precision:
Batch size:
Input size:
Average latency:
Throughput:
Power:
Temperature:
Benchmark 1: system baseline
Monitoring terminal:
sudo tegrastats
Configuration terminal:
sudo nvpmodel -q
sudo jetson_clocks --show
cat /etc/nv_tegra_release
free -h
df -h
Install sysbench:
sudo apt install -y sysbench
CPU test:
sysbench cpu --threads=6 --time=30 run
Memory test:
sysbench memory --time=30 run
This does not measure AI acceleration directly, but it does catch unstable setups quickly.
Benchmark 2: NVIDIA Jetson computer vision benchmarks
NVIDIA's jetson_benchmarks repository supports common models such as Inception V4, ResNet-50, OpenPose, VGG-19, YOLO-V3, Super Resolution, and UNet, and includes an Orin Nano benchmark CSV path. GitHub
cd ~
git clone https://github.com/NVIDIA-AI-IOT/jetson_benchmarks.git
cd jetson_benchmarks
mkdir -p models
sudo sh install_requirements.sh
Download Orin Nano models:
python3 utils/download_models.py \
--all \
--csv_file_path benchmark_csv/orin-nano-benchmarks.csv \
--save_dir "$(pwd)/models"
Set benchmark clocks:
sudo nvpmodel -q
sudo jetson_clocks
Run all Orin Nano benchmarks:
sudo python3 benchmark.py \
--all \
--csv_file_path benchmark_csv/orin-nano-benchmarks.csv \
--model_dir "$(pwd)/models" \
--jetson_clocks
Run a single model:
sudo python3 benchmark.py \
--model_name resnet \
--csv_file_path benchmark_csv/orin-nano-benchmarks.csv \
--model_dir "$(pwd)/models" \
--jetson_clocks
Benchmark 3: Ollama LLM smoke test
Ollama is a simple way to do a first local-LLM sanity check.
Install:
curl -fsSL https://ollama.com/install.sh | sh
Check service status:
systemctl status ollama
Run a small model:
ollama run llama3.2:1b
Example controlled timing test:
time ollama run llama3.2:1b "Write a short explanation of TensorRT for edge AI."
In another terminal:
sudo tegrastats
For larger models, NVMe is strongly recommended. The Orin Nano's shared 8GB memory means quantization, context length, and runtime choice matter a lot.
Benchmark 4: vLLM LLM serving benchmark
Jetson AI Lab documents a vLLM-based GenAI benchmarking flow for Jetson. Jetson AI Lab
Set high-performance mode:
sudo nvpmodel -m 0
sudo jetson_clocks
Pull the container:
sudo docker pull ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
Start the container:
sudo docker run --rm -it \
--network host \
--shm-size=16g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--runtime=nvidia \
--name=vllm \
-v "$HOME/.cache/huggingface:/root/.cache/huggingface" \
ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin
Open a second shell into the same container:
sudo docker exec -it vllm bash
Serve a quantized model in the first shell:
vllm serve "RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16" \
--gpu-memory-utilization 0.8
Warm-up or single-user benchmark:
vllm bench serve \
--dataset-name random \
--model RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 \
--num-prompts 50 \
--percentile-metrics ttft,tpot,itl,e2el \
--random-input-len 2048 \
--random-output-len 128 \
--max-concurrency 1
Multi-user benchmark:
vllm bench serve \
--dataset-name random \
--model RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 \
--num-prompts 50 \
--percentile-metrics ttft,tpot,itl,e2el \
--random-input-len 2048 \
--random-output-len 128 \
--max-concurrency 8
Higher concurrency usually raises total throughput while also increasing per-request latency. Record both.
Benchmark 5: simple PyTorch CUDA check
import torch
from typing import Final
def main() -> None:
print("PyTorch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if not torch.cuda.is_available():
return
device: Final[torch.device] = torch.device("cuda")
x: torch.Tensor = torch.randn((4096, 4096), device=device)
y: torch.Tensor = torch.randn((4096, 4096), device=device)
torch.cuda.synchronize()
z: torch.Tensor = x @ y
torch.cuda.synchronize()
print("Result shape:", z.shape)
if __name__ == "__main__":
main()
Run:
python3 torch_cuda_check.py
If CUDA is unavailable, avoid random desktop CUDA packages. Prefer Jetson-specific wheels, containers, or JetPack-compatible package sources.
Result interpretation
For computer vision models, FPS alone is not enough. Also record:
- input resolution
- precision
- preprocessing cost
- postprocessing cost
- batch size
- camera or video I/O overhead
- power mode
- thermal behavior
For LLMs, the most useful metrics are:
TTFT: Time to first token
ITL: Inter-token latency
Output tok/s: Generated token throughput
E2E latency: Full request latency
Peak memory: Whether the model fits comfortably
Power: Efficiency during inference
Temperature: Thermal stability during sustained load
For edge-AI comparisons, report both performance and efficiency:
Performance:
FPS
tok/s
latency
throughput
Efficiency:
FPS/W
tok/s/W
joules per inference
Troubleshooting
JetPack 6.x SD card does not boot
Most likely cause:
Outdated QSPI firmware
Recommended fix:
Boot a JetPack 5.x image first.
Apply the QSPI firmware update.
Then boot JetPack 6.x again.
Display does not work
Use DisplayPort when possible. If HDMI is required, use a known-good DisplayPort-to-HDMI adapter because the board uses DisplayPort, not native HDMI.
Docker GPU runtime fails
Check:
docker --version
sudo docker info | grep -i runtime
cat /etc/nv_tegra_release
Benchmark results look too slow
Check:
sudo nvpmodel -q
sudo jetson_clocks --show
sudo tegrastats
Common causes:
Not running in the intended high-performance mode
jetson_clocks not enabled
Thermal throttling
Slow microSD storage
Model too large for available memory
CPU fallback instead of GPU execution
Incorrect container or package version
Background desktop load
Ollama or vLLM falls back to CPU
Watch tegrastats. If the GPU stays quiet and CPU load spikes, the runtime probably is not using the accelerated path you expected.
Out of memory
Possible fixes:
Use a smaller model
Use a quantized model
Reduce context length
Reduce concurrency
Move caches and models to NVMe
Use swap only as a last-resort stability tool
If swap is used during benchmarking, document it clearly because it can make the results misleading.
Recommended benchmark report format
# Jetson Orin Nano Benchmark Report
## System
| Item | Value |
| --- | --- |
| Board | Jetson Orin Nano Developer Kit |
| Memory | 8GB LPDDR5 |
| JetPack | TBD |
| L4T / Jetson Linux | TBD |
| Storage | microSD / NVMe |
| Power Mode | MAXN SUPER / 25W / 15W |
| jetson_clocks | Enabled / Disabled |
| Cooling | Stock / modified |
| Ambient Temp | TBD |
## Computer Vision Benchmarks
| Model | Input | Precision | FPS | Avg Power | FPS/W | Peak Temp |
| --- | ---: | --- | ---: | ---: | ---: | ---: |
| ResNet-50 | 224x224 | TBD | TBD | TBD | TBD | TBD |
| YOLO | TBD | TBD | TBD | TBD | TBD | TBD |
| UNet | 256x256 | TBD | TBD | TBD | TBD | TBD |
## LLM Benchmarks
| Model | Runtime | Quantization | Concurrency | Output tok/s | TTFT | ITL | Peak RAM | Peak Temp |
| --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
| llama3.2:1b | Ollama | default | 1 | TBD | TBD | TBD | TBD | TBD |
| Llama 3.1 8B | vLLM | W4A16 | 1 | TBD | TBD | TBD | TBD | TBD |
| Llama 3.1 8B | vLLM | W4A16 | 8 | TBD | TBD | TBD | TBD | TBD |
Final setup flow
1. Flash microSD card.
2. Update firmware if JetPack 6.x does not boot.
3. Boot JetPack 6.x.
4. Update system packages.
5. Install developer tools.
6. Enable SSH.
7. Configure the desired high-performance mode.
8. Enable jetson_clocks for benchmarks.
9. Verify behavior with tegrastats or jtop.
10. Move models and caches to NVMe if available.
11. Validate Docker GPU runtime.
12. Run jetson_benchmarks.
13. Run an Ollama smoke test.
14. Run a vLLM benchmark for serving metrics.
15. Record latency, throughput, power, and thermal data.
The most important setup requirement on Jetson is software-stack consistency: firmware, JetPack, CUDA, TensorRT, Docker, PyTorch, and model runtime versions need to line up. Once the board is stable, the most useful comparisons are sustained performance per watt under realistic thermal and memory constraints, not just isolated peak FPS or token throughput.