Let’s Backup!

Once you have your Jetson all setup the way you like it, back it up!

Nuts and Bolts

If you are only interested in a backup method, we’ve written some scripts that use the Linux rsync command line utility to backup the root directory of a Jetson to another directory. Typically the other directory is on a different drive.

You will need some backup medium. Network attached storage, another computer, or external drive. A relatively inexpensive way to get started is to use a USB drive:

Western Digital 8TB (other sizes available): https://amzn.to/2ZLSijf
Seagate 8TB (other sizes available): https://amzn.to/2OLXtZW

These are 5400 RPM disk drives generally meant to be used for archiving data.

The scripts are located in the JetsonHacks account on Github in the backupJetson repository. There are basic instructions there, though you may find it useful to watch the video. These are overly simple scripts, you may want to tailor them to your needs. rsync provides a very large number of parameters for configuration, you may want to choose others than the ones in the scripts.

As an alternative you can use a GUI front end to rsync such as Back In Time as demonstrated in the video.

In any case, reading the rest of the article below will help put everything in context. Here’s the money quote:

If you cannot regenerate a system from scratch, you do not have a system. You have a pain train that is on the tracks coming towards you. When will it arrive? One of the few guarantees in life. The pain train will arrive when you least expect it, when it is the most costly, and when it will hurt the most. That is right before the big demo, or a big project is due, or some other time when your system absolutely has to work.

Background

Let’s go over some of the ways that people handle backups in a professional environment. The idea here is that if we understand why these backup procedures are in place, we can tailor them to suit our needs.

Backups are a mind numbingly boring subject that can elicit sheer terror when they are not present, or don’t work. Most people that have used computers for any amount of time encounter an “Alt-shift” moment when they accidentally delete important information or a system update puts their system into an unusable state.

Think of backups as insurance. The amount of insurance you take depends on what type of loss you are you trying to protect yourself from. We usually think about backups as either time, or loss.

If we are looking through the time lens, “How long does it take me to get back to the point before I needed to restore my system?” Also, “Does restoring my system get me back to where I can start working again?” Sometimes backups bring you back to a point where the system will just corrupt itself again when restored, say in the case of a system upgrade.

When we look at backups through a loss lens, we think about data that is difficult or impossible to recreate or replace. Typically this is data that has been uniquely gathered or created, some common examples being the pictures on your phone, spreadsheets you create, documents that you have written, presentations you make and so on.

This idea helps us bin the data as to whether it is unique or if it is common. Unique data is what we have created, unique to us. On the other hand, common data is information that we can gather from other sources. While our system relies on common data, there are always copies of this data available so that we can get it from another source. Large data sets that are available on the Internet are examples of common data, a prime example being machine learning inferencing models.

Temporary Data

Another type of data that is stored on systems is temporary or cached data, which you can think of as working products. This can be a significant amount of data, but you do not think of it as ‘valuable’ in the sense that you can always recreate it from source material.

In more concrete terms, when creating the video above, the cache to render the video is around 100 gigabytes. As we will explain later, we actually need to make 3 copies of the data , that ends up being ~300GB. We’ve done about 250 videos on the JetsonHacks YouTube channel, so that would end up around 75 Terabytes of data.

But this data is not valuable in any meaningful sense, it is a by-product of rendering the video. That’s why people go through a multi-step process to archive information. When a project is done, they remove the caches and such, then archive their project on secondary storage.

We don’t actually save the caches from the video, of course. While people will tell you that “data storage is free”, ordering 75 Terabytes of disk storage from Amazon ends up in a good sized bill. Programs that backup data provide ways to exclude directories such as temp directories or directories holding cache information.

Types of Backup

Most of the time we think about backups in a few different ways.

Full System
- You can think of this as a snapshot in time
- Desktop systems have dedicated programs to do this, for example Macintosh has Time Machine
- On a Jetson, this might be thought of as the base L4T system + the programs that you run on your machine, like machine learning and trained models
- Differential and Incremental backups
  - Differential are files that have changed since the last full backup
  - Incremental backups are files that have changed since the backup, be it full or incremental backup
- This can be automated, backup every X amount of time, minutes, hours, days, weeks, system startup
- Most expensive
Data backup
- You may have data that you want to keep safe and accessible
- Separate from the system software
- Generally unique to your system, for example saved images, videos or other gathered information
- May be irreplaceable
Developers backups (programming)
- Generally this includes the source code and associated build information, data and documentation
- Versioned, so you can keep track of changes that are made
  - Especially important in a group programming environment
- Usually has a separate formal system for this, such as Git or Subversion
- There are easy ways to do this on a personal level, such as Github
- Some programming environments have built in support

3-2-1 Rule

In most professional environments, the physical aspect of storing backup information is referred to as “The 3-2-1 rule”

Keep at least three copies of your data
- Original copy and at least two backups
Keep the backed-up data on two different storage types
- The data is less likely to be corrupted when on two different types of storage
Keep at least one copy of the data offsite
- A local disaster (like a fire!) could ruin your backups
- This is easier now because of cloud backup

Each approach has a different cost in terms of storage space, time and effort. Here’s the bullet points:

Storage Space
- Money attached – Drives cost money!
- Hardware – Takes up physical space, and you need it wire it
- Different types: Local drives, network storage, cloud storage
- Organization – keeping track of data can be a challenge in and of itself
- You need to make at least 3 copies – Production (the data on the computer), Local (a copy of production), and Offsite (another copy of production). Offsite probably means cloud storage
Time
- How much of our time is needed, and how much computer time?
- Our time – Initial Setup – backup programs/commands/tests
- Computer time – How long does it take to make a backup, or an incremental backup?
- Manual or automated? If it’s not automated, it may be skipped
Effort
- How much do you need to know to make a backup?
- What do you have to do to start a backup?
- If it’s too hard, you won’t make backups frequently

Jetson Backup

Why embedded systems are a little different than a desktop:

More susceptible to hardware failures or experiments that go bad
Usually depend on memory (eMMC or SD card) which can be relatively unreliable (SD cards especially)
Jetson in particular has a different drive layout than other systems with several different partitions

Different Categories of Users

Developers

If you are a developer you always just assume that something will break catastrophically, and that you will need to regen a system. Most developers have enough experience working at a system level that they know if you make a mistake, the system can become unstable.

As part of the testing regimen, developers typically will have different versions of an operating environment that they need to support. For example on the Jetson, they may have an environment for the JetPack 4.3 and another for the JetPack 4.4 version. That’s why professional developers don’t get excited about new releases, because it means more things that they have to keep track of and more work to bring everything thing forward.

Typically for a major release, the developers will gen up a new system and rebuild the system from scratch just to make everything works as expected.

With that said, developers will usually create a base system with their environment modifications (like their programming environment, data sets and so on), and then make a full system backup. The system is then backed up periodically (depending on the place, usually once every day or two). Some places will make local backups more frequently (let’s say every hour), and then ripple that to more permanent storage less frequently.

Typically a developer will have some plan for backing up their work so they won’t lose more than a half days worth of work or so if things go terribly wrong.

Remember this is for active developers, people who are making changes to their system day in and day out. Also, they may be changing the way that the overall system works.

To restore the system, it’s pretty simple. Find the last backup snapshot, restore the system and try to piece back everything together since that point in time.

Normal People

For normal people who are not actively trying to destroy, I mean, improve their system like a developer, backups are usually thought of in a different manner. In a business that is collecting data, let’s say an accounting system, the accounting software will organize the data collection so that it is backed up as part of the process of collection. This is typical of most data base types of software applications where you will hear terms like audit trails and journaling. In most cases, the data is gathered over a network type of application with the actual data being stored on network attached storage. People have begun doing these types of application over the Internet, with most of the data being stored in the cloud.

Even if data is stored in the cloud, remember the 3-2-1 rule. The information is downloaded on a schedule so that the information can be stored locally.

Usually in this situation there is a full system backup that can restore all of the application software and configurations on the local computer. In a separate step, you then retrieve the data from a backup data store and then you’re ready to get back to work. In most places, there is an IT or system administrator person that handles this procedure.

Yeah, but what’s a happy balance?

That’s great and everything, but what do we do on something like a Jetson? At JetsonHacks we are developers. Most of our development and programming changes go into a version control system, Git. Typically we create a “system environment” which has a base L4T version along with the programming tools and data sets we need. Then we make a backup, so that if things go south then we have a stable base to work from.

We also make backups of the data sets we are working on regularly. Thus, restoring the system consists of restoring the base system environment that we backed up, and then adding the data sets and the Git repositories of the source code and scripts for our project. Remember, these data sets and the Git repositories are also from backups. We pull from three backup silos so to speak.

To be clear, whenever a new L4T is released, we build a new “system environment” from scratch. The “system environment” gets backed up. We then add in our data sets and Git source. There are some inevitable hiccups in this procedure, usually due to library version mismatches. However, it is reliable and it is rare that we have a “Alt-Shift” moment.

A Recipe

If you are not a developer, you will benefit from organizing your system before backups. Create a base “system environment” which includes all of the applications and libraries that you want to use. Then make a backup.

Certainly keep track of what your system environment contains, you will need to recreate it from scratch at some point.

Whenever you add another essential program that you know you want in your system environment, make another backup.

Now, if you keep the specialized data you use in specific directories, then you only need to back up those directories on a regular basis. Even if you have a catastrophic failure, you only need to restore a system environment and add your last data backup. Then you are up and running again.

Just to be clear …

Backups are a surprisingly deep subject. The above are just some suggestions on handling backing up your data. In fact there is an entire industry that has sprung up around saving computer data.

It is not possible to cover in a short article “what you should do” when making computer backups, as everyone has their own special situation. You will do best by reading some background material, and then deciding what best fits your situation.

The more organized your are, the easier the task becomes. If your data is spread out all over that place, programs and libraries added without much thought and so on, you steer yourself towards a full backup solution. That’s not bad, but it requires more resources in physical drive space and time.

On the other hand, if you only need data backups and make full system backups when you add your programs, you cut down a lot on the number of backups that you make and the resources required.

Backup when you place your rootfs on a USB drive

Looky here:

7 Responses

Kaisar Khatak says:
July 29, 2020 at 6:42 pm
Hi. Good Post.
I was looking at the backupJetson/backup-rootfs.sh script and saw that the date was appended to the backup filename. I think this would make rsync delete parameter ineffective for future backups. I think that a cron job with rsync should suffice.
Also, I am not sure how many backups are necessary. If we keep 2 backups, the next time the backup script runs (the 3rd time), it might want to delete the oldest backup before creating the latest backup snapshot.
Thoughts?
Reply
1. kangalow says:
  July 31, 2020 at 11:55 am
  The scripts are meant for a one time backup of a system with the rootfs on a separate drive such as NVMe or USB. This is a pretty typical use case in embedded systems.
  If you are using cron jobs, there is a different protocol for using rsync effectively. To be clear, if you already know about how to use cron jobs, you also are capable of using rsync “for real” in a scheduled manner. This article is more along the lines of, “Hey, backup your stuff. Don’t really care how, but here are some breadcrumbs to get started. Here’s a script for making a backup today. Also, Google rsync to figure out how to use it.”
  One thing to think about is that embedded systems are usually treated differently than desktop systems. Desktops usually have a backup strategy to be handled by IT staffs at most companies. Most embedded systems are handled by the developers. It’s easy if you’re developing for the embedded system on a desktop machine, it gets taken care of automagically. If you are actually developing on the embedded system, then it’s a different game.
  Part of the equation is resource management. If you are doing incremental backup of everything, the storage requirements are not too bad. If you are gathering large amounts of data, a better strategy may be to off board/backup the data rather than the entire system if it is remains relatively static. If you look in Back in Time it allows you to select how many backups you want.
  Backups are a surprisingly deep subject, every place has a different approach. The major point tends to be how much time people are willing to spend to get from a know restoration point back to even. Typically small shops tend to settle on something less than half a day, in other words they’re willing to lose something like a mornings worth of work.
  Large shops (imagine backing up a few hundred/thousand machines) tend to have much more involved procedures. Because it’s real money they may take a different approach. Some are willing to let developers struggle to get back to even (usually working off of a 1 week old backup), others are more generous and pretty much journal everything so that the developer is up and running quickly.
  This usually means that there is a prepared image with the apps/development tools/setup from a restore point. Once that is restored, then the developer updates from when the restore point was taken. That typically means pulling from a code versioning system, (something like git or subversion), recompiling the code, loading the latest data set, and you’re on your way. Most developers do some type of automated commit to the versioning system, so that they don’t lose a whole lot if things go bad.
  Like I said, deep subject. Very dependent on how you use your system. It’s also clear why there are multi-million dollar companies that have sprung up to help with this task.
  For people not doing development, it basically comes down to data changes which usually use a different backup method. Collect the data, process the data, and then distribute the data in some type of form that gets backed up.
  Reply
John Console says:
July 29, 2020 at 11:37 pm
this is great. when would you cover restore procedures?
Reply
gino says:
November 20, 2021 at 8:58 pm
hi sir,
I have some of the jetson computers. I have to install my custom programs to all of the jetsons.
Is it correct to use this script to make the copy of the os so all the computers would have same programs and settings?
Reply
1. kangalow says:
  November 20, 2021 at 10:04 pm
  No. This is for backing up one machine. For multiple Jetsons, you should use the NVIDIA Jetson tools provided with the SDK Manager. You can ask for more help on the official NVIDIA Jetson forums, where a large group of developers and NVIDIA engineers share their experience: https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/70
  Reply
Torben Andersen says:
June 30, 2025 at 8:13 am
This is an old post, so maybe the original poster is not reading it anymore, but I’ll take the chance. I am a beginner, when it comes to Jetsons. I imagine two possible types of backups. One is a backup of all the files found in the rootfs and (if I understand it correctly) is useful if someone deletes something by mistake or needs an earlier version of one or more files. The other type of backup, is a backup of everything, including system etc. This is what you will need, if a lightning hits your house and the Jetson gets toasted.
I think that the approach described above is of the first type, whereas I (and many others?) need to be able to restore everything after a major catastrophe. I have spent much time installing stuff on my Jetson Nano Orin, and I would hate to begin all over again after, say, a lightning strike.I can’t help thinking of windows computers: Just keep a clone of your HD and you are allways good to go again. Any suggestions?
Reply
1. kangalow says:
  June 30, 2025 at 1:53 pm
  I believe that the two scenarios you layout are accurate. However, I think the perspective might need adjustment. The Jetson is an embedded system, so it is different than a PC in the way that the drives are laid out. Additionally, there is firmware (which is usually referred to as QSPI) which is in chips on the device itself, not on the drive.
  There are ~ 15 partitions on a Jetson drive. The APP partition, which is where the rootfs resides, is typically the only interesting part of the drive for backup purposes. The other 14 partitions hold a variety of information, mostly having to do with the embedded nature of the device. You could make a backup of those partitions (a true “disk image”) but it doesn’t buy you much.
  Now the thing to know is that the backup is keyed to that particular hardware. It’s unlikely you would be able to get a new Jetson and simply copy over the APP partition from the old machine and have it work reliably. There’s a lot of security concerns and all that which make it very tricky to rely on.
  Typically embedded engineers handle this by making sure that their code is version controlled (Github or whatever), and create scripts to build the system from scratch. If you can’t recreate an embedded development system from scratch, you don’t have a system.
  The backup can still be used for recovering data, or trying to figure out why your new system isn’t quite right. Of course, a better path is to have a plan to actually backup up the data periodically. You could always use the built-in Ubuntu tools too.
  It’s a different process for production engineering. I’ll skip that, because it’s painful to even talk about it.
  In short, it’s more difficult than on a PC or Mac. As usual, the first step is to know you have a problem, and then figure out how much pain you are willing to endure. For me, what I usually do is create and build my code in Github. On an active project, I copy the directories I’m working in to a NAS, and possibly to cloud storage if it’s really important. If the drive goes down, which will happen occasionally because development, then it’s usually pretty simple to re-flash to the base and copy over the work directories or clone the Github repositories, etc. If you know you’re about to kill an install, the full APP backup is comforting so you’ll only lose an hour or so.
  But that’s the difference between PC programming and embedded systems. Embedded system programming generally tends to break the entire system. With embedded systems you have to assume that something bad happens and the entire system needs to be recreated.To make up for it, assume that it will happen at the most inconvenient time. I know it’s not a very rosy picture, but that’s my experience. Thanks for reading!
  Reply

Disclaimer

Some links here are affiliate links. If you purchase through these links I will receive a small commission at no additional cost to you. As an Amazon Associate, I earn from qualifying purchases.

Books, Ideas & Other Curiosities

JetsonHacks