Once you have your Jetson all setup the way you like it, back it up!
Nuts and Bolts
If you are only interested in a backup method, we’ve written some scripts that use the Linux rsync command line utility to backup the root directory of a Jetson to another directory. Typically the other directory is on a different drive.
You will need some backup medium. Network attached storage, another computer, or external drive. A relatively inexpensive way to get started is to use a USB drive:
- Western Digital 8TB (other sizes available): https://amzn.to/2ZLSijf
- Seagate 8TB (other sizes available): https://amzn.to/2OLXtZW
These are 5400 RPM disk drives generally meant to be used for archiving data.
The scripts are located in the JetsonHacks account on Github in the backupJetson repository. There are basic instructions there, though you may find it useful to watch the video. These are overly simple scripts, you may want to tailor them to your needs. rsync provides a very large number of parameters for configuration, you may want to choose others than the ones in the scripts.
As an alternative you can use a GUI front end to rsync such as Back In Time as demonstrated in the video.
In any case, reading the rest of the article below will help put everything in context. Here’s the money quote:
Let’s go over some of the ways that people handle backups in a professional environment. The idea here is that if we understand why these backup procedures are in place, we can tailor them to suit our needs.
Backups are a mind numbingly boring subject that can elicit sheer terror when they are not present, or don’t work. Most people that have used computers for any amount of time encounter an “Alt-shift” moment when they accidentally delete important information or a system update puts their system into an unusable state.
Think of backups as insurance. The amount of insurance you take depends on what type of loss you are you trying to protect yourself from. We usually think about backups as either time, or loss.
If we are looking through the time lens, “How long does it take me to get back to the point before I needed to restore my system?” Also, “Does restoring my system get me back to where I can start working again?” Sometimes backups bring you back to a point where the system will just corrupt itself again when restored, say in the case of a system upgrade.
When we look at backups through a loss lens, we think about data that is difficult or impossible to recreate or replace. Typically this is data that has been uniquely gathered or created, some common examples being the pictures on your phone, spreadsheets you create, documents that you have written, presentations you make and so on.
This idea helps us bin the data as to whether it is unique or if it is common. Unique data is what we have created, unique to us. On the other hand, common data is information that we can gather from other sources. While our system relies on common data, there are always copies of this data available so that we can get it from another source. Large data sets that are available on the Internet are examples of common data, a prime example being machine learning inferencing models.
Another type of data that is stored on systems is temporary or cached data, which you can think of as working products. This can be a significant amount of data, but you do not think of it as ‘valuable’ in the sense that you can always recreate it from source material.
In more concrete terms, when creating the video above, the cache to render the video is around 100 gigabytes. As we will explain later, we actually need to make 3 copies of the data , that ends up being ~300GB. We’ve done about 250 videos on the JetsonHacks YouTube channel, so that would end up around 75 Terabytes of data.
But this data is not valuable in any meaningful sense, it is a by-product of rendering the video. That’s why people go through a multi-step process to archive information. When a project is done, they remove the caches and such, then archive their project on secondary storage.
We don’t actually save the caches from the video, of course. While people will tell you that “data storage is free”, ordering 75 Terabytes of disk storage from Amazon ends up in a good sized bill. Programs that backup data provide ways to exclude directories such as temp directories or directories holding cache information.
Types of Backup
Most of the time we think about backups in a few different ways.
- Full System
- You can think of this as a snapshot in time
- Desktop systems have dedicated programs to do this, for example Macintosh has Time Machine
- On a Jetson, this might be thought of as the base L4T system + the programs that you run on your machine, like machine learning and trained models
- Differential and Incremental backups
- Differential are files that have changed since the last full backup
- Incremental backups are files that have changed since the backup, be it full or incremental backup
- This can be automated, backup every X amount of time, minutes, hours, days, weeks, system startup
- Most expensive
- Data backup
- You may have data that you want to keep safe and accessible
- Separate from the system software
- Generally unique to your system, for example saved images, videos or other gathered information
- May be irreplaceable
- Developers backups (programming)
- Generally this includes the source code and associated build information, data and documentation
- Versioned, so you can keep track of changes that are made
- Especially important in a group programming environment
- Usually has a separate formal system for this, such as Git or Subversion
- There are easy ways to do this on a personal level, such as Github
- Some programming environments have built in support
In most professional environments, the physical aspect of storing backup information is referred to as “The 3-2-1 rule”
- Keep at least three copies of your data
- Original copy and at least two backups
- Keep the backed-up data on two different storage types
- The data is less likely to be corrupted when on two different types of storage
- Keep at least one copy of the data offsite
- A local disaster (like a fire!) could ruin your backups
- This is easier now because of cloud backup
Each approach has a different cost in terms of storage space, time and effort. Here’s the bullet points:
- Storage Space
- Money attached – Drives cost money!
- Hardware – Takes up physical space, and you need it wire it
- Different types: Local drives, network storage, cloud storage
- Organization – keeping track of data can be a challenge in and of itself
- You need to make at least 3 copies – Production (the data on the computer), Local (a copy of production), and Offsite (another copy of production). Offsite probably means cloud storage
- How much of our time is needed, and how much computer time?
- Our time – Initial Setup – backup programs/commands/tests
- Computer time – How long does it take to make a backup, or an incremental backup?
- Manual or automated? If it’s not automated, it may be skipped
- How much do you need to know to make a backup?
- What do you have to do to start a backup?
- If it’s too hard, you won’t make backups frequently
Why embedded systems are a little different than a desktop:
- More susceptible to hardware failures or experiments that go bad
- Usually depend on memory (eMMC or SD card) which can be relatively unreliable (SD cards especially)
- Jetson in particular has a different drive layout than other systems with several different partitions
Different Categories of Users
If you are a developer you always just assume that something will break catastrophically, and that you will need to regen a system. Most developers have enough experience working at a system level that they know if you make a mistake, the system can become unstable.
If you cannot regenerate a system from scratch, you do not have a system. You have a pain train that is on the tracks coming towards you. When will it arrive? One of the few guarantees in life. The pain train will arrive when you least expect it, when it is the most costly, and when it will hurt the most. That is right before the big demo, or a big project is due, or some other time when your system absolutely has to work.
As part of the testing regimen, developers typically will have different versions of an operating environment that they need to support. For example on the Jetson, they may have an environment for the JetPack 4.3 and another for the JetPack 4.4 version. That’s why professional developers don’t get excited about new releases, because it means more things that they have to keep track of and more work to bring everything thing forward.
Typically for a major release, the developers will gen up a new system and rebuild the system from scratch just to make everything works as expected.
With that said, developers will usually create a base system with their environment modifications (like their programming environment, data sets and so on), and then make a full system backup. The system is then backed up periodically (depending on the place, usually once every day or two). Some places will make local backups more frequently (let’s say every hour), and then ripple that to more permanent storage less frequently.
Typically a developer will have some plan for backing up their work so they won’t lose more than a half days worth of work or so if things go terribly wrong.
Remember this is for active developers, people who are making changes to their system day in and day out. Also, they may be changing the way that the overall system works.
To restore the system, it’s pretty simple. Find the last backup snapshot, restore the system and try to piece back everything together since that point in time.
For normal people who are not actively trying to destroy, I mean, improve their system like a developer, backups are usually thought of in a different manner. In a business that is collecting data, let’s say an accounting system, the accounting software will organize the data collection so that it is backed up as part of the process of collection. This is typical of most data base types of software applications where you will hear terms like audit trails and journaling. In most cases, the data is gathered over a network type of application with the actual data being stored on network attached storage. People have begun doing these types of application over the Internet, with most of the data being stored in the cloud.
Even if data is stored in the cloud, remember the 3-2-1 rule. The information is downloaded on a schedule so that the information can be stored locally.
Usually in this situation there is a full system backup that can restore all of the application software and configurations on the local computer. In a separate step, you then retrieve the data from a backup data store and then you’re ready to get back to work. In most places, there is an IT or system administrator person that handles this procedure.
Yeah, but what’s a happy balance?
That’s great and everything, but what do we do on something like a Jetson? At JetsonHacks we are developers. Most of our development and programming changes go into a version control system, Git. Typically we create a “system environment” which has a base L4T version along with the programming tools and data sets we need. Then we make a backup, so that if things go south then we have a stable base to work from.
We also make backups of the data sets we are working on regularly. Thus, restoring the system consists of restoring the base system environment that we backed up, and then adding the data sets and the Git repositories of the source code and scripts for our project. Remember, these data sets and the Git repositories are also from backups. We pull from three backup silos so to speak.
To be clear, whenever a new L4T is released, we build a new “system environment” from scratch. The “system environment” gets backed up. We then add in our data sets and Git source. There are some inevitable hiccups in this procedure, usually due to library version mismatches. However, it is reliable and it is rare that we have a “Alt-Shift” moment.
If you are not a developer, you will benefit from organizing your system before backups. Create a base “system environment” which includes all of the applications and libraries that you want to use. Then make a backup.
Certainly keep track of what your system environment contains, you will need to recreate it from scratch at some point.
Whenever you add another essential program that you know you want in your system environment, make another backup.
Now, if you keep the specialized data you use in specific directories, then you only need to back up those directories on a regular basis. Even if you have a catastrophic failure, you only need to restore a system environment and add your last data backup. Then you are up and running again.
Just to be clear …
Backups are a surprisingly deep subject. The above are just some suggestions on handling backing up your data. In fact there is an entire industry that has sprung up around saving computer data.
It is not possible to cover in a short article “what you should do” when making computer backups, as everyone has their own special situation. You will do best by reading some background material, and then deciding what best fits your situation.
The more organized your are, the easier the task becomes. If your data is spread out all over that place, programs and libraries added without much thought and so on, you steer yourself towards a full backup solution. That’s not bad, but it requires more resources in physical drive space and time.
On the other hand, if you only need data backups and make full system backups when you add your programs, you cut down a lot on the number of backups that you make and the resources required.
Backup when you place your rootfs on a USB drive
Hi. Good Post.
I was looking at the backupJetson/backup-rootfs.sh script and saw that the date was appended to the backup filename. I think this would make rsync delete parameter ineffective for future backups. I think that a cron job with rsync should suffice.
Also, I am not sure how many backups are necessary. If we keep 2 backups, the next time the backup script runs (the 3rd time), it might want to delete the oldest backup before creating the latest backup snapshot.
The scripts are meant for a one time backup of a system with the rootfs on a separate drive such as NVMe or USB. This is a pretty typical use case in embedded systems.
If you are using cron jobs, there is a different protocol for using rsync effectively. To be clear, if you already know about how to use cron jobs, you also are capable of using rsync “for real” in a scheduled manner. This article is more along the lines of, “Hey, backup your stuff. Don’t really care how, but here are some breadcrumbs to get started. Here’s a script for making a backup today. Also, Google rsync to figure out how to use it.”
One thing to think about is that embedded systems are usually treated differently than desktop systems. Desktops usually have a backup strategy to be handled by IT staffs at most companies. Most embedded systems are handled by the developers. It’s easy if you’re developing for the embedded system on a desktop machine, it gets taken care of automagically. If you are actually developing on the embedded system, then it’s a different game.
Part of the equation is resource management. If you are doing incremental backup of everything, the storage requirements are not too bad. If you are gathering large amounts of data, a better strategy may be to off board/backup the data rather than the entire system if it is remains relatively static. If you look in Back in Time it allows you to select how many backups you want.
Backups are a surprisingly deep subject, every place has a different approach. The major point tends to be how much time people are willing to spend to get from a know restoration point back to even. Typically small shops tend to settle on something less than half a day, in other words they’re willing to lose something like a mornings worth of work.
Large shops (imagine backing up a few hundred/thousand machines) tend to have much more involved procedures. Because it’s real money they may take a different approach. Some are willing to let developers struggle to get back to even (usually working off of a 1 week old backup), others are more generous and pretty much journal everything so that the developer is up and running quickly.
This usually means that there is a prepared image with the apps/development tools/setup from a restore point. Once that is restored, then the developer updates from when the restore point was taken. That typically means pulling from a code versioning system, (something like git or subversion), recompiling the code, loading the latest data set, and you’re on your way. Most developers do some type of automated commit to the versioning system, so that they don’t lose a whole lot if things go bad.
Like I said, deep subject. Very dependent on how you use your system. It’s also clear why there are multi-million dollar companies that have sprung up to help with this task.
For people not doing development, it basically comes down to data changes which usually use a different backup method. Collect the data, process the data, and then distribute the data in some type of form that gets backed up.
this is great. when would you cover restore procedures?
I have some of the jetson computers. I have to install my custom programs to all of the jetsons.
Is it correct to use this script to make the copy of the os so all the computers would have same programs and settings?
No. This is for backing up one machine. For multiple Jetsons, you should use the NVIDIA Jetson tools provided with the SDK Manager. You can ask for more help on the official NVIDIA Jetson forums, where a large group of developers and NVIDIA engineers share their experience: https://forums.developer.nvidia.com/c/agx-autonomous-machines/jetson-embedded-systems/70