Back in October 2014, Google’s Pete Warden wrote an interesting article: How to run the Caffe deep learning vision library on Nvidia’s Jetson mobile GPU board. At the time, I thought, “What fun!”. However, I noticed in the article that at the time there were issues with running Caffe on CUDA 6.5, which was just being introduced in LT4 21.1.
After the holiday break, I realized a long enough period of time had passed that most of the issues would probably be worked out since we are now on LT4 21.2 and that I would be able to run Caffee in all its CUDA 6.5 goodness. In fact, I could, and with even better results than the original! Looky here:
Install Caffe
Here’s the install script on Github: installCaffe.sh. You may want to season it to taste, as it does run the examples towards the end of the script. After you download the script, set the Permissions of the file to ‘Allow executing file as program’ in the file ‘Properties’ dialog before executing.
NOTE: This installation was done on a clean system, which was installed using JetPack. JetPack did a new flash of LT4 21.2, CUDA 6.5 and OpenCV.
NOTE (6-15-2015): Aaron Schumacher found an issue with some of the later versions of Caffe. From his article: The NVIDIA Jetson TK1 with Caffe on MNIST
Unfortunately master has a really large value for LMDB_MAP_SIZE in src/caffe/util/db.cpp, which confuses our little 32-bit ARM processor on the Jetson, eventually leading to Caffe tests failing with errors like MDB_MAP_FULL: Environment mapsize limit reached. Caffe GitHub issue #1861 has some discussion about this and maybe it will be fixed eventually, but for the moment if you manually adjust the value from 1099511627776 to 536870912, you’ll be able to run all the Caffe tests successfully.
NOTE (7-10-2015): Corey Thompson also adds:
To get the LMDB portion of tests to work, make sure to also update examples/mnist/convert_mnist_data.cpp as well:
examples/mnist/convert_mnist_data.cpp:89:56: warning: large integer implicitly truncated to unsigned type [-Woverflow]
CHECK_EQ(mdb_env_set_mapsize(mdb_env, 1099511627776), MDB_SUCCESS) // 1TB
^
adjust the value from 1099511627776 to 536870912.
Just as a reminder, Caffe is a deep learning framework developed with cleanliness, readability, and speed in mind. It was created by Yangqing Jia during his PhD at UC Berkeley, and is in active development by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
So why is that an interesting topic? The quick answer is it’s currently all the rage in the “in the know” community. You’ll see it talked about in various terms, but here’s some high level background from Wikipedia: Deep Learning.
Caffe comes with a pre-built ‘Alexnet’ model that recognizes 1,000 different kinds of objects.
The reason that this is an interesting topic on the Jetson is the speed at which images can be recognized after a model has been trained, taking advantage of the Jetson CUDA GPU. In the example, it takes around 24 milliseconds per recognition result. And what does that mean? As an example, it means that when you’re driving your car, on board cameras can recognize traffic signs better and faster. Deep learning also means better speech recognition is possible. Better alignment of audio and video with transcripts. All sorts of things, too numerous to list here.
Note: In the video, you see the results posted as ~235 ms for the ‘Average Forward Pass’. This is for an iteration of 10 times, so the timing for one image recognition is around ~24 ms.
I’ll note here that Deep Learning also makes it possible to find cute kitten and puppy pictures faster than ever before, and pray that the servers hosting this blog don’t get crushed or melted from all the web traffic. (Sorry, that just seems like a gratuitous mention to increase traffic).
I did play around with some of the parameters of the Jetson board when running the example. In Pete Warden’s test, he was able to get 34ms per recognition using CUDA 6.0. Not surprisingly, CUDA 6.5 is mo’ better and faster (the Caffe code is probably better now too), and I was able to get around 27ms per recognition. But I also used another trick, which is to clock the CPUs into performance mode which lowered the recognition time to 24ms. The maxCPU script sets the appropriate flags. Overall, that’s quite a difference in speed. Note that I haven’t used cuDNN or reclocked the GPUs just yet. This is all with a power budget of around 10 watts!
Way cool.
7 Responses
Hi,
While installing caffe using the installcaffe.sh script, error occurred. The message is fatal error: cublas_v2.h: No such file or directory
#include
^
Pls help
Thanks
Hi,
Which version of L4T is installed on your Jetson? Which version of CUDA do you have installed? cutlas_v2.h is part of CUDA, is that on your machine?
I just ran through this install. I also needed to install libboost-filesystem-dev package to compile.
Hello, thanks for your great install! Caffe runs smoothly on my jetson now. However, I am trying to create an lmdb database larger than the (536870912 b = 536.9 mb) value used, but this is not allowing me when changing the mapsize setting in lmdb.. Do you know of any workaround possible?
Thank you
Hi jerpint,
I don’t know of any at the moment, thanks for reading!
When running the install script I get: There is no branch or tag named dev, since getting following error
error: pathspec ‘dev’ did not match any file(s) known to git.
Someone else commented on same issue but I don’t understand if I need to fix anything or not. The script ran, installed, and the test section of the script ran and passed all the tests that it ran. I am wondering if I am missing code as a result of above issue. Will I have problems later as a result?