Articles » Deep Learning » Caffe » Caffe Deep Learning Framework – 64-bit NVIDIA Jetson TX1

Caffe Deep Learning Framework – 64-bit NVIDIA Jetson TX1

Back in February, we installed Caffe on the TX1. At the time, the TX1 was running a 32-bit version of L4T 23.1. With the advent of the 64-bit L4T 24.2, this seems like a good time to do a performance comparison of the two. The TX1 can now do an image recognition in about 8 ms! For the install and test, Looky Here:

Background

As you recall, Caffe is a deep learning framework developed with cleanliness, readability, and speed in mind. It was created by Yangqing Jia during his PhD at UC Berkeley, and is in active development by the Berkeley Vision and Learning Center (BVLC) and by community contributors.

The L4T 23.1 Operating System release was a 64-bit kernel supporting a 32-bit user space. For the L4T 24.2 release, both the kernel and the user space are 64-bit.

Caffe Installation

A script is available in the JetsonHack Github repository which will install the dependencies for Caffe, downloads the source files, configures the build system, compiles Caffe, and then runs a suite of tests. Passing the tests indicates that Caffe is installed correctly.

This installation demonstration is for a NVIDIA Jetson TX1 running L4T 24.2, an Ubuntu 16.04 variant. The installation of L4T 24.2 was done using JetPack 2.3, and includes installation of OpenCV4Tegra, CUDA 8.0, and cuDNN 5.1.

Before starting the installation, you may want to set the CPU and GPU clocks to maximum by running the script:

$ sudo ./jetson_clocks.sh

The script is in the home directory, and is also included in the installCaffeJTX1 repository for convenience.

In order to install Caffe:

$ git clone https://github.com/jetsonhacks/installCaffeJTX1.git
$ cd installCaffeJTX1
$ ./installCaffe.sh

Installation should not require intervention, in the video installation of dependencies and compilation took about 10 minutes. Running the unit tests takes about 45 minutes. While not strictly necessary, running the unit tests makes sure that the installation is correct.

Test Results

At the end of the video, there are a couple of timed tests which can be compared with the Jetson TK1, and the previous installation:

Jetson TK1 vs. Jetson TX1 Caffe GPU Example Comparison 10 iterations, times in milliseconds
Machine	Average FWD	Average BACK	Average FWD-BACK
Jetson TK1 (32-bit OS)	234	243	478
Jetson TX1 (32-bit OS)	179	144	324
Jetson TX1 *with cuDNN support* (32-bit OS)	103	117	224
Jetson TX1 (64-bit OS)	110	122	233
Jetson TX1 *with cuDNN support* (64-bit)	80	119	200

There is definitely a performance improvement between the 32-bit and 64-bit releases. There are a couple of factors for the performance improvement. One is the change from a 32-bit to 64-bit operating system. Another factor is the improvement of the deep learning libraries, CUDA and cuDNN, between the releases. Considering that the tests are running on the exact same hardware, the performance boost is impressive. Using cuDNN provides a huge gain in the forward pass tests.

The tests are running 50 iterations of the recognition pipeline, and each one is analyzing 10 different crops of the input image, so look at the ‘Average Forward pass’ time and divide by 10 to get the timing per recognition result. For the 64-bit version, that means that an image recognition takes about 8 ms.

NVCaffe

It is worth mentioning that NVCaffe is a special branch of Caffe used on the TX1 which includes support for FP16. The above tests use FP32. In many cases, FP32 and FP16 give very similar results; FP16 is faster. For example, in the above tests, the Average Forward Pass test finishes in about 60ms, a result of 6 ms per image recognition!

Conclusion

Deep learning is in its infancy and as people explore its potential, the Jetson TX1 seems well positioned to take the lessons learned and deploy them in the embedded computing ecosystem. There are several different deep learning platforms being developed, the improvement in Caffe on the Jetson Dev Kits over the last couple of years is quite impressive.

Notes

The installation in this video was done directly after flashing L4T 24.2 on to the Jetson TX1 with CUDA 8.0, cuDNN r5.1 and OpenCV4Tegra. Git was then installed:

$ sudo apt-get install git

The latest Caffe commit used in the video is: 80f44100e19fd371ff55beb3ec2ad5919fb6ac43

14 Responses

Max says:
February 9, 2017 at 2:57 pm
Thanks for the great manual!
Build worked flawlessly, however tests get stuck.
Sometimes after just a few tests, sometimes after a few tens, but always get stuck. No errors.
The board is brand new, full clean L4T using latest JetPack. jetson_clocks.sh executed.
Any suggestions on how to debug the issue?
Reply
1. kangalow says:
  February 9, 2017 at 4:20 pm
  What version of L4T are you using?
  Also, do you have any idea how much memory is being used? You can try to open up the System Monitor while it is running and make sure that memory pressure isn’t causing the issue.
  Another thing to try is to turn off the jetson_clocks.sh script, some people have reported issues there.
  Reply
  1. Max says:
    February 9, 2017 at 9:15 pm
    ubuntu@tegra-ubuntu:~/caffe$ head -n 1 /etc/nv_tegra_release
    # R24 (release), REVISION: 2.1, GCID: 8028265, BOARD: t210ref, EABI: aarch64, DATE: Thu Nov 10 03:51:59 UTC 2016
    When it gets stuck, I seem to have plenty of memory:
    ubuntu@tegra-ubuntu:~$ free
    total used free shared buff/cache available
    Mem: 4090604 2197844 521628 42648 1371132 2154284
    Swap: 0 0 0
    Tried disabling jetson_clocks.sh and running the test without -j4.
    Same result.
    One more data point: it freezes on different tests, but always on some variant of gradient testing…
    [ RUN ] CuDNNConvolutionLayerTest/0.TestGradientGroupCuDNN
    or
    [ RUN ] ConvolutionLayerTest/2.Test1x1Gradient
    or
    [ RUN ] InnerProductLayerTest/2.TestGradientTranspose
    or
    [ RUN ] RNNLayerTest/3.TestGradientNonZeroContBufferSize2
    etc.
    Any way to see what it gets stuck on exactly?
    Reply
    1. kangalow says:
      February 10, 2017 at 8:20 pm
      Unfortunately I don’t see any obvious error, or have any idea how to go about fixing your issue. I haven’t encountered anything along those lines.
      You might try the NVCaffe version:
      https://github.com/dusty-nv/jetson-inference/blob/master/docs/building-nvcaffe.md
      and see if that works.
      Reply
      1. Max says:
        February 12, 2017 at 5:59 pm
        nvcaffe got quite a few issues:
        1) I had to fix include path
        2) Had to create links to some of the libraries, so it finds them at linking
        3) Some of tests fail
        4) One of the tests runs out of memory and aborts “make runtest”…
        So I am back to your version and trying to debug it.
        According to GDB, the hanging tests are spinning around “usleep” inside
        cuMemcpy, probably waiting for the transfer to finish. Forever.
        Are there any non-caffe, Cuda tests that I can run on TX1?
        May be it’s just the specific board/CPU are faulty and I should RMA it..
        Reply
      2. kangalow says:
        February 12, 2017 at 7:31 pm
        Hi Max,
        I don’t have enough experience with your issue to be of much help. It’s worth asking the same question on the Jetson TX1 forum: https://devtalk.nvidia.com/default/board/164/jetson-tx1/
        The NVCaffe implementation should work out of the box, it’s probably worth putting in some issues on Github to say you were having issues.
        Reply
Yi Wang says:
April 15, 2017 at 2:10 am
run sudo ./jetson_clocks.sh and follows: Can’t access Fan!
Reply
1. kangalow says:
  April 15, 2017 at 8:07 am
  Which version of L4T are you using?
  Reply
2. Tong Zhang says:
  March 28, 2019 at 12:13 am
  Hello,I had meet the same problem,had you soloved it?
  Reply
Yi Wang says:
April 15, 2017 at 5:30 pm
the latest version ,R24.3
Reply
1. kangalow says:
  April 15, 2017 at 5:35 pm
  This article and the code are for L4T 24.2. You should try the jetsonClocks.sh file located in the home directory that is provided in a JetPack install.
  Reply
  1. Yi Wang says:
    April 16, 2017 at 9:01 am
    OK,I will try it later, Thx!
    Reply
Pingback: Tutorial: Integrating Deep Learning Applications With FlytOS on Nvidia TX1 (Part 1/2)
Tony says:
February 5, 2018 at 10:34 am
Has anyone tried to install caffe on TX1 with latest JetPack 3.1 and LT28.2?
I am permanently getting errors for missing cblas.h and I am not able to install libopenblas-dev neither via apt-get nor aptitude are able to install it forwhatever reason.
I am not very keen on doing another full reset of the TX1, and probably to some older JetPack version.
Any advice is welcome.
Reply

Disclaimer

Some links here are affiliate links. If you purchase through these links I will receive a small commission at no additional cost to you. As an Amazon Associate, I earn from qualifying purchases.

Books, Ideas & Other Curiosities

JetsonHacks

Caffe Deep Learning Framework – 64-bit NVIDIA Jetson TX1

Background

Caffe Installation

Test Results

NVCaffe

Conclusion

Notes

Related

14 Responses

Leave a Reply Cancel reply

Disclaimer

Recent Posts

Meta

Copyright @ JetsonHacks 2014-2024