Building TensorFlow Update – Jetson TX1

kangalow

9 years ago

Here’s a quick update on building TensorFlow for the NVIDIA Jetson TX1. Late last month, we looked at installing TensorFlow on the Jetson TX1. Of course, we are talking about the installTensorFlowTX1 repository on the JetsonHacks Github account.

As is not atypical in developing open source software, a week goes by and it doesn’t build anymore. “Double double, toil and trouble”. The good news is that none of the code in the TensorFlow repository changed for the tag v0.11. It can be particularly annoying when the people controlling releases go back and changed tagged version. Doesn’t stop people from doing it (mostly because they don’t know better) …

Library Relocation

Fortunately the fix had been brought up in the TensorFlow Github issues list, the zlib library had been updated. The link in the TensorFlow code base points to the ‘latest version’ of zlib, instead of a particular version in a permanent location. This isn’t quite best practices either, but it is understandable how it happens.

Typically a project that has as many dependencies as TensorFlow does is built rather simply at the beginning by a handful of people. Most people will grab a link to the latest and greatest version of any dependencies. You’ll see things like people linking to the master branch of a Github repository, or the latest version of a compressed library or binary. Time passes, the project gets a lot more people working on it, and invariably the dependencies codes change. Open source advocates say this is really good, everything is always getting better. For people actually working on the projects, it means that things can break unexpectedly in strange ways.

Anyway, a patch was added to point to the permanent location of the zlib library used by version v0.11 of TensorFlow which is applied when the TensorFlow library is git cloned.

Incremental Compilation

Another issue that is addressed in the latest update is incremental building using Bazel. TensorFlow is a big project, there are lots of dependencies and such. Bazel is a build system for putting together such beasts. As is the case for most build programs for large systems, the actual build programs are nearly as complicated as the large systems themselves.

In the case of TensorFlow, there’s an issue in the CUDA configuration file tensorflow/third_party/gpus/cuda_configure.bzl where the rule:

cuda_configure = repository_rule(
implementation = _cuda_autoconf_impl,
local = True,
)

basically tells the build system that the an incremental build cannot be used. When it takes over two hours to build TensorFlow on the Jetson TX1 (the build can fail for a variety of trivial reasons) this is particularly annoying. In other words, let’s say you perform a build. Everything but three files compile, which means that you have to start again from scratch. By removing the ‘local = True,’ line, incremental build goes into effect, which means that in the example all you have to do is ‘buildTensorFlow.sh’ again, and the build will continue where it left off and compile the three missing files. This patch is applied to the TensorFlow code base when the repository is clone using ‘cloneTensorFlow.sh’.

Hopefully this helps.