Facebook AI Research (FAIR) has open sourced Deep Learning CUDA extensions. The extensions are for Torch, a scientific computing framework with wide support for machine learning algorithms.
In the Facebook blog announcement, researchers claim a speedup of over 20x compared to the fastest publicly available code when used to train popular architectures such as a typical deep Convolution Net for object recognition.
For a deep dive, read the paper: Fast Convolutional Nets With fbfft: A GPU Performance Evaluation. The paper gives timings for the Facebook code (called fbfft) and compares them with an implementation with NVIDIA’s cuFFT library and NVIDIA’s cuDNN. fbfft be fast!
The source code itself is available via Github: fbcunn
A little bit geeky, but really good stuff and fun to read.