Building a standalone C++ Tensorflow program on Windows

June 21, 2017

In the last post we built a static C++ Tensorflow library on Windows. Here we’ll write a small Tensorflow program in Visual Studio independent from the Tensorflow repository and link to the Tensorflow library. The tutorials I have been able to find about writing a new Tensorflow C++ program all seem to require that the new C++ project live within the Tensorflow repository itself. But this is not practical for many projects, nor does it turn out to be necessary. If you haven’t built a static Tensorflow library, do that first.

This program will simply read some data that has been hard-coded into memory, and then feed it into a graph that just multiplies it by another hard-coded matrix.

A simple C++ Tensorflow program

Create a new solution with the following code:

// matmul.cpp

#include <vector>
#include <eigen/Dense>

#include "matmul.h"
#include "tensorflow/core/public/session.h"
#include "tensorflow/cc/ops/standard_ops.h"

using namespace tensorflow;

// Build a computation graph that takes a tensor of shape [?, 2] and
// multiplies it by a hard-coded matrix.
GraphDef CreateGraphDef()
{
  Scope root = Scope::NewRootScope();

  auto X = ops::Placeholder(root.WithOpName("x"), DT_FLOAT, 
                            ops::Placeholder::Shape({ -1, 2 }));
  auto A = ops::Const(root, { { 3.f, 2.f },{ -1.f, 0.f } });

  auto Y = ops::MatMul(root.WithOpName("y"), A, X, 
                       ops::MatMul::TransposeB(true));

  GraphDef def;
  TF_CHECK_OK(root.ToGraphDef(&def));

  return def;
}

int main()
{
  GraphDef graph_def = CreateGraphDef();

  // Start up the session
  SessionOptions options;
  std::unique_ptr<Session> session(NewSession(options));
  TF_CHECK_OK(session->Create(graph_def));

  // Define some data.  This needs to be converted to an Eigen Tensor to be
  // fed into the placeholder.  Note that this will be broken up into two
  // separate vectors of length 2: [1, 2] and [3, 4], which will separately
  // be multiplied by the matrix.
  std::vector<float> data = { 1, 2, 3, 4 };
  auto mapped_X_ = Eigen::TensorMap<Eigen::Tensor<float, 2, Eigen::RowMajor>>
                     (&data[0], 2, 2);
  auto eigen_X_ = Eigen::Tensor<float, 2, Eigen::RowMajor>(mapped_X_);

  Tensor X_(DT_FLOAT, TensorShape({ 2, 2 }));
  X_.tensor<float, 2>() = eigen_X_;

  std::vector<Tensor> outputs;
  TF_CHECK_OK(session->Run({ { "x", X_ } }, { "y" }, {}, &outputs));

  // Get the result and print it out
  Tensor Y_ = outputs[0];
  std::cout << Y_.tensor<float, 2>() << std::endl;
  
  session->Close();
}

As a digression, I would write something a little fancier, but the C++ API is fairly limited relative to the Python API, and seems to be designed with inference in mind more than training. It is fairly straightforward to load a computation graph that has already been built in Python, but building a new graph from scratch in C++ is harder. In particular, in Python it is easy add training operations to the graph — the minimize method of the Optimizer class will automatically add gradient Ops that will use backprop to calculate the gradient for every Op in the graph. The C++ API does not yet have this functionality. In theory one could go through the graph and add these gradients oneself, but at that point it would probably be best just to write an Optimizer class in C++. (This is currently an open issue.) This will probably change as the C++ API improves, but it’s (justifiably) probably not a high priority for the Tensorflow team. For the time being it is probably best to build the graphs in Python and then load them in C++.

Next, in the header file, put the following:

// matmul.h

#pragma once

#define COMPILER_MSVC
#define NOMINMAX

If you omit the COMPILER_MSVC definition, you will run into an error saying “You must define TF_LIB_GTL_ALIGNED_CHAR_ARRAY for your compiler.” If you omit the NOMINMAX definition, you will run into a number of errors saying “’(‘: illegal token on right side of ‘::’”. (The reason for this is that <Windows.h> gets included somewhere, and Windows has macros that redefine min and max. These macros are disabled with NOMINMAX.)

Setting the project properties

The main trick to linking to the C++ Tensorflow library is getting all the project properties right. The following settings assume that you downloaded the Tensorflow repository to C:\Users\%USERNAME%\bin\tensorflow. If you put it somewhere else substitute it wherever you see that path. And naturally, substitute your own username for %USERNAME%.

Include Directories

First, the compiler needs to find all the appropriate Tensorflow header files. In your Additional Include Directories, add:

C:\Users\%USERNAME%\bin\tensorflow
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\external\eigen_archive
C:\Users\%USERNAME%\bin\tensorflow\third_party\eigen3
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\protobuf\src\protobuf\src

It is important to put the eigen_archive directory above the eigen3 directory — otherwise you’ll get the C1014 “too many include files” error.

Linker Settings

Next, in your Additional Dependencies setting, add the following:

zlib\install\lib\zlibstatic.lib
gif\install\lib\giflib.lib
png\install\lib\libpng12_static.lib
jpeg\install\lib\libjpeg.lib
lmdb\install\lib\lmdb.lib
jsoncpp\src\jsoncpp\src\lib_json\$(Configuration)\jsoncpp.lib
farmhash\install\lib\farmhash.lib
fft2d\\src\lib\fft2d.lib
highwayhash\install\lib\highwayhash.lib
libprotobuf.lib
tf_protos_cc.lib
tf_cc.lib
tf_cc_ops.lib
tf_cc_framework.lib
tf_core_cpu.lib
tf_core_direct_session.lib
tf_core_framework.lib
tf_core_kernels.lib
tf_core_lib.lib
tf_core_ops.lib

Without any of these you will get “unresolved external symbol” errors. Of course, the compiler has to be able to find all these libraries, so add the following to your Additional Library Directories:

C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\protobuf\src\protobuf\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_cc.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_cc_ops.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_cc_framework.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_core_cpu.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_core_direct_session.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_core_framework.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_core_kernels.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_core_lib.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\tf_core_ops.dir\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build\Release
C:\Users\%USERNAME%\bin\tensorflow\tensorflow\contrib\cmake\build

Additional Command Line Options

At this point your code will successfully compile, but if you try to run it, you will get the following error when you try to create the scope: “Non-OK-status: status status: Not found: Op type not registered ‘NoOp’ in binary running on MACBOOK. Make sure the Op and Kernel are registered in the binary running in this process.”

The issue here (as I understand it) is that Visual Studio will by default only link to the objects that it thinks it needs in the libraries that it has been given. Tensorflow ends up using more of these objects internally, however, so all objects in the library need to be explicitly linked with the /WHOLEARCHIVE option. Unfortunately, you cannot simply use the /WHOLEARCHIVE option on its own because you will get linker error LNK1000: “Internal error during CImplib::EmitThunk”, so you have to explicitly call /WHOLEARCHIVE only on the tensorflow libraries that you want.

To do this, add the following to your command line options:

/machine:x64 
/ignore:4049 /ignore:4197 /ignore:4217 /ignore:4221
/WHOLEARCHIVE:tf_cc.lib 
/WHOLEARCHIVE:tf_cc_framework.lib
/WHOLEARCHIVE:tf_cc_ops.lib 
/WHOLEARCHIVE:tf_core_cpu.lib
/WHOLEARCHIVE:tf_core_direct_session.lib 
/WHOLEARCHIVE:tf_core_framework.lib
/WHOLEARCHIVE:tf_core_kernels.lib 
/WHOLEARCHIVE:tf_core_lib.lib
/WHOLEARCHIVE:tf_core_ops.lib   
/WHOLEARCHIVE:tf_stream_executor.lib
/WHOLEARCHIVE:libjpeg.lib   

The /ignore and /machine options are not strictly necessary, but just a good idea.

Running the program

At this point everything should compile correctly, and you will have an executable called matmul.exe that will use Tensorflow to perform a matrix multiplication. On my system this program compiles to about 35 MB. The output should look like:

 7 17
-1 -3