Update 2019-11-11: Part 2 has been published here, describing the solution that works for me, which is to use the Open Neural Network eXchange (ONNX) format and the open source ONNX Runtime.
Background (or life sob story)
Several months ago I worked through the arduous task of compiling TensorFlow’s C++ interface and writing an application that would take a trained neural network and use it for inference. That was still in the time of TensorFlow 1.8. Now, TensorFlow 2.0 is around the corner and many things have changed. Googling how to save models in Python showed a few new methods, but then how does one load those models in C++? So, I pieced together a process using deprecated APIs and fixing some earlier mistakes, and moved on. Now I’m going back to figure out what the right solution is with the latest release of TensorFlow, version 1.13, and that will also work with TensorFlow 2.0 that is currently in beta.
This is part 1, which summarises my frustration so far, because of the API changes between TensorFlow 1.x and 2.0. I was going to include how to use the TensorFlow C API available at https://www.tensorflow.org/install/lang_c to load the saved neural network, but this post is already getting quite long and the C API looks like an overly complicated way to tackle the problem, plus I’m not sure what parts of the C API will be supported in TensorFlow 2.0. Part 2 will tackle loading the model. My new plan is to try and convert the neural network into the ONNX format and load it using the ONNX Runtime. This runtime has a C API with an example here.
The problem with using TensorFlow’s C++ API is that you either have to force your project into TensorFlow’s build system, or try building TensorFlow using Cmake. To any TensorFlow developers reading this, please expose the C++ API in a nice littel library that does not force me to build my project with bazel and does not leave dependencies scattered around like in the Cmake process (which isn’t officially supported anyway).
The old way to tackle the problem of saving a TensorFlow graph to a single *.pb file was to use the
tf.graph_util.convert_variables_to_constants function to convert all weights and biases tensors into constants, followed by using
tf.gfile.GFile to write the graph definition to file. An example of this is shown here. The problem is that
convert_variables_to_constants is now deprecated and will be removed in TensorFlow 2.0.
If you’re looking to serve your trained model from a web server, TensorFlow Serving and saving the model using
tf.saved_model is a good bet. TF Serving will support TensorFlow 2.0, so the only change will come into your code that saves the model because the API in
tf.saved_model is changing with TensorFlow 2.0.
If, like me, you’re looking for a C/C++ method and think that TF Serving is overkill, I couldn’t find an abolutely guaranteed route to success. However, the best seems to be to convert to ONNX format and use an ONNX runtime to use the model for inference. Part 2 of this series of posts will cover my attempts to create a tutorial on how to do this.
Many ways to save (skin?) a TensorFlow model (cat?)
There are several ways to save and load/serve TensorFlow models. For each one, I’ll cover saving and loading in Python followed by moaning, mostly because some part of the process has been deprecated or the method doesn’t fit my needs.
I’ve written code to use some of the methods, until I reached the point of realising that its all deprecated. You can find the code for this part, and later also part 2, on BitBucket. At least some of the code using
tf.saved_model will be re-used for part 2.
The simplest way of using this module is through the function
tf.saved_model.simple_save. Oops, that one’s also deprecated. The more complex way is to construct an instance of the
tf.saved_model.Builder class. It’s only after struggling through the code to use this class that I saw the note “This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.builder.SavedModelBuilder or tf.compat.v1.saved_model.Builder. Tensorflow 2.0 will introduce a new object-based method of creating SavedModels.” This means the
SavedModel protocol buffer and
tf.saved_model will continue to be used, but there’s no code to use these in TensorFlow v1.x as well as 2.0. Below is the code showing how to use
tf.saved_model.Builder, for what it’s worth.
One thing I haven’t done is implement a proper method to retrieve the inputs directly from the
SavedModel. I believe this is supposed to be possible, but I’ll leave that for part 2.
In TensorFlow 2.0, there are two useful functions
tf.saved_model.load for working with
SavedModel protocol buffers. The examples in the TensorFlow 2.0 documentation here show that
tf.keras models will be easy to save, while custom models will be a little bit more effort as you’ll have to wrap them in a class. On the bright side, the tool for converting TensorFlow models to ONNX supports, and even recommends, using
SavedModel. This tool, tf2onnx is located here. At present, tf2onnx only supports TensorFlow up to 1.13, not 2.0.
This module provides
tf.io.gfile.GFile mentioned above and the
tf.io.write_graph function. Both achieve the same result, they write a
GraphDef protocol buffer to a file. It is easier to use
write_graph, which also provides an
as_text option for controlling whether the
GraphDef is stored in plain text or a binary format. There is one problem, this approach only saves the neural network structure, not the weights. Bibhu Pala provides a tutorial showing how to use
tensorflow.python.tools.freeze_graph to save the weights and combine them with the network structure in a single .pb file. Unfortunately, digging into the code for
freeze_graph shows that the deprecated
convert_variables_to_constants makes an unwanted return! So, indirectly this approach is also deprecated.
There is an additional complication that I couldn’t get around. Loading the
GraphDef, followed by calling
tf.import_graph_def should, as I understand the documentation, create the variables. It doesn’t! Attempting to create a new
tf.train.Saver object in order to use its
restore method results in an error about no variables being present. It seems that, although the
GraphDef got imported somewhere, it isn’t usable.
This method has also been deprecated in Python, YAY. It also seems to be fragile as it depends on the naming of the restore operation nodes in the graph. You’ll see why in the next paragraph. Saving a
MetaGraphDef and the weights is as simple as constructing a
tf.train.Saver object and calling the
save method with the current session and the path to the checkpoint file. This creates a
*.meta file that contains the
*.data file that contains the variables and a
*.index file. Reloading this model in Python is almost as simple as saving it. First call
tf.train.import_meta_graph with the path to the
*.meta file above. This imports the
MetaGraphDef and leaves one with all the variables nicely defined. Now, creating a new
tf.train.Saver object works, so the final step is to call its
restore method with the same path used when calling
save. This loads all the variables and everything works again. There is one limitation though. Opening the meta file in Netron doesn’t show the weights, probably because Netron doesn’t know that this is a checkpoint. Skipping the call to
tf.train.import_meta_graph will result in the
tf.train.Saver constructor throwing an error about there not being any variables defined in the current session. Sounds like there should have been a Loader class, but lets not obsess over weird API’s. The code below shows how to do everything I’ve described above.
There is a demo showing how to use the TensorFlow C API here. The graph for the saved model contains additional nodes for saving and restoring the graph. There is a string tensor named
save/Const that you put the path to the checkpoint into. The output tensor is named
save\RestoreV2 in modern versions, and
save\restore_all in old verions, hence my comment on the fragility of this method. Now, restoring the saved model is done by executing the operations in the graph by running the associated operations just like for any other operations in TensorFlow.
I was going to explore this further because I was thinking, how bad could a web server serving a
SavedModel could be. Then I saw that the recommended way of installing it is to use Docker. So, on Windows, I’ll have to have a whole virtual machine just to run my TensorFlow model! While it looks pretty easy in the code, and the C++ side could make use of the C++ REST SDK to call the API, this is probably the least efficient solution when your model is going to reside on a single computer serving only a few requests. This is definitely a solution targetted at web servers. On the positive side,
SavedModel files can be served easily using
tensorflow_model_server as shown here. This is documented in v1.x and v2.0 guides, so this is the only method that is guaranteed to work in TensorFlow 2.0. It also appears that TensorFlow 2.0 support has landed in TensorFlow Serving and, according to this closed issue, TensorFlow 2.0
SavedModel should have been mostly working in TF Serving because the format hasn’t changed much.
SavedModel format and
tf.saved_model seem to be relatively reliable methods of saving your models. In part 2, I’ll try to put together code that is portable between TensorFlow 1.x and 2.0 (not sure how well that will work, but I’ll try). In terms of creating inference-side code in C or C++ that will work with both TensorFlow 1.x and 2.0, the best bet seems to be converting to ONNX. The tool for this is tf2onnx, which can work with checkpoints created using
tf.train.Saver as well as
SavedModel. Once converted to ONNX format, Microsofts ONNX Runtime can be used for loading it in C (as well as C#). There is a Python API for tf2onnx as well. One thing I’ve noticed in the discription of how tf2onnx works is that it relies on
freeze_graph.py, which currently uses the deprecated
convert_variables_to_constants function. I expect that this will get fixed, otherwise ONNX support for TensorFlow 2.0 will not exist, which may be a major hit for ONNX supporters so I’ll pin my hopes on them. In part 2 of this sequence of posts, I’ll try putting together a demo that covers saving in Python, converting using tf2onnx, and finally running the model using the ONNX Runtime.