Saving and loading TensorFlow neural networks Part 2: ONNX to the rescue

Welcome back to my attempts to save a trained TensorFlow model in Python and load it in C/C++. Part 1 documented how I kept running into that word deprecated in the TensorFlow library. The conclusion was that the SavedModel format was going to remain in TensorFlow 2.0, but all the functions in TensorFlow 1.x for creating a SavedModel, or saving the model in some other way, were deprecated. The most future-proof way of working was to serve the SavedModel using TensorFlow Serving. While the code for saving the model will be quite different in TensorFlow 2.0, at least the client-side code won’t have to change. The disadvantage is that one needs a virtual machine to use TF Serving on Windows, which is overkill for my use case. That’s why this post is going to explore using ONNX and the ONNX Runtime to run aSavedModel.

Changes between TF 1.x and 2.0

Before diving into the process for saving a neural network, its worth covering some differences between the two versions. A big one for me is that the session-based interface was deprecated. Now, the default is eager execution mode. The tf.layers module is also gone, so one either has to use tf.nn directly or, the much better route in most cases, tf.keras. tf.variable_scope is also gone, so any code relying on it to share layer weights between networks will have to change. There is the official migration guide as well as this Colab notebook by François Chollet to help in getting started with TensorFlow 2.0.

With regards to saving TensorFlow computation graphs, the good news is that SavedModel is the one format that is supposed to be compatible with both version 1.x and 2.0. This even means that a SavedModel generated by TF 2.0 code should be usable in TF 1.x code, as long as it doesn’t use features that aren’t available in that earlier version.

How to save your network

I’ll only cover this briefly as the TensorFlow documentation has many examples. Firstly, tf.keras.Model supports saving networks either using Keras’s format with an HDF5 file, or the SavedModel format, as shown here in the documentation. The same function, tf.keras.Model.save gets used in both cases, the difference seems to depend on whether the file name ends in .h5 or not. If it does, you’re generating a Keras file, if it doesn’t, you’re using TensorFlow’s SavedModel. There’s also tf.saved_model.save, which is useful in combination with the tf.function decorator. For a Keras model, both are equivalent. However tf.saved_model.save is useful in cases where the model isn’t a Keras model, e.g. if it uses the tf.nn module. Now for the obligatory piece of code to show saving a model:

import tensorflow as tf
def create_model():
X = tf.keras.Input(shape=(10,), name='input')
h = tf.keras.layers.Dense(10, kernel_initializer=tf.constant_initializer(1), bias_initializer=tf.constant_initializer(1))(X)
y = tf.keras.layers.Dense(10, kernel_initializer=tf.constant_initializer(1), bias_initializer=tf.constant_initializer(1), name='output')(h)
model = tf.keras.models.Model(inputs=[X], outputs=[y])
return model
def run_model(model: tf.keras.Model):
output = model.predict(np.array([[1,2,3,4,5,6,7,8,9,10]]))
print(output)
if __name__ == "__main__":
model = create_model()
run_model(model)
tf.saved_model.save(model, './data/save_model_v2')
view raw tf_2.0_savedmodel.py hosted with ❤ by GitHub
A simple demo of TensorFlow 2’s SavedModel.

The next step is to convert the saved model, stored in ./data/save_model_v2 into an ONNX file. I originally wanted to use the tf2onnx utility, but it doesn’t yet support TensorFlow 2.0 and the gs/tf20 branch isn’t seeing any progress, it doesn’t run on TF 2.0 either. Running tf2onnx in TensorFlow 1.13 fails to convert a TF 2.0 SavedModel to ONNX. It works fine for a TF 1.13 SavedModel, as expected, even on one generated from a tf.keras.Model. So, time for a new plan. Since the network was defined using tf.keras, it should be possible to save it as a Keras model in HDF5 and then use keras2onnx to convert it to ONNX. Fortunately, this does work. Install keras2onnx by running pip install keras2onnx in an environment with TensorFlow 1.13 or 1.14. Unfortunately, keras2onnx doesn’t yet support TensorFlow 2.0 (how terribly surprising). There’s no CLI for keras2onnx so I wrote the following little script to convert my Keras model generated in TensorFlow 2.0 to ONNX:

import tensorflow as tf
import keras2onnx as k2o
import onnx
if __name__ == "__main__":
model = tf.keras.models.load_model('./data/save_model_v2.h5')
onnx_model = k2o.convert_keras(model, model.name)
onnx.save_model(onnx_model, './data/save_model_v2.onnx')
view raw keras_to_onnx.py hosted with ❤ by GitHub
Simple script showing how to convert a Keras model to ONNX using keras2onnx.

There is one thing to modify in the code for saving the model, just replace tf.saved_model.save(model, './data/save_model_v2') with model.save('./data/save_model_v2.h5'). That will give the Keras file needed above.

Installing the ONNX Runtime

Now we need the ONNX Runtime in order to use these ONNX models. If you’re using Windows and have Visual Studio installed, there’s a NuGet package for you. If you’re running Linux and haven’t installed NuGet, there’s also binary releases available here. At the time or writing this post, the latest version with binaries was v1.0.0, which is the version used here. Download the appropriate archive and extract the files. This will give an include folder with the main header file onnxruntime_c_api.h. There are archives for CPU and GPU use. If you’re intending to use your CUDA-capable GPU, download the GPU package. You’ll also see in the code how to enable CUDA use. I’ve placed the contents of these archives in ~/opt, so remember to do the same or modify the CMakeLists.txt file. That’s it, you’re ready to start using the C API. There’s also C++, C# and Python APIs.

Let’s run some code

Grab the code for this demo:

git clone https://bitbucket.org/williamjshipman/tensorflow-save-and-serve-blog-post.git

The C++ code is located in the folder cpp. To build the code, run the following commands

mkdir build
cd build
cmake .. -DENABLE_CUDA=[ON|OFF] -DONNX_PATH=/path/to/onnxruntime
make

To run the code, do the following:

cd ..
./build/onnx_demo

This will output a whole lot of debugging messages. If everything works, you’ll see a line Verifying results: PASS. If you haven’t put the ONNX Runtime folder in ~/opt, use the CMake option ONNX_PATH to provide the path to the folder. This folder must contain the include and lib folders for the runtime. If you have CUDA and cuDNN installed, you can enable CUDA support. By default this is off.

Getting into the code

The neural network is represented in the ONNX Runtime by a session (OrtSession). Session options include the optimization level as well as registering additional execution providers, like CUDA.

The next step is to create the input tensor/array of data. I’ve used CreateTensorWithDataAsOrtValue to create the tensor and fill it with data from an array. You don’t have to create the output tensors, those get created automatically when running the neural network. In order to create the tensor, one has to first create an OrtMemoryInfo object. As far as I can tell, this specifies what method will be used to allocate the data. For example, if you’re using CUDA, this allows one to allocate the tensor using CUDA pinned memory. It seems that even if you use CreateCpuMemoryInfo, which doesn’t do anything special for CUDA AFAIK, it works fine with the CUDA provider. You can probably optimize this part of the code further, but since I’m having trouble using my laptop’s Nvidia GPU, I’m not going to worry about it.

Now comes the Run function call. What’s important to note here is that you have to provide the names of the inputs and outputs in the neural network. I’ve included code that iterates through all of the inputs and outputs, gets their names and stores them for later use. Once Run has finished, use GetTensorMutableData to get the contents of the output tensor. It takes care of allocating the array for the data, so don’t delete this array later.

And there you go, you now have almost all the code needed to go from SavedModel and Keras models to ONNX, load and use those neural networks. The only shortcoming is that tf2onnx needs to be upgraded to support converting a TensorFlow 2.0 SavedModel to .onnx.

3 thoughts on “Saving and loading TensorFlow neural networks Part 2: ONNX to the rescue

  1. Hi William,

    This short blog post was very useful for me. This is the only conversion from TF2 to ONNX that works for me and I only found it because of you. Thank you very much!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s