Efficient Image Classification at Edge

A manufacturer of card shuffling machines for casinos asked us to develop a fast algorithm capable of classifying poker cards with inference time below 150ms per card.

First attempt: Fine-tuning

Image classification on a Raspberry Pi can be quite simple, since it is rather powerful, so I hoped that a pre-trained, general-purpose classifier could do, with a bit of fine-tuning.

A quick Google search takes us to Kaggle, which, luckily, has a fine-tuned model based on EfficientNetB0 and a dataset (yay!).

Borrowing the model and the dataset, we ran again the fine-tuning loop using also an enriched dataset provided by the client.

Second-attempt: Model from scratch

Now, sadly, this model was not only terribly slow for our purposes, but when passed from Tensorflow to TensorflowLite, the accuracy suffers, despite several attempts at optimizing the pruning and quantization necessary to reduce the model size. Part of the problem is, I believe, that the fine-tuned model relied on color features, which were irrelevant in our context (black and white images). Therefore, a simpler, classical architecture did the trick better:

resize_and_rescale = tf.keras.Sequential([
tf.keras.layers.experimental.preprocessing.Resizing(img_size, img_size),
tf.keras.layers.experimental.preprocessing.Rescaling(1.0/255)
])

model = tf.keras.Sequential([
    resize_and_rescale,
    tf.keras.layers.Conv2D(filters=32, kernel_size=7, input_shape=[img_size, img_size, n_channels], padding="same",
                           activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Conv2D(filters=32, kernel_size=7, padding="same", activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Conv2D(filters=64, kernel_size=5, padding="same", activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Conv2D(filters=64, kernel_size=5, padding="same", activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding="same", activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(units=128, activation="relu", kernel_initializer="he_normal"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(units=53, activation="softmax")
])

To improve inference time, we tried different values for img_size. Unsurprisingly, smaller images gave faster but slightly less accurate results, until a compromise was found.

Deploying the trained model

The Raspberry Pi supports the Tensorflow Lite runtime in Python. However, we wanted to have as much flexibility to change it later for a less powerful microcontroller. The model is, as usual, the easy part. The real pain was to make it work in C++ without having to install the whole internet. The official Tensorflow Lite documentation requires installing bazel, which is a ridiculous thing to do in a microprocessor. Even on the Raspberry Pi, installing bazel can be painful.

Luckily, a nice blog post came to help: Install TensorFlow 2 Lite on Raspberry 64 OS – Q-engineering (qengineering.eu)

After fighting a little bit with the C++ API, we were able to finally deploy the model, and test it.

Performance of the different models is shown here. Tested on a Raspberry Pi 4b with 4GB of RAM.

Model Name	Inference time (C++)	Inference time (Python)	Accuracy (tflite)	Accuracy (tf)
EfficientNetB0 + Fine Tuning	170ms	220ms	0.87	0.96
Vanilla NN (112X112)	115ms	140ms	0.94	0.95
Vanilla NN (64X64)	40ms	65ms	0.91	0.91