At the time of this writing, the TensorFlow Object Detection API is still under research and constantly evolving, so it's not strange to find missing pieces that could make the library much more robust for production applications.
If you’ve worked on the field before, you are probably familiar with mAP (mean average precision), a metric that measures the accuracy of object detectors. You can find a great introduction to mAP here, but in short, mAP represents the average of the maximum precisions at different recall values.
The TensorFlow Object Detection API provides several methods to evaluate a model, and all of them are centered around mAP. Unfortunately for those looking for a more conventional confusion matrix, TensorFlow doesn’t offer a solution at this time.
To fill that void, I put together a small script that generates a confusion matrix after running a dataset of images through a model capable of detecting multiple classes of objects in an image. The output matrix has the following format:
The horizontal rows represent the target values (what the model should have predicted — the ground-truth)
The vertical columns represent the predicted values (what the model actually predicted).
Each row and column correspond to each one of the classes supported by the model.
The final row and column correspond to the class “nothing” which is used to indicate when an object of a specific class was not detected, or an object that was detected wasn’t part of the ground-truth.
With this information, the script can easily compute the precision and recall for each one of the classes. It would be equally simple —but I left this to the reader— to compute accuracy or any other metrics that come out of the confusion matrix.
You need a couple of things to run the script:
The label map used by your model — This is the proto-buff file that you created in order to train your model.
A detection record file — This is the file generated by using the
/object_detection/inference/infer_detections.pyscript. This script runs a TFRecord file through your model and saves the results in a detection record file.
Here is an example of running the script:
python confusion_matrix.py --detections_record=testing_detections.record --label_map=label_map.pbtxt
The script will print the confusion matrix along with precision and recall information to the standard output.
In case you missed the link to the code before, here it is again.
How is the confusion matrix computed?
Here is a quick outline of the algorithm to compute the confusion matrix:
For each detection record, the algorithm extracts from the input file the ground-truth boxes and classes, along with the detected boxes, classes, and scores.
Only detections with a score greater or equal than 0.5 are considered. Anything that’s under this value is discarded.
For each ground-truth box, the algorithm generates the IoU (Intersection over Union) with every detected box. A match is found if both boxes have an IoU greater or equal than 0.5.
The list of matches is pruned to remove duplicates (ground-truth boxes that match with more than one detection box or vice versa). If there are duplicates, the best match (greater IoU) is always selected.
The confusion matrix is updated to reflect the resulting matches between ground-truth and detections.
Objects that are part of the ground-truth but weren’t detected are counted in the last column of the matrix (in the row corresponding to the ground-truth class). Objects that were detected but aren’t part of the confusion matrix are counted in the last row of the matrix (in the column corresponding to the detected class).
A good next step could be to integrate this script as part of the evaluation framework coded as part of the Object Detection API. I'll try to get around that at some point.