Mask RCNN (Instance segmentation)

Instance segmentation methods provide a way to identify not only object classes but also the number of instances of each object class in the image. Mask-RCNN which segments object masks on each Region of Interest (RoI) in a pixel-to-pixel manner. In this project, Mask-RCNN is implemented from scratch on a subset of COCO dataset. We only segment images to three categories namely humans, animals and vehicles.

Region proposal networks (RPN)

RPNs are attention mechanisms that propose regions in an image that are most likely to contain an object. They estimate bounding box regions in an image.

Ground truth bounding box (Red) and regions proposed by RPN (Blue)

We can clearly see that dense bounding box estimates by RPN are around the ground truth bounding box.

Box Head

The Box Head refines the proposals from RPN and predicts the objectness scores for the proposals generated by the RPN Head. It also categorizes object proposals into categories - humans, animals, vehicles.

Left - Ground truth bounding box, Middle - proposals from Box Head (before refining), Right - refined proposals from BoxHead. Categories - Humans (blue), Animals (Green), and vehicles (Red).

Mask Head

The Mask Head is responsible for further refining box head bounding box predictions to pixel-level segmentation. Thus, the Mask Head takes the box head’s bounding box predictions as well as object class predictions to provide pixel-level segmentation.

The final Mask predictions clearly show that the pixel-wise segmentation performed by our MaskHead pipeline is stable and robust to multiple objects of different classes and sizes in the same image.

The code is provided here