Mask RCNN (Instance segmentation)
This is the unofficial implementation of the paper - "Mask R-CNN" by He et. al.
Instance segmentation methods provide a way to identify not only object classes but also the number of instances of each object class in the image. Mask-RCNN which segments object masks on each Region of Interest (RoI) in a pixel-to-pixel manner. In this project, Mask-RCNN is implemented from scratch on a subset of COCO dataset. We only segment images to three categories namely humans, animals and vehicles.
Region proposal networks (RPN)
RPNs are attention mechanisms that propose regions in an image that are most likely to contain an object. They estimate bounding box regions in an image.
We can clearly see that dense bounding box estimates by RPN are around the ground truth bounding box.
Box Head
The Box Head refines the proposals from RPN and predicts the objectness scores for the proposals generated by the RPN Head. It also categorizes object proposals into categories - humans, animals, vehicles.
Mask Head
The Mask Head is responsible for further refining box head bounding box predictions to pixel-level segmentation. Thus, the Mask Head takes the box head’s bounding box predictions as well as object class predictions to provide pixel-level segmentation.
The final Mask predictions clearly show that the pixel-wise segmentation performed by our MaskHead pipeline is stable and robust to multiple objects of different classes and sizes in the same image.
The code is provided here