I want to location position of different color cubes in a given image. I have used CNN regression with VGG16 architecture whose accuracy is good enough. But while testing, predicted coordinates are having error from 0.3 to 20 cm. During training each sample consist of an image with 6 cubes of similar shape and different colours and label as 18 coordinates variables. Is there any other way to solve this problem? Sample images are
(x1, y1, z1, x2, y2, z2,....,x3, y3, z3)
(x1, y1, z1, x2, y2, z2,....,x3, y3, z3)