Y.L.
I worked on a similar project in college. Let's work on your example, you have 2 objects in your image, and you can actually identify them whenever one of them moves, you can then identify the centroid of the object or any pixel value that changed in that particular object and you "remember" this pixel value, a.k.a save it, so in your next iteration for searching the object the one that is closest to that value is the same object you locate before. That's the simplest way I can think of right now.
Good luck!
Nestor Sanchez
IMAQ/Motion Support
National Instruments
Nestor