Marco,
You can only capture an image from one camera at a time, unless you use a separate board for each camera. By cycling through the cameras, you can get just over 7 frames per second for each camera (at 30 fps total). For this the cameras need to be synchronized, which requires some wiring and adjustment of switches on the cameras.
Tracking the body as it moves can be a difficult task. Background subtraction can be difficult, but it can give you the silhouette of the person. I think the best method I have seen involves attaching a reference point to the body (target pattern or bright LED, etc.) and tracking the point. You could track it using pattern recognition, or for an LED you could use a binary image.
Bruce
Bruce Ammons
Ammons Engineering