Thanks for the help dug9000, your explanation helped to make it clear.
Just to clarify some points from our implementation, we use a pool of semaphores to handle different machines, because we have several available to be used, but each one can only be used by one thread at a time. We haven't figured out a way to use just one semaphore for all of them, so we use one semaphore for each.
I believe that fixing the extra releases, or using a Lock would be the best scenario now.
Thanks again!