Basically, it is running "properly". The effect you're seeing probably has to do with the thread scheduling algorithm of NT, which is different from Win95/98/me, and in fact differs from version to version of NT. NT uses a variation of multi-level feedback queues to schedule threads, with a random background priority boost. What the panel allows you to do is alter the ***relative*** thread priority of each thread within the demo process, to one of seven levels. The process itself has a base priority ( called its priority class), and the relative thread priority is added to the process base priority to come up with a final thread priority of one of 32 levels which is then used for scheduling. So the biggest difference you can induce in the demo thread priorities is 7 out of the 32 different possible priorities. Note that the threads you create are scheduled along with every other thread in existance on the system - and there may be hundreds in existance at a time. Depending upon the process priority for the demo, and the number of other threads on the system, the relative differences between the demo threads may be getting "washed out". Bear in mind also that whenever a thread, regardless of its priority, performs I/O, it's likely to get suspended, and another thread of equal or lower priority will then run. And the demo threads are doing a lot of I/O. To top it all off, NT randomly boosts the priority of background threads in an attempt to avoid priority inversion. The cumulative effect of all of this is that the differences between thread priorities seems small. If the two systems you're considering have large differences in processor speed, this too could affect the thread scheduling - the time spent executing code compared to the time suspended for I/O is reduced on the faster machine, which again may minimize the observable differences in the thread's performance.
There are several good books on this topic, and the Win32 SDK shipped with Measurement Studio has a good section on processes and threads.