06-23-2008 12:12 PM
06-25-2008 08:46 PM
06-25-2008 10:38 PM
Hi Ryan,
I spent 28 hours straight Monday/Tuesday to troubleshoot and rebuild the affected server on a Win2k8 VM (hosted by Hyper-V RC1) (was previously on Win2k3 VM hosted by MS Virtual Server 2005). I also updated server and clients to DSC 8.5. Below is what I discovered in the process:
Clients: Clients run ~10 different screens about half of which have at least one Hypertrend (HT) display with a single trace. Some have multiple HT's with single trace and others have multi-trace or multi-trace-multi-HT displays.
Normally, client CPU usage is minimal with nicitdl5.exe and lookout.exe alternating for small chunks of CPU time. When the problem began to get worse, the lookout.exe would take more and more CPU. When first noticed, only the more complex (data intensive) HT screens would noticeably hog client CPU and exhibit screen sluggishness for redraws. By the time I rebuilt the server, the clients had deteriorated to the point that the screen would be totally unresponsive once ANY screen containing a Hypertrend trace was popped up! It was so bad that I had to remove view permissions for all screens containing Hypertrends, so that other displays could still be used without crashing the Lookout display function totally.
Server: Normally, same low CPU utilization as clients. As problem worsened, nicitdl5.exe on server began taking more and more CPU time.....all the time, not just after a restart. So really, I knew it was a database problem, but I had been planning a hardware/OS upgrade anyway, so I did the deed.
Nicitdl5.exe behavior during and after upgrade: Before the upgrade, I made several attempts to backup the 365-day database (~5GB, including ~700MB SQL files). They all failed using citadel control and NiMAX, usually starting out fine then jumping to 100% completion erroneously, or just stopping with no error (nicitdl5.exe just disappeared from task manager in the middle of the backups!). I decided to just copy the failing server database manually (after shutdown of all NI database and SQL services) for further study after the hardware/OS upgrade. Note: The citadel control and NiMax crash problem with silent crashing of nicitdl5.exe while backing up (corrupt ?) databases exists even in the newest versions on a clean install.
Using NiMax, on the new hardware/OS, I tried backing up chunks of the database. I discovered that there was corruption between June 10 and june 19 that I could not backup. Also, I recorded the size of each 2-week chunk of backup prior to June 10 and the sizes were growing bigger and bigger as I approached the corrupt area. For example, a month of backup was normally (Dec07 through May08) between 250 and 300 MBytes. As I tested a few days here and there closer to the corrupt area, I found the size of a day was 1/4 the normal month size. I ran out of time (consciousness) isolating the bad date or dates, and NiMax had to be closed forcibly each time the corrupt and/or data runaway area was discovered because the timeouts were so long and sometimes no error would appear (if silent nicitdl5.exe crash occurred).
Nothing changed in any of the server or client process files during this period! Somehow, the server database was logging data out of control and either trying to feed it to the clients, or bogging the clients down because nicitdl5.exe was bogged down.
I would like to attempt a repair of the affected database, mostly just to find out what happened and how best to recover if it happens again. (I would like to use your repair tool if possible) I could also try to narrow the database down and send it to you if interested.
Result: All is back to normal. Running database is 6-months plus missing 24-days in June plus new data....
Is there a practical size limit for Citadel 5 databases?
Could you include a Citadel probem detection/reporting in new versions?
Cheers and Best....
Ed
06-25-2008 10:52 PM - edited 06-25-2008 10:53 PM
Interesting Sidebar: I had requested some time ago.... that you folks at NI include a smooth Lookout shutdown when requested by the Operating System. Each time windows update forces a reboot (usually once/month), I'm assuming this is a possible time for Citadel database corruption.... Other ideas to detect and/or limit Citadel problems? Thanks!
06-26-2008 11:49 AM
06-26-2008 04:31 PM
We have seen this behaviour recently too on Lookout V6.0.2 / WinXP SP2. This machine doesn't have an internet connection, but it apparently dialed microsoft through the pager modem. Plant manager came in the next morning with no lookout running. Went to event viewer and saw the restart command after updates. Deleted the database since it was a new server install.
It was my fault for not defeating automatic updates when I deployed it.
Roger
07-02-2008 03:30 AM
07-02-2008 05:37 AM
Ryan,
Some clarification:
WIndows updates force the shutdown on server OS's just fine. I was wondering if these "forces" were contributing to Citadel curruption or not. If Citadel can handle relatively routine "shutdown -r -f" commands without damage to the database, all is good as it stands now. I use the -f switch sometimes also when perfoming remote machine maintenance, so it would be good to know that Citadel "does not mind".
Ed