Alert on buffer queue corruption
Detect when I have buffer queue corruption and alert on that issue. Original post in NOC Services - https://feedback.osisoft.com/forums/596665-noc-services/suggestions/18788656-alert-on-buffer-queue-corruption.
PI Data Archive 2016 R2 provides auto-recovery of corrupt queues. Have you been able to upgrade to this version? Does this solve your problem?
Hi Nebojsa Krstic,
While I agree with @Steve Kwan that we should try to get to the root of why your buffers are becoming corrupt, I also think it would be helpful to provide a health incident if that happens. I'll do some research on my end to learn how we might be able to surface that.
How often are your queues becoming corrupt? Have you been able to figure out what might be causing this (perhaps a power outage? or bad sector on a disk or something?) I would like to know more about how frequently this happens for you, and what might be causing it.
I would like to understand why you're getting buffer queue corruption. It would be best to correct the root cause of the buffer queue corruption. Have you contacted tech support?
And we cant update to the latest 2017 because PI Data Archive issues since release are not fixed so we can updated all at once. Any news on the date of the 2017 PI Data Archive release date ( the fixed one). We checked with support too and they say you can upgrade to the latest AF and Notifications but then there was the alert of some possible deletion of event frames...
We have 2016R2. And no we still have the same issue. Once it gets corrupted we have to stop thousands of analysis and notifications then, stop the service rename the folder. Then start the service, analysis and notification. Why.. well only because if not done in that way users get false notifications and that is not allowed to happen.
a couple of comments on your last response:
PI System Health will monitor the health of the PI Data Archive in a few ways - we will use some windows performance counters, and we will also be able to talk to the Data Archive directly and query any health incidents that it is able to publish. As time goes on and we release new versions, I expect the Data Archive to add to the list of incident types that it detects and reports to PI System Health.
As to the comment about ports, unfortunately, that is not something I can change. The software will need to communicate on a port. It can't use a port that's already in use, such as 5450. There are several pieces of software that are required to enable PI System Health. PI Web API is one - the good news is, that port is configurable, so you can set that yourself. We will also rely on a Connector, which uses Relay technology. There is a specific port needed for that, from the relay to the Data Archive and to AF - I believe they are using 5672, but you can double check the documentation.
All of the port requirements will be clearly documented and communicated when we have our first release, at the end of 2017. Thank you so much for your interest!
Yes PI System Health product must be able to monitor all internal processes running within the PI System, but also ensure it doesn't run on different ports if possible!
Imagine a scenario where the PI system achieve and AF are in a DMZ layer. Opening additional ports to monitor is a risk from a IT security point of view DMZ should be a safe place secured. We do have a SharePoint with Coresight / Vision , WebParts etc in the corporate (user segment).
At the moment we can't use PI System Health product as we have to open more ports..... Maybe a point to think about...?! Happy to discuss more.