Make more videos about the theoretical side of PI
It's great to see videos about statistical quality control and the swinging door algorithm. I think that having more videos of this nature will really help PI administrators understand PI's capabilities rather than just its features.
This is also a chance for OSIsoft to explain some of their design decisions, especially those that naïvely seem like bad idea but actually have sound reasoning behind them.
Lastly, the process of making these videos will likely reveal insights and improvements, since it forces an in-depth look at PI and a justification for its design.
I will post video ideas in the comments.
Kenneth Barber commented
# Tag Theory (part 2) #
• Uses and properties of Boolean tags (tags whose values are only 0 or 1 and whose step attribute is on)
- The time-weighted average and the integration over time within a time range produce the same value, but they mean slightly different things. The time-weighted average is unitless. The integration over time represents the number of days in the time range that the tag was 1.
- Multiplying a Boolean tag by a tag that represents a rate over time reproduces the rate tag only when the Boolean tag is 1, and it is 0 otherwise. This effectively produces a conditional signal, and we can, say, integrate this new tag over time to simulate conditional integration on the original rate tag.
• When to use digital states on a single tag versus using multiple Boolean tags.
Details: As long as only 1 of the states can occur at any given time, then digital states are appropriate, otherwise the overlapping states should be separated into their own Boolean tags.
• Maximizing the use of a tag-limited PI system (trade off simplicity and calculation time for cost savings and reduced disk space)
- Combine up to 64 Boolean tags into a single Int64 tag. Use bit masking to extract the individual Boolean values.
- If you get very desperate trying to pack multiple signals into a single tag, consider using a string or BLOB tag.
- Instead of using 1 tag for each each test frequency, apply all frequencies at once. Separate the output into signals by frequency by applying the Fourier transform.
- If historical forecast values are not important, then instead of having 1 tag per time period in the future (e.g. 1 day, 2 days, 1 week), just use a single future data tag and overwrite its values.
- If a signal has a fractional part and requires more digits than Float64 allows (52), then scale the signal to an integer and save as Int64. If even more digits of precision are required, break the signal into 2 tags: one for the upper digits and one for the lower digits.
- A totalizer tag can afford to reset to 0 at most once in its life time without needing to keep track of when the resets occured. The reset would ideally occur when the value reaches maximum value that the tag's point type supports. When doing a subtraction over a time range, if the value is negative, add back this maximum value to get the correct value.
- Write a program to alternate 2 sets of tags between scan = on and scan = off to break the tag limit (the tag limit limits the number of scan = on tags, not the number of tags) at the expense of a much lower scan rate for both sets.
# PI Asset Framework Theory #
• Easy confusions and poor design, and how to fix them.
- Choosing the primary parent.
- Instead of having unused attributes for certain equipment that follows an element template, separate the template into a base one and a derived one.
- When to use a child element as opposed to more attributes or child attributes.
- When to use child attributes as opposed to leaving the attributes "flat".
- When to use categories versus container elements. Compare and contrast. Pros and cons.
- When to make a whole new database versus using more container elements to group elements into "pseudo-databases". Compare and contrast. Pros and cons.
- When to use layers of container elements versus leaving the element list "flat". For example, if I have 2 sites, 2 areas within each site, and 2 models of equipment, what are the pros and cons of creating multiple arrangements of hierarchy (e.g. \Site\Area\Model\ and \Area\Model\Site\ and etc. all co-existing) versus not nesting site, area, and model within each other (e.g. \Site\Equipment and \Area\Equipment and \Model\Equipment all co-existing)?
• Explaining and justifying the cardinality of relationship between databases, elements, child elements, element template, derived templates, attributes, child attributes, categories, analyses, tags, notifications, AD groups, PI Identities, event frames, event frame templates, child event frames, etc. A giant diagram encapsulating all of the cardinalities would probably help.
• PI Asset Framework as a directed graph, not necessarily a hierarchy. Weak references create the illusion of a hierarchy (tree).
• Compare and contrast PI Asset Framework against a traditional, multi-table relational database.
Kenneth Barber commented
These are only examples of possible video topics. OSIsoft knows PI much better than I do, so they can probably think of more useful topics.
# Tag Theory #
• Translating from relational database thinking to time series database thinking.
- Arrange your table of values so that you only have 2 types of columns: the value column and columns that comprise the compound primary key. The columns that comprise the compound primary key, minus the timestamp column, identify the PI Point. The timestamp column and the value column identify an event that is saved to that PI Point.
- A relational database treats all dimensions (columns) as discrete, but time series databases have a continuous time dimension represented by discrete samples, and a discrete "tag" dimension.
- Disadvantages of a relational database for time series data.
• Different ways to handle signals over discrete time or signals over a time range
- The mass of rock for a single load into a haul truck, where the haul truck can take multiple loads before driving away to dump off its load, and where the weight scale is on the loader, is only recorded at precise times, and the times that the loader is empty is not recorded. Interpolation between these values is not appropriate. Possible approaches: avoid using interpolated values or create a new tag that tracks the mass being held in the haul truck, which can be interpolated.
- A line on a monthly invoice applies to a time range (the month) and cannot be broken down accurately into smaller pieces. Possible approaches: avoid using interpolated values or convert the cost to cost per day, which would produce a continuous signal over time.
• Defining named constants in the PI Data Archive.
Details: Make a tag named after the constant. Save a single event to it, where the timestamp is the beginning of time (January 1, 1970) and the value is the constant.
• Why use linear interpolation between events on a PI Point, as opposed to say, a cubic spline?
Details: Linear interpolation will produce a line segment whose slope equals the average of all slopes in the same time period in the original signal. See the mean value theorem. Also, the interpolation method used between events should not produce interpolations that are greater than or less than the values at the interpolated events, otherwise false alarms might be generated.
• Creating a tag that accurately represents the product of 2 tags (e.g. mass flow rate × % of particles within a certain size, rate × Boolean tag) if both are step = on, both are step = off, or one is step = on and the other is step = off.
• Maximum changes in accuracy and precision compared to the "true" signal after different operations.
Examples: sampling, step interpolation, linear interpolation, exception, compression, changing scan rate, numerical integration over time, event-weighted derivative with respect to time, combining signals (e.g. sum or product)
• Uses and properties of totalizers.
- Totalizers are the definite integral with respect to time of a rate over time. As such they have all of the properties of integrals. For viewers that are less math-inclined, totalizers can be thought of as a cumulative sum of measurements over small time ranges, starting from some arbitrary starting time.
- A totalizer is continuous over time, as opposed to the measurements over time ranges mentioned earlier, which are not.
- You can find a "sum" in a time range by subtracting the values at the time range's end points. This is much faster than summing every small measurement in a time range.
- The starting time of the totalizer is irrelevant, since only the difference between totalizer values matter, and the differences are not affected by the starting time. Use visuals to prove this!
- Even though a linearly interpolated totalizer is just an approximation of the "true" totalizer, an exact derivative with respect to time of the interpolated totalizer can be calculated. Since the totalizer is a series of line segments, the derivative is a series of flat lines (i.e. step = on). Similarly, integrating a tag that represents a rate over time and that has step = on can be integrated over time to produce an exact totalizer. However, taking the derivative of a step = on tag produces all 0 or undefined, and integrating a step = off tag produces parabolic pieces that require more saved events than the original tag to reproduce accurately.