Automatically remove data from continuous views
As an engineer using PI View or SQL writers for a report, I would like option to remove data older than a certain cutoff. For example, a continuous view is used to generate data for a report on the last 8 hours. Data older than that are no longer needed for the report and should be removed to avoid unnecessarily large tables.
One workaround with Text and Hadoop writers is the Append Timestamp option, but there's no ability to remove data from other targets.
For supporters of this idea, what is the downstream client tool that consumes the data prepared by the PI Integrator? Is the concern about disk space or how to only display data for a specified time range?
For the former, this seems to be a different type of “rolling” reporting use case that we do not currently have a solution for. In the mean time, as Rory mentioned below, a stored procedure on SQL would work if this idea is intended for SQL writers.
For the latter, is it possible to create “rolling timespans” for your graphs, reports, tables, etc.? For example, here’s a video on how to create calculated measures for a table based on last day or month of data: youtube.com/watch?v=NevFWL2JoMU
Drew Brubaker commented
I have a view that writes a text file every 10 minutes. My client has an automated system to pull in the data every 10 minutes. After that, the file/data is not needed. I would like there to only be one file and for Integrator to overwrite the existing file/data.
I currently have it set to not append timestamp and there is only one file. However, it keeps adding data to the file.
Kelsey Bobeck commented
This feature would be beneficial to minimize storage needs (and therefore cost) in Redshift, as the Redshift endpoint we publish to is a "buffer" storage point before the data gets picked up by a glue job. Once it's picked up, we don't need the data anymore. However, applying purge processes on a large number of tables could be challenging and it would much easier if this was done through the Integrator.
Brent Bregenzer commented
Most customers who've asked about this are concerned with the former. They would prefer the Integrator didn't produce ever-growing tables when they only need the last 6mo, 1y, etc. of data online, and they consider it an added burden to manage stored procedures to trim the tables.
Gonzalo Merciel commented
As a PII4BA user I need to maintain on the target database only the data corresponding to the specified time range, older data should be deleted when new data is inserted.
Rory Byrne-Dugan commented
I agree with this use case. For SQL targets, you could create a stored procedure to delete data older than a certain time, and then you can run it with SQL Agent. Not the most ideal scenario, but it works. Stack Overflow has a good example for deleting data here: https://stackoverflow.com/questions/1588722/delete-items-older-than-a-day-sql-server
Hans Bleijendaal commented
Thanks for bringing this in.