In situations where your database is not the ultimate owner of the data—where
corrections to data can flow in from various sources and at various times—use
of transaction-time is inappropriate for historical queries.
Imagine you have a financial trading system and you want to perform
calculations based on the official 'end of day', that occurs each day at 17:00
hours. Does all the data arrive into your database at exactly 17:00? Or does
the data arrive from an upstream source where we have to allow for data to
arrive out of order, and where some might always arrive after 17:00?
This can often be the case with high throughput systems where there
are clusters of processing nodes, enriching the data before it gets to
our store.
In this example, we want our queries to include the straggling bits of
data for our calculation purposes, and this is where valid-time
comes in. When data arrives into our database, it can come with an
arbitrary time-stamp that we can use for querying purposes.
We can tolerate data arriving out of order, as we’re not completely
dependent on transaction-time.