In situations where your database is not the ultimate owner of the data—where
corrections to data can flow in from various sources and at various times—use
of transaction-time is inappropriate for historical queries.
Imagine you have a financial trading system and you want to perform
calculations based on the official 'end of day', that occurs each day
at 17:00 hours. Does all the data arrive into your database at exactly
17:00? Or does the data arrive in fact arrive from an upstream source,
and we have to allow for some data to arrive out of order, and some
might just arrive after 17:00?
This can often be the case with high throughput systems where there
are clusters of processing nodes, enriching the data before it gets to
our store.
In this example, we want our queries to include the straggling bits of
data for our calculation purposes, and this is where valid-time
comes in. When data arrives into our database, it can come with an
arbitrary time-stamp that we can use for querying purposes.
We can tolerate data arriving out of order, as we’re not completely
dependent on transaction-time.