The Activity Schema™ data model is an open standard for data modeling and transformation.
It enables simpler, faster, and more reliable data preparation
The Activity Schema was designed assuming that data is messy: foreign keys don't exist, duplication in the source, multiple identifiers, etc...
One table allows everyone to know where the data is. It is simple to find and simple to discover all new data sources
Using the table is super easy since it is a standard structure. You can easily query the data without learning what all the columns mean
Raw data goes into the Activity Stream and then that is used to generate any table. Everything you need is built on top of this one table making data more reliable and trusted.
Data in the Activity Stream can be related to each other using time and customer. No need to pre-define all the joins
Building it only requires you to define your business logic. This is usually on one source and is really easy.
Not only is building anything possible using the Activity Schema, but changing, adding and deleting is done with ease.
Having worked in data for many years, I found that every question we answered started from scratch. We constantly had to think about what we need to build, the columns that we should add, the ways a table would be used.
I needed a standard that can work across all companies, that can answer any question, that can capture any data structure. I needed the Activity Schema.
Data is modeled using independent activities.
All warehouse data is in a single time-series table
All plots and analyses for BI run against a single table
An Activity Schema data model structures all data in the warehouse as a single time series table.
Data is built from independent activities via temporal joins, instead of from staging tables via standard SQL joins.
Any activity can be combined with any other by using relationships in time instead of foreign keys, allowing for true ad hoc queries.
Existing data modeling approaches, such as a star schema, have many layers of dependencies.
These are difficult to manage and maintain. The source of truth is not always clear, they are harder to debug, and require more documentation to use.
One business concept per activity means fewer models to manage, understand, and maintain
No joins between models means no need to tie disparate source systems together
Changes to source data typically only affect a single activity
A single data layer makes tracing data provenance and debugging far easier
Time-series modeling means incremental updates (rather than full rebuilds) by default
Fewer models, with one concept each, makes them vastly easier to document
Each activity represents a single concept (like a 'page view' or 'completed order'), so it's always clear which to use
Time-based joins means any activity can be queried and combined with another without defining foreign keys
A standard data model means that any analysis can be reused across companies. A customer acquisition cost calculation for one company can be shared with another.
A standard data model means that queries don't have to be written by hand
Because all activities are related in time, swapping one activity for another requires no structural changes to queries.
Queries run substantially faster against an activity stream table, which has fewer columns, requires fewer joins, and can be easily partitioned by activity or time