Exploratory Sprint: Derived Aggregates in Meltano Models
As we are trying to support Facebook Ads (#1572 (closed)), we are facing a wall with aggregates that need multiple attributes in order to be calculated.
Check my comments on the related issue (#1572 (closed)) for more context.
Assume that you want to calculate the CPC: The average cost for each click.
This can simply calculated as total amount spend / total clicks
over a specific period.
Our issue is that we have data per day with both the amount spend and total clicks per day, but also the cpc per day.
As this is a weighted average, we can not just average the daily CPC values:
Let's say you have 2 days of Ad insights:
Date | Total $ Spent | Clicks | cpc ----------------------------------------------- 2020/01/15 | 1 | 10 | 0.10 2020/01/16 | 2 | 100 | 0.02
The cpc for the 2 days is:
(Total Cost) / (Total Clicks) = 3 / 110 = 0,02727
But if you average the CPC values for the 2 rows you get 0,06
Which, of course, is wrong as you have to do a weighted average: the first value is for 10 clicks while the second is for 100 clicks (10 times more). The same holds if you have 2 or 100 days.
So what we need is to be able to create an aggregate that is the SUM(daily_spent) / SUM(total_daily_clicks)
, but we do not have such an option right now in Meltano.
This is more generic, but most of the time we get this issue with similar averages.
An initial idea on how we could approach this:
We could add a new type for aggregates
:
type = derived
- sql --> references other aggregates by name in
{{}}
- It is not treated as a real aggregate
For example:
{
version = 1
sql_table_name = ads_insights
name = ads_insights
columns {
... ...
}
aggregates {
total_cost {
label = Total Cost
description = Total Cost
type = sum
sql = "{{table}}.spend"
}
total_clicks {
label = Total Clicks
description = Total Clicks
type = sum
sql = "{{table}}.clicks"
}
cpc {
label = Cost per Click (CPC)
description = Average Cost per click
type = derived
sql = "1.0 * {{total_cost}} / NULLIF({{total_clicks}}, 0)"
}
... ...
}
}
We can then use that to expand the last clause of our generated queries by adding those derived clauses only there, referencing the attributes generated for the other aggregates.
Considerations:
-
Make sure to include the referenced aggregates in the query
-
Update the generation of the final select clause to not select everything (*) from the result clause but only the requested aggregates
This solves the problem of adding things not requested
-
Add the derivative aggregate directly in the final select clause
-
If there are filters for it, just add theme directly there
For example, if I update the final SELECT * FROM "result" "result" ...
in any HDA query generated by Meltano, I can achieve this pretty easily (and also filter the derived aggregate):
... ... ...
SELECT
*,
1.0 * "ads_insights.total_cost" / NULLIF("ads_insights.total_clicks", 0) "ads_insights.cpc"
FROM
"result" "result"
WHERE
1.0 * "ads_insights.total_cost" / NULLIF("ads_insights.total_clicks", 0) is not NULL
ORDER BY
"ads_insights.date_start" ASC
LIMIT
50;