RFM.rst 19.2 KB
Newer Older
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
1 2 3 4
.. _rfm:


.. |catalogicon| image:: ../../images/CatalogIcon.png
5
 :target: `RFM Sample Aginity Catalog Assets`_
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65


Lesson #5: Learn to Create Basic Recency, Frequency and Monetary Segments
**************************************************************************

RFM (Recency, Frequency, Monetary) analysis is a simple, easy to comprehend, highly used marketing model for behavior based customer segmentation. It groups customers based on their transaction history – how recently did they transact, how often did they transact and how much did they purchase.

RFM models helps divide customers into various categories or clusters to identify customers who are more likely to respond to promotions and also for future personalization services.


.. seealso::

  As with all of our lessons if you want to catalog this code in `Aginity Pro <https://www.aginity.com/products/aginity-pro/>`_ or `Aginity Team <https://www.aginity.com/products/aginity-team/>`_  click the |catalogicon| to skip right to the queries to add to your catalog.


Step 1 - The Required Data
===========================

For the sake of this example we want to make it simple but let you expand it against any data you have available to you.

We are going to mock up some data using a basic SQL **UNION ALL** command.  We will use two different tables which are very typical in RFM modeling, **TRANSACTION_HEADER** and **TRANSACTION_DETAIL**.

Here is the structure of each table.


TRANSACTION_HEADER
++++++++++++++++++++

  +------------------+---------------------------------+----------------------+
  | Column name      | Column Description              | Data Type            |
  |                  |                                 |                      |
  +==================+=================================+======================+
  | customer_id      | This is a unique identifier of  | Integer              |
  |                  | a customer that purchased.      |                      |
  +------------------+---------------------------------+----------------------+
  | transaction_id   | A unique identifier of a        | Integer              |
  |                  | transaction.                    |                      |
  +------------------+---------------------------------+----------------------+
  | transaction_date | The date on which the           | Date                 |
  |                  | transaction occurred.           |                      |
  +------------------+---------------------------------+----------------------+


TRANSACTION_DETAIL
+++++++++++++++++++++

  +------------------+---------------------------------+----------------------+
  | Column name      | Column Description              | Data Type            |
  |                  |                                 |                      |
  +==================+=================================+======================+
  | transaction_id   | A unique identifier of a        | Integer              |
  |                  | transaction. FK to TRANSACTION  |                      |
  |                  | HEADER.                         |                      |
  +------------------+---------------------------------+----------------------+
  | quantity         | The quantity of items purchased.| Integer              |
  +------------------+---------------------------------+----------------------+
  | net_amount       | The total amount of items       | Decimal(14,3)        |
  |                  | purchased.                      |                      |
  +------------------+---------------------------------+----------------------+

66 67
Sample RFM Data Scripts
+++++++++++++++++++++++++
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222

The following SQL Scripts are portable and used to simulate the RFM model.

**TRANSACTION_HEADER**

.. code-block:: sql
  :linenos:

    -- transaction_header data

    select cast(123456 as integer) as customer_id, cast(11111 as integer) as transaction_id, cast('2019-01-01' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11112 as integer) as transaction_id, cast('2019-01-04' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11113 as integer) as transaction_id, cast('2019-01-07' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11114 as integer) as transaction_id, cast('2019-01-10' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11115 as integer) as transaction_id, cast('2019-01-14' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11116 as integer) as transaction_id, cast('2019-01-17' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11117 as integer) as transaction_id, cast('2019-01-20' as date) as transaction_date
    union all
    select cast(123456 as integer) as customer_id, cast(11118 as integer) as transaction_id, cast('2019-01-27' as date) as transaction_date
    union all
    select cast(234567 as integer) as customer_id, cast(21115 as integer) as transaction_id, cast('2019-01-14' as date) as transaction_date
    union all
    select cast(234567 as integer) as customer_id, cast(21116 as integer) as transaction_id, cast('2019-01-15' as date) as transaction_date
    union all
    select cast(234567 as integer) as customer_id, cast(21117 as integer) as transaction_id, cast('2019-01-16' as date) as transaction_date
    union all
    select cast(234567 as integer) as customer_id, cast(21118 as integer) as transaction_id, cast('2019-01-17' as date) as transaction_date



**TRANSACTION_DETAIL**

.. code-block:: sql
  :linenos:

        --transaction_detail data

        select cast(11111 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(11112 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(11112 as integer) as transaction_id, cast(1 as integer) as quantity, cast(7.25 as decimal(13,2)) as net_amount
        union all
        select cast(11113 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(11112 as integer) as transaction_id, cast(15 as integer) as quantity, cast(1.10 as decimal(13,2)) as net_amount
        union all
        select cast(11114 as integer) as transaction_id, cast(1 as integer) as quantity, cast(25.34 as decimal(13,2)) as net_amount
        union all
        select cast(11114 as integer) as transaction_id, cast(2 as integer) as quantity, cast(14.32 as decimal(13,2)) as net_amount
        union all
        select cast(11114 as integer) as transaction_id, cast(1 as integer) as quantity, cast(7.10 as decimal(13,2)) as net_amount
        union all
        select cast(11115 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(11116 as integer) as transaction_id, cast(1 as integer) as quantity, cast(8.10 as decimal(13,2)) as net_amount
        union all
        select cast(11117 as integer) as transaction_id, cast(2 as integer) as quantity, cast(23.10 as decimal(13,2)) as net_amount
        union all
        select cast(11118 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(21115 as integer) as transaction_id, cast(14 as integer) as quantity, cast(4.10 as decimal(13,2)) as net_amount
        union all
        select cast(21116 as integer) as transaction_id, cast(16 as integer) as quantity, cast(8.10 as decimal(13,2)) as net_amount
        union all
        select cast(21117 as integer) as transaction_id, cast(4 as integer) as quantity, cast(23.10 as decimal(13,2)) as net_amount
        union all
        select cast(21118 as integer) as transaction_id, cast(1 as integer) as quantity, cast(43.10 as decimal(13,2)) as net_amount

Step 2 - The RFM Query
===========================

  We are using a **WITH** statement to collapse two passes into one required SQL statement.  Let's start with the statement itself that returns the aggregated analytics and the segment each customer falls in.

.. code-block:: postgresql
  :linenos:

      with customer_metrics as
      (
      select
       th.customer_id,
       count(distinct th.transaction_id) as trips_per_period,
       sum(td.quantity * td.net_amount) as total_spend_per_period,
       datediff(DAY,current_date, max(th.transaction_date)) AS days_since_last_transaction
      from  TRANSACTION_HEADER th
      join  TRANSACTION_DETAIL td on th.transaction_id = td.transaction_id
      where th.transaction_date > dateadd(day, cast ($lookback_days as integer)*-1,current_date)
      AND td.quantity > 0             -- returns ignored
      group by th.customer_id),
      rfm as
      (
      select customer_id,
        ntile($buckets) over (order by days_since_last_transaction desc) as r,
        ntile($buckets) over (order by trips_per_period desc) as f,
        ntile($buckets) over (order by total_spend_per_period desc) as m,
        trips_per_period,
        total_spend_per_period,
        days_since_last_transaction
      from customer_metrics
      )
      select customer_id
           , r
           , f
           , m
           , trips_per_period
           , total_spend_per_period
           , days_since_last_transaction
           , ntile($buckets) over (order by 1.0*r+1.0*f+1.0*m)
      from rfm;  -- weights on final calculation - default to 1.0

Breakdown of the SQL
+++++++++++++++++++++

    There are two queries embedded in the **WITH** statement:  **customer_metrics** and **rfm**.

    The **customer_metrics** query will aggregate:

     #. The trips_per_period by counting distinct transaction id's (F or Frequency).
     #. The total_spend_per_period by summing net_sales and the quantity sold (M or Monetary).
     #. The days_since_last_transaction by finding the difference between the current date and the last purchase date (R or Recency).

     In this query we have two parameters:  $lookback_days which tells you how long from current date do you want to segment customers purchases by and $buckets which signifies the number of segments you want the query to return.

     The **rfm** query then uses the windowing function, ntile, which will take the ordered data from the **customer_metrics** query and segment them into equal size (number of rows per group).

     The final query as shown below brings together all the information from the **WITH** queries and displays it along with a final ntile of the RFM calculation.

  .. code-block:: postgresql
    :linenos:

        select customer_id
              , r
              , f
              , m
              , trips_per_period
              , total_spend_per_period
              , days_since_last_transaction
              , ntile($buckets) over (order by 1.0*r+1.0*f+1.0*m)
         from rfm;  -- weights on final calculation - default to 1.0

 .. note::

       The 1.0 you see in the query above represents equal weight to the R, F and M calculation.  In some cases an organization may want to weight each measure differently to perform the final segmentation.  For instance, you may choose to apply a weight of 2 to monetary and 1 and 1 to frequency and recency.


       Ignore this answer, it does not work: Better use the answer from Louis

       For anchor, you may define "short" anchor names like this:

223
RFM Sample Aginity Catalog Assets
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
224 225
======================================

226
There are six assets you can add to your catalog.  I chose to add them as shown below.
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
227 228 229

.. image:: ../../images/RFMCatalog.png

230
These queries are written using ANSII standard SQL so should work across most database platforms.  Just select a connection in the Pro/Team Editor and either double click the catalog item and execute or drag and drop the catalog item which will expose the code and run them.
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
231

232
DATA-transaction_header
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
+++++++++++++++++++++++++
  .. code-block:: postgresql
    :linenos:


        (select cast(123456 as integer) as customer_id, cast(11111 as integer) as transaction_id, cast('2019-01-01' as date) as transaction_date
        union all
        select cast(123456 as integer) as customer_id, cast(11112 as integer) as transaction_id, cast('2019-01-04' as date) as transaction_date
        union all
        select cast(123456 as integer) as customer_id, cast(11113 as integer) as transaction_id, cast('2019-01-07' as date) as transaction_date
        union all
        select cast(123456 as integer) as customer_id, cast(11114 as integer) as transaction_id, cast('2019-01-10' as date) as transaction_date
        union all
        select cast(123456 as integer) as customer_id, cast(11115 as integer) as transaction_id, cast('2019-01-14' as date) as transaction_date
        union all
        select cast(123456 as integer) as customer_id, cast(11116 as integer) as transaction_id, cast('2019-01-17' as date) as transaction_date
        union all
        select cast(123456 as integer) as customer_id, cast(11117 as integer) as transaction_id, cast('2019-01-20' as date) as transaction_date
        union all
        select cast(123456 as integer) as customer_id, cast(11118 as integer) as transaction_id, cast('2019-01-27' as date) as transaction_date
        union all
        select cast(234567 as integer) as customer_id, cast(21115 as integer) as transaction_id, cast('2019-01-14' as date) as transaction_date
        union all
        select cast(234567 as integer) as customer_id, cast(21116 as integer) as transaction_id, cast('2019-01-15' as date) as transaction_date
        union all
        select cast(234567 as integer) as customer_id, cast(21117 as integer) as transaction_id, cast('2019-01-16' as date) as transaction_date
        union all
        select cast(234567 as integer) as customer_id, cast(21118 as integer) as transaction_id, cast('2019-01-17' as date) as transaction_date)

262
DATA-transaction_detail
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299
+++++++++++++++++++++++++++
.. code-block:: postgresql
  :linenos:

        (select cast(11111 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(11112 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(11112 as integer) as transaction_id, cast(1 as integer) as quantity, cast(7.25 as decimal(13,2)) as net_amount
        union all
        select cast(11113 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(11112 as integer) as transaction_id, cast(15 as integer) as quantity, cast(1.10 as decimal(13,2)) as net_amount
        union all
        select cast(11114 as integer) as transaction_id, cast(1 as integer) as quantity, cast(25.34 as decimal(13,2)) as net_amount
        union all
        select cast(11114 as integer) as transaction_id, cast(2 as integer) as quantity, cast(14.32 as decimal(13,2)) as net_amount
        union all
        select cast(11114 as integer) as transaction_id, cast(1 as integer) as quantity, cast(7.10 as decimal(13,2)) as net_amount
        union all
        select cast(11115 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(11116 as integer) as transaction_id, cast(1 as integer) as quantity, cast(8.10 as decimal(13,2)) as net_amount
        union all
        select cast(11117 as integer) as transaction_id, cast(2 as integer) as quantity, cast(23.10 as decimal(13,2)) as net_amount
        union all
        select cast(11118 as integer) as transaction_id, cast(3 as integer) as quantity, cast(3.10 as decimal(13,2)) as net_amount
        union all
        select cast(21115 as integer) as transaction_id, cast(14 as integer) as quantity, cast(4.10 as decimal(13,2)) as net_amount
        union all
        select cast(21116 as integer) as transaction_id, cast(16 as integer) as quantity, cast(8.10 as decimal(13,2)) as net_amount
        union all
        select cast(21117 as integer) as transaction_id, cast(4 as integer) as quantity, cast(23.10 as decimal(13,2)) as net_amount
        union all
        select cast(21118 as integer) as transaction_id, cast(1 as integer) as quantity, cast(43.10 as decimal(13,2)) as net_amount)


300 301
FML - Days Since Last Transaction (Recency)
+++++++++++++++++++++++++++++++++++++++++++++
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
302

303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335
  This asset is a reusable formula that calculates the days between execution run time (current date) and the maximum transaction date for each customer.

  .. code-block:: SQL
    :linenos:

    current_date - max(th.transaction_date)

FML - Total Spend in Period (Monetary)
++++++++++++++++++++++++++++++++++++++++

  This asset is a reusable formula that calculates the aggregation (sum) of net sales, defined as quantity multiplied by net_amount over the specified time period.

  .. code-block:: SQL
    :linenos:

    sum(td.quantity * td.net_amount)

FML - Trips per Period (Frequency)
+++++++++++++++++++++++++++++++++++++

  This asset is a reusable formula that counts the number of distinct transactions within the specified time period.

  .. code-block:: SQL
    :linenos:

      count(distinct th.transaction_id)

Sample - RFM Query
++++++++++++++++++++++++

  This asset uses the formulas above and then calculates the segementation using the windowed analytic function ntile.

.. code-block:: postresql
Jeffrey Schlitt's avatar
Jeffrey Schlitt committed
336 337
  :linenos:

338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370
        with customer_metrics as
        (
        select
        th.customer_id,
        @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/FML - Trips per Period (Frequency)} as trips_per_period,
        @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/FML - Total Spend in Period (Monetary)} as total_spend_per_period,
        @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/FML - Days Since Last Transaction (Recency)} AS days_since_last_transaction
        from  @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/DATA-transaction_header}
        join  (SELECT * FROM @{/Samples/Sample Data Science Queries - All Platforms/Recency-Frequency-Monetary Samples/DATA-transaction_detail}) td
        on th.transaction_id = td.transaction_id
        where td.quantity > 0             -- returns ignored
        --and th.transaction_date > dateadd(day, cast($lookback_days as integer)*-1,current_date)  -- consider the past 365 days - customization opportunity (1)
        group by th.customer_id),
        rfm as
        (
        select customer_id,
        ntile($buckets) over (order by days_since_last_transaction desc) as r,  -- split into 10 bins - customization opportunity (2)
        ntile($buckets) over (order by trips_per_period desc) as f,
        ntile($buckets) over (order by total_spend_per_period desc) as m,
        trips_per_period,
        total_spend_per_period,
        days_since_last_transaction
        from customer_metrics
        )
        select customer_id
           , r
           , f
           , m
           , trips_per_period
           , total_spend_per_period
           , days_since_last_transaction
           , ntile($buckets) over (order by 1.0*r+1.0*f+1.0*m)
        from rfm