Data-Driven Marketing: Wie die kundenzentrierte Ansprache in Zukunft gelingt, Teil 3

Neukunden erfolgreich binden: Analyse des Verhalten nach dem Erstkauf

Zur Blog-Serie https://www.shi-gmbh.com/herausforderung-customer-centricity/

Inhalt:

Grundlegendes Verhalten der Wiederkäufer

1. Wie viele Besuche mit Kauf macht ein Kunde im Durchschnitt?

2. Wie viel Zeit vergeht zwischen zwei Transaktionen eines Wiederkäufers?

3. "Churn"- oder Abwanderungsrate

Verhalten nach dem Erstkauf - Kohortenanalyse

1. Über welchen Kanal kommen meine Kunden zurück?

2. Wann werden Erstkäufer wieder aktiv?

3. Welche Produkte bewegen zum Zweitkauf?

Merkmale von Wiederkäufer

1. Used Device

2. Browser

3. Herkunftsland

In [2]:
import findspark
findspark.init('/usr/local/Cellar/apache-spark/2.3.2/libexec/')

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.window import Window
In [3]:
import plotly.plotly as py
import plotly.graph_objs as go
import pandas as pd
import requests
from plotly.offline import iplot, init_notebook_mode
from plotly import tools
init_notebook_mode()
In [4]:
sc = SparkSession.builder.appName("TestGA").getOrCreate()

ga_data = sc.read.json('/Users/eduardoschumann/Documents/Projekte/E-CommerceAnalytics/ga_data/ga_sessions_*.json.gz')

Grundlegendes Verhalten der Wiederkäufer

1. Wie viele Besuche mit Kauf macht ein Kunde im Durchschnitt?

Wir generieren zunächst eine Sicht der Daten, der alle Besuche mit einer Transaktion (Kauf) darstellt und mitteln dann

In [5]:
transactions = ga_data.select('fullVisitorId', to_date("date", "yyyyMMdd").alias("pdate"), expr("totals.transactions").cast('integer')).fillna(0).filter("transactions > 0")

transactions.groupBy('fullVisitorId').count().agg(avg(col("count"))).show()
+------------------+
|        avg(count)|
+------------------+
|1.1526641388944323|
+------------------+

2. Wie viel Zeit vergeht zwischen zwei Transaktionen eines Wiederkäufers?

Hierfür nutzen wir eine Window-Funktion, um alle Besuche eines Nutzers zu betrachten und die Anzahl der Tage zu berechnen, die zwischen zwei aufeinanderfolgenden Käufe vergehen. Dafür setzen wir die Funktion lag ein.

In [6]:
timeintertrans = transactions.withColumn("time_intertrans",datediff(transactions.pdate, lag(transactions.pdate, 1).over(Window.partitionBy("fullVisitorId").orderBy("pdate")))).fillna(0).filter("time_intertrans > 0")
timeintertrans.show()
+-------------------+----------+------------+---------------+
|      fullVisitorId|     pdate|transactions|time_intertrans|
+-------------------+----------+------------+---------------+
|6332431001963915083|2016-08-15|           1|             14|
|8851822767477687842|2016-09-07|           1|              6|
|8851822767477687842|2016-09-27|           1|             20|
|8851822767477687842|2016-09-30|           1|              3|
|2086918960702630134|2017-04-24|           1|             14|
|2864376493455301677|2017-04-08|           1|            155|
|2864376493455301677|2017-05-02|           1|             24|
| 419161784882208907|2017-03-14|           1|             88|
|5149788969578895545|2017-02-26|           1|             61|
| 768998327849146700|2016-12-11|           1|              6|
|4286676770640078553|2016-12-02|           1|             21|
|9657742098396819155|2017-04-11|           1|             28|
|0608915197735218105|2017-01-10|           1|             62|
|0608915197735218105|2017-01-13|           1|              3|
|0608915197735218105|2017-01-26|           1|             13|
|0608915197735218105|2017-03-13|           1|             46|
|0608915197735218105|2017-03-15|           1|              2|
|0608915197735218105|2017-03-17|           1|              2|
|0608915197735218105|2017-03-31|           1|             14|
|0608915197735218105|2017-04-11|           1|             11|
+-------------------+----------+------------+---------------+
only showing top 20 rows

In [7]:
timeintertrans.agg(avg(col("time_intertrans"))).show()
+--------------------+
|avg(time_intertrans)|
+--------------------+
|   27.72128851540616|
+--------------------+

3. "Churn"- oder Abwanderungsrate

Die Abwanderungsrate gibt an, wie viele Kunden am Ende eines Zeitintervalls, bspw. der Vertraugslaufzeit oder eines Zeifensters, verloren gehen. Da bei dem Shop keine Verträge mit fester Laufzeit abgeschlossen werden, betrachten wir Kalendermonate für die Berechnung der Abwanderungsrate

Mathematische Definition:

Churn(Zeitintervall) = 1 - ( Kunden(Ende) / Kunden(Beginn) )

Kunden(Beginn): Anzahl der Kunden, die im lezten Zeitintervall was gekauft haben

Kunden(Ende): Anzahl der Kunden, die zu Kunden(Beginn) zählten und im aktuellen Zeitintervall was gekauft haben (keine reine Neukunden)

In [12]:
buyersInPeriod = transactions.withColumn("month", month("pdate")).withColumn("year", year("pdate")).select("fullVisitorId",  expr( "month + ((year - 2016)*12)").alias("period")).distinct().withColumn("previousPeriod", expr("period  - 1"))
buyersInPeriod.show()
+-------------------+------+--------------+
|      fullVisitorId|period|previousPeriod|
+-------------------+------+--------------+
|4061971552460255356|     8|             7|
|0155890066027533778|     8|             7|
|6236875467673822779|     8|             7|
|5241810559534801001|     8|             7|
|8885298272066714457|     9|             8|
|3924372865099736100|     8|             7|
|7180222975879135190|     8|             7|
| 627462873100311863|     8|             7|
| 051257374560700342|     8|             7|
|0097216552247524030|     9|             8|
|2174401229555413552|     9|             8|
| 589967396396729491|     8|             7|
|4499382041484859324|     8|             7|
|0363878417378060116|    19|            18|
|8719867302995619824|    19|            18|
|2522954230696043073|    19|            18|
|4180896449051062828|    11|            10|
|2981325201816358942|    19|            18|
|2377126524987232945|    19|            18|
|4930078129398561333|    12|            11|
+-------------------+------+--------------+
only showing top 20 rows

In [13]:
totBuyersPeriod = buyersInPeriod.groupBy("period").count().orderBy("period")
totBuyersPeriod.show()
+------+-----+
|period|count|
+------+-----+
|     8| 1046|
|     9|  793|
|    10|  818|
|    11|  864|
|    12| 1318|
|    13|  662|
|    14|  666|
|    15|  809|
|    16|  870|
|    17| 1055|
|    18|  878|
|    19|  964|
|    20|   42|
+------+-----+

In [14]:
current = buyersInPeriod.select("fullVisitorId", expr("previousPeriod").alias("current"))
previous = buyersInPeriod.select("fullVisitorId", expr("period").alias("current"))

comparePeriods = previous.join(current, ["fullVisitorId", "current"])
comparePeriods.show()
+-------------------+-------+
|      fullVisitorId|current|
+-------------------+-------+
|7184569784224376022|      9|
|2969957690756250633|     15|
|7284466025557220497|     13|
|2402527199731150932|     16|
|2193251253036865639|     11|
| 803888563485194008|     12|
|5954999562906219668|      8|
|0608915197735218105|     16|
|4116720743881459837|     15|
|2252819581596939503|      8|
|1956307607572137989|     11|
|3891707253957252733|     19|
|1160204963389257110|     18|
|8197879643797712877|     12|
|0014262055593378383|     18|
|6820200829240883942|     17|
|9952616174324085427|     10|
|2402527199731150932|     12|
|3857043812510146001|     17|
|9029794295932939024|     13|
+-------------------+-------+
only showing top 20 rows

In [15]:
totRemainingBuyersPeriod = comparePeriods.groupBy("current").count().orderBy("current").withColumnRenamed("current","period").withColumnRenamed("count","remaining")
totRemainingBuyersPeriod.show()
+------+---------+
|period|remaining|
+------+---------+
|     8|       46|
|     9|       34|
|    10|       48|
|    11|       52|
|    12|       44|
|    13|       29|
|    14|       45|
|    15|       51|
|    16|       54|
|    17|       40|
|    18|       45|
|    19|        6|
+------+---------+

In [16]:
totBuyersPeriod.join(totRemainingBuyersPeriod, "period").select(expr("period").alias("afterPeriod"), expr("1 - (remaining / count)").alias("churnRate")).show()
+-----------+------------------+
|afterPeriod|         churnRate|
+-----------+------------------+
|          8|0.9560229445506692|
|          9| 0.957124842370744|
|         10| 0.941320293398533|
|         11|0.9398148148148148|
|         12|0.9666160849772383|
|         13|0.9561933534743202|
|         14|0.9324324324324325|
|         15|0.9369592088998764|
|         16|0.9379310344827586|
|         17|0.9620853080568721|
|         18|  0.94874715261959|
|         19|0.9937759336099585|
+-----------+------------------+

Verhalten nach dem Erstkauf - Kohortenanalyse

1. Über welchen Kanal kommen meine Kunden zurück?

In [18]:
transactionsInfo = ga_data.select('fullVisitorId','channelGrouping','visitId', 'visitNumber', to_date("date", "yyyyMMdd").alias("pdate"), expr("totals.transactions").cast('integer')).fillna(0).filter("transactions > 0")
transactionsInfo.show()
+-------------------+---------------+----------+-----------+----------+------------+
|      fullVisitorId|channelGrouping|   visitId|visitNumber|     pdate|transactions|
+-------------------+---------------+----------+-----------+----------+------------+
|2324511444137196326| Organic Search|1472681213|          1|2016-08-31|           1|
|6810807631474905667|         Direct|1472682771|          1|2016-08-31|           1|
|0973104829844612521| Organic Search|1472670538|          1|2016-08-31|           1|
|3797990038447668987| Organic Search|1472693370|          1|2016-08-31|           1|
|1704269846042149545|       Referral|1472681704|          1|2016-08-31|           1|
|1124125006906712847|         Direct|1472659016|          3|2016-08-31|           1|
| 330289116549575054|       Referral|1472669071|         92|2016-08-31|           1|
|9972043774359472649|       Referral|1472627689|          3|2016-08-31|           1|
|3906774598490786925|       Referral|1472670219|          4|2016-08-31|           1|
|3616014667867180410|         Direct|1472665638|          2|2016-08-31|           1|
|8445777031468826793| Organic Search|1472668347|          2|2016-08-31|           1|
|4332670999936880007| Organic Search|1472652183|          3|2016-08-31|           1|
|0460646718150298450|       Referral|1472649010|          3|2016-08-31|           1|
|3028610092281426822|       Referral|1472684946|          3|2016-08-31|           1|
|1235024616645642362|       Referral|1472674007|          1|2016-08-31|           1|
|4785716484578798827|       Referral|1472658354|          4|2016-08-31|           1|
|8961785775115413973|       Referral|1472663016|          2|2016-08-31|           1|
|2070184068182824137|       Referral|1472674189|          1|2016-08-31|           1|
|2022590988318962074|       Referral|1472707485|          2|2016-08-31|           3|
|   5791622267997002|        Display|1472681820|          2|2016-08-31|           1|
+-------------------+---------------+----------+-----------+----------+------------+
only showing top 20 rows

In [46]:
userTransactionWindow = Window.partitionBy(transactionsInfo['fullVisitorId']).orderBy(transactionsInfo['pdate'].asc(),transactionsInfo['visitNumber'].asc())

transactionOrder = row_number().over(userTransactionWindow) 

orderedTransactions = transactionsInfo.select('fullVisitorId', 'channelGrouping', 'pdate', 'visitId', transactionOrder.alias('rank'))
orderedTransactions.show()
+-------------------+---------------+----------+----------+----+
|      fullVisitorId|channelGrouping|     pdate|   visitId|rank|
+-------------------+---------------+----------+----------+----+
|0205723721674321991|       Referral|2017-04-28|1493433632|   1|
|0535797652559788641| Organic Search|2017-07-21|1500695352|   1|
|0829523931932991940|         Direct|2016-11-18|1479505183|   1|
| 085830972238265715| Organic Search|2017-04-12|1492013037|   1|
| 192269026404761198|         Direct|2017-02-14|1487104486|   1|
|1944533426362788043|         Direct|2016-09-21|1474514449|   1|
|2266837403735683845| Organic Search|2017-07-26|1501136533|   1|
|3090631969460248576|       Referral|2016-10-27|1477592443|   1|
|3213840074316400693|       Referral|2016-08-01|1470057237|   1|
|3220580369738167789|       Referral|2016-12-20|1482254794|   1|
|3575773830913742237| Organic Search|2017-07-08|1499575490|   1|
|3745754200111893080|       Referral|2016-08-09|1470754745|   1|
|4100254762425335988|       Referral|2017-03-10|1489160019|   1|
|4100254762425335988|       Referral|2017-03-10|1489164958|   2|
|4132503829043338043|       Referral|2016-08-15|1471281754|   1|
|4186017304980916856|       Referral|2016-12-11|1481482661|   1|
|4389802785459743912|       Referral|2016-10-02|1475412093|   1|
| 439449392549956950|       Referral|2016-09-20|1474407251|   1|
| 449754892287493650|         Direct|2017-05-22|1495461812|   1|
|4625952167494304032|         Social|2017-04-28|1493419483|   1|
+-------------------+---------------+----------+----------+----+
only showing top 20 rows

In [21]:
firstChannel = orderedTransactions.select('fullVisitorId','channelGrouping').where("rank = 1")

firstChannel.groupBy('channelGrouping').count().show()
+---------------+-----+
|channelGrouping|count|
+---------------+-----+
|        Display|  113|
|     Affiliates|    9|
|         Social|   98|
|         Direct| 1715|
|       Referral| 4555|
|        (Other)|    1|
|    Paid Search|  418|
| Organic Search| 3113|
+---------------+-----+

In [23]:
secondChannel = orderedTransactions.select('fullVisitorId','channelGrouping').where("rank > 1").groupBy('fullVisitorId','channelGrouping').count().withColumnRenamed('channelGrouping','followChannel')

secondChannel.groupBy('followChannel').count().show()
+--------------+-----+
| followChannel|count|
+--------------+-----+
|       Display|   15|
|        Social|    6|
|        Direct|  157|
|      Referral|  557|
|   Paid Search|   42|
|Organic Search|  225|
+--------------+-----+

In [25]:
channelsByUser = firstChannel.join(secondChannel, 'fullVisitorId')
channelsByUser.show()
+-------------------+---------------+--------------+-----+
|      fullVisitorId|channelGrouping| followChannel|count|
+-------------------+---------------+--------------+-----+
|4100254762425335988|       Referral|      Referral|    1|
|6332431001963915083|       Referral|      Referral|    1|
|8851822767477687842|       Referral|      Referral|    5|
|2086918960702630134|       Referral|      Referral|    1|
|2864376493455301677|         Direct|Organic Search|    2|
| 419161784882208907|         Direct|        Direct|    1|
|5149788969578895545|       Referral|      Referral|    1|
| 768998327849146700|       Referral|      Referral|    1|
|7838650270550176403|    Paid Search|   Paid Search|    1|
|4286676770640078553|         Direct|Organic Search|    1|
|9657742098396819155|       Referral|      Referral|    1|
|0608915197735218105|       Referral|      Referral|   12|
|1285137808569536764|         Direct|        Direct|    5|
|2295240953686921101|       Referral|      Referral|    1|
|3345163376310988493|         Direct|        Direct|    3|
| 387849079599046538|       Referral|      Referral|    1|
|5492367789301420845|       Referral|      Referral|    1|
|9890182105631590481|       Referral|      Referral|    2|
|0342282266188517272|       Referral|      Referral|    1|
| 280738376597848400|       Referral|      Referral|    2|
+-------------------+---------------+--------------+-----+
only showing top 20 rows

In [27]:
channelMatrix = channelsByUser.groupBy('channelGrouping','followChannel').agg(sum('count').alias('refs')).orderBy('channelGrouping','followChannel')
channelMatrix.show()
+---------------+--------------+----+
|channelGrouping| followChannel|refs|
+---------------+--------------+----+
|         Direct|        Direct| 334|
|         Direct|       Display|  17|
|         Direct|Organic Search|  26|
|         Direct|   Paid Search|  10|
|         Direct|      Referral|  27|
|         Direct|        Social|   1|
|        Display|        Direct|   1|
|        Display|       Display|   7|
|        Display|Organic Search|   6|
|        Display|   Paid Search|   1|
|        Display|      Referral|  10|
| Organic Search|        Direct|   6|
| Organic Search|       Display|   4|
| Organic Search|Organic Search| 267|
| Organic Search|   Paid Search|  17|
| Organic Search|      Referral|  12|
| Organic Search|        Social|   1|
|    Paid Search|       Display|   1|
|    Paid Search|Organic Search|  13|
|    Paid Search|   Paid Search|  19|
+---------------+--------------+----+
only showing top 20 rows

In [29]:
inChannelW = Window.partitionBy(channelMatrix['channelGrouping'])
tottr = (channelMatrix['refs'] / sum('refs').over(inChannelW))*100

kohChannel = channelMatrix.select('channelGrouping','followChannel', 'refs', tottr.alias('quote')).toPandas()
kohChannel 
Out[29]:
channelGrouping followChannel refs quote
0 Display Direct 1 4.000000
1 Display Display 7 28.000000
2 Display Organic Search 6 24.000000
3 Display Paid Search 1 4.000000
4 Display Referral 10 40.000000
5 Social Social 4 100.000000
6 Direct Direct 334 80.481928
7 Direct Display 17 4.096386
8 Direct Organic Search 26 6.265060
9 Direct Paid Search 10 2.409639
10 Direct Referral 27 6.506024
11 Direct Social 1 0.240964
12 Referral Direct 5 0.671141
13 Referral Display 1 0.134228
14 Referral Organic Search 18 2.416107
15 Referral Paid Search 4 0.536913
16 Referral Referral 717 96.241611
17 Paid Search Display 1 2.941176
18 Paid Search Organic Search 13 38.235294
19 Paid Search Paid Search 19 55.882353
20 Paid Search Referral 1 2.941176
21 Organic Search Direct 6 1.954397
22 Organic Search Display 4 1.302932
23 Organic Search Organic Search 267 86.970684
24 Organic Search Paid Search 17 5.537459
25 Organic Search Referral 12 3.908795
26 Organic Search Social 1 0.325733
In [30]:
# X-Achse stellt den Reengaging-Channel
fig = go.Figure(data=[go.Heatmap(
                   z=kohChannel['quote'],
                   x=kohChannel['followChannel'],
                   y=kohChannel['channelGrouping'])])

iplot(fig)

2. Wann werden Erstkäufer wieder aktiv?

In [37]:
firstMonth = orderedTransactions.withColumn("month", month("pdate")).withColumn("year", year("pdate")).select('fullVisitorId',expr( "month + ((year - 2016)*12)").alias("period")).where("rank = 1")

followMonth = orderedTransactions.withColumn("month", month("pdate")).withColumn("year", year("pdate")).select('fullVisitorId',expr( "month + ((year - 2016)*12)").alias("follow_period")).where("rank > 1").groupBy('fullVisitorId',"follow_period").count()
In [38]:
joinedm = firstMonth.join(followMonth, 'fullVisitorId')

monthMatrix = joinedm.groupBy('period','follow_period').agg(sum('count').alias('refs')).orderBy('period','follow_period')
In [40]:
monthW =Window.partitionBy(monthMatrix['period'])
totm = (monthMatrix['refs'] / sum('refs').over(monthW))*100

kohMonat = monthMatrix.select('period','follow_period', 'refs', totm.alias('quote')).toPandas()
kohMonat
Out[40]:
period follow_period refs quote
0 12 12 68 47.552448
1 12 13 24 16.783217
2 12 14 18 12.587413
3 12 15 18 12.587413
4 12 16 3 2.097902
5 12 17 6 4.195804
6 12 18 3 2.097902
7 12 19 3 2.097902
8 13 13 21 35.593220
9 13 14 17 28.813559
10 13 15 13 22.033898
11 13 16 2 3.389831
12 13 17 4 6.779661
13 13 18 2 3.389831
14 16 16 41 41.000000
15 16 17 44 44.000000
16 16 18 12 12.000000
17 16 19 3 3.000000
18 20 20 1 100.000000
19 19 19 60 90.909091
20 19 20 6 9.090909
21 15 15 38 36.893204
22 15 16 36 34.951456
23 15 17 11 10.679612
24 15 18 15 14.563107
25 15 19 3 2.912621
26 9 9 58 50.434783
27 9 10 34 29.565217
28 9 11 8 6.956522
29 9 12 7 6.086957
... ... ... ... ...
46 8 17 2 0.847458
47 8 18 2 0.847458
48 10 10 43 20.283019
49 10 11 52 24.528302
50 10 12 33 15.566038
51 10 13 19 8.962264
52 10 14 18 8.490566
53 10 15 18 8.490566
54 10 16 12 5.660377
55 10 17 10 4.716981
56 10 18 5 2.358491
57 10 19 2 0.943396
58 11 11 36 18.947368
59 11 12 43 22.631579
60 11 13 34 17.894737
61 11 14 13 6.842105
62 11 15 31 16.315789
63 11 16 12 6.315789
64 11 17 13 6.842105
65 11 18 2 1.052632
66 11 19 5 2.631579
67 11 20 1 0.526316
68 14 14 33 26.612903
69 14 15 33 26.612903
70 14 16 24 19.354839
71 14 17 14 11.290323
72 14 18 13 10.483871
73 14 19 7 5.645161
74 18 18 47 53.409091
75 18 19 41 46.590909

76 rows × 4 columns

In [41]:
iplot(go.Figure(data=[go.Heatmap(
                   z=kohMonat['quote'],
                   x=kohMonat['follow_period'],
                   y=kohMonat['period'])]))

3. Welche Produkte bewegen zum Zweitkauf?

In [43]:
transactionHits = ga_data.select('fullVisitorId', 'visitId',expr("totals.transactions").cast('integer'), explode("hits").alias("hit"), expr("hit.product"), expr("hit.transaction.transactionRevenue")).fillna(0).filter("transactions > 0 AND transactionRevenue > 0")
In [44]:
productVisit = transactionHits.select('visitId', explode("product").alias('productext'), expr("productext.v2ProductName"))
In [48]:
firstProds = orderedTransactions.select('visitId').where("rank == 1").join(productVisit, 'visitId').groupBy('v2ProductName').count().orderBy(expr('count').desc())
firstProds.show(truncate=False)
+----------------------------------------------------+-----+
|v2ProductName                                       |count|
+----------------------------------------------------+-----+
|Google Sunglasses                                   |1080 |
|Google Laptop and Cell Phone Stickers               |681  |
|Google Men's 100% Cotton Short Sleeve Hero Tee White|631  |
|Google 22 oz Water Bottle                           |610  |
|Google Men's 100% Cotton Short Sleeve Hero Tee Black|563  |
|Google Men's 100% Cotton Short Sleeve Hero Tee Navy |485  |
|Google Men's  Zip Hoodie                            |391  |
|BLM Sweatshirt                                      |377  |
|Engraved Ceramic Google Mug                         |336  |
|26 oz Double Wall Insulated Bottle                  |333  |
|Google Twill Cap                                    |327  |
|Google Men's Vintage Badge Tee Black                |322  |
|YouTube Custom Decals                               |310  |
|Red Shine 15 oz Mug                                 |307  |
|Google Men's 100% Cotton Short Sleeve Hero Tee Red  |296  |
|Recycled Paper Journal Set                          |282  |
|Recycled Mouse Pad                                  |281  |
|Google Men's Bike Short Sleeve Tee Charcoal         |251  |
|Foam Can and Bottle Cooler                          |248  |
|Google Women's Short Sleeve Hero Tee White          |241  |
+----------------------------------------------------+-----+
only showing top 20 rows

In [50]:
followProds = orderedTransactions.select('visitId').where("rank > 1").join(productVisit, 'visitId').groupBy('v2ProductName').count().orderBy(expr('count').desc())
followProds.show(truncate=False)
+----------------------------------------------------+-----+
|v2ProductName                                       |count|
+----------------------------------------------------+-----+
|Google Sunglasses                                   |264  |
|Google 22 oz Water Bottle                           |165  |
|Recycled Paper Journal Set                          |147  |
|Google Men's 100% Cotton Short Sleeve Hero Tee Black|139  |
|Google Men's 100% Cotton Short Sleeve Hero Tee White|114  |
|Google Laptop and Cell Phone Stickers               |100  |
|Maze Pen                                            |86   |
|Windup Android                                      |78   |
|Google Men's 100% Cotton Short Sleeve Hero Tee Navy |73   |
|1 oz Hand Sanitizer                                 |69   |
|Sport Bag                                           |67   |
|Google Men's 100% Cotton Short Sleeve Hero Tee Red  |66   |
|Google Twill Cap                                    |65   |
|Google Men's  Zip Hoodie                            |64   |
|Badge Holder                                        |63   |
|Google Men's Vintage Badge Tee Black                |63   |
|Recycled Mouse Pad                                  |62   |
|Engraved Ceramic Google Mug                         |62   |
|Google Kick Ball                                    |62   |
|Foam Can and Bottle Cooler                          |58   |
+----------------------------------------------------+-----+
only showing top 20 rows

Merkmale von Wiederkäufer

In [51]:
transactionFeatures = ga_data.select('fullVisitorId',"device","geoNetwork","socialEngagementType", "trafficSource", 'visitId', 'visitNumber', to_date("date", "yyyyMMdd").alias("pdate"), expr("totals.transactions").cast('integer')).fillna(0).filter("transactions > 0")

transactionOrder = row_number().over(Window.partitionBy(transactionFeatures['fullVisitorId']).orderBy(transactionFeatures['pdate'].asc(),transactionFeatures['visitNumber'].asc()))
transactionFeatures = transactionFeatures.withColumn("rank", transactionOrder)

1. Used Device

In [52]:
firstDev = transactionFeatures.where("rank = 1").groupBy("device.deviceCategory").count().orderBy(desc("count")).toPandas()
iplot(go.Figure(layout=dict(title='Erstkäufer: Geräteart') , data= [go.Pie(labels=firstDev["deviceCategory"], values=firstDev["count"])]))
In [53]:
secondDev = transactionFeatures.where("rank > 1").groupBy("device.deviceCategory").count().orderBy(desc("count")).toPandas()
iplot(go.Figure(layout=dict(title='Zweitkäufer: Geräteart') , data= [go.Pie(labels=secondDev["deviceCategory"], values=secondDev["count"])]))

2. Browser

In [54]:
frstBrowser = transactionFeatures.where("rank = 1").groupBy("device.browser").count().orderBy(desc("count")).toPandas()
iplot(go.Figure(layout=dict(title='Erstkäufer: Browser') , data= [go.Pie(labels=frstBrowser["browser"], values=frstBrowser["count"])]))
In [55]:
ndBrowser = transactionFeatures.where("rank > 1").groupBy("device.browser").count().orderBy(desc("count")).toPandas()
iplot(go.Figure(layout=dict(title='Zweitkäufer: Browser') , data= [go.Pie(labels=ndBrowser["browser"], values=ndBrowser["count"])]))

3. Herkunftsland

In [57]:
transactionFeatures.where("rank == 1").groupBy("geoNetwork.country").count().orderBy(desc("count")).show()
+--------------+-----+
|       country|count|
+--------------+-----+
| United States| 9523|
|        Canada|  173|
|     Venezuela|   40|
|        Mexico|   19|
|         Japan|   17|
|        Taiwan|   15|
|United Kingdom|   15|
|     Australia|   12|
|   Puerto Rico|   11|
|         India|   10|
|     Singapore|   10|
|        Brazil|   10|
|     Hong Kong|    9|
|     Indonesia|    9|
|   Switzerland|    9|
|   South Korea|    8|
|       Germany|    7|
|      Colombia|    7|
|         China|    6|
|       Ireland|    6|
+--------------+-----+
only showing top 20 rows

In [56]:
transactionFeatures.where("rank > 1").groupBy("geoNetwork.country").count().orderBy(desc("count")).show()
+--------------+-----+
|       country|count|
+--------------+-----+
| United States| 1466|
|     Venezuela|   23|
|        Canada|   18|
|        Taiwan|    4|
|     Indonesia|    2|
|       Belgium|    2|
|     (not set)|    2|
|       Germany|    1|
|     Argentina|    1|
|       Nigeria|    1|
|      Thailand|    1|
|     Nicaragua|    1|
|        Mexico|    1|
|         Italy|    1|
|     Australia|    1|
|      Colombia|    1|
|        Israel|    1|
|       Georgia|    1|
|         Kenya|    1|
|United Kingdom|    1|
+--------------+-----+