{"id":2004,"date":"2022-09-16T08:22:14","date_gmt":"2022-09-16T06:22:14","guid":{"rendered":"https:\/\/thedatastory.nl\/?p=2004"},"modified":"2025-09-18T15:29:00","modified_gmt":"2025-09-18T13:29:00","slug":"clustering-users-based-on-interaction-categories","status":"publish","type":"post","link":"https:\/\/thedatastory.nl\/nl\/data-stories\/clustering-users-based-on-interaction-categories\/","title":{"rendered":"Clustering users based on interaction categories"},"content":{"rendered":"\n<p>Once we know a user\u2019s interests, it is much easier to target them personally, likely also more effectively. With this goal in mind we made an attempt to cluster users based on the categories of their interactions with content using Principal Component Analysis (PCA). In this blogpost we will guide you through a Jupyter Notebook. Check out <a href=\"https:\/\/github.com\/The-Data-Story\/Blog\/tree\/main\/Clustering\">our GitHub repository<\/a> to follow along or try it for yourself! The code is split up into five segments:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Data collection<\/li>\n\n\n\n<li>Data preparation (feature scaling)<\/li>\n\n\n\n<li>Choosing a number of components for PCA<\/li>\n\n\n\n<li>Performing PCA and visualising it<\/li>\n\n\n\n<li>Assigning each user to a cluster<\/li>\n<\/ol>\n\n\n\n<p><\/p>\n\n\n\n<p>We will shortly get into the specifics of PCA but, before we do so, let\u2019s first look at some use case examples for the clustering we are going to discuss. Let\u2019s say you run a blog and you assign categories to each article you write. Or imagine you have a webshop in which all products belong to categories. It would be interesting to find if there are any patterns in user interest. If a noteworthy number of users interested in category A is also interested in category D, we can recommend category D to users that have only interacted with category A and vice versa.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"2560\" height=\"1707\" src=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Sliced-Meat-Photo-scaled.jpg\" alt=\"Mise en place\" class=\"wp-image-3723\" style=\"width:840px;height:398px\" srcset=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Sliced-Meat-Photo-scaled.jpg 2560w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Sliced-Meat-Photo-300x200.jpg 300w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Sliced-Meat-Photo-1024x683.jpg 1024w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Sliced-Meat-Photo-768x512.jpg 768w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Sliced-Meat-Photo-1536x1024.jpg 1536w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Sliced-Meat-Photo-2048x1365.jpg 2048w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Sliced-Meat-Photo-18x12.jpg 18w\" sizes=\"(max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Source: UNL Food<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><em>Clustering in Machine Learning &#8211; Principal Component Analysis<\/em><\/strong><\/h2>\n\n\n\n<p>When we don\u2019t know a whole lot about data yet, we can use a Machine Learning technique called Clustering to learn more about it. Just as books and music have genres, we can group unlabelled data points into clusters that we might understand more easily. So-called clustering algorithms have different methods, but all have one thing in common: their aim is to maximise similarity between data points, forming groups with maximum similarity. There are different ways to measure similarity, such as the distance between two points. In a two-dimensional plot, for instance, two points with small distance have great similarity.<\/p>\n\n\n\n<p>Too many ingredients in a recipe makes it a lot more difficult to follow. The same goes for clustering algorithms. Once the number of dimensions increases, it becomes increasingly complex to measure similarity and thus harder to perform clustering. To illustrate this, imagine drawing a line of best fit in two dimensions: fairly doable. Now imagine trying to do so in three dimensions: harder but still doable. Keep increasing the number of dimensions and you understand the point. To combat this there are techniques to reduce the number of dimensions, often used prior to clustering algorithms like K-Means to make them more effective.<\/p>\n\n\n\n<p>One of these techniques is PCA. In PCA, principal components are constructed: linear combinations of the initial variables. The point of these components is that they are uncorrelated and contain most of the information from the initial variables. PCA tries to put as much of the information into the first component, as much of the remaining information into the second and so forth. More specifically, it tries to represent all data vectors as a linear combination of eigenvectors, minimising the mean-squared error. To learn more about the mathematics behind PCA, take a look at the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Principal_component_analysis#:~:text=Principal%20component%20analysis%20(PCA)%20is,components%20and%20ignoring%20the%20rest.\">Wikipedia page<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><em>Collecting and preparing our data for PCA<\/em><\/strong><\/h2>\n\n\n\n<p>As shown below, our input table should have all users in rows and columns for each category. Its values should indicate how many times a user has interacted with a category. In the Jupyter Notebook you will find that you can either import this table from BigQuery or from a .csv file. To learn more about necessary credentials for importing through BigQuery, read the <a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/authentication\/service-account-file\">following article<\/a> in the BigQuery docs. Once you have created a Dataframe, we can start our preparation.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"918\" height=\"195\" src=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Screenshot-2022-08-16-at-13.50.10.png\" alt=\"Clustering users\" class=\"wp-image-2018\" srcset=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Screenshot-2022-08-16-at-13.50.10.png 918w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Screenshot-2022-08-16-at-13.50.10-300x64.png 300w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Screenshot-2022-08-16-at-13.50.10-768x163.png 768w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/Screenshot-2022-08-16-at-13.50.10-18x4.png 18w\" sizes=\"(max-width: 918px) 100vw, 918px\" \/><\/figure>\n\n\n\n<p>First, we will clean up the imported DataFrame using the methods in the Jupyter Notebook. We then perform feature scaling, also known as data normalisation. PCA, amongst many other machine learning algorithms, is very sensitive to outliers: if there is a point with an exceptional distance from the other points, the result will be dominated by this point, likely giving misleading results. Therefore, we normalise the range of all features to make sure they each contribute more proportionately to the final distance. To learn more about feature scaling, take a look at the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Feature_scaling\">Wikipedia page<\/a>. The code and result of this preparation is shown below.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"402\" src=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/2-screenshot-2022-08-23-at-10.45.46-1024x402.png\" alt=\"\" class=\"wp-image-2019\" srcset=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/2-screenshot-2022-08-23-at-10.45.46-1024x402.png 1024w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/2-screenshot-2022-08-23-at-10.45.46-300x118.png 300w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/2-screenshot-2022-08-23-at-10.45.46-768x301.png 768w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/2-screenshot-2022-08-23-at-10.45.46-18x7.png 18w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/2-screenshot-2022-08-23-at-10.45.46.png 1292w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>We fetch the Dataframe from a CSV by calling csv_to_df() and handing it the filename, value separator and whether the first row of the CSV contains the column names so that it can be treated differently. We then have the possibility to remove columns using df_prep() and use scale() to perform feature scaling, handing it a list of column names to help us understand the results.<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><em>PCA<\/em><\/strong><\/h2>\n\n\n\n<p>We now finally have our \u2018mise en place\u2019 ready and can get cooking. Let\u2019s turn on the stove and start PCA. We first need to find out how hot the stove should be. What number of components do we want to reduce our dimensions to? This number should be as small as possible, while keeping in mind that we should maintain sufficient information from our original data. We don\u2019t want too many ingredients when cooking so we skip some, but we do want to keep as much of the original taste as possible. To do this, we calculate the explained variance for each component. In simple terms, the explained variance tells us how much of our data is represented by a component. <\/p>\n\n\n\n<p>Opinions differ across the board and it depends on your data, but good practice is to keep at least around 80% of your variance. We plot the explained variance per component as shown below and as you can see, we need at least 6 components to keep 80% of our variance. As the number of components gets higher, you can see that each additional component contributes less to the total variance. The difference in variance indicates how significant a component is.<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" width=\"469\" height=\"327\" src=\"https:\/\/lh3.googleusercontent.com\/laETImEbmm__GxcA3FUKGOm_CFVMSkF7KTLUh5KtoFE3iPMdloo_xcISb0qXb7qsHx6CSu0RVoO6wMdLvmCENtNjTx4P0VWsBCtwpkqeipxG1I7cyu2OSERFGkw_PsJoW6BgF1IsivFsDZgGEH0ZPxrAnhYv1Fdh2ctSBF7vdoYYk9A-zJpujgoA\"><\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"193\" src=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/3Screenshot-2022-08-23-at-11.11.46-1024x193.png\" alt=\"Clustering users\" class=\"wp-image-2020\" srcset=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/3Screenshot-2022-08-23-at-11.11.46-1024x193.png 1024w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/3Screenshot-2022-08-23-at-11.11.46-300x57.png 300w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/3Screenshot-2022-08-23-at-11.11.46-768x145.png 768w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/3Screenshot-2022-08-23-at-11.11.46-18x3.png 18w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/3Screenshot-2022-08-23-at-11.11.46.png 1299w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>We call a single function to get this chart. As you can see, in our case, we called it with n=12<\/em><\/figcaption><\/figure>\n\n\n\n<p>Now that we have chosen to perform PCA with 6 components, we can run our function. After doing so, we can plot a heatmap to visualise each component. The darker a cell, the more significantly its category is represented in that component. We can now see what links there are between categories. As principal component 2 below shows, someone interested in category D is likely also interested in category I.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"845\" height=\"684\" src=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap.png\" alt=\"Clustering users\" class=\"wp-image-2021\" style=\"width:607px;height:490px\" srcset=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap.png 845w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap-300x243.png 300w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap-768x622.png 768w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap-15x12.png 15w\" sizes=\"(max-width: 845px) 100vw, 845px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"206\" src=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/5Screenshot-2022-08-23-at-11.14.30-1024x206.png\" alt=\"\" class=\"wp-image-2022\" srcset=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/5Screenshot-2022-08-23-at-11.14.30-1024x206.png 1024w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/5Screenshot-2022-08-23-at-11.14.30-300x60.png 300w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/5Screenshot-2022-08-23-at-11.14.30-768x155.png 768w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/5Screenshot-2022-08-23-at-11.14.30-18x4.png 18w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/5Screenshot-2022-08-23-at-11.14.30.png 1295w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>Before plotting the heatmap, we run the above function. As you can see, in our case we selected n=6<\/em><\/figcaption><\/figure>\n\n\n\n<p>Now that we have our principal components, we can assign each user to a component and, by doing so, cluster them. To do this, we calculate the Mean Squared Error (MSE) between each component and the row of user interactions and find the component with the lowest MSE, the row that is most similar. We do want to find the spread of our users over these clusters to see whether the result is even at all useful.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"418\" src=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/6Screenshot-2022-08-23-at-11.16.27-1024x418.png\" alt=\"\" class=\"wp-image-2023\" srcset=\"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/6Screenshot-2022-08-23-at-11.16.27-1024x418.png 1024w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/6Screenshot-2022-08-23-at-11.16.27-300x122.png 300w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/6Screenshot-2022-08-23-at-11.16.27-768x313.png 768w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/6Screenshot-2022-08-23-at-11.16.27-18x7.png 18w, https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/6Screenshot-2022-08-23-at-11.16.27.png 1302w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\"><em>The two functions that are run to find the lowest MSE and assign each user a principal component.<\/em><\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><strong><em>What\u2019s next?<\/em><\/strong><\/h2>\n\n\n\n<p>Now that we have segmented our audience, we can target them more personally and start questioning where to go from here. Which user group is the most active? Which user group has the best conversion rate? Is the content consumed by this group better? What user group should we focus on? <\/p>\n\n\n\n<p><\/p>\n\n\n\n<p>Follow <a href=\"https:\/\/thedatastory.nl\/en\/\">The Data Story<\/a> on <a href=\"https:\/\/www.linkedin.com\/company\/the-data-story\">LinkedIn<\/a> to stay up to date on our blogs!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Once we know a user\u2019s interests, it is much easier to target them personally, likely also more effectively. With this goal in mind we made an attempt to cluster users [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":2021,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_price":"","_stock":"","_tribe_ticket_header":"","_tribe_default_ticket_provider":"","_tribe_ticket_capacity":"0","_ticket_start_date":"","_ticket_end_date":"","_tribe_ticket_show_description":"","_tribe_ticket_show_not_going":false,"_tribe_ticket_use_global_stock":"","_tribe_ticket_global_stock_level":"","_global_stock_mode":"","_global_stock_cap":"","_tribe_rsvp_for_event":"","_tribe_ticket_going_count":"","_tribe_ticket_not_going_count":"","_tribe_tickets_list":"[]","_tribe_ticket_has_attendee_info_fields":false,"footnotes":""},"categories":[2],"tags":[],"class_list":["post-2004","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-stories"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Clustering users based on interaction categories - The Data Story<\/title>\n<meta name=\"description\" content=\"In this blogpost we will guide you to clustering users through a Jupyter Notebook. The code is split up into five segments.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/thedatastory.nl\/nl\/data-stories\/clustering-users-based-on-interaction-categories\/\" \/>\n<meta property=\"og:locale\" content=\"nl_NL\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Clustering users based on interaction categories - The Data Story\" \/>\n<meta property=\"og:description\" content=\"In this blogpost we will guide you to clustering users through a Jupyter Notebook. The code is split up into five segments.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/thedatastory.nl\/nl\/data-stories\/clustering-users-based-on-interaction-categories\/\" \/>\n<meta property=\"og:site_name\" content=\"The Data Story\" \/>\n<meta property=\"article:published_time\" content=\"2022-09-16T06:22:14+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-18T13:29:00+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap.png\" \/>\n\t<meta property=\"og:image:width\" content=\"845\" \/>\n\t<meta property=\"og:image:height\" content=\"684\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"The Data Story\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Geschreven door\" \/>\n\t<meta name=\"twitter:data1\" content=\"The Data Story\" \/>\n\t<meta name=\"twitter:label2\" content=\"Geschatte leestijd\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/\"},\"author\":{\"name\":\"The Data Story\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#\\\/schema\\\/person\\\/e218295bc2730947de16723cb69d18f3\"},\"headline\":\"Clustering users based on interaction categories\",\"datePublished\":\"2022-09-16T06:22:14+00:00\",\"dateModified\":\"2025-09-18T13:29:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/\"},\"wordCount\":1262,\"publisher\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/thedatastory.nl\\\/wp-content\\\/uploads\\\/2022\\\/09\\\/heatmap.png\",\"articleSection\":[\"Data stories\"],\"inLanguage\":\"nl-NL\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/\",\"url\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/\",\"name\":\"Clustering users based on interaction categories - The Data Story\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/thedatastory.nl\\\/wp-content\\\/uploads\\\/2022\\\/09\\\/heatmap.png\",\"datePublished\":\"2022-09-16T06:22:14+00:00\",\"dateModified\":\"2025-09-18T13:29:00+00:00\",\"description\":\"In this blogpost we will guide you to clustering users through a Jupyter Notebook. The code is split up into five segments.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/#breadcrumb\"},\"inLanguage\":\"nl-NL\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/#primaryimage\",\"url\":\"https:\\\/\\\/thedatastory.nl\\\/wp-content\\\/uploads\\\/2022\\\/09\\\/heatmap.png\",\"contentUrl\":\"https:\\\/\\\/thedatastory.nl\\\/wp-content\\\/uploads\\\/2022\\\/09\\\/heatmap.png\",\"width\":845,\"height\":684},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/data-stories\\\/clustering-users-based-on-interaction-categories\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data stories\",\"item\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/data-stories\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Clustering users based on interaction categories\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#website\",\"url\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/\",\"name\":\"The Data Story\",\"description\":\"Data Analyse, Visualisatie &amp; Automation\",\"publisher\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"nl-NL\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#organization\",\"name\":\"The Data Story\",\"url\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/thedatastory.nl\\\/wp-content\\\/uploads\\\/2021\\\/11\\\/Logo-negatief.svg\",\"contentUrl\":\"https:\\\/\\\/thedatastory.nl\\\/wp-content\\\/uploads\\\/2021\\\/11\\\/Logo-negatief.svg\",\"width\":250,\"height\":49,\"caption\":\"The Data Story\"},\"image\":{\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/thedatastory.nl\\\/en\\\/#\\\/schema\\\/person\\\/e218295bc2730947de16723cb69d18f3\",\"name\":\"The Data Story\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"nl-NL\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/3b7086714fe59d0954156d3cedeeaf914f3630d097c0c9587f9fb4793ce63818?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/3b7086714fe59d0954156d3cedeeaf914f3630d097c0c9587f9fb4793ce63818?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/3b7086714fe59d0954156d3cedeeaf914f3630d097c0c9587f9fb4793ce63818?s=96&d=mm&r=g\",\"caption\":\"The Data Story\"},\"url\":\"https:\\\/\\\/thedatastory.nl\\\/nl\\\/author\\\/the-data-story\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Clustering users based on interaction categories - The Data Story","description":"In this blogpost we will guide you to clustering users through a Jupyter Notebook. The code is split up into five segments.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/thedatastory.nl\/nl\/data-stories\/clustering-users-based-on-interaction-categories\/","og_locale":"nl_NL","og_type":"article","og_title":"Clustering users based on interaction categories - The Data Story","og_description":"In this blogpost we will guide you to clustering users through a Jupyter Notebook. The code is split up into five segments.","og_url":"https:\/\/thedatastory.nl\/nl\/data-stories\/clustering-users-based-on-interaction-categories\/","og_site_name":"The Data Story","article_published_time":"2022-09-16T06:22:14+00:00","article_modified_time":"2025-09-18T13:29:00+00:00","og_image":[{"width":845,"height":684,"url":"http:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap.png","type":"image\/png"}],"author":"The Data Story","twitter_card":"summary_large_image","twitter_misc":{"Geschreven door":"The Data Story","Geschatte leestijd":"8 minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/#article","isPartOf":{"@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/"},"author":{"name":"The Data Story","@id":"https:\/\/thedatastory.nl\/en\/#\/schema\/person\/e218295bc2730947de16723cb69d18f3"},"headline":"Clustering users based on interaction categories","datePublished":"2022-09-16T06:22:14+00:00","dateModified":"2025-09-18T13:29:00+00:00","mainEntityOfPage":{"@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/"},"wordCount":1262,"publisher":{"@id":"https:\/\/thedatastory.nl\/en\/#organization"},"image":{"@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/#primaryimage"},"thumbnailUrl":"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap.png","articleSection":["Data stories"],"inLanguage":"nl-NL"},{"@type":"WebPage","@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/","url":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/","name":"Clustering users based on interaction categories - The Data Story","isPartOf":{"@id":"https:\/\/thedatastory.nl\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/#primaryimage"},"image":{"@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/#primaryimage"},"thumbnailUrl":"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap.png","datePublished":"2022-09-16T06:22:14+00:00","dateModified":"2025-09-18T13:29:00+00:00","description":"In this blogpost we will guide you to clustering users through a Jupyter Notebook. The code is split up into five segments.","breadcrumb":{"@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/#breadcrumb"},"inLanguage":"nl-NL","potentialAction":[{"@type":"ReadAction","target":["https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/"]}]},{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/#primaryimage","url":"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap.png","contentUrl":"https:\/\/thedatastory.nl\/wp-content\/uploads\/2022\/09\/heatmap.png","width":845,"height":684},{"@type":"BreadcrumbList","@id":"https:\/\/thedatastory.nl\/data-stories\/clustering-users-based-on-interaction-categories\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/thedatastory.nl\/en\/"},{"@type":"ListItem","position":2,"name":"Data stories","item":"https:\/\/thedatastory.nl\/en\/data-stories\/"},{"@type":"ListItem","position":3,"name":"Clustering users based on interaction categories"}]},{"@type":"WebSite","@id":"https:\/\/thedatastory.nl\/en\/#website","url":"https:\/\/thedatastory.nl\/en\/","name":"The Data Story","description":"Data Analyse, Visualisatie &amp; Automation","publisher":{"@id":"https:\/\/thedatastory.nl\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/thedatastory.nl\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"nl-NL"},{"@type":"Organization","@id":"https:\/\/thedatastory.nl\/en\/#organization","name":"The Data Story","url":"https:\/\/thedatastory.nl\/en\/","logo":{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/thedatastory.nl\/en\/#\/schema\/logo\/image\/","url":"https:\/\/thedatastory.nl\/wp-content\/uploads\/2021\/11\/Logo-negatief.svg","contentUrl":"https:\/\/thedatastory.nl\/wp-content\/uploads\/2021\/11\/Logo-negatief.svg","width":250,"height":49,"caption":"The Data Story"},"image":{"@id":"https:\/\/thedatastory.nl\/en\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/thedatastory.nl\/en\/#\/schema\/person\/e218295bc2730947de16723cb69d18f3","name":"The Data Story","image":{"@type":"ImageObject","inLanguage":"nl-NL","@id":"https:\/\/secure.gravatar.com\/avatar\/3b7086714fe59d0954156d3cedeeaf914f3630d097c0c9587f9fb4793ce63818?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/3b7086714fe59d0954156d3cedeeaf914f3630d097c0c9587f9fb4793ce63818?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/3b7086714fe59d0954156d3cedeeaf914f3630d097c0c9587f9fb4793ce63818?s=96&d=mm&r=g","caption":"The Data Story"},"url":"https:\/\/thedatastory.nl\/nl\/author\/the-data-story\/"}]}},"_links":{"self":[{"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/posts\/2004","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/comments?post=2004"}],"version-history":[{"count":6,"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/posts\/2004\/revisions"}],"predecessor-version":[{"id":3724,"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/posts\/2004\/revisions\/3724"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/media\/2021"}],"wp:attachment":[{"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/media?parent=2004"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/categories?post=2004"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/thedatastory.nl\/nl\/wp-json\/wp\/v2\/tags?post=2004"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}