E-commerce Prices Analysis

Demographic Approach

Project description

Situation

  The analyzed data set was collected to better understand the distribution of customers considering their nationality, gender, age group or city. The data was extracted from Google Big Data and didn’t have relevant noise, so the only change performed was to create a new column to classify each customer according to age, splitting them by 6 distinct age groups: teens, twenties, thirties, forties, fifties and elders.

Task

  Below it is given more details about the key questions to be analyzed and answered, which pretend to identify possible trends and patterns related with customers' behavior and their gender or nationality, among other characteristics. The main task was performed by asking 15 questions mentioned along the report created. Here we are just focus on 5 of them. To access to the complete report, please check my github project.

Action

  In order to achieve the final results, and implement my analysis, the data was cleaned using python and to perform the visualizations, looker studio was used. Below we can find the main findings for 5 of the 15 total questions.

Result

  Generally speaking, it was found that, considering gender, in general, men tend to spend more money and buy more expensive products. The top 3 countries with more money spent are China, United States and Brazil, but France and the United Kingdom are the countries with more money spent by person, for men and women respectively.

Problem definition

  For this specific topic, the data design started with the extraction of all tables from the BigQuery database, namely: distribution_centers, events, inventory_items, order_items, orders, products and users. The extraction was performed using bigquery and SQL and saved on .csv files, splitted according to the total file size permitted. After having the .csv files, the dataset was analyzed using python and in this case, it wasn’t needed to perform the cleaning of missing/corrupted data. It was just created new columns used to facilitate the data analysis file “age_group”, according to the customer age. In the end, we used 2 main datasets: one that joined users’ tables with products’ table and another which joined users’ tables with order_items table. At the end, specific datasets were created and saved in .csv files to create visualizations in Looker.

Data Design

1. What is the distribution of nationality, age and gender among our customer base?

5. How are the product’s categories distributed among gender and age groups?

7. Considering the 4 most popular categories among both genders, how much money was spent on such products?

11. How the number of orders created varies along the time for the most 5 popular countries?

15. What happens to the total retail price if we double the sales of Calvin Klein, Carhartt, Diesel, True Religion and 7 for All Mankind, or do we do the same for most popular products among men and women?

Analysis and Conclusions

  After a intensive and deep exploration and analysis of the data, one can highlight the follow main conclusions:

    ● Genders are distributed almost equally

    ● Regarding the total retail price and number of orders, China, USA and Brazil are the 3 countries

    ● We have a clear difference between the behavior of men and women regarding products’ categories and brands

    ● If we estimate the increasement of most popular sold product, the total price has a growth sound 43% for men and     27% for women

    ● In general the elderly men are the ones who tend to buy more expensive products