E-commerce Prices Analysis
Demographic Approach
Project description
Situation
The analyzed data set was collected to better understand the distribution of customers considering their nationality, gender, age group or city. The data was extracted from Google Big Data and didn’t have relevant noise, so the only change performed was to create a new column to classify each customer according to age, splitting them by 6 distinct age groups: teens, twenties, thirties, forties, fifties and elders.
Task
Below it is given more details about the key questions to be analyzed and answered, which pretend to identify possible trends and patterns related with customers' behavior and their gender or nationality, among other characteristics. The main task was performed by asking 15 questions mentioned along the report created. Here we are just focus on 5 of them. To access to the complete report, please check my github project.
Action
In order to achieve the final results, and implement my analysis, the data was cleaned using python and to perform the visualizations, looker studio was used. Below we can find the main findings for 5 of the 15 total questions.
Result
Generally speaking, it was found that, considering gender, in general, men tend to spend more money and buy more expensive products. The top 3 countries with more money spent are China, United States and Brazil, but France and the United Kingdom are the countries with more money spent by person, for men and women respectively.
Problem definition
For this specific topic, the data design started with the extraction of all tables from the BigQuery database, namely: distribution_centers, events, inventory_items, order_items, orders, products and users. The extraction was performed using bigquery and SQL and saved on .csv files, splitted according to the total file size permitted. After having the .csv files, the dataset was analyzed using python and in this case, it wasn’t needed to perform the cleaning of missing/corrupted data. It was just created new columns used to facilitate the data analysis file “age_group”, according to the customer age. In the end, we used 2 main datasets: one that joined users’ tables with products’ table and another which joined users’ tables with order_items table. At the end, specific datasets were created and saved in .csv files to create visualizations in Looker.
Data Design
The problem is more clearly defined if some questions to be answered are defined, so next we have the main questions we want to discuss along the report and try to find patterns and trends to better understand what are the customers behavior according their ages, genders or nationality. So, we have considered the questions below:
1. What is the distribution of nationality, age and gender among our customer base?
5. How are the product’s categories distributed among gender and age groups?
7. Considering the 4 most popular categories among both genders, how much money was spent on such products?
11. How the number of orders created varies along the time for the most 5 popular countries?
15. What happens to the total retail price if we double the sales of Calvin Klein, Carhartt, Diesel, True Religion and 7 for All Mankind, or do we do the same for most popular products among men and women?
Analysis and Conclusions
After a intensive and deep exploration and analysis of the data, one can highlight the follow main conclusions:
● Genders are distributed almost equally
● Regarding the total retail price and number of orders, China, USA and Brazil are the 3 countries
● We have a clear difference between the behavior of men and women regarding products’ categories and brands
● If we estimate the increasement of most popular sold product, the total price has a growth sound 43% for men and 27% for women
● In general the elderly men are the ones who tend to buy more expensive products
Recommended actions
Apart from the conclusions mentioned before, it is certain this data set has relevant information about the customers, but it would be interesting to collect more data to study for example the social class of each customer or job title. Furthermore, some of the data could be studied in more detail and it would be interesting to perform the actions mentioned below.
● Regarding further actions, it would be interesting to study in more detail the behavior of each city population to find pattern
● The difference among different group ages could be analyze in more detail
● Regarding the insights, It is clear that if we pretend to increase the sales and the total retail price earn, we should target the increment of sales for most popular products among elderly men and in the countries were people tend to spend more expensive products