Home / Blog / Data Science / Software Development Process

Software Development Process

June 26, 2024
38

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Introduction

Analysing website visitor behaviour is known as web analytics. In order to measure web activity, including the use of websites and component parts such web pages, photos, and videos, data must be tracked, reviewed, and reported.

Web analytics serves as a business indicator to identify which items a certain consumer is most likely to purchase and aims to market to those customers who are most likely to purchase a specific product. This enhances the cost-to-revenue ratio for sales and marketing.

Web analytics also monitors user clicks and drill-downs on a website, detects the websites that users visit most frequently, and interacts with the browser online. You can monitor and evaluate your actions. The results of web analytics are provided in the form of tables, charts, and graphics.

There are two categories of web analytics.

Onsite Web Analytics

Onsite Web Analytics is a narrower focus that uses analytics to track the activity of visitors to a particular website and to see the performance of the website. The data collected is typically relevant to the website owner and may contain details about website engagement.

Offsite Web Analytics

Offsite web analytics describes how you track user behaviour on websites other than the one for your company to estimate its prospective clientele. Offsite web analytics offers sector-wide analytics that give information on how a business is compared to its rivals. This is a sort of study that focuses on information gathered from the whole web, including online communities, search engines, and digital platforms.

Log file analysis, also known as log management, is the process of analyzing the data collected from log files to monitor, troubleshoot, and generate reports for website performance. Log files contain a record of almost every action taken on a network server, such as a web server, email server, database server, or file server.

Tagging is the process of using a tag management system to add code snippets to your website's hypertext markup language code to track the interaction between your website's visitors and your website. These code snippets are called tags. When a company adds these tags to their website, they can use them to track various metrics such as page views, unique visitors, and specific products displayed.

Mobile analytics offers companies hitherto unattainable insights into the private lives of app users. In order to gather, store, and analyse data, analytics are often offered as software that interfaces with already-existing company websites and applications. Teams in marketing, sales, and product management need this information to make wiser decisions. Businesses are flying blind without mobile analytics tools. They are unable to know the identities of users, what draws them to a website or app, or why they depart. The mobile device keeps track of visits, visitors, information about the sources of the data, location, when users log in or out, device information, etc.

Mobile analytics helps us with reviews by data, product, and marketing teams.

There are different people involved in using mobile analytics such as the marketing team, UX/UI designers, products, and technical teams.

While planning for a tour with friends, family, or even solo trips, we often search for places that are worth visiting. So, when we search by giving some name such as state, the place name, etc., we'll see some tourism websites or google maps that have location information, packages, reviews and rating, comments, etc.

Let's build a recommender system that recommends some catchy locations according to our preferences and google reviews can be the best choice to build this.

We'll try to build a recommender system by gathering similar clusters from Google review users.

As shown above, we'll import a few packages required such as pandas, NumPy, matplotlib, seaborn, etc.

Explanation

Because we took it from the Google travel review data set, we are using the secondary sourced data.

According to individuals and their various tastes, 24 categories are taken into consideration in the dataset. On Google User Rating, the rating scale is often chosen between 1 and 5. As a result, we are figuring out the average user rating for each category.

'A' refers to attributes
A1: Unique user-id
A2: Average ratings on churches
A3: Average ratings on resorts
A4: Average ratings on beaches
A5: Average ratings on parks
A6: Average ratings on theatres
A7: Average ratings on museums
A8: Average ratings on malls
A9: Average ratings on zoo
A10: Average ratings on restaurants
A11: Average ratings on pubs/bars
A12: Average ratings on local services
A13: Average ratings on burger/pizza shops
A14: Average ratings on hotels/other lodgings
A15: Average ratings on juice bars
A16: Average ratings on art galleries
A17: Average ratings on dance clubs
A18: Average ratings on swimming pools
A19: Average ratings on gyms
A20: Average ratings on bakeries
A21: Average ratings on beauty & spas
A22: Average ratings on cafes
A23: Average ratings on viewpoints
A24: Average ratings on monuments
A25: Average ratings on gardens

After extracting the data, we'll check for top5 content in the dataset.

EDA:

Exploratory data analysis is a crucial phase that may help us understand the data and create hypotheses with the use of summaries and graphical representations.

Look at the below-listed mean, range, standard deviation, etc.

Look for any missing values, and if there are any, remove them.

There are values lacking, such as reviews on burger and pizza joints and gardens. We must thus remove those values.

Plotting will show us how our data is dispersed.

According to the scale, all ratings range from 0 to 5.

There aren't many broad distributions, like reviews of restaurants, bars, or pubs, indicating that they are frequented by tourists.

A few of them, such the bakeries, pools, and gyms, receive the lowest scores.

Now perform k-means clustering

Firstly we've to cluster on the dataset extracted which has 24 attributes.

By the elbow method we've to select the number of clusters as shown.

The scaled data PCA has less inertia than the original and unscaled. Also, when using the original
data, it is very difficult to select the right number of clusters.
Continue K-means clustering analysis of PCA scaled data with 4 cluster.

We've taken 4 clusters as shown there are four colors i.e., orange, pink, blue, and green. Each color indicates a different cluster.

⦁ The pink-colored cluster has gyms, bakeries, pools, beauty, and spas, dance clubs, and cafes. They are not loosely spread.
⦁ The orange-colored cluster has juice bars, hotels, art galleries, and burger/pizza shops attributes. Here the tourist could have more interest to visit.
⦁ The blue-colored cluster has more common interested people, which includes local services, zoos, malls, restaurants, and pubs or bars.
⦁ The Green colored cluster has people who are interested in nature like parks, beaches, and resorts and also includes museums, and theatres.

Let us check the distribution among the clusters.

Cluster 0 has more interest in parks, theatres, museums, and malls.
Cluster 1 has more interest in burger/pizza, hotels, juice bars, and art galleries.
Cluster 2 has more interest in malls, restaurants, and pubs/bars.
Cluster 3 has a similar interest in almost all the categories such as dance clubs, bakeries, beauty and spas, cafes, viewpoints, swimming pools, gyms, monuments, and gardens.
To the above observations, cluster 3 has a good environment and is feasible for visiting.

Conclusion:

The distribution of ratings in each category of spots was not uniform, according to a preliminary study of the data. Some of them are widely distributed, while others are not.

Additionally, it was discovered via comparing each way of data preparation that the size was decreased with standardised data and PCA. Users of Google Reviews may be categorised into 4 different groups. Each cluster has the following despite the very little amount of cumulative explanatory variation between the two PCs:

The pink-hued cluster is home to bakeries, cafés, pools, spas, beauty salons, and dancing clubs. They are not dispersed widely.

Juice bars, motels, art galleries, and burger/pizza businesses are features of the orange-colored cluster. Here the tourist could have more interest to visit.

The visitor may be more interested in visiting this location.

The blue-colored cluster, which comprises nearby services, zoos, malls, restaurants, and bars, has more often interested individuals.

People who enjoy being outside, such as those who frequent parks, beaches, and resorts, may be found in the green cluster, which also contains theatres and museums.

We have taken the final cluster into consideration based on the cluster's distribution plot.

Parks, theatres, museums, and shopping malls are more popular with the cluster 0. Hotels, juice bars, burger/pizza joints, and art galleries are more popular with cluster 1. The cluster 2 is more interested in shopping centres, eateries, and bars. Nearly all of the categories, including dance halls, bakeries, spas, cafés, vistas, swimming pools, gyms, monuments, and gardens, are of similar appeal to the cluster 3. These results suggest that cluster 3 has a decent environment and is reachable.

Therefore, by analysing the information from traveller reviews, we were able to understand that each unique group will prefer a distinct site based on their preferences and freedom.

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore