Capstone Project - The Battle of Neighborhoods

Gary YOLO

A. Introduction

A.1. background & problem description.

New York City, the most populous city in the United States, one of the greatest metropolises over the world, is a dream place for gourmet to seek delicious cuisine. Its food culture includes an array of international cuisines influenced by the city's immigrant history. Central and Eastern European immigrants, especially Jewish immigrants from those regions, brought bagels, cheesecake, hot dogs, knishes, and delicatessens (or delis) to the city. Italian immigrants brought New York-style pizza and Italian cuisine into the city, while Jewish immigrants and Irish immigrants brought pastrami and corned beef, respectively. Chinese and other Asian restaurants, sandwich joints, trattorias, diners, and coffeehouses are ubiquitous throughout the city. Some 4,000 mobile food vendors licensed by the city, many immigrant-owned, have made Middle Eastern foods such as falafel and kebabs examples of modern New York street food. The city is home to "nearly one thousand of the finest and most diverse haute cuisine restaurants in the world," according to Michelin. As of 2019, there were 27,043 restaurants in the city, up from 24,865 in 2017[1].

As the figures tell, New York City attracts many to start their business in the food industry. Before they take action, they need to find out where they would open it? What would they consider when selecting a location? By exploring the regional characteristics of these restaurants, I hope to figure out whether the neighborhood of restaurants is an essential factor for the success of a restaurant with sound analysis.

As I mentioned above, there are hundreds of restaurants, and it is impractical to run an analysis for each type of restaurant. Based on the maximum total numbers among these restaurants, I choose Pizza Place for the following report. The study of other types of restaurants can be conducted with the same method.

A.2. Data Preparation

Data used in the analysis are listed below:

· Neighborhoods in New York City -- Wikipedia[2]. I cleaned the data and reduced it to boroughs of NYC so that I can use it to find geological locations for further venue analysis.

· Using Geopy to get geological location by address name

· Using Foursquare API to get the most common venues of given Borough of New York City.

· Using Foursquare API to get the venues' record of given venues of New York City.

B. Methodology

I used BeautifulSoup to scrape boroughs from Wikipedia and organize a table containing Community Board, Area, Pop.Census, Neighborhoods information of New York City.

And I used Geopy to get the geological location of each community board. (Because Geopy cannot recognize the address like 'Bronx CB 1', I use the first address in the list of Neighborhood of each community board. If it is still not found, the second address will be used.)

capstone project the battle of neighborhoods

I utilized the Foursquare API to explore the boroughs and segment them. I designed the limit as 100 venues and the radius 500 meters for each borough from their given latitude and longitude information. Here is the header of the result, adding venue id, venue name, category, latitude, and longitude information from Foursquare API.

capstone project the battle of neighborhoods

It returns with 2555 records. I summarize venues by category. Among these 2555 records, Pizza Place counts 117 with the maximum total number. Therefore, I choose Pizza Place as an example of restaurants for further analysis.

I utilized the Foursquare API again by pizza places ID to explore the detailed record of these pizza places. Select out Rating, Price, Likes, Photos, Tips into a data frame. And drop those places without a rating.

capstone project the battle of neighborhoods

Then I tried to find correlations among these variables:

capstone project the battle of neighborhoods

Showing from the correlation matrix, Likes, Photos, and Tips are highly correlated with each other. But Likes is not highly related to Rating. Customers who click Likes for some specific reasons but give lower ratings to the general performance might cause this low correlation. Therefore I choose Rating to represent the restaurant. Rating is somewhat correlated to Price, which indicates that the price might not affect impressions of customers in that place significantly.

I utilized the Foursquare API centering these pizza places to explore their neighborhoods with a 500-meter radius.

capstone project the battle of neighborhoods

According to venue categories and numbers surrounding each pizza house, I use k-means to cluster pizza houses into several groups. I use the "elbow" method to help select the optimal number of clusters by fitting the model with a range of values for k. The "elbow" (the point of inflection on the curve) is a good indication that the underlying model fits best at that point. In the visualizer "elbow" k=6 is annotated with a dashed line.

capstone project the battle of neighborhoods

I merged cluster labels of each pizza place with its geological location.

capstone project the battle of neighborhoods

Then I used folium to visualize the distribution of these pizza places in NYC as below:

capstone project the battle of neighborhoods

Let's see if there are differences among performances of these pizza places with different clustering labels:

capstone project the battle of neighborhoods

Based on medians, it seems there is a difference. But is it significant? I ran the one-way ANOVA analysis:

capstone project the battle of neighborhoods

Showing in the table, the performances of these six types of pizza places are significantly different. Places with cluster labels 0 and 4 perform best while those with label 1,2, and 3 perform worst. Hence, different neighborhoods might affect the impression of the customers in this restaurant. Restaurant owners may search for a similar location to start their business. So what is special in these locations around pizza places?

I selected out pizza places with labels 0 and 4, and dropped categories with an empty value, sort venue types by average numbers of venues around pizza places in a descending way.

capstone project the battle of neighborhoods

Pizza place as a classic light meal normally is opened at a location where different light meals gather, like Cafe, bakery, or bar. It explains why places with higher ratings gather around Manhattan. Does the pizza there truly taste better than other places'? A short time spent on tasting could be a reason that renders the quality of ratings. Another interesting thing is that Italian restaurant numbers are more than other types of restaurants, especially for pizza places with label 1. These places are in the Bronx, which is happen to be an Italian area. Pizza as a representative of Italian food seems to meet the correct market there.

D. Conclusion

As a result, for those types of pizza place which provide quick services and moderate flavor with moderate price, they should consider locations in a busy area and close to other light meals restaurant.

For those aiming to provide delicious pizza to most picky gourmet, they should open their pizza place at places close to their target customers or customers with a particular background.

E. Discussion

As a recommendation to those who plan to operate a restaurant, location selection is only one fundamental problem to think over. The analysis of this report assumes the type of restaurant is selected, for example, a pizza house. It can not solve the problem of whether a type of restaurant is the most popular type and how many customers will visit every day. And as for location suggestion, it offers an opportunity analysis but lacks risk analysis, like the cost of the location and competition in that area.

Although in this report, it demonstrates the relations between location and ratings, but ratings might not reflect the operation status of the restaurant. A restaurant with a high rating could still be unprofitable, which is unsuccessful from a business perspective. So the suggestion is relatively narrow. To suggest more practical and profitable ideas, the relationship between customer reactions and financial performance should be evaluated.

With all these analyses done, the report finally becomes constructive for a restaurant owner in the real business world.

E. Reference

[1] New York City—Wikipedia, https:// en.wikipedia.org/wiki/N ew_York_City

[2] Neighborhoods in New York City — Wikipedia https:// en.wikipedia.org/wiki/N eighborhoods_in_New_York_City

Maninder Kaur

Jan 13, 2021

IBM Data Science Professional — Capstone Project “The Battle of Neighborhoods”

On January 1, 2020, I moved to Canada carrying dreams in my eyes to work, travel and live my life. Never knew that many of my plans would be cancelled with the surge of Corona Virus, profoundly affecting the life around the world. Given the life-threatening and economic challenging crisis people facing right now, one can only hope to stay positive and test negative. The isolation and shutdown enforced me to devote my time in understanding the value of learning and upgrading my skills. Having research experience in data analyses using python, knowing the basics paved the way to leverage my skills by learning the fundamentals of data science and various tools and methods in practice such as exploratory data analysis, machine learning etc.

IBM Data Science Professional Certificate Course was very useful for me in building the foundation of concepts required in the data science profession. I would recommend this course to all who are interested in learning data science. The final assignment of this course is the so-called “Capstone Project” in which many of the tools and methods learned during the course are applied in a “The Battle of Neighborhoods” challenge. In this challenge, I am asked to choose a city and design a problem that can be solved using location data in addition to other datasets. I choose the city of Vancouver as a case. I hope you like my blogpost on this capstone project.

Introduction/Business Problem

Canada has built a reputation of welcoming a number of immigrants every year and valuing multiculturalism. Vancouver is among the topmost cities that offers great opportunities to new immigrants in terms of work opportunities, education, lifestyle, and food choices. Vancouver is known for its sushi, fresh seafood, the finest house-made charcuterie, the most delicious tacos, and the B.C. wines. For an immigrant like me, who prefer vegetarian food, finding a right place to eat can be a daunting challenge.

Thus, the goal of this project is to explore, segment, and cluster the neighborhoods in the city of Vancouver and provide recommendations to my friend’s start-up to develop an app for food-delivery based on the common restaurant categories in different neighborhoods. The research questions specific to this business problem are: In which neighborhood of the city will you find large number of restaurants? Which neighborhood have the higher concentration of restaurants depending on its type? Where to find vegan-friendly restaurant? Where to find restaurants with Indian, or Mediterranean cuisine? The target population for my project is the new immigrants who recently moved to Vancouver and looking for vegetarian food.

Description of the data

For the Vancouver neighborhood data, a Wikipedia page exists that has all the information that can be used to explore and cluster the 20 official neighborhoods in the city of Vancouver. The corresponding coordinates are obtained using geocoder .

As mandated in the challenge instructions, I will use the Foursquare location data about restaurants in Vancouver. Foursquare is a US technology company that provides recommendations of the places via a location technology platform offering business solutions and consumer products near a user’s specified location based on users’ previous browsing history and check-in history. I will use foursquare data such as the restaurant name, ID, location and category of food.

I will use the neighbourhoods in Vancouver, geocoder, and data about the restaurants in these neighbourhoods from foursquare to show the density of restaurants in them. It is noteworthy to mention that while working on this project, most of the Canadian cities are following strict lockdown measures and several restaurants are only open for take-aways. This limitation may affect the sample size of this project for analysis. However, this project can be implemented for future use as well.

Methodology

In this section, I will describe about the procedure used to collect data from various sources, converting into data into tabular form for pre-processing and exploratory data analysis. I started out by scrapping the data from Wikipedia to create a dataframe with the neighborhood of the city of Vancouver . I used various python libraries such as pandas, requests, BeautifulSoup. I cleaned the data to keep the required information such as neighborhood’s names and discard the rest description. The next step is to get the coordinates of all the neighborhood using geocoder and nominatim function to add geospatial data coordinates- latitudes and longitudes. Further, I append geospatial coordinates to the dataframe which resulted in the below table:

I visualized the map of Vancouver with neighborhoods denoted by blue circles using folium package and my dataframe as shown in figure 3.

Then, I utilized the Foursquare API to explore the neighborhoods and segment them. I retrieved the foursquare data for all venues in Vancouver on foursquare within a radius of 500 m and limited the number of venues to 100 in each neighborhood that resulted in 560 venues in total. I explored the neighborhood with the number of venues in each and later, analyzed for restaurant only venues. There was a total of 36 unique restaurant categories. Given the ethnicity and culturally diverse population of the city, there is a diversity in the specialized cuisine restaurants. The dataframe was further analyzed by segregating the restaurants using onehot encoding and listed the top 10 most common restaurants in each neighborhood.

Furthermore, I ran an unsupervised machine learning algorithm. I clustered the neighborhoods into 5 categories using k-means clustering and labelled these clusters depending on the commonality of cuisines in those clusters. I also tried with k value of 3 and 6, but I fixed my analysis for k value to be 5 considering k=5 is the most general case used in clustering method. These clusters are further analyzed to answer our research questions and discussed in the subsequent sections.

From the foursquare data, we figured that there are 36 unique categories in the restaurant venues in Vancouver. We analyzed the top 10 most frequently occurring restaurants’ categories as shown in figure 4.

Restaurants serving sushi cuisine are the winners in the list of topmost frequently occurring restaurants in Vancouver. As my project is looking more for Vegetarian/Vegan restaurants or Indian or Italian restaurants, I am amazed to see theses cuisine in the top 10 list of restaurants. While this is interesting, I would like to dig deeper to know which neighborhoods have the higher number of aforementioned cuisines.

Out of total 168 restaurant venues, Coal Harbour seems to be the most popular neighborhood for restaurants, followed by West End as seen in figure 5. Riley Park, Yaletown and Mount Pleasant are almost equally popular with 19, 18 and 16 number of restaurant venues, respectively. Kerrisdale is having the least number of restaurants.

I sorted the list to display the neighborhood and its 10 most common restaurant venues as shown in figure 6 which is further used for clustering. Coal Harbor undoubtedly topped the chart with Japanese cuisine and seafood in top 5 most common venues, claiming the fact that Vancouver is indeed known for sushi and seafood. Nevertheless, I can view a variety in commonality of other cuisines in other neighborhoods. I used the k-means clustering machine learning algorithm to make clusters of all neighborhoods in the city.

As you can see from the table in figure 6, the neighborhoods in the city of Vancouver and their most common venues are assigned five cluster labels from 0 to 4. These clusters are further analyzed and labelled depending on the most common cuisine venue in that neighborhood.

Marpole food cluster

Contemplating the diversity in terms of population density and the residential areas because of its proximity to Airport, this cluster (in red) has most venues from Vietnamese and Korean cuisines. There are some restaurants serving Italian and Indian cuisine available too.

Downtown food cluster

The cluster 2 (in purple) included 7 neighborhoods mostly in the downtown and surrounding areas in the city. While Vietnamese and Japanese restaurant were among topmost common venues, mostly all the neighborhoods showed Vegetarian/Vegan, or Indian restaurants venues except Coal Harbor. Coal harbor as stated earlier is more for sushi and seafood lovers.

Mixed food cluster

The cluster 3 (in blue), cluster 4 (in green) and cluster 5 (in orange) included neighborhoods having residential dwellings: home, apartments, single- and multi-family residences. If we clearly analyze the most common venues in these neighborhoods, Fast food restaurants and cuisine from Asian countries top the chart. I rename this cluster as the ‘Fast food cluster’.

Coal Harbor has the highest number of restaurants with unique categories. This makes more sense as it is the hub of Vancouver and closer to downtown. There are not enough Vegetarian/Vegan friendly options available in Coal Harbor. One can find many restaurants specialised in sushi and seafood, serving the best Vancouver is known for. The surrounding areas close to downtown including Yaletown, Kensington-Cedar Cottage, Mount Pleasant and Riley Park has variety of restaurants specialised in Indian and Mediterranean cuisines. The residential areas are categorized for mixed food cluster owing to diversity in the population and multi-culturalism. Kerrisdale is the only exceptional neighborhood that excel in Spanish cuisine. Vegetarian/Vegan restaurants are more common in the neighborhoods of Kensington-Cedar Cottage, Kitsilano, Riley Park and Yaletown. As most of the immigrants would prefer to live in residential areas listed in cluster 3,4 or 5, I would recommend the start-up to strategize some ideas to deliver Vegetarian/Vegan food to the neighborhoods in the mixed food cluster.

In this project titled ‘The Battle of Neighborhoods’, my aim was to explore, segment and cluster the neighborhoods in the city of Vancouver. Undeniably, Vancouver is known as the “sushi capital of North America” and famous sushi rolls. In the past years, there has been a rising popularity in Vegetarian/vegan food preferences over meat-based/seafood. Through this project, I used the ML k-means clustering algorithm to find the most common venues in each neighborhood and was able to configure the popular neighborhoods with most Vegan friendly restaurants. Start-up company can use this analysis to develop business ideas with delivery options to target new immigrants looking for specific cuisine.

Acknowledgement

I am grateful to several cheat sheets found on github, stackoverflow and medium to assist me in working out this project and writing full report. I also like to acknowledge this article for inspiring me to design my business problem. The course of the IBM Data Science Professional Certification has immensely supported me in my journey of learning data science. For more details on this project, please follow the github . Feel free to contact me if you have any questions or comments. Looking forward to talking to you about data science!

More from Maninder Kaur

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Maninder Kaur

Text to speech

Capstone_the Battle Of The Neighborhoods

Capstone project - the battle of the neighborhoods (week 2), applied data science capstone by ibm/coursera, table of contents, introduction: business problem, methodology, results and discussion.

In this project we will try to find an optimal location for a breakfast point. Specifically, this report will be targeted to stakeholders interested in opening a breakfast spot near richmond circle in Bangalore.

Since there are lots of breakfast spots and eating joints near Richmond Circle, we will try to detect locations that are not already crowded with breakfast spots . We are also particularly interested in areas with no breakfast spots in the vicinity . We would also prefer locations as close to richmond cirle as possible , assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

Based on definition of our problem, factors that will influence our decission are:

We decided to use regularly spaced circular grids of locations, centered around Richmond Circle center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:

Neighborhood Candidates

Let’s create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around Richmond Circle, Bangalore.

Let’s first find the latitude & longitude of Richmond Circle, using specific, well known address and Google Maps geocoding API.

Now let’s create a grid of area candidates, equaly spaced, centered around city center and within ~3km from Richmond Circle. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart. Reason of chosing 3 km vicinity was because of the limit on calls that can be made to FourSquare API for number of generated neighborhoods.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we’ll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let’s create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

Let’s create a hexagonal grid of cells : we offset every other row, and adjust vertical row spacing so that every cell center is equally distant from all it’s neighbors .

Let’s visualize the data we have so far: city center location and candidate neighborhood centers:

OK, we now have the coordinates of centers of neighborhoods/areas to be evaluated, equally spaced (distance from every point to it’s neighbors is exactly the same) and within ~3km from Richmond Circle.

Let’s now use Google Maps API to get approximate addresses of those locations.

Looking good. Let’s now place all this into a Pandas dataframe.

…and let’s now save/persist this data into local file.

Now that we have our location candidates, let’s use Foursquare API to get info on breakfast spots in each neighborhood.

We’re interested in venues in ‘nightlife’ category, ‘shop and services’ category and ‘breakfast spot’ category. So we will include in out list only venues that have these in category name. Finding out the number os spots for all these mentioned categories will help in instilling more confidence in stakeholders regarding the choice of business they want to start with.

Foursquare credentials are defined in hidden cell bellow.

Let’s now see all the collected breakfast spots in our area of interest on map.

So now we have all the breakfast spots in area within few kilometers from Richmond Circle, and we know their number! We also know which breakfast points exactly are in vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we’re now ready to use this data for analysis to produce the report on optimal locations for a breakfast spot/point near Richmond Circle.

In this project we will direct our efforts on detecting areas of Bengalure that have low breakfast spot density. We will limit our analysis to area ~3km around richmod circle.

In first step we have collected the required data: location and type (category) of every restaurant within 3km from Richmond Circle (Under the Fly Over). We have also identified breakfast spots (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of ‘ breakfast spot density ’ across different areas of Bengaluru - we will use heatmaps to identify a few promising areas close to center with low number of breakfast spot in general ( and less breakfast spot in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders: we will take into consideration locations with no more than two breakfast spot in radius of 250 meters , and we want locations without breakfast spot in radius of 400 meters . We will present map of all such locations but also create clusters (using k-means clustering ) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final ‘street level’ exploration and search for optimal venue location by stakeholders.

Let’s perform some basic explanatory data analysis and derive some additional info from our raw data. First let’s count the number of breakfast spot, night life spots and shop and services spots in every area candidate :

OK, now let’s calculate the distance to nearest breakfast spot from every area candidate center (not only those within 300m - we want distance to closest one, regardless of how distant it is).

OK, so on average Breakfast spot can be found within ~600m from every area center candidate. That’s fairly close, so we need to filter our areas carefully!

Let’s crete a map showing heatmap / density of breakfast spots and try to extract some meaningfull info from that. Also, let’s show borders of Richmond boroughs on our map and a few circles indicating distance of 1km, 2km and 3km from Richmond circle.

Looks like a few pockets of low night life spots density closest to city center can be found south, south-east and west from Richmond Circle .

Let’s create another heatmap map showing heatmap/density of breakfast spot only.

This map is so ‘hot’ (breakfast spot are quite densly distributed near RIchmond Circle) it also indicates higher density of existing breakfast spot directly north and west from Richmond Circle, with closest pockets of low breakfast spot density positioned east, south-east and south from richmond circle .

Based on this we will now focus our analysis on areas south-west, south, south-east and east from Richmond Circle center - we will move the center of our area of interest and reduce it’s size to have a radius of 2.5km . This places our location candidates mostly in boroughs with large low restaurant density south and south west from richmond circle, however this borough is less interesting to stakeholders as it’s mostly residental and less popular with tourists).

Not bad - this nicely covers all the pockets of low breakfast spot density in nearby bouroughs.

Let’s also create new, more dense grid of location candidates restricted to our new region of interest (let’s make our location candidates 100m appart).

OK. Now let’s calculate two most important things for each location candidate: number of breakfast spot in vicinity (we’ll use radius of 250 meters ) and distance to closest breakfast spot .

OK. Let us now filter those locations: we’re interested only in locations with no more than two breakfast spot in radius of 250 meters , and no breakfast spot in radius of 400 meters .

Let us see how this looks on a map.

Looking good. We now have a bunch of locations fairly close to Richmond Circle (mostly in south and south east of Richmond borough), and we know that each of those locations has no more than two breakfast spot in radius of 250m, and no breakfast spot closer than 400m. Any of those locations is a potential candidate for a new breakfast spot, at least based on nearby competition.

Let’s now show those good locations in a form of heatmap:

Looking good. What we have now is a clear indication of zones with low number of breakfast spot in vicinity, and no Italian breakfast spot at all nearby.

Let us now cluster those locations to create centers of zones containing good locations . Those zones, their centers and addresses will be the final result of our analysis.

Not bad - our clusters represent groupings of most of the candidate locations and cluster centers are placed nicely in the middle of the zones ‘rich’ with location candidates.

Addresses of those cluster centers will be a good starting point for exploring the neighborhoods to find the best possible location based on neighborhood specifics.

Let’s see those zones on a city map without heatmap, using shaded areas to indicate our clusters:

Let’s zoom in on candidate areas in Richmond Town :

…and candidate areas near Richmond Circle :

Finaly, let’s reverse geocode those candidate area centers to get the addresses which can be presented to stakeholders.

This concludes our analysis. We have created 15 addresses representing centers of zones containing locations with low number of breakfast spot and no breakfast spot nearby, all zones being fairly close to city center (all less than 4km from Richmond Circle, and about half of those less than 2km from Richmond circle). Although zones are shown on map with a radius of ~500 meters (green circles), their shape is actually very irregular and their centers/addresses should be considered only as a starting point for exploring area neighborhoods in search for potential restaurant locations. Most of the zones are located around nearby boroughs, which we have identified as interesting due to being popular with tourists, fairly close to city center and well connected by public transport.

Our analysis shows that although there is a great number of breakfast spot in Bengaluru , there are pockets of low breakfast spot density fairly close to richmond circle center. Highest concentration of breakfast spot was detected north and east from Richmond Circle, so we focused our attention to areas south, south-west and west, corresponding to boroughs near by.

After directing our attention to this more narrow area of interest (covering approx. 5x5km south-east from Richmond Circle) we first created a dense grid of location candidates (spaced 100m appart); those locations were then filtered so that those with more than two breakfast spots in radius of 250m and those with breakfast spot closer than 400m were removed.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all this is 15 zones containing largest number of potential new breakfast spot locations based on number of and distance to existing venues. This, of course, does not imply that those zones are actually optimal locations for a new breakfast spot! Purpose of this analysis was to only provide info on areas close to Richmond circle center but not crowded with existing breakfast spot - it is entirely possible that there is a very good reason for small number of breakfast spot in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

Purpose of this project was to identify Bengaluru areas close to center with low number of breakfast spot in order to aid stakeholders in narrowing down the search for optimal location for a new breakfast spot. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis , and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby breakfast spot. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal breakfast spot location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.

Vishal Chauhan

Oct 6, 2019

The Battle of Neighborhoods: Coursera Capstone Project

Opening a new authentic indian restaurant in queens, ny, 1. discussion and background of the business problem:, indian restaurants, introduction section.

This final project explores the best locations for Indian restaurants throughout the Queens of New York. New York is a major metropolitan area with more than 8.4 million (Quick Facts, 2018) people living within city limits. New York City is the largest city in the United States with a long history of international immigration. People came from many parts of the world. According to the 2007 American Community Survey estimates, New York City is home to approximately 315,000 people from the Indian subcontinent. With its diverse culture, comes diverse food items. There are many restaurants in New York City, each belonging to different categories like Chinese, Indian, French, etc.

Target Audience

Data Section

For this project we need the following data: 1. New York City data that contains Borough, Neighborhoods along with there latitudes and longitudes

2. Indian restaurants in Queens neighborhood of new york city.

Problem Statement

2. Data Preparation:

I will use New York City data for this project.

After further analysis we will get data with coordinates in a data frame:

We will use geopy library for getting coordinates of Queens, NY for further use:

Using Foursquare Location Data:

Foursquare data is very comprehensive and it powers location data for Apple, Uber, etc. For this business problem I have used, as a part of the assignment, the Foursquare API to retrieve information about the Venue, Venue category with their longitudes and latitudes. The call returns a JSON file and we need to turn that into a data-frame. Here I’ve chosen 100 popular spots for each neighborhood with a radius of 500 meters. Below is the data-frame obtained from the JSON file that was returned by Foursquare —

3. Exploratory Data Analysis:

There are 271 unique categories in which Indian Restaurant is one of them. We will do one hot encoding for getting dummies of the venue category. So that we will calculate the mean of all venue groups by their neighborhoods.

After this we will extract only the Neighborhood and Indian Restaurant column for further analysis:

Clustering the Neighborhoods:

We will extract Indian restaurant data from the above table and fit this into the code for finding the best value of K.

From the above image, we see that the best value of K will be 3 according to the Elbow method.

We will merge the above table with our New York data frame so that we will get coordinates of all neighborhoods

We can see these 3 clusters in the Map using Folium Library.

Let’s Examine the Clusters:

Here, we have 3 clusters 0,1 and 2 respectively. In cluster 0 we have neighborhoods that have the least number of Indian Restaurants.

Cluster 0 has Red color on the map.

In cluster 1: We have all neighborhoods which have highly dense Indian Restaurants. In this dataset, we have only one neighborhood. Cluster 1 has a purple color on the map.

In cluster 2: We have all neighborhoods which have medium dense Indian Restaurants. Cluster 2 has a light green color on the map.

Visualization:

There are 5 boroughs in New York City in which Queens has the highest number of neighborhoods.

After that, we will see which neighborhood has the highest number of Indian restaurants.

In the above image, we see that Bayside has the highest number of Indian restaurants.

The results of the exploratory data analysis and clustering is summarized below :

According to the analysis, South Ozone Park will provide the least competition for an upcoming Indian restaurant as the International Airport is close to this neighborhood. So, all this is the best place for Indian immigrants for having lunch/dinner and the frequency of Indian restaurants is very low compared to other neighborhoods. Bayside has the highest number of Indian restaurants and Jamaica Estates is highly dense so, we will not open there. Some drawbacks of analysis are: the clustering is completely based on the data provided by Foursquare API. Since land price, the distance of venues from the closest station, the number of potential customers, could all play a major role and thus, this analysis is definitely far from being conclusory. However, it definitely gives us some very important preliminary information on the possibilities of opening restaurants in the Queens borough of New York City. Also, another pitfall of this analysis could be the consideration of only one major borough of New York City, taking into account all the areas under the 5 major boroughs that would give us an even more realistic picture. Furthermore, these results also could potentially vary if we use some other clustering techniques like DBSCAN.

Finally, to conclude this project, we have got a small glimpse of how a real-life Data science project looks like. I have used some frequently used python libraries to handle JSON file, plotting graphs, and other exploratory data analysis. Use Foursquare API to major boroughs of New York City and their neighborhoods. The potential for this kind of analysis in a real-life business problem is discussed in great detail. Also, some of the drawbacks and chances for improvements to represent even more realistic pictures are mentioned. As a final note, all of the above analyses is depended on the adequacy and accuracy of Four Square data. A more comprehensive analysis and future work would need to incorporate data from other external databases.

More from Vishal Chauhan

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Vishal Chauhan

Text to speech

cahyati sangaji (cahya)

Jun 1, 2020

Capstone Project — The Battle of Neighborhoods (Week 1)

A Visual Approach to determine Strategic Locations for Masks and Medical Devices Distribution for COVID-19 treatment based on confirmed cases on May 28,2020 at red zone areas to measure “new normal” readiness

Cahyati S. Sangaji

Applied Data Science Capstone by IBM/Coursera

Table of contents

Introduction: business problem, methodology, results and discussion.

Since the beginning of 2020, Jakarta and many other cities around the world have been under attack by an invisible army called ‘Novel Corona Virus’, also known as ‘Covid-19’. Every effort has been focusing on solving or minimizing problems, including Data Scientists. Data Scientists assessed the situations in places around the world, such as availability, amount, and geographical distribution (i.e. locations) of health infrastructures, such as virus testing centers and authorized hospitals to treat affected patients. In this article, we would like to present a simple analysis for determining strategic locations for the distribution of masks and medical devices for COVID-19 treatment, based on confirmed cases on May 28, 2020, and the red zone areas for “new normal” condition analysis.

A few Identified factors that influence our decision are:

The following data sources are needed to extract/generate the required information:

Let’s start the Project by importing necessary Python libraries.

Import necessary libraries

Make sure that we have created a Foursquare developer account and have our credentials handy.

Read and show all data used.

Read and show data Covid-19 cases per district.

Read and show the top 5 data rows from Covid-19 cases per district.

Read and show the bottom 5 data rows from Covid-19 cases per district

Read and show the total population data in DKI Jakarta 2020.

Total population in Jakarta.

Read and show the top 5 data rows from total population in DKI Jakarta, 2020.

Read and show the data from 10 districts most pupulated in DKI Jakarta, 2020.

Read and show the top 5 data rows from 10 most populated areas in DKI Jakarta, 2020 per district.

According to the information update from Kompas.com (megapolitan.kompas.com), the following hospitals are the existing reference hospitals for Covid-19 testing in Jakarta area:

Construct a Pandas data frame for subsequent data analysis.

Read and show Hospital data that provide treatment Covid-19.

Read and show the top 5 data rows from Hospital data providing treatment Covid-19.

This sums up our data mining and data exploration section. In the following METHODOLOGY section, we will describe the process of how to do a ‘Visual’ approach to better understand our data using data science and data analytics tool kits.

First, we create a new dataset of only positive cases from the Covid-19 Case table on May 28, 2020.

Remove / drop irrelevant columns for this analysis.

Check if there are any missing or null values.

From all these processes: data mining, preparation, and exploration, the total number of Covid-19 confirmed positive cases in Jakarta is 5,061 per 28 May 2020, distributed across 6 main municipalities or cities in Jakarta, across 268 districts (or ‘Kelurahan’) out of just over 92.736 population of Jakarta.

East Jakarta (Jakarta Timur) has the highest number of total POSITIVE cases with 1162 confirmed positives. Just like any other city, each city/municipality has many neighborhoods that can be used to pinpoint the location of the new proposed Covid-19 testing center along with further analysis of the neighborhood using FourSquare API and Folium map visualization technique.

Need to get Latitude & Longitude of Jakarta city and the districts

To assist in the analysis, we will use the ‘’free services” provided by Open Cage Geocode (https://opencagedata.com/) to get the latitude and longitude of cities, districts, particular venues, or neighborhoods. We will start by opening an account and downloading the required dependencies for our analysis. Terms and condition applies. Please refer to their website for further details.

Similarly, we can use the API service from OpenCage Geocoder to obtain the latitude and longitude of all districts in Jakarta.

Get the latitude and longitude Hospital

Besides, we also need to get the latitude and longitude of all Covid-19 testing centers in Jakarta that we have checked from the source www.kompas.com .

We then need to know how to get a map of the city that we are interested in (i.e. Jakarta) to present our data to the stakeholders using a ‘Visualization’ approach.

We have downloaded all the required dependencies earlier in the report, and now we are ready to use the FOLIUM API service as described in the following section.

The map shows the main outer ring roads surrounding the city of Jakarta. It does NOT, however, show the official territorial boundary of the city concerning other administrative regions in the east, west, and south of Jakarta.

However, because the author is from Indonesia, we know roughly which neighborhood belongs to Jakarta and which does not. In this scenario, we want to propose a strategic locations (i.e. neighborhood) for the investing group within the Jakarta governmental area.

The chart below show the population density in Jakarta.

The chart below show the population density in Jakarta, per district

Based on the graph results shown that areas need the distribution of masks the most is Central Jakarta (Jakarta Pusat) with the most populated areas. Then 5 districts that mostly need for a distribution of masks are Kali Anyar, Kampung Rawa, Galur, Tanah Tinggi, and Kerendang .

To better understand and estimate the territories or areas that are within the administrative government of Jakarta city, we need to plot all the districts that we have downloaded from the riwayat-file-covid-19-dki-jakarta-jakartagis.hub.arcgis.com site together with their latitude and longitude values. The following lines of Python code will execute the task using Folium API.

As we can see from the above map, most of the districts are within the main outer ring roads surrounding the city, and others are situated outside the main ring roads. To solve in our business challenge, we need to show the extent and the distribution medical devices for treatment of COVID-19 positive case-patients within the city of Jakarta based on the number that we obtained from the government site. The following lines of Python code will achieve the task and present the data in a clear visual approach.

This is a similar map plot that we can see from the government task force for Covid-19 cases in Jakarta. Their graph can be seen in this link: https://corona.jakarta.go.id/id/peta-persebaran. As we can see, most of the regions in Jakarta are now in the ‘RED’ zone, with the radius of the circle represent the relative extent of Covid-19 distribution in the City of Jakarta.

A better presentation of the data would be to use a ‘slider’ in the map that shows the growth of the circle day by day or simply an animation that shows the daily growth of Covid-19 cases in the city. An app developer might develop an App that alerts vehicles/road users that alerts that they are not allowed to pass the RED zone within the city. This App could save lives! The next set of problems that we need to solve is to show the location of existing and approved Covid-19 testing centers (or reference hospitals) and see how well they are distributed to each other within the city and in which regions of Jakarta. The following lines of Python code show how. We will first try to plot the hospitals WITHOUT the RED circles as that might cause distraction.

As you can see, the hospitals are quite sparsely distributed within each other except the two hospitals in the south are relatively close to each other (i.e. Fatmawati and Pasar Minggu hospitals). Let’s see how strategic they are in accomodating the extent of positive cases patients in the city. We can do this by overlaying the two data within a single map as shown in the following codes:

We can see from the results of the distribution of COVID-19 cases and the location of hospitals, almost all hospitals require a lot of medical equipment for COVID-19 treatment. In addition to Fatmawati hospital and the Pasar Minggu hospital, the distribution of the COVID-19 case is not as extensive as other hospitals.

We will try to analyze locations in the red zone based on the location of the hospital in the middle of the red zone. We determine based on the location of the Tarakan Hospital, Central Jakarta.

Let’s begin by trying to get the top 100 venues that are within Tarakan Hospital neighborhood and are within a radius of 500 meters of our candidate Covid-19 testing center using FOURSQUARE API. First, let’s create the GET request URL. Name that URL, url.

Get URL for the API in Tarakan Hospital neighborhood.

Next, let’s make a request using REQUEST library, and name our query results for Tarakan Hospital area, results.

Next, we will use the above function (get_category_type) to extract information from the JSON file related to venues in the Tarakan Hospital neighborhood. The following line of code should do the trick:

Based on the results generated by the FOURSQUARE API, we can locate the business site around Tarakan hospital and identify affected business locations in the red zone.

The next set of challenges that we need to tackle is to gain slightly more insights (profile) of the Tarakan hospital area. To simplify our analysis, we will just use the Euclidian (distance-based) clustering technique which is part of the unsupervised machine learning technique. In particular, we will use K-means clustering.

To start, we need to decide the best K-value for our analysis. We will let the K-means clustering algorithm to calculate this for us. The following lines of code will carry out the task.

The X-axis of the plot shows various number of K-values that we can use for our clustering analysis. As we can see from the chart, the curve starts flattening out at K=3. Therefore, we will use a K=3 to cluster neighborhoods surrounding our proposed Covid-19 testing center. The following lines of code assign Cluster label to all venues that are within a 500-meter radius of our Covid-19 testing center in Tarakan Hospital area:

To better visualize the clustering of our neighborhood, we will need to create a custom function that we call ‘regioncolors’ that will assign a color to each area within a 500-meter radius of our proposed facility. The following line of code should help us with this task.

At this stage, we have assigned cluster labels to all of our neighborhood venues, and we have assigned unique colors to each cluster. Next, we can then visualize our clustering analysis to a Folium map to see how all of these venues are geographically distributed within the 500-meter radius that we specified surrounding the proposed facility.

Then we compiled a map of the results of this business location with a map of the distribution of COVID-19 cases.

The result of analysis is the location of the business which is in the Tarakan hospital neighborhood and is within a radius of 500 meters. Then, we also get the most congested cluster if businesses apply normal conditions in the red zone, potentially increasing cases of contracting the COVID-19 virus within the area.

The project aims to provide information to local people who must be alerted to go out of the house from the distribution of the COVID-19 case in Jakarta. It also aims to provide information on areas that are most needed for a lot of mask distribution, according to population density in the area.

Further, it provides information on which hospitals that need the most medical equipments for COVID-19 treatment, possibly even additional medical personnels (doctors and nurses). It also provides information on the business neighborhood which shall implement Covid-19 health protocol with a high discipline when “new normal” comes.

This project helps mask sellers to understand potential distribution areas according to population density in Jakarta. It also helps the distribution of medical devices for corona care to hospitals that are estimated to have a large number of patients or even helps analyzing which hospitals need additional medical personnel (doctors and nurses).

It will also provide awareness to help business owners who run businesses surrounding the adjacent clusters to be better informed, with the density of people within the business neighborhood.

More from cahyati sangaji (cahya)

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

cahyati sangaji (cahya)

Text to speech

DEV Community

DEV Community

Jade Tran

Posted on May 30, 2020

Capstone Project - The Battle of Neighborhoods

Applied Data Science Capstone by IBM/Coursera

1. Introduction: COFFEE ETHNIC IN DA NANG

In such a large and rich of coffee culture city like Da Nang, Viet Nam, it will be competitive to start up coffee business. In this case my contractor is a humble Vietnamese man who has contacted me to give advises and draw up essential lines of business prediction and back-up plans (but this part we will just discuss about predicting hot spot).

alt text

2. Orientation

First of all we need to collect Data of all coffee shops in Da Nang including their name, id, location (address, latitude, longitude) then pick up the "hot" neighbor where locates most of the venue. In order to asset Data we use FourSquare and apply folium for visualizing a particular neighbor in which that we will observe customer "traffic" and predict an appropriate location of new coffee shop in town. In this case you will find its temporary name on the folium map, "O Day Roi!" (Meaning "Here It Is!" in Vietnamese)

3. Execution steps

We import all the tools we need.

Apply your credential ID on [FourSquare]

Get requests near Da Nang city.

Transform data into json then request geocode.

We start creating group including information which is recommended.

Creating items of objects coffee shop and their attributes - id, address, name, etc

Alt Text

From the output we can identify necessary factors of what we will use later to consider the probability of launching our up-to-coming location.

Based on that we start to organize what we have got.

Alt Text

As we can see that there are many coffee shops without address we need to execute hasattr() to determine if each object (coffeeshop) has a attribute (address). Next step we will execute a very important part - get coordinates of Da Nang and create folium map which will help visualize what we have got from data.

Alt Text

By spotting the clusters of items we can see which neighborhood has density of coffee business.

3. Conclusion

We will need a location where we can catch out customers from "hot" location we have picked up from the map and stay in a certain distance so as to lessen the competivity of business.

Alt Text

Here you can find the full notebook to try yourselves: Link to the notebook

Top comments (0)

pic

Templates let you quickly answer FAQs or store snippets for re-use.

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink .

Hide child comments as well

For further actions, you may consider blocking this person and/or reporting abuse

Advice For Junior Developers

capstone project the battle of neighborhoods

Advice from a career of 15+ years for new and beginner developers just getting started on their journey.

femi_dev profile image

Top 10 React Form Libraries for Efficient Form Creation

Femi Akinyemi - Mar 8

bdougieyo profile image

AI powered code debugging extensions

Brian Douglas - Mar 6

hossam43 profile image

"Error with Leaflet Map - Unable to display tiles on the map"

Hossam Ayman - Feb 2

dhairyashah profile image

How to Hide JavaScript Code in View Source

Dhairya Shah - Mar 4

Once suspended, chijade will not be able to comment or publish posts until their suspension is removed.

Once unsuspended, chijade will be able to comment and publish posts again.

Once unpublished, all posts by chijade will become hidden and only accessible to themselves.

If chijade is not suspended, they can still re-publish their posts from their dashboard.

Once unpublished, this post will become invisible to the public and only accessible to Jade Tran.

They can still re-publish the post if they are not suspended.

Thanks for keeping DEV Community safe. Here is what you can do to flag chijade:

chijade consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy.

Unflagging chijade will restore default visibility to their posts.

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

capstone project the battle of neighborhoods

Towards Data Science

Tony Xu

Jun 20, 2020

The Battle of the Neighborhoods — Open a Movie Theater in Montreal

The capstone project of utilizing folium and foursquare apis for ibm data science professional certificate.

I followed IBM Data Science Professional Certificate in Coursera , it’s composed of 9 courses in this professional certificate. There are pretty good examples of the courses.

The final assignment is to finish a project called the Capstone project which requests you to leverage Foursquare APIs to fetch the data from API calls and utilize the folium map library to visualize data analysis. It’s quite a good opportunity to practice data science methodology and toolset in this project.

In this project, we will cover all phases in the data science life cycle to resolve a problem. And we will dive into the following tools/library in data science:

Ok, let’s get started.

Introduction: Business Problem

In this project, we are going to look for an optimal location to open a movie theater. Specifically, this report can provide a reference for stakeholders who are interested in opening a movie theater in Montreal, Quebec, Canada .

Montreal is the second-largest city in Canada and the largest city in the province of Quebec, located along the Saint Lawrence River at its junction with the Ottawa River. It sits on an island. In this report, we will focus on all areas on the Montreal island. There are many movie theaters on Montreal island, we will conclude where are the existing movie theaters . Then we will use a clustering model to find similar areas on the island considering demographic data of each borough and region. The preferred area shall be distant from existing movie theaters .

We will use data science tools to fetch the raw data, visualize it then generate a few most promising areas based on the above criteria . In the meanwhile, we will also explain the advantage and traits for the candidates, so that stakeholders can make the final decision base on the analysis.

Based on the definition of our problem, factors that may impact our decision are:

We decided to use a regularly spaced grid of locations all around the whole Montreal island, to define our neighborhoods. Concretely, we will use popular hexagon honeycomb to define our neighborhoods.

In this project, we will fetch or extract data from the following data sources:

Montreal Island Shape File

To show the Montreal island boundary in the folium map, we need a geojson definition file for Montreal island. We downloaded this shapefile from the Carto website.

The file is in JSON format, containing boundary definition for every borough or municipality in Montreal island. We will visualize this geojson definition file with a folium map in the next step.

folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the leaflet.js library. Manipulate your data in Python, then visualize it in on a Leaflet map via folium . ¹

It’s not difficult to use folium , just required a few lines of code to show Montreal island with boundary data.

Next step, we want to generate candidate cells in the map, more specifically, only within Montreal island. It’s popular to use the honeycomb hexagon grid when dealing with problems related to the map. Unlike circle, there is no spacing among hexagons which make sure no missing area. Furthermore, the distance between any two adjacent hexagons is the same.

Unfortunately, Folium doesn’t provide native support to draw hexagon in the map view, we have to write some code to support this feature.

We write a method to calculate the hexagon vertices’ coordinates by giving centroids coordinates and length of the side.

After that, we generate a honeycomb hexagon grid throughout the island.

Looks great! 😄

So far we created a honeycomb grid on the island and we generated the center coordinates for each hexagon. We will use Google Geocoding API to reversely lookup the address accordingly.

Google Geocoding API

The Google Geocoding API is a service that provides geocoding and reverse geocoding of addresses.²

It requires a Google API key to use this set of APIs. It can be applied from Google Developer Console .

Let’s put all the data in a Pandas Dataframe , and show the first 10 items.

Each row contains the center address of a hexagon and corresponding latitude and longitude degrees which are in WGS84 spherical coordinate system, X/Y columns are in UTM Cartesian coordinate system which uses the common metric unit — meter or kilometer.

Foursquare API

The Foursquare Places API offers real-time access to Foursquare’s global database of rich venue data and user content to power your location-based experiences in your app or website.³

Now we generated all the candidate neighborhoods on Montreal island, we will get all movie theaters information using Foursquare API.

From Foursquare API documentation, we can find the corresponding movie theater category in Venue Categories . The corresponding ID of Movie Theater in Foursquare API is 4bf58dd8d48988d17f941735 which is under Arts & Entertainment main category. It contains several sub-categories:

Unlike coffee shops, restaurants everywhere, there aren’t lots of movie theaters in the region, it also makes sense since we don’t expect movie theater in every neighborhood.

Let’s fetch all the movie theaters on Montreal island first. To do so, we will fetch movie theaters data in each borough and municipality.

From the response of Foursquare APIs, there are a total of 44 movie theaters on Montreal island. Let’s plot it in a map view.

Let’s show it in heatmap using the positron style.

From heatmap, we can see the movie theaters are mainly concentrated in downtown areas and the center of the island. Usually, there are also a lot of shopping malls nearby, let’s pull out the shopping centers data on Montreal island using Foursquare APIs.

From Foursquare API documentation, there are several categories related to shopping malls or shopping centers.

We will fetch all shopping malls data in the above categories and show them on the map with movie theaters data.

From the map view, we can see movie theater is located near shopping malls in most scenarios.

Our target area shall have more shopping malls and fewer movie theaters nearby.

Before that, we need to cluster all the candidate hexagons based on certain information, in this project, we pull out census data as major features for clustering.

Montreal Census information

Now we will fetch census information of each borough or municipalities on Montreal island. The latest data was collected in 2016. We can get it from the Montreal city official website .

It’s a pretty big excel file containing a lot of data, I modified some sheets a bit to extract data easier into Pandas Dataframe.

We only focus on several basic census information: Population , Density , Age , Education and Income .

Next, we will show census data distribution on a choropleth map.

A Choropleth Map is a map composed of colored polygons. It is used to represent spatial variations of a quantity.⁴

We also show shopping centers and movie theaters’ locations on the same map.

From the above choropleth maps, we can see movie theaters are mostly located in areas with a higher population. Same for shopping centers’ locations. Moreover, most movie theaters locate in the area with lower revenue. Regions with higher revenue have fewer shopping centers and movie theaters.

So far, we retrieved all the necessary raw data we needed and visualized them. In the following steps, we will manipulate these datasets, extract data, and generate new features for the machine learning algorithm. Finally, we will find out the best suitable place to open a movie theater on Montreal island.

Methodology

The business purpose of this project is to find a suitable place on Montreal island to open a movie theater.

Now we retrieved the following data:

We also generated a honeycomb hexagons grid throughout the whole Montreal Island.

Based on the above raw data, we will try to generate new features accordingly, e.g. census information for each candidate cell , and the number of movie theaters and shopping malls in local and nearby .

In the final step, we will focus on the most promising areas with more shopping malls and fewer movie theaters. And we will also present the candidate hexagon cells in the map view for stakeholders to make the final decision.

We got the basis census information of each borough and municipality. We want to get the census information for each candidate hexagon cell accordingly, we calculate those census information based on borough and municipality which intersects with the cell.

If a hexagon is in one borough completely, we will use the borough’s census info as hexagon’s one. So it means for all hexagons inside one borough, we will treat them the same for census feature.

Accordingly, if a hexagon has a 50% intersection with two boroughs respectively, we will generate the census data of this hexagon, 50% ratio from these two boroughs respectively.

Based on this rule, we can calculate the census for all hexagons.

Let’s merge this data frame with the previous location data frame and generate a new one: candidates_df which contains basic information on each hexagon. We print several rows of this data frame.

Looking good. Now we have census information in each hexagon area.

Then we will calculate the shopping center and movie theaters related information for each hexagon area.

We will calculate the following features for shopping malls and movie theaters:

Now we prepared all the data we need, we can use the K-Means clustering algorithm to group the similar candidate hexagon areas into clusters.

K-Means Clustering

We pick up census features and the number of shopping malls and the number of movie theaters as input features.

We will run an evaluation step first to select the best K which is the number of categories in the algorithm.

We use the Sum of Squared Distance and Silhouette Score two methods to evaluate the K-Means algorithm for different K .

Sum of Squared Distance measures error between data points and their assigned clusters’ centroids. Smaller means better.

Silhouette Score focuses on minimizing the sum of squared distance inside the cluster as well, meanwhile, it also tries to maximize the distance between its neighborhoods. From its definition, the bigger the value is, the better K is.

From the figure, we can see Sum of Squared Distance going down when K becomes bigger. When K=2,3, Silhouette Score is higher, but SSE is still high at that time, we choose K=10 for this project, it's a balanced number for both Sum of Squared Distance and Silhouette Score . Let’s run the K-Means algorithm again with k=10 .

Let’s visualize clustering results with a different color in the map view.

Let’s put everything together on one map view:

From the cluster plot in the above map view, we can see there is one cluster in light blue composed of 4 hexagons in downtown, there are full of movie theaters and shopping malls in this cluster.

The purple cluster contains the area with a lot of shopping malls. The light green cluster contains more shopping malls and movie theaters except for the downtown cluster.

Let’s assign weights to all three movie theaters related features and combine them into one feature. Same for shopping malls. It’s easier for sorting.

We will calculate weighted Mall Score and weighted Cinema Score , then generate a new Score feature for sorting.

The higher final score is, it means there are more shopping malls and fewer movie theaters.

Cluster 7 have the highest score, it has more shopping malls and fewer movie theaters. Let’s explore more characteristics of cluster 7 .

There are 40 hexagons in Cluster 7 with an average of 0.77 Malls in local and 0.0 Cinemas in local . Let’s plot all clusters for comparison of each feature in a bar chart using matplotlib.pyplot library. We highlight Cluster 7 which is our target cluster.

From the bar chart, we can see that Cluster 7 has the most population and density among all the clusters. Furthermore, it has fairly more shopping centers in the hexagon area or nearby and relatively fewer movie theaters.

Next, we sort all hexagons in Cluster 7 by Score in descending order and pick the first 5 hexagons. They will be our first choice position to open a movie theater.

As the above statistics information, there are 1~3 shopping malls in local and more shopping malls nearby, but without any movie theater within 1 km. Looks quite good selections.

Let’s plot Cluster 7 hexagons in the map view, gray out the other clusters and highlight our 5 choices as well.

This concludes our analysis. We have found out 5 most promising zones with more shopping malls nearby and fewer movie theaters around the area. Each zone is in regular hexagon shape which is popular in map view. The zones in the cluster have the most population and density comparing with other clusters.

Results and Discussion

We generated hexagon areas all over Montreal island. And we group them into 10 clusters according to census data information including population, density, age, education, and income. Shopping center information and existing movie theaters information are also considered when running the clustering algorithm.

From data analysis and visualization, we can see movie theaters are always located near shopping malls usually, which inspired us to find out the area with more shopping malls and fewer movie theaters.

After the K-Means Clustering machine learning algorithm, we got the cluster with most shopping malls nearby and fewer movie theaters on average. We also discovered the other characteristics of the cluster. It shows the cluster has the most population and density which implies the highest traffic among all the clusters.

There are 40 hexagon areas in this cluster, we sort all these hexagon areas by shopping malls and movie theaters info in descending order which targets to cover more shopping malls and fewer movie theaters in the local cell or nearby.

We draw our conclusion with the 5 most promising hexagon areas satisfying all our conditions. These recommended zones shall be a good starting point for further analysis. There are also other factors which could be taken into account, e.g. real traffic data and the revenue of every movie theater, parking lots nearby. They will be helpful to find more accurate results.

The purpose of this project is to find an area on Montreal island to open a movie theater.

After fetching data from several data sources and process them into a clean data frame, applying the K-Means clustering algorithm, we picked the cluster with more shopping malls and fewer movie theaters on average. By sorting all candidate areas in the cluster, we get the most 5 promising zones which are used as starting points for final exploration by stakeholders.

The final decision on optimal movie theater’s location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like the parking lot of each location, traffic of existing movie theaters in the cluster, and current revenue of them, etc.

https://github.com/kyokin78/Coursera_Capstone/blob/project/CapstoneProject_OpenCinemaInMontreal.ipynb

Capstone Project — Open an Italian Restaurant in Berlin, Germany

More from Towards Data Science

Your home for data science. A Medium publication sharing concepts, ideas and codes.

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Certified IBM Data Scientist, Senior Android Developer, Mobile Designer, Embracing AI, Machine Learning…

Text to speech

Capstone Project – The Battle of Neighborhoods

capstone project the battle of neighborhoods

Project Report | Capstone Project – The Battle of Neighborhoods

1. intoduction.

This project aims to find the better neighborhood environment, like pub, park or gym. With the map of Scarborough, Toranto, this project will help people to decide which neighborhood is the beneficial place compared to various participants neighborhood.

As lots of people are migrating to Toranto. They need information and resources to balance housing price and schools for their children. So, this project is for those people among choosing neighborhoods, like the access to cafes, schools, super markets, hospitals, etc.

This project will analyse features for people migrating to Scarborough to search a best neighborhood. The features include median housing price and better school, lower crime rates, road connectivity, good management for emergency facilities, and recreational facilities.

People will get an awareness of that area before moving to a new city.

Foursquare API Data:

Foursquare provides different data information in different venues among neighborhoods. Those information includes venue names, locations, menus and even photos. The foursquare platform is to obtain the required information through the API.

With the gaining information of neighborhoods, Foursquare API would gather information about venues of neighborhoods. For each neighborhood, the radius is 100 meters.

Foursquare data contains venues, longitude, latitude and postcodes. The information obtained per venue as follows:

在这里插入图片描述

Data Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In this project, I’d like to use Scarborough dataset which we scrapped from wikipedia on Week 3. Dataset consisting of latitude and longitude, zip codes.

3. Methodology Section

在这里插入图片描述

Clustering Approach:

To compare the similarities of two cities, we decided to explore neighborhoods, segment them, and group them into clusters to find similar neighborhoods in a big city like New York and Toronto. To be able to do that, we need to cluster data which is a form of unsupervised machine learning: k-means clustering algorithm.

在这里插入图片描述

With my credentials of Foursquare API, features of the neighborhoods would be gathered and utilized. As the request limitations, the radius parameter of neighborhood is set to 700 and the total number of place requests is set to 100.

在这里插入图片描述

5. Discussion

Problem solved:.

The purpose of this project offer suggestions to people with a better neighborhood in Scarborough.So the connectivity to the airport, bus stops, distance to the downtown, markets and etc are conuts.

Sorted list of house in terms of housing prices in a ascending or descending order Sorted list of schools in terms of location, fees, rating and reviews

6. Conclusion

With the help of k-means cluster algorithm, the neighborhoods are separated into 10 clusters with 103 different lattitude and logitude from the dataset. The dataset has similar neighborhoods around. The charts represents a particular neighborhood with average house prices and school ratings .

I really appreciate this opportunity and experience with the efforts to deal with all the tasks. This project is a practical application in a real situation by using Data Science tools.The mapping with Folium is a useful tool to consolidate information and make analysis visualized.

Improvment:

With farther works, this project could be more precise in terms in find the best house not only based on price in Scarborough. It may requires something else around.

Depended Libraries :

Pandas: To create and edit dataframes.

Folium: To visualize the neighborhood clusters distribution.

Scikit Learn: To import clustering algorithms.

JSON: To handle JSON files.

XML: To separate data from presentation and XML stores data in plain text format.

Geocoder: To retrieve location from data.

Beautiful Soup and Requests: To extract data from HTML and XML.

Matplotlib: To draw plots.

capstone project the battle of neighborhoods

“相关推荐”对你有帮助么?

capstone project the battle of neighborhoods

请填写红包祝福语或标题

capstone project the battle of neighborhoods

你的鼓励将是我创作的最大动力

capstone project the battle of neighborhoods

您的余额不足,请更换扫码支付或 充值

capstone project the battle of neighborhoods

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。 2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

capstone project the battle of neighborhoods

Doug Marcum

May 30, 2019

The Battle of Neighborhoods — Coursera IBM Capstone Project

1 ) Introduction/Business Problem — The basis of this study is to help a small group of investors planning to open their first U.S. based brewery / restaurant expansion in Toronto. Being that Toronto is the most populated city in Canada, and continually ranks as an important global city based on a high quality of living, the choice to expand into the neighbor of the north market was an easy selection for the investing group. However, with limited knowledge of the Toronto market, the group of investors have selected us to assist in the selection of which areas of Toronto will facilitate a launch of their brewery / restaurant expansion.

They are interested in building in an area that meets the following criteria:

With these criteria given by the investing group, based on previous success in other markets, the objective is to locate and recommend to the investors, the target audience, which neighborhood(s) of Toronto will be the best choice to start their international growth plan. The information gained will assist in choosing the right location by providing data about the population of each neighborhood, in addition to other established venues present in these areas.

Additionally, this information could be of interest to other potential investors looking to open a new restaurant or entertainment venue in Toronto.

2 ) Data — The necessary information needed by the investing group will come from the following sources:

City of Toronto Neighborhood Profiles for providing an overview of the neighborhoods in Toronto

City of Toronto Open Data Catalogue : The Census of Population is held across Canada every five years (the last being in 2016), and collects data about age and sex, families and households, language, immigration and internal migration, ethnocultural diversity, Aboriginal peoples, housing, education, income, and labor. City of Toronto Neighborhood Profiles use this Census data to provide a portrait of the demographic, social and economic characteristics of the people and households in each City of Toronto neighborhood. The profiles present selected highlights from the data, but these accompanying data files provide the full data set assembled for each neighborhood.

In these profiles of the City of Toronto’s 140 social planning neighborhoods. These social planning neighborhoods were developed by the City of Toronto to help government and community organizations with local planning by providing socio-economic data at a meaningful geographic area. The boundaries of these social planning neighborhoods are consistent over time, allowing for comparison between Census years. Neighborhood level data from a variety of other sources are also available through the City’s Wellbeing Toronto mapping application and here on the Open Data portal.

Each data point in this file is presented for the City’s 140 neighborhoods, as well as for the City of Toronto as a whole. The data is sourced from several Census tables released by Statistics Canada. The general Census Profile is the main source table for this data, but other Census tables have also been used to provide additional information. CSV File

City of Toronto Neighborhood Shapes for mapping : GeoJSON File

Wikipedia for Toronto Neighborhood Borough Designation : Each of the 140 social planning neighborhoods of Toronto reside within a defined borough. While the City of Toronto is a singular municipality, the 140 neighborhoods are still grouped into six distinct boroughs.

Foursquare API to collect information on other venues/competitors in the neighborhoods of Toronto

3 ) Methodology — In order to establish the targeted neighborhood(s), we will explore the demographics of the neighborhoods in the city of Toronto by segmenting the data and conducting descriptive analysis using Panda. Additional data will be gleaned by web scraping and API will be used to generate data.

Data Group 1 Stage A — Census Data 1. Data was pulled into from the City of Toronto Neighborhoods Profile Census CSV File to create a dataframe. 2. This dataframe contains all the census data (2016) of the neighborhoods of Toronto that will be filtered. 3. Data is filtered into columns based on neighborhood population, male and female age groups, education level, and after-tax income.

Stage B — Web scraping to align neighborhoods with boroughs 1. Wikipedia page for Toronto Neighborhood Borough Designations is scraped using BeautifulSoup. 2. Scraped data is transformed to dataframe. 3. Merge this dataframe with Census Data dataframe.

Stage C — Pull Toronto shape file 1. Get the shape file . 2. Remove unnecessary data and merge to previous dataframe.

Data Group 2 Stage A — Establish Medians and scoring system 1. Calculate medians of the demographic columns across the 140 neighborhoods. Median Population: 16749.5 Median Higher Education: 4122.5 Median Female: 1952.5 Median Male: 1800.0 Median After Tax Income: $36538.5 2. From the criteria delivered by the investor group, each category was given a standardized score based the category being divided by its median score and then multiplied by a factor of given importance. The columns are the summed to create a total score for each neighborhood. The dataframe is then merged to create a dataframe with all necessary data.

3. From here, utilizing a choropleth folium map, a clearer picture of the neighborhoods of Toronto becomes apparent.

Data Group 3 Stage A — Toronto geographical data is utilized as the input into the Foursquare API, that returned venues from all of Toronto 1. Using the geographical coordinates of each neighborhood in Toronto, calls are made to the Foursquare API to return the top 100 venues in a radius of 1610 meters, approximately a one-mile radius. 2. The data is then visualized via a folium map.

3. Foursquare is called on again to narrow the list to the neighborhoods within the top fifteen (15) total score (with the same parameters as the previous call), and this data is mapped as well.

Data Group 4 Stage A — Analyze the Top 15 neighborhoods of Toronto 1. Utilize one hot encoding to transform our list of established venues in the top 15 neighborhoods to return a shape of (1198, 208).

2. Group by neighborhoods. Filter out venues related to retail and personal care categories to focus on activity centered venues (i.e. bars, nightclubs, restaurants, attractions, etc…). Create a small dataframe to display totals for each neighborhood.

3. Drop neighborhood Rogue due to extreme limitations on venues after final filter.

4 ) Results — With the data now ready, we run k-means to cluster the neighborhoods into three (3) clusters. The cluster number was established after multiple samplings and iterations. With our clusters established, this dataframe is merged with the total scores data to provide us with our final pieces of criteria in selecting the appropriate neighborhood(s).

2. The clusters are visualized via a stamen toner folium map:

5 ) Discussion — From the results discovered and presented, the following observations and recommendations can be made:

6 ) Conclusion — In conclusion, the scope of this of the analysis is somewhat limited. The hospitality industry is ever changing, and the information afforded us may be dated due to relying on user information via Foursquare. Overall though, the model created can easily be replicated again and again with monitored data via the Foursquare API and the data from the forthcoming census in 2021. With the data analyzed and scoring system established by the investor group, we stand by the recommendations made.

More from Doug Marcum

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Doug Marcum

Text to speech

Eshita Goel

Jul 29, 2020

Battle of the Neighborhoods

Ibm applied data science professional certificate — capstone project.

As part of the 9 Course series from the IBM Applied Data Science Specialization offered by IBM on Coursera, the 9th course tests us on our ability to make an independent project.

This project demands the use of Foursquare API for Data Analysis and allows us to implement the various techniques learnt during the specialization. This includes various Data Visualisation Techniques.

Introduction

This report is part of the Capstone Project for IBM’s Applied Data Science Professional Certificate offered by Coursera. This is part of the final course in this 9-course series.

We will be using several data visualization techniques, in particular, we will be making use of Foursquare API to retrieve location data for the state of Toronto in Canada and use this data to perform data analysis.

Introducing the Project

Toronto is a big state with numerous neighbourhoods. Each neighbourhood has its share of shops, restaurants, cafes, beaches etc. The different places add to the vibrance of the state and make it a great place to live and travel for others. The place also provides many opportunities for entrepreneurs, especially those that want to start afresh. Being so close to the capital of Canada, it receives its fair share of foot traffic from foreigners. In this project, we aim to analyze the various places in Toronto using location data imported from the Foursquare API.

2. The Problem

If a person has to open a new restaurant in a neighbourhood in Toronto then what neighbourhood should he/she choose based on the restaurant type. And if they have a specific restaurant type in mind then what place would be best ensuring a good amount of customer traffic but also keeping in mind the amount of competition. We also cluster the neighbourhoods based on the most popular spots to visit so that we can make it easier for a new business person to choose the right neighbourhood for their restaurant/shop.

3. Interest

This report will be useful for those who want to start a new business in the state of Toronto. It will also be helpful for those who want to travel to Toronto and want to visit specific locations based on their interest. For example, if a tourist wants to visit multicultural restaurants in Toronto, then what neighbourhoods are best? They should ideally visit those neighbourhoods where multicultural restaurants are popular among the people. The clustering of the neighbourhoods based on the most visited spots allows people to decide where to travel and explore more. Clusters tell us what neighbourhoods are fairly similar to each other so the person can skip travelling to many of the same neighbourhoods.

Data Acquisition and Cleaning

We make use of a few data sources to get the data required for this project. We get different kinds of Neighborhoods in Toronto along with the Boroughs and Postal codes from Wikipedia:

List of postal codes of Canada: M

This is a list of postal codes in canada where the first letter is m. postal codes beginning with m are located within….

en.wikipedia.org

We get the Latitudes and Longitudes for each postal code through the CSV file provided to us by the Coursera Applied Data Science Capstone Week 3 Module. We get the various location-related data, like the kinds of places in a particular neighbourhood, using Foursquare API. This data will include the type of shops, restaurants, cafes, beaches etc in each neighbourhood.

2. Acquiring the Data

We acquire the data about the various neighbourhoods, boroughs and postal codes from the Wikipedia page using Beautiful Soup. We put it into a data frame. The latitudes and longitudes are in a CSV file that can be read using pandas. We will make calls to Foursquare API using our credentials to acquire the location-related data.

3. Cleaning the data

Once we have our data that includes the Postal Codes, Boroughs and Neighbourhoods in Canada, we drop all rows where the borough is unknown. We want to focus on the data that has been assigned a borough.

More than one neighbourhood can exist that has the same postal code, we combine such rows. We make a single row for each postal code and the subsequent neighbourhoods that are associated with that postal code would be put into the same row separated by commas.

If a particular postal code does not have a neighbourhood assigned but that row has an assigned borough, then the neighbourhood is considered to be the same as the borough.

We sort the data by the postal codes.

Then we merge the data frame with the stored values of the latitudes and longitudes of each postal code.

Here we get our final data frame.

4. Feature Selection

We want to focus on neighbourhoods in Toronto. So we drop all rows that have Boroughs outside of Toronto. We keep the data from Toronto. Finally, our data frame is ready to use.

Exploratory Data Analysis

We can visualize the various neighbourhoods in Canada by drawing a map and plotting the neighbourhoods on top. This allows us to see what we are dealing with and how the neighbourhoods are scattered.

We need to focus on the neighbourhoods in Toronto. We have already made a separate data frame with data from Toronto (i.e. East, West, North and Downtown Toronto). We visualize this data using a map centred in Toronto but only plotting the neighbourhoods in Toronto.

2. Exploring the Neighborhoods

We now see how we can find out what venues are there in each neighbourhood in Toronto. We can do this by exploring any one neighbourhood. The process would then be similar for all subsequent neighbourhoods.

We find what is the first neighbourhood in our list of neighbourhoods in Toronto. It is “The Beaches”. We use Foursquare API to get the top 100 venues that are in The Beaches within a radius of 700 metres. This gives us an idea of the types of locations that can be present in a neighbourhood.

The first 5 of these venues are :

We can see that around “The Beaches”, few of the locations include a Trail, Gastropub, Bakery, Vegan Restaurant and Ice Cream shop. Each venue has its assigned category along with the latitudes and longitudes.

Similarly, we can explore the other neighbourhoods and this information can be very useful for a potential business owner wanting to start a new business in any of the neighbourhoods.

Methodology

Our first step is making a dataset with all the neighbourhoods along with the different venues near that neighbourhood. This dataset will allow us to group the neighbourhoods together according to the similarity in the type of venues in each neighbourhood. For example, if two neighbourhoods are very popular for their beaches and cafes then they can be put into the same group.

Now we have a dataset of all the neighbourhoods and their corresponding venues along with the categories of the venues.

2. Grouping the neighbourhoods based on the topmost common venues

Using the above dataset, we can start to group the neighbourhoods based on the similarity of their topmost venues. If two neighbourhoods have the same top few venues then they can be groups together in the same row. We make use of the mean of the frequency of the occurrence of each category and combine the neighbourhoods with similar venues. We make a data frame of the postal codes along with the grouped neighbourhoods and the top 10 most common venues.

The top 5 rows of this data frame will look like :

This clearly shows the similar neighbourhoods in the same row along with the top 10 most popular places in those neighbourhoods.

3. Clustering the data

We will cluster the data using KMeans Clustering.

We already have our grouped data where similar neighbourhoods have been grouped together based on the top venues in these neighbourhoods. We can now perform KMeans Clustering to associate each group to a cluster. We cluster these neighbourhoods into 5 clusters and label them accordingly.

We will then analyze based on the clusters how similar the neighbourhoods are to each other. Neighbourhoods in the same cluster are likely to have similar categories of venues and thus opening a new branch for the business in the same cluster will not be ideal.

Most of the neighbourhoods in Toronto fall in the same cluster (Indicated in Blue). The neighbourhoods in the Red cluster are mainly on the outskirts of Toronto.

There are 2 neighbourhoods that form their own cluster and one cluster (indicated in purple) consists of only 2 neighbourhoods.

Since the neighbourhoods were clustered based on the similarity in the categories of popular venues, it can be observed that most neighbourhoods have the same category of venues.

Our aim was to help potential business owners and tourists in picking our the right neighbourhood to travel or open a business in. For example, if a business owner wants to open a Vegan Cafe, he/she must choose the neighbourhood carefully making sure that there aren’t any other popular Vegan Cafes in the same location. If so, he/she can face a lot of competition from an already established place.

But they should also keep in mind the interest of the people. People in a particular neighbourhood should be interested in the entrepreneur’s business.

Also, when tourists visiting Toronto plan their holiday, they would want to visit different kinds of places. Visiting neighbourhoods that have almost the same characteristics would not be ideal.

This is why our aim was to cluster the neighbourhoods based on their similarity, i.e, their most popular locations. For example, if two neighbourhoods are popular for Indian restaurants, then both of them can be put into the same cluster. This alerts business owners that if they want to open a particular kind of restaurant and one neighbourhood is not ideal, then all neighbourhoods in that cluster are not ideal. This also gives them an idea about how they can scale profitably and open more branches in different clusters.

Conclusions

The clusters allow interested people to understand how similar neighbourhoods are in Toronto.

Using the data about the 10 most popular venues in each neighbourhood group allows people to choose the right neighbourhood for starting a new business or opening a new branch for their already existing business.

The similarity in neighbourhoods allows tourists to decide which places they should add on the to-visit list without making redundant choices. It also tells them what are the most popular places n each neighbourhood that they must visit.

This project has been done with help from the labs provided by the IBM Applied Data Science Capstone Course on Coursera. Certain ideas for this project have been taken from the labs. All pictures used are my own.

You can find my GitHub code for this project here :

eshitagoel/Coursera_Capstone

This is my capstone project for coursera course. contribute to eshitagoel/coursera_capstone development by creating an…, more from eshita goel.

Data Scientist based in London

About Help Terms Privacy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store

Eshita Goel

Text to speech

Capstone Project – The Battle of Neighborhoods | Finding a Better Place in Scarborough, Toronto

1. introduction:.

The purpose of this Capstone Project is to help people in exploring better facilities around their neighborhood. It will help people making smart and efficient decision on selecting great neighborhood out of numbers of other neighborhoods in Scarborough, Toranto.

Lots of people are migrating to various states of Canada and needed lots of research for good housing prices and reputated schools for their children. This project is for those people who are looking for better neighborhoods. For ease of accessing to Cafe, School, Super market, medical shops, grocery shops, mall, theatre, hospital, like minded people, etc.

This Capstone Project aim to create an analysis of features for a people migrating to Scarborough to search a best neighborhood as a comparative analysis between neighborhoods. The features include median housing price and better school according to ratings, crime rates of that particular area, road connectivity, weather conditions, good management for emergency, water resources both freash and waste water and excrement conveyed in sewers and recreational facilities.

It will help people to get awareness of the area and neighborhood before moving to a new city, state, country or place for their work or to start a new fresh life.

2. Data Section

Data Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Will use Scarborough dataset which we scrapped from wikipedia on Week 3. Dataset consisting of latitude and longitude, zip codes.

Foursquare API Data:

We will need data about different venues in different neighborhoods of that specific borough. In order to gain that information we will use “Foursquare” locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

After finding the list of neighborhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighborhood. For each neighborhood, we have chosen the radius to be 100 meter.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:

Map of Scarborough

3. Methodology Section

Clustering approach:.

To compare the similarities of two cities, we decided to explore neighborhoods, segment them, and group them into clusters to find similar neighborhoods in a big city like New York and Toronto. To be able to do that, we need to cluster data which is a form of unsupervised machine learning: k-means clustering algorithm.

Using K-Means Clustering Approach | Most Common Venue

Most Common Venues near Neighborhood | Using Clustering

Using credentials of Foursquare API features of near-by places of the neighborhoods would be mined. Due to http request limitations the number of places per neighborhood parameter would reasonably be set to 100 and the radius parameter would be set to 500.

would be set to 500.

4. Results Section

Map of Clusters in Scarborough

Average Housing Price by Clusters in Scarborough

School Ratings by Clusters in Scarborough

The Location:

Scarborough is a popular destination for new immigrants in Canada to reside. As a result, it is one of the most diverse and multicultural areas in the Greater Toronto Area, being home to various religious groups and places of worship. Although immigration has become a hot topic over the past few years with more governments seeking more restrictions on immigrants and refugees, the general trend of immigration into Canada has been one of on the rise.

Foursquare API:

This Capstone project have used Four-square API as its prime data gathering source as it has a database of millions of places, especially their places API which provides the ability to perform location search, location sharing and details about a business.

5. Discussion Section

Problem which tried to solve:.

The major purpose of this project, is to suggest a better neighborhood in a new city for the person who are shiffting there. Social presence in society in terms of like minded people. Connectivity to the airport, bus stand, city center, markets and other daily needs things nearby.

6. Conclusion Section

In this Capstone project, using k-means cluster algorithm I separated the neighborhood into 10(Ten) different clusters and for 103 different lattitude and logitude from dataset, which have very-similar neighborhoods around them. Using the charts above results presented to a particular neighborhood based on average house prices and school rating have been made.

I feel rewarded with the efforts and believe this course with all the topics covered is well worthy of appreciation. This project has shown me a practical application to resolve a real situation that has impacting personal and financial impact using Data Science tools. The mapping with Folium is a very powerful technique to consolidate information and make the analysis and decision better with confidence.

Future Works:

This Capstone project can be continued for making it more precise in terms to find best house in Scarborough. Best means on the basis of all required things(daily needs or things we need to live a better life) around and also in terms of cost effective.

Libraries Which are Used to Develope the Project:

Pandas: For creating and manipulating dataframes. Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map. Scikit Learn: For importing k-means clustering. JSON: Library to handle JSON files. XML: To separate data from presentation and XML stores data in plain text format. Geocoder: To retrieve Location Data. Beautiful Soup and Requests: To scrap and library to handle http requests. Matplotlib: Python Plotting Module.

GitHub Link of Complete Project : https://github.com/roshangrewal/Coursera_Capstone

Letest Tweets

Recent posts, most useful vs code extension’s which makes developer life easier, ai for everyone: andrew ng wants to tell us with this technical course in 31 points., top most 49 artificial intelligence interview questions & answers, how to get more work done in a day than most people do in a week, 2 replies to “capstone project – the battle of neighborhoods | finding a better place in scarborough, toronto”.

201401 651207I see something actually fascinating about your internet web site so I saved to bookmarks . 118502

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Get In Touch

Please fill the below form and will get back to you within 24 hours.

IMAGES

  1. GitHub

    capstone project the battle of neighborhoods

  2. Data Science: Capstone Project

    capstone project the battle of neighborhoods

  3. IBM Data Science Capstone Project

    capstone project the battle of neighborhoods

  4. Capstone Project -- Battle of Neighbourhoods

    capstone project the battle of neighborhoods

  5. Capstone-Project---The-Battle-of-Neighborhoods/Capstone Project

    capstone project the battle of neighborhoods

  6. Capstone Project The Battle of Neighborhoods

    capstone project the battle of neighborhoods

VIDEO

  1. Grunkzone Fighter Maker Marathon (5/30)

  2. HALOWC 2022 VICTORY BUNDLE

  3. Fire & Stone Siege of Vienna 1683 Explained!

  4. NEW! HALOWC 2022 VICTORY BUNDLE

  5. Wednesday Service

  6. How Rooftop Koreans Defended Los Angeles

COMMENTS

  1. DeNemchenko/Capstone-Project---The-Battle-of-Neighborhoods

    Capstone-Project---The-Battle-of-Neighborhoods This is the final assignment of the Applied Data Science Capstone Course by IBM on Coursera Introduction New York City's demographics show that it is a large and ethnically diverse metropolis. It is the largest city in the United States with a long history of international immigration.

  2. Capstone Project

    A.1. Background & Problem Description. New York City, the most populous city in the United States, one of the greatest metropolises over the world, is a dream place for gourmet to seek delicious cuisine. Its food culture includes an array of international cuisines influenced by the city's immigrant history.

  3. The Battle of Neighborhoods

    The Battle of Neighborhoods — Coursera Capstone Project This is my first Data Science project for coursera final assignment. Introduction New York is a major central city for diversity...

  4. IBM Data Science Professional

    In this project titled 'The Battle of Neighborhoods', my aim was to explore, segment and cluster the neighborhoods in the city of Vancouver. Undeniably, Vancouver is known as the "sushi...

  5. IBM Data Science Capstone Project

    IBM Data Science Capstone Project — Battle of the Neighborhoods | by ShengJun | Towards Data Science 500 Apologies, but something went wrong on our end. Refresh the page, check Medium 's site status, or find something interesting to read. ShengJun 137 Followers Engineer | Analyst | Matcha & Coffee lover | Iaidoka More from Medium Zach Quinn in

  6. Capstone_the Battle Of The Neighborhoods

    Capstone Project - The Battle of the Neighborhoods (Week 2) Applied Data Science Capstone by IBM/Coursera Table of contents Introduction: Business Problem Data Methodology Analysis Results and Discussion Conclusion Introduction: Business Problem In this project we will try to find an optimal location for a breakfast point. Specifically, this report...

  7. The Battle of Neighborhoods: Coursera Capstone Project

    Bayside neighborhood has the highest number of Indian restaurants. Jamaica Estates neighborhood has a high density of Indian restaurants. Cluster 0 neighborhoods have the least number of Indian restaurants. I will open my restaurant in the South Ozone Park neighborhood because it is near the International Airport. Because all immigrants will ...

  8. GitHub

    The following plot shows only the neighborhoods from which 10 or more than 10 venues were obtained. The resultant dataset consists of 37 neighborhoods. Next, we will perform one hot encoding on the filtered data to obtain the venue categories in each neighborhood.

  9. PDF Capstone-Project---The-Battle-of-Neighborhoods/Capstone Project

    Contribute to DeNemchenko/Capstone-Project---The-Battle-of-Neighborhoods development by creating an account on GitHub.

  10. Capstone Project

    Capstone Project — The Battle of Neighborhoods (Week 1) A Visual Approach to determine Strategic Locations for Masks and Medical Devices Distribution for COVID-19 treatment based on confirmed...

  11. Battle of the Neighborhoods. Coursera Data Science Capstone Project

    Battle of the Neighborhoods. ... This Data science capstone project has helped me explore various aspects of data science by covering topics like web scraping, data cleaning, statistics and ...

  12. Capstone Project

    Capstone Project - The Battle of Neighborhoods # datascience # python # beginners Applied Data Science Capstone by IBM/Coursera 1. Introduction: COFFEE ETHNIC IN DA NANG In such a large and rich of coffee culture city like Da Nang, Viet Nam, it will be competitive to start up coffee business.

  13. The Battle of the Neighborhoods

    Leveraging Folium, Foursquare APIs for Capstone project. It constructs a honeycomb hexagon grid on the map and uses K-Means to cluster the cells. After visualized analysis including choropleth map and bar charts plot, it comes out several candidates for stakeholders to make the final decision. ... The Battle of the Neighborhoods — Open a ...

  14. Capstone Project

    Project Report | Capstone Project - The Battle of Neighborhoods 1. Intoduction. This project aims to find the better neighborhood environment, like pub, park or gym. With the map of Scarborough, Toranto, this project will help people to decide which neighborhood is the beneficial place compared to various participants neighborhood.

  15. The Battle of Neighborhoods

    The Battle of Neighborhoods — Coursera IBM Capstone Project. 1) Introduction/Business Problem — The basis of this study is to help a small group of investors planning to open their first U.S. based brewery / restaurant expansion in Toronto. Being that Toronto is the most populated city in Canada, and continually ranks as an important global ...

  16. Capstone Project

    Capstone Project - The Battle of the Neighborhoods ... of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level ...

  17. Capstone Project

    Capstone Project - Battle Of Neighbourhoods. ... The aim of this project is to demonstrate how Foursquare data can be used to explore and compare neighbourhoods or cities of choice and which ...

  18. Battle of the Neighborhoods. IBM Applied Data Science Professional

    Battle of the Neighborhoods IBM Applied Data Science Professional Certificate — Capstone Project As part of the 9 Course series from the IBM Applied Data Science Specialization offered by...

  19. Capstone Project

    1. Introduction: The purpose of this Capstone Project is to help people in exploring better facilities around their neighborhood. It will help people making smart and efficient decision on selecting great neighborhood out of numbers of other neighborhoods in Scarborough, Toranto.