Visualizing hotels in Senegal (two regions Dakar and Thies)
Visualizing Hotels in Senegal with Python
In this project, we will try to extract data from google earth pro in kml format, then convert them to xlsx file (excel format). We only select the necessary columns such as name of hotels, addresses, tel and coordinates. Our file will not include geometry column at the moment, later on we will see how to convert the excel file or csv file into geojson or shapefile then visualize our locations.
Extracting data (scrapping) from google earth pro
When you open a google earth pro software which is totally free now, on the search bar, enter your search, it might appear many pages, for example in my case of study today, i've entered hotels in Senegal, there are a lot of pages, just go to next panel (an icon like a Folder telling you 'copy the current research to a file'), click on that icon, then scroll down and check it, you will see the same title you've entered on the research panel, right click to rename it, for example, give it a name 'page1 hotels in Senegal', then right click on it and select 'save place as ', choose 'kml'. '
Next: You need to convert the kml file format to xlsx file or csv file.
Go to file where you saved your kml file, then right click on it, select 'properties', in 'open with', select 'change', then change it to notepad or note depends on your computer.
Next, open an empty excel file, select 'file', select 'open', find your file and open it, some panels wiill appear just select the first ones always, you can change the column names and choose the interested columns, the rest you can delete it and save your file as xlsx or csv file.
Loading the necessary libraries and file
hotel_names addresses phone_number coordinates 0 Radisson Blu Hotel, Dakar Sea Plaza Route de la Corniche O, Dakar 16868, Senegal +221 33 869 33 33 -17.4730893,14.6945826,0 1 Radisson Hotel Dakar Diamniadio Prolongement Autoroute a peage a cote du centr... +221 32 824 48 48 -17.1970069,14.7385608,0 2 Paradialaw Senegal, Toubab Dialao, toubab dialaw , prés d... NaN -17.14559,14.5963946,0 3 Hotel Club Royal Saly Route Saly-Ngaparou M'bour SN, 23002, Senegal +221 33 939 52 30 -17.0221955,14.4432474,0 4 Le Grand Hôtel de Thiés Thies, Senegal +221 33 951 00 98 -16.9280314,14.7950366,0 array(['hotel_names', 'addresses', 'phone_number', 'coordinates'], dtype=object)
hotel_names addresses phone_number coordinates count 30 30 28 30 unique 30 30 28 30 top ibis Dakar Avenue Nelson Mandela, Dakar, Senegal +221 33 869 33 33 -17.0221955,14.4432474,0 freq 1 1 1 1 We can see there are two missing values in phone_number column, if we are really interested on, like doing a marketing business, we can google the name of the hotels, find the number and complete our data. But here, we are in a case of study, let's see which hotels are missing with phone number
hotel_names addresses phone_number coordinates 2 Paradialaw Senegal, Toubab Dialao, toubab dialaw , prés d... NaN -17.14559,14.5963946,0 12 Mbind Siga Mbind Siga, Mbodiene, Senegal NaN -16.924339,14.8047,0 The most two populated regions in Senegal (which count 14 regions) are 'Dakar' and 'Thies'. These two regions are next to each other and it's normal, because Dakar is the Capital City of Senegal and Thies is the closest region of Dakar. Let's do some text mining with our data, we want to extract only those hotels in Ddakar, for that we make sure that 'Dakar' is in the addresses.
hotel_names addresses phone_number coordinates 0 Radisson Blu Hotel, Dakar Sea Plaza Route de la Corniche O, Dakar 16868, Senegal +221 33 869 33 33 -17.4730893,14.6945826,0 5 Boutique Hôtel La Villa Racine 37, Senegal extension Immeuble Kébé, Rue Jules... +221 33 889 41 41 -17.436217,14.6663435,0 7 ibis Dakar Avenue Abdoulaye Fadiga, Dakar 18524, Senegal +221 33 829 59 59 -17.4268162,14.6687365,0 9 Terrou-Bi Resort Boulevard Martin Luther King Dakar, 11500, Sen... +221 33 839 90 39 -17.4660981,14.6769092,0 10 Le Ndiambour - Hôtel et Résidence Dakar Senegal SN, 121 Rue Carnot, Dakar 11000, Senegal +221 33 889 42 89 -17.4395282,14.6674174,0 15 International VDN Hotel Dakar cices foire 651 Dakar 38233, Senegal +221 78 436 91 17 -17.4677832,14.7452296,0 16 La Madrague Ngor Traversée vers l'ile de Ngor(2eme Plage), Daka... +221 33 820 02 23 -17.5104885,14.7500204,0 17 Le Djoloff Dakar, Senegal +221 33 889 36 30 -17.462291,14.679639,0 19 Pullman Dakar Teranga Place de l'Independance, 10 Rue PL 29, Dakar 1... +221 33 889 22 00 -17.4307642,14.667648,0 20 Le Lodge des Almadies SN, Route des Almadies, Dakar 29339, Senegal +221 33 869 03 45 -17.5150118,14.7398163,0 21 Hotel Sokhamon Avenue Nelson Mandela, Dakar, Senegal +221 33 889 71 00 -17.441901,14.663395,0 22 Hotel Jardin Savana Dakar Corniche Est Dakar Plateau, Dakar, Senegal +221 33 849 42 42 -17.4315413,14.6534857,0 23 Hotel Lagon Rte de la Corniche Estate, Dakar, Senegal +221 33 889 25 25 -17.4273289,14.667817,0 25 Hotel Cabourg SN, Dakar 12000, Senegal +221 33 860 07 04 -17.491721,14.7546323,0 28 La Demeure Route de la Corniche O, Dakar, Senegal +221 33 820 76 79 -17.507246,14.735484,0 The above results are all hotels located in Dakar
We can use the same code to extract hotels in 'Thies'
hotel_names addresses phone_number coordinates 4 Le Grand Hôtel de Thiés Thies, Senegal +221 33 951 00 98 -16.9280314,14.7950366,0 We have 1 hotel located in Thies in our data
Let's check how many hotels we have in our data
3015What about the rest of hotels location in senegalese regions
14There are 14 hotels which are not in Dakar and not in Thies
When i opened my data in excel file, i noticed in column of addresses, there are locations like "M'bour, Sally", those are in the region of Thies. In other side, there are locations like 'Toubab Diallao and so on' are located in Dakar, but those information i htink someone who visited Senegal or know Senegal will know that distinction.
Let's move on to the second party of our data analysis today, visualizing the addresses
Senegal and Geospatial
The title Senegal and Geospatial can be strange, but the meaning is to play some geospatial function over hotels in Senegal
There are two spatial data that we can use: Vector and Rasta\ Vector data is when we record data as point, line, polygon and so on.\ Rasta is mostly a satelite image ....
Exploring the hotels in Senegal
let's check the coordinates column in our data to do some processing
Our data contains the coordinates of the point locations of hotels, we would like our data contains the type of hotels (1 star to 5 stars), unfortunetelly, we don't have them. Let's create two columns (x and y), x will represent t he latitude and y the longitude.
0 -17.4730893,14.6945826,0 1 -17.1970069,14.7385608,0 2 -17.14559,14.5963946,0 3 -17.0221955,14.4432474,0 4 -16.9280314,14.7950366,0 Name: coordinates, dtype: objectThe numbers starting with - are the longitude and numbers starting with 14 are the latitude. We need to remove ,0 in our data
hotel_names addresses phone_number coordinates 0 Radisson Blu Hotel, Dakar Sea Plaza Route de la Corniche O, Dakar 16868, Senegal +221 33 869 33 33 -17.4730893,14.6945826 1 Radisson Hotel Dakar Diamniadio Prolongement Autoroute a peage a cote du centr... +221 32 824 48 48 -17.1970069,14.7385608 2 Paradialaw Senegal, Toubab Dialao, toubab dialaw , prés d... NaN -17.14559,14.5963946 3 Hotel Club Royal Saly Route Saly-Ngaparou M'bour SN, 23002, Senegal +221 33 939 52 30 -17.0221955,14.4432474 4 Le Grand Hôtel de Thiés Thies, Senegal +221 33 951 00 98 -16.9280314,14.7950366
hotel_names addresses phone_number coordinates x y 0 Radisson Blu Hotel, Dakar Sea Plaza Route de la Corniche O, Dakar 16868, Senegal +221 33 869 33 33 -17.4730893,14.6945826 -17.4730893 14.6945826 1 Radisson Hotel Dakar Diamniadio Prolongement Autoroute a peage a cote du centr... +221 32 824 48 48 -17.1970069,14.7385608 -17.1970069 14.7385608 2 Paradialaw Senegal, Toubab Dialao, toubab dialaw , prés d... NaN -17.14559,14.5963946 -17.14559 14.5963946 3 Hotel Club Royal Saly Route Saly-Ngaparou M'bour SN, 23002, Senegal +221 33 939 52 30 -17.0221955,14.4432474 -17.0221955 14.4432474 4 Le Grand Hôtel de Thiés Thies, Senegal +221 33 951 00 98 -16.9280314,14.7950366 -16.9280314 14.7950366 x is the longitude and y is the latitude
hotel_names object addresses object phone_number object coordinates object x object y object dtype: objectThis is not normal, because the corrdinates column is numeric and the x and y also, let's first try to remove the white space before going to another solution
hotel_names object addresses object phone_number object coordinates object x object y object dtype: objecthotel_names object addresses object phone_number object coordinates object x float64 y float64 dtype: objectNow, i will make a copy of my data to keep it for future error or any unexpected error cases
We have in the above figure a different locations of the hotels depends on the coordinates, but we need to see them in a map, let's add a background map using contextily library
The image is not clear, this is due to the crs (coordinate reference system), we still did not convert our xlsx format to geojson or spatial format., let's do that first
hotel_names addresses phone_number coordinates x y geometry 0 Radisson Blu Hotel, Dakar Sea Plaza Route de la Corniche O, Dakar 16868, Senegal +221 33 869 33 33 -17.4730893,14.6945826 -17.473089 14.694583 POINT (-17.47309 14.69458) 1 Radisson Hotel Dakar Diamniadio Prolongement Autoroute a peage a cote du centr... +221 32 824 48 48 -17.1970069,14.7385608 -17.197007 14.738561 POINT (-17.19701 14.73856) 2 Paradialaw Senegal, Toubab Dialao, toubab dialaw , prés d... NaN -17.14559,14.5963946 -17.145590 14.596395 POINT (-17.14559 14.59639) 3 Hotel Club Royal Saly Route Saly-Ngaparou M'bour SN, 23002, Senegal +221 33 939 52 30 -17.0221955,14.4432474 -17.022195 14.443247 POINT (-17.02220 14.44325) 4 Le Grand Hôtel de Thiés Thies, Senegal +221 33 951 00 98 -16.9280314,14.7950366 -16.928031 14.795037 POINT (-16.92803 14.79504) We have a new column 'geometry', we see the POINT representing the vector point location for every hotel location address.
Index(['hotel_names', 'addresses', 'phone_number', 'coordinates', 'x', 'y', 'geometry'], dtype='object')let's plot our spatial data
<AxesSubplot:>We have almost the same figure as sen_hotels.plot() above
None<Geographic 2D CRS: EPSG:4201> Name: Adindan Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: Eritrea; Ethiopia; South Sudan; Sudan. - bounds: (21.82, 3.4, 47.99, 22.24) Datum: Adindan - Ellipsoid: Clarke 1880 (RGS) - Prime Meridian: Greenwichd:\programm files\python 3 8 6\lib\site-packages\contextily\tile.py:632: UserWarning: The inferred zoom level of 28 is not valid for the current tile provider (valid zooms: 0 - 18). warnings.warn(msg)Let's Explore more on our spatial data
<class 'geopandas.geodataframe.GeoDataFrame'> <class 'geopandas.geoseries.GeoSeries'>We have two main regions in Senegal in this data, Dakar and Thies, let's extract those two regions from OpenstreetMap and place our hotels on them
Explore with GeoPanDas
geometry place_name bbox_north bbox_south bbox_east bbox_west 0 POINT (-17.44794 14.69342) Dakar, 2084, Senegal 14.853425 14.533425 -17.287938 -17.607938 <Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich<networkx.classes.multidigraph.MultiDiGraph at 0x4c36f700>(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)Now, we have 'Dakar' and 'Thies' shapes, let's save it
We can play around and find the network of 'Thies region'
(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)Let's also extract geopandas data for 'Thies' as we early did with 'Dakar'
Make this Notebook Trusted to load map: File -> Trust Notebook<AxesSubplot:>
pop_est continent name iso_a3 gdp_md_est geometry 0 920938 Oceania Fiji FJI 8374.0 MULTIPOLYGON (((180.00000 -16.06713, 180.00000... 1 53950935 Africa Tanzania TZA 150600.0 POLYGON ((33.90371 -0.95000, 34.07262 -1.05982... 2 603253 Africa W. Sahara ESH 906.5 POLYGON ((-8.66559 27.65643, -8.66512 27.58948... 3 35623680 North America Canada CAN 1674000.0 MULTIPOLYGON (((-122.84000 49.00000, -122.9742... 4 326625791 North America United States of America USA 18560000.0 MULTIPOLYGON (((-122.84000 49.00000, -120.0000... Index(['pop_est', 'continent', 'name', 'iso_a3', 'gdp_md_est', 'geometry'], dtype='object')
pop_est continent name iso_a3 gdp_md_est geometry 51 14668522 Africa Senegal SEN 39720.0 POLYGON ((-16.71373 13.59496, -17.12611 14.373... <AxesSubplot:>We visualized the hotels in Senegal, they are all in the western side of the country. Both 'Thies' and 'Dakar' are located in the same area.
Conclusion
Analyzing spatial data is a fun, we can create our own real data from google earth, tranform the data to spatial data, visualize it and get a nice result.
Comments
Post a Comment