Skip to content

Exploratory Data Analysis

As a disclaimer, some of this EDA is to explore our raw data and what it looks like. However, to run these notebooks, especially with using the satellite data and school points you will already need to run the Data Gathering and Feature Engineering Scripts first.

Satellite Data

The first thing we wanted to explore in our Exploratory Data Analysis was some maps of what our countries looked like and how our predictors might map onto our countries. We used Google Earth Engine to create some maps of nighttime imagery, the global human modification index and the vegetation index. For nighttime and vegetation index, we also wanted to show the change in time as we were using the rate of change as a predictor as well. Below you will find some static images of the maps we created. If you click on them, you can also find an interactive version. Click here to see the Jupyter notebook with code included for replicating the maps below.

Satellite Images on a National Level for both Brazil and Thailand:

  1. Average Radiance Band

    Here we see that the Average light comes from the big cities in the south for both countries. This predictor later plays a big role in determining internet connectivity. Click on this map to see a comparison between school points and the entire country average radiance in 2014 and in 2019.

    Brazil and Thailand Average Radiance

  2. Cloud Free Band

    This is a second band within the VIIRS Satellite nighttime images. It measures light without clouds or solar illumination. In some ways, specifically in tropical rainforests which both Brazil and Thailand have, it is a better measure of light emittance than the average radiance band. We use both as predictors in our model. Additionally, you see in the maps that the light emittance looks vastly different. Click on this map to see a comparison between school points and the entire country cloud free coverage in 2014 and in 2019.

    Brazil and Thailand Cloud Free Coverage

  3. Global Human Modification Map

    In this map, we see the level of Global Human Modification in the last few years within both Brazil and Thailand. For more information on how this dataset was compiled, please see the Data Gathering page. Click on this Brazil Map to see the country level data.

    GHM_Map

    Brazil

  4. Normalized Difference Vegetation Index

    Here we see the difference in vegetation between Brazil and Thailand. Click here to see the map for Brazil, toggle between the layers to see the entire country and just the school point areas.

    NDVI

  5. Here we also see GIFs that show the time series change of vegetation from 2000 to 2021.

    Brazil_GIF

    Thailand_GIF

Speedtest data

Here is an image of what the speed test data looks like when zoomed in on Sao Paolo in Brazil. Each square represents a tile with the red areas being areas with higher internet speed.

speedtest

Open Cell ID data

The first image shows the Open Cell ID map visualized in Brazil and the second image shows the map zoomed in on Rio de Janeiro.

speedtest speedtest

Facebook data

Here, the first image shows a homogeneous looking plot of monthly active Facebook users in Brazil, but the second image, zoomed in on Sau Paulo, shows how the city centre is a localised hot-spot of Facebook users.

facebook

facebook_2

Training Set EDA

We also did some Exploratory Data Analysis once our training dataset was created. You will not be able to run this on your own until you have run the Data Gathering and Feature Engineering scripts. Click Here for the full notebook of explanatory visualizations. Click here for the Jupyter Notebook .ipynb file.