In this notebook you will learn more about the MODIS products in the Data Cube and how to use the Snow Cover and indices like 10-days NDSI.
This notebook will require a number of external Python modules to run, which will be imported at the top of the notebook as his convention.
import datacube
import matplotlib.pyplot as plt
import numpy as np
import geopandas
import rasterstats
import pandas as pd
import xarray as xr
As we have been showed in other notebooks, we need to initialise a Data Cube in order to have access to the Kyrgyzstan Data Cube (KDC), and this can be done as follows:
dc = datacube.Datacube()
Now let's see which MODIS products are in the Data Cube using the command below (already showed in a previous training session).
dc.list_products()
As you can see there are many products in the Data Cube, let's see the details of one of the MODIS product. For example with the line below you can see which kind of data are in the product modis_nbar
.
dc.list_measurements().loc['modis_nbar']
The results will tell you the data inside the product, the type of data, the nodata value, and the name with the aliases. Which means that you can load the data using either the measurement name or any of the aliases.
dc.list_measurements().loc['MOD10A1']
While with the following comand, you can check the description of a product. In this case 34 is the index of the product MOD10A1
found above from dc.list_products()
dc.list_products().loc[34,'description']
For MODIS the data have been downloaded from NASA, therefore, for more details, you can check the documentation online.
query = {'time': ['2020-01-01', '2020-01-01'],
'measurements': ['snow_cover'],
'output_crs': 'EPSG:4326',
'resolution': [-0.0041666733029590754,0.0041666733029590754],
'dask_chunks': {'time': -1, 'latitude': 782,'longitude': 1744}
}
# Loading the data
ds = dc.load(product='MOD10A1', **query)
print(ds)
# Select DataSet in time, in this case the fist satellite passage
ds = ds.isel(time=0)
print(ds)
Once we have selected a Dataset, we can find how many cloud pixels are in our image. Rembember that the value for the cloudy pixel that we found from https://nsidc.org/data/MOD10A1/versions/6 was 250
(ds.snow_cover.values==250).sum()
You can find the percentage of cloudy pixel in our raster. The .size
function will give you the total number of pixels in our snow_cover
raster
(ds.snow_cover.values==250).sum()/ds.snow_cover.size
This means that 45% of our image is covered by clouds
Now we are computing how many snow pixels and nodata pixels are in our raster
# Set NDSI limit to 0.4
ndsi_lim = 40
# Find pixels that have 0.4 <= NDSI <= 1 (snow pixels)
condition_snow_mask = (ds.snow_cover >= 40) & (ds.snow_cover <= 100)
# Assign the value of 150 to all these pixels
snow_mask = xr.where(condition_snow_mask, 150 , ds.snow_cover)
# Percentage of snow pixels
(snow_mask.values==150).sum()/ds.snow_cover.size
We can load the shapefile of the raions using geopandas (see training session about geopandas)
raion_gdf = geopandas.read_file("data/kgz_admbnda_adm2_moes_20181119.shp", encoding='utf-8')
The following command are used to plot the raster data (ds.snow_cover
) within the geopandas dataframe raion_gdf
fig, ax = plt.subplots(figsize=(25,10))
ds.snow_cover.plot(ax=ax, cmap='Blues_r')
raion_gdf.plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2)
If you are interested in knowing only where the snow is, it is possible to use the snow_mask
computed before. We can assign a value of 1 to all the pixels with snow, and 0 to the remaining ones.
snow_mask = xr.where(condition_snow_mask, 1 , 0)
fig, ax = plt.subplots(figsize=(25,10))
snow_mask.plot(ax=ax, cmap='Blues_r')
raion_gdf.plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2)
If we want to do some statistical analysis with the raster data combined with our shapefile, we can use the module rasterstats
. Click here for the documentation.
This module will allow us to operate zonal statistics. For example:
# Convert all the snow pixels to 150
snow_mask = xr.where(condition_snow_mask, 150 , ds.snow_cover)
# Convert all the no_snow pixels to 0 leaving the other numbers the same
snow_mask = xr.where(snow_mask < 40, 0 , snow_mask)
# Zonal statistic from rasterstats
results = rasterstats.zonal_stats(raion_gdf, snow_mask.values, affine=ds.snow_cover.affine,
nodata=255, stats='count', categorical=True)
We copy all the results into a pandas DataFrame:
df = pd.DataFrame(results)
df = df.fillna(0)
df
Now we have the count of the pixels for different classes (numbers in the header) and the total number of pixels a specific Raion (count
). Then we add the data that we want into the geopandas DataFrame:
# Percentage of snow pixels
raion_gdf['snow_percentage'] = df[150.0]/df['count']
# Number of snow pixels
raion_gdf['n_snowpix'] = df[150.0]
raion_gdf
We can now plot the results in a nice looking map!
Percentage of snow by Raion:
# set up the plot
fig, ax = plt.subplots(1, 1)
# set the size of the plot
fig.set_size_inches(18,10)
# plot with more attributes
raion_gdf.plot(column='snow_percentage',
ax=ax,
cmap='Blues',
vmin=0,
vmax=1,
edgecolor='black',
legend=True,
legend_kwds={'label': "Percentage of Snow by Raion",
'orientation': "horizontal"})
Number of pixels that contains snow by Raion
# set up the plot
fig, ax = plt.subplots(1, 1)
# set the size of the plot
fig.set_size_inches(18,10)
# plot with more attributes
raion_gdf.plot(column='n_snowpix',
ax=ax,
cmap='Blues',
edgecolor='black',
legend=True,
legend_kwds={'label': "Number of Snow pixels by Raion",
'orientation': "horizontal"})
Further analyses can be done using the snow cover.
In the following section we will look into snow persistence and snow percentage by pixel.
# Loading again the Raion shapefiles
raion_gdf = geopandas.read_file("data/kgz_admbnda_adm2_moes_20181119.shp", encoding='utf-8')
raion_utm_gdf = raion_gdf.to_crs({'init': 'epsg:32643'})
Let's get all the MODIS data for December 2019
start_date = '2019-12-01'
end_date = '2019-12-31'
query = {'time': ['2019-12-01', '2019-12-31'],
'measurements': ['snow_cover'],
'output_crs': 'EPSG:32643',
'resolution': [-500, 500],
'dask_chunks': {'time': 1}
}
# Loading and merging data for MODIS
terra = dc.load(product='MOD10A1', **query)
aqua = dc.load(product='MYD10A1', **query)
data = xr.concat([aqua,terra], dim='time')
crs = data.crs
print(data)
Now we need to apply a NDSI limit to find our snow pixels
# Find pixels that have 0.4 <= NDSI <= 1 (snow pixels)
condition = (data.snow_cover >= 40) & (data.snow_cover <= 100)
snow_cover = xr.where(condition, 1, 0)
# Find pixels that could not be classified (clouds included)
nodata = (data.snow_cover >= 250) | (data.snow_cover == 200) | (data.snow_cover == 201)
snow_cover = xr.where(nodata, np.nan, snow_cover)
# All the rasters will have 1 for snow, 0 for no_snow and NaN for no_data)
print(snow_cover)
# Dimensions of the Data Array
snow_cover.shape
I group all my data by day, so I have a snow mask (snow/nosnow) for each day.
# Find the max values of snow_cover for each day. It will include all the snow pixels.
snow_cover = snow_cover.groupby('time.day').max()
print(snow_cover)
# Dimensions of the Data Array
snow_cover.shape
(The first dimension makes sense because there are 31 days in December.)
Now we can compute the snow persistence. This will tell us the maximum number of consecutive days where we had snow in a pixels.
# Saving all the nodata values
nodata = snow_cover.isnull()
# Keep memory of pixel that always nodata night cloud
mask_nan = ( nodata.sum(dim='day')/snow_cover.day.size ) == 1
# Initialise the xarrays
snow_persistence = xr.where(snow_cover.isel(day=0) != -1, 0, 1)
snow_persistence_max = snow_persistence.copy()
# For loop to find the snow persistence
for i in range(snow_cover.day.size):
# For nodata night and cloud conditions use the existing values
# otherwise calculate the new value
snow_persistence = xr.where(nodata.isel(day=i),
snow_persistence,
(snow_persistence + snow_cover.isel(day=i)) * snow_cover.isel(day=i))
snow_persistence_max = xr.ufuncs.fmax(snow_persistence, snow_persistence_max)
# Keep the NaN values
snow_persistence_max = xr.where(mask_nan, np.nan , snow_persistence_max).compute()
# Plot the max snow persistence observed in the choosen time period
snow_persistence_max.name = 'snow_persistence_max'
fig, ax = plt.subplots(figsize=(25,10))
snow_persistence_max.plot.imshow(ax=ax)
raion_utm_gdf.plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2)
We can also compute the snow percentage, which will tell us the number of days classified as snow over the number of observations (in this case, days)
First we compute the not valid pixels, where the pixels could not be classified as either snow or no_snow. For example, when the pixel has been identified as cloud, night or missing data.
# To compute the percentage that a pixel has been a snow pixel, we need to remove the case
# when it has been detected as a nodata, night or cloud.
snow_count_valid = xr.where(nodata, 1 , 0)
snow_count_sum = snow_count_valid.sum(axis=0)
print(snow_count_sum)
# Plot the number of occurrencies per pixel, at the edge should be equal to snow_cover.time.size
snow_count_sum.name = 'number of occurrencies for nodata'
fig, ax = plt.subplots(figsize=(25,10))
snow_count_sum.plot.imshow(ax=ax)
raion_utm_gdf.plot(ax=ax, facecolor='none', edgecolor='black', lw=2)
Now we can compute the snow percentage
# Compute the percentage that a pixel has been a snow pixel
snow_percentage = snow_cover.sum(dim='day')/(snow_cover.day.size - snow_count_sum) *100.
# Plot the snow percentage in the chosen time period
snow_percentage.name = 'snow_percentage'
fig, ax = plt.subplots(figsize=(25,10))
snow_percentage.plot.imshow(ax=ax, vmax = 100., vmin=0.)
raion_utm_gdf.plot(ax=ax, facecolor='none', edgecolor='black', lw=2)