In this notebook accessing and manipulating Data Cube data will be explained. This will give you a basic understanding of generic functions within the Data Cube, before more useful use cases are explored in the next notebook. Hopefully at that point, you will be able to write your own functions to use the Data Cube to help with analysis of your own.
In a Python script, the datacube module is imported in the same way as other modules are, using the import statement. The database which organises and stores all of the data is set up before any data is put into the Data Cube, so the act of importing the "datacube" module will connect to the datacube.
Once imported, we need to actually initialise the Data Cube, and this is done using the command below:
import datacube
dc = datacube.Datacube()
The variable dc
is the initialised Data Cube instance, and this is a class which many methods can be run from, allowing us to load data from the Data Cube into Python in order to do useful things with it. Additionally, we can see what data is in the Data Cube, or which satellites/sources of data are present. We will look at some of these functions to understand the data structures of the Data Cube, so that when it comes to load data into the Python editor, we are more prepared to use it.
The first thing to look at is the "products" contained within the Data Cube. A product can relate to a type of data contained within the Data Cube. For example, a product could be Sentinel-2 Level 2 data (atmospherically corrected data), or it could be decadal indices data, such as NDVI, NDSI or VHI.
The following command lists all the product types in the Data Cube, as well as some information about these products. Run the box below to do this:
dc.list_products()
As can be seen above, this lists all the different data products in the Data Cube. Each of these data products will have one or more data bands included in the Data Cube as well, and we can check to see what bands or measurements, each product type will have using the command below:
dc.list_measurements()
This lists a lot of useful information including the names of each band, which is useful for loading the data into the Data Cube. Additional information, such as the data type of the bands, and their no data values are also listed.
Knowing all this, it is now possible to load some data from the Data Cube into Python. This is done using the dc.load()
function. There are a number of things which must be specified in order to load data in as well as some things which are useful to specify.
Required arguments are the product type (for example s2_10m
for Sentinel-2 10m data), the output_crs of the data (what projection it is called in, the standard projection in the Data Cube is WGS84 UTM 43N, or EPSG:32643, but many different types of projections can be used) and the resolution of the data, in terms of size of x & y pixel. If you are loading data in a UTM projection, this is in metres, so to load in Sentinel data at 10m resolution, the command would be resolution=[10,10]
.
Additional information which can be supplied is the latitude and longitude of the data area you are interested in loading in, as well as the time period you are interested in. While not required to enter this information when loading data, it is highly recommended to do this, as there is a lot of data in the Data Cube, so trying to load large amounts of data this way will either cause your program to run very slowly or more likely, crash completely.
Sometimes, you might want to load a lot of data in for processing, and there are ways to do this which will be explained in later notebooks. It has to be done very carefully, being mindful on managing the memory and processing requirements of data at all times.
However, below is an example for loading in just a small amount of data into the Data Cube:
ds = dc.load(product='s2_10m',x=[69.0,69.5],y=[39.0,39.5],time=['2019-06-01','2019-06-10'],measurements=['red','green','blue'],output_crs='EPSG:32643',resolution=[10,10])
print(ds)
This is a very long and difficult to read way of calling in the data into Python. There is a way to do in a more readable way, which is to define many of the arguments before loading data in something called a dictionary. The entire contents of this dictionary can be called as arguments into the dc.load()
function using two asterisks **
before the dictionary name to load them all. This can be done in the following way:
query = {'x': [69.0,69.5],
'y': [39.0,39.5],
'time': ['2018-06-01','2018-06-10'],
'measurements': ['red','green','blue'],
'output_crs': 'EPSG:32643',
'resolution': [100,100],
}
ds = dc.load(product='l8_30m', **query)
print(ds)
This is much easier to read than filling in all arguments separately. It also makes it easier to change the area you are interested in getting data for or the data bands if you are loading in data from multiple products, as you only need to change values in the dictionary rather than in every dc.load()
function.