SAR data are not especially good for vegetation classification. However they have the great advantage of being independent of cloud cover. Here we investigate the use of S1 time series over a complete growth period for thematic mapping.
Since we have no ground truth data, we will use the Canada AAFC Annual Crop Inventory, which is also on the GEE archive. In particular the 2017 inventory for an area in southern Saskatchewan. This area consists of large agricultural fields, well-defined crops, and flat terrain (a big advantage for SAR measurement).
Multilook SAR image data are not normally distributed, rather they are gamma distributed. The GEE classifiers might be expected not to work so well, so we will use Tensorflow to program a more flexible neural network classifier.
First of all we grab a time series for the region of interest over the 2017 growing season (March to October):
%matplotlib inline
import ee
from auxil.eeWishart import omnibus
ee.Initialize()
poly = ee.Geometry({'geodesic': False, 'type': 'Polygon', 'coordinates': [[[-105.10328600000001, 50.24327999999998],
[-104.66649400000001, 50.24327999999998],
[-104.66649400000001, 50.46604134048255],
[-105.10328600000001, 50.46604134048255],
[-105.10328600000001, 50.24327999999998]]]})
poly = ee.Geometry(poly)
collection = ee.ImageCollection('COPERNICUS/S1_GRD') \
.filterBounds(poly) \
.filterDate(ee.Date('2017-03-01'), ee.Date('2017-11-01')) \
.filter(ee.Filter.eq('transmitterReceiverPolarisation', ['VV','VH'])) \
.filter(ee.Filter.eq('resolution_meters', 10)) \
.filter(ee.Filter.eq('instrumentMode', 'IW')) \
.filter(ee.Filter.eq('relativeOrbitNumber_start',5)) \
.filter(ee.Filter.eq('orbitProperties_pass', 'ASCENDING'))
collection = collection.sort('system:time_start')
collection.size().getInfo()
Next, we transform the collection into two multiband images with linear intensities, one pre-processed with the refined Lee filter and one not:
from auxil.eeRL import refinedLee
import math
def get_vvvh(image):
''' get 'VV' and 'VH' bands from sentinel-1 imageCollection and restore linear signal from db-values '''
return image.select('VV','VH').multiply(ee.Image.constant(math.log(10.0)/10.0)).exp()
timeseries_rl = collection \
.map(get_vvvh) \
.map(refinedLee) \
.toBands() \
.clip(poly) \
.float()
timeseries_raw = collection \
.map(get_vvvh) \
.toBands() \
.clip(poly) \
.float()
timeseries_rl.bandNames().length().getInfo()
The class lables are conveniently obtained from the GEE archive of the Canadian AAFC Annual Crop Inventory for the year 2017, and we append them as an additional band (band 39):
crop2017 = ee.ImageCollection('AAFC/ACI') \
.filter(ee.Filter.date('2017-01-01', '2017-12-01')) \
.first() \
.clip(timeseries_raw.geometry())\
.float()
labeled_timeseries_rl = ee.Image.cat(timeseries_rl,crop2017)
labeled_timeseries_raw = ee.Image.cat(timeseries_raw,crop2017)
labeled_timeseries_rl.bandNames().length().getInfo()
Now export both images to the Google drive (cloud storage would be better, but I don't have a billing account). Note that the export scale is 30m corrresponding to the AAFC/ACI resolution:
drexport1 = ee.batch.Export.image.toDrive(labeled_timeseries_rl,
description='driveExportTask',
folder = 'EarthEngineImages',
fileNamePrefix='labeled_timeseries_rl',scale=30,maxPixels=1e10)
drexport1.start()
drexport2 = ee.batch.Export.image.toDrive(labeled_timeseries_raw,
description='driveExportTask',
folder = 'EarthEngineImages',
fileNamePrefix='labeled_timeseries_raw',scale=30,maxPixels=1e10)
drexport2.start()
After downloading from the drive to a local directory, we have:
!ls -l imagery/regina
This displays three of the (filtered) VV bands and the last (label) band:
%run scripts/dispms -f imagery/regina/labeled_timeseries_rl.tif -p [21,23,25] \
-F imagery/regina/labeled_timeseries_rl.tif -E 2 -P [39,39,39]
Now read the labeled time series into two Numpy arrays, which we will use to train a Tensorflow NN classifier:
from osgeo import gdal
from osgeo.gdalconst import GA_ReadOnly, GDT_Byte
import numpy as np
gdal.AllRegister()
inDataset = gdal.Open('imagery/regina/labeled_timeseries_raw.tif',GA_ReadOnly)
cols = inDataset.RasterXSize
rows = inDataset.RasterYSize
bands = inDataset.RasterCount
labeled_timeseries_raw = np.zeros((rows*cols,bands))
for b in range(bands):
band = inDataset.GetRasterBand(b+1)
labeled_timeseries_raw[:,b]=band.ReadAsArray(0,0,cols,rows).ravel()
labeled_timeseries_raw = np.nan_to_num(labeled_timeseries_raw)
inDataset = gdal.Open('imagery/regina/labeled_timeseries_rl.tif',GA_ReadOnly)
labeled_timeseries_rl = np.zeros((rows*cols,bands))
for b in range(bands):
band = inDataset.GetRasterBand(b+1)
labeled_timeseries_rl[:,b]=band.ReadAsArray(0,0,cols,rows).ravel()
labeled_timeseries_rl = np.nan_to_num(labeled_timeseries_rl)
# for later
driver = inDataset.GetDriver()
m = labeled_timeseries_raw.shape[0]
inDataset = None
labeled_timeseries_raw.shape
The AAFC/ACI thematic maps have 68 different classes. This code generates a dictionary of class names:
classdict = {'0':'Nc'}
filepath = 'imagery/AAFC.txt'
with open(filepath) as fp:
line = fp.readline()
key = line[:3].replace('\t','')
value = line[10:].replace('\t',' ').replace('\n','')
classdict.update({key:value})
while line:
line = fp.readline()
key = line[:3].replace('\t','')
value = line[10:].replace('\t','').replace('\n','')
classdict.update({key:value})
del classdict['']
len(classdict)
Now we can see which class labels pertain to our region of interest:
classnums = np.unique(labeled_timeseries_rl[:,-1])
print(classnums)
classnames = str([classdict[str(int(cn))] for cn in classnums])
classnames
In order to train the neural network we have to renumber the labels consecutively from 0:
i=0
labels = labeled_timeseries_rl[:,-1]
for c in classnums:
labels = np.where(labels==c,i,labels)
i += 1
labels = np.array(labels,dtype=np.uint8)
n_classes = len(np.unique(labels))
print(np.unique(labels))
Write the labels as an image to disk and display them:
outDataset = driver.Create('imagery/regina/labels.tif',cols,rows,1,GDT_Byte)
outBand = outDataset.GetRasterBand(1)
outBand.WriteArray(np.reshape(labels,(rows,cols)))
outBand.FlushCache()
outDataset = None
%run scripts/dispms -f imagery/regina/labels.tif -c
Now we simulate ground truth by taking a random subset of training pixels, first for the unfiltered series:
# random subset for training
np.random.seed(27)
n = 50000
idx = np.random.permutation(m)[0:n]
# training vectors
Xs = labeled_timeseries_raw[idx,:-1]*100
# one hot encoded class labels
Ls = np.array(labels[idx],dtype=np.int)
ls = np.zeros((n,n_classes))
for i in range(n):
ls[i,Ls[i]] = 1
print(ls[0:5,:])
The module class auxil.dnn encapsulates a simple feed forward neural network using tf.keras.models.Sequential(). We use it with two hidden layers, each with 40 neurons, to classifiy the unfiltered time series:
%%time
import auxil.dnn as dnn
classifier = dnn.Dnn([40,40],n_classes,learning_rate=0.002)
classifier.train(Xs,ls,epochs=100)
The accuracy on the training and validation data is about 80%:
classifier.history()
Now we Train on the Lee filtered series:
np.random.seed(27)
n = 50000
idx = np.random.permutation(m)[0:n]
# training vectors
Xs = labeled_timeseries_rl[idx,:-1]*100
classifier = dnn.Dnn([40,40],n_classes,learning_rate=0.002)
classifier.train(Xs,ls,epochs=100)
And we have quite a substantial increase in classification accuracy (to about 86%):
classifier.history()
We'll use the filtered result to classify (predict) the entire image:
cls,probs = classifier.classify(labeled_timeseries_rl[:,0:-1]*100)
# for later display:
cls[0]=1
cls[1]=n_classes-1
probs.shape
Write the thematic map and the class probabilities images to disk:
# write the class image to disk
outDataset = driver.Create('imagery/regina/timeseries_rl_class.tif',cols,rows,1,GDT_Byte)
outBand = outDataset.GetRasterBand(1)
outBand.WriteArray(np.reshape(cls,(rows,cols)))
outBand.FlushCache()
outDataset = None
# write the class probabilities to disk
bands = probs.shape[1]
probs = np.byte(probs*255)
outDataset = driver.Create('imagery/regina/timeseries_rl_probs.tif',cols,rows,bands,GDT_Byte)
for b in range(bands):
outBand = outDataset.GetRasterBand(b+1)
outBand.WriteArray(np.reshape(probs[:,b],(rows,cols)))
outBand.FlushCache()
outDataset = None
Test the classifier with all of the training data (misclassification rate about 14.5%):
classifier.test(Xs,ls)
Compare the classified image with the AAFC/ACI thematic map:
%run scripts/dispms -f imagery/regina/timeseries_rl_class.tif -c -F imagery/regina/labels.tif -C
Looks good, but we can "pretty it up" some more with Probabilistic Label Relaxation (see Chapter 7 in my textbook):
%run scripts/plr imagery/regina/timeseries_rl_probs.tif
%run scripts/dispms -f imagery/regina/timeseries_rl_probs_plr.tif -c \
-F imagery/regina/labels.tif -C
And I guess we're done.