09-MLP-grid-search.py

This script performs the grid search for hyperparameter optimization and saves loss statistics and other model information from each model that will be used to identify the “best” model. Individual model results will be saved on the local filesystem as JSON files in a folder named according to the arguments to this script: output/mlp/<buffer-distance>m_cloudthr<cloud-thr>_<mask-method1><mask-method2>_masking_<n-folds>folds_seed<seed>/. File names for each model output are shortened hashes based on the hyperparameters for that given model.

usage: 09-MLP-grid-search.py [-h] [--n-workers N_WORKERS]
                             [--cloud-thr CLOUD_THR]
                             [--buffer-distance BUFFER_DISTANCE]
                             [--mask-method1 {lulc,scl}]
                             [--mask-method2 {mndwi,ndvi,""}]
                             [--n-folds N_FOLDS] [--seed SEED]

Named Arguments

--n-workers

How many workers to use for fitting models in parallel (recommended not to go over number of physical cores). If “nan”, the default, the the number of workers will be set to the number of physical cores (recommended).

Default: nan

--cloud-thr

Percent of cloud cover acceptable in the Sentinel tile corresponding to the sample. If this threshold is surpassed, no Sentinel image chip will be collected for the sample.

Default: 80

--buffer-distance

Square search radius (in meters) to use for reflectance data aggregation. This determines the size of the image chip that will be extracted and processed.

Default: 500

--mask-method1

Possible choices: lulc, scl

Which data to use for masking (removing) non-water in order to calculate aggreated reflectance values for only water pixels? Choose (“scl”) to water pixels as identified based on the Scene Classification Layer that accompanies the Snetinel tile, or (“lulc”) to use Impact Observatory’s Land-Use/Land-Cover layer to identify water, and the Scene Classification Layer to remove clouds. Using “lulc” is strongly recommended.

Default: “lulc”

--mask-method2

Possible choices: mndwi, ndvi, “”

Which additional normalized index to use, if any, to update the mask to remove errors of ommission (pixels classified as water that shouldn’t be) prior to calculated aggregated reflectance? If “ndvi”, then only pixels with an NDVI value less than 0.25 will be retained. If “mndwi” (recommended) then only pixels with an MNDWI value greater than 0 will be retained. Of “”, then no secondary mask is used.

Default: “mndwi”

--n-folds

The number of folds to create for the training / validation set when fitting models using k-fold cross-validation.

Default: 5

--seed

The seed (an integer) used to initialize the pseudorandom number generator for use in partitioning data.

Default: 123