06a-download-chips-for-qa.py

This script downloads the chips generated in bin/05-prep-qa-chip-dataset.py to the local filesytem to be used for sample QA/QC. PNGs depicting the sentinel 2 image and water mask corresponing to the associated training sample will be written to a new directory: data/qa_chips/<buffer-distance>m_cloudthr<cloud_thr>_<mask-method1><mask_method2>_masking/. One this is run, the data scientist can go through the saved PNG files and delete the images that correspond to poor-quality samples. Each PNG is named according to the unique ID that corresponds to the sample that it represents, so in subsequent steps, training samples that should be kept (i.e. are high-quality) and used in model training can be identified based on which files remain on the local filesystem. It is recommended that this script be run on a local machine as opposed to an Azure VM, as it is easier to preview and delete files on a local machine.

usage: 06a-download-chips-for-qa.py [-h] [--day-tolerance DAY_TOLERANCE]
                                    [--cloud-thr CLOUD_THR]
                                    [--buffer-distance BUFFER_DISTANCE]
                                    [--mask-method1 {lulc,scl}]
                                    [--mask-method2 {mndwi,ndvi,""}]
                                    [--composite {rgb,cir,swir}]

Named Arguments

--day-tolerance

Days of search around sample date for a matching Sentinel 2 image.

Default: 8

--cloud-thr

Percent of cloud cover acceptable in the Sentinel tile corresponding to the sample. If this threshold is surpassed, no Sentinel image chip will be collected for the sample.

Default: 80

--buffer-distance

Square search radius (in meters) to use for reflectance data aggregation. This determines the size of the image chip that will be extracted and processed.

Default: 500

--mask-method1

Possible choices: lulc, scl

Which data to use for masking (removing) non-water in order to calculate aggreated reflectance values for only water pixels? Choose (“scl”) to water pixels as identified based on the Scene Classification Layer that accompanies the Snetinel tile, or (“lulc”) to use Impact Observatory’s Land-Use/Land-Cover layer to identify water, and the Scene Classification Layer to remove clouds. Using “lulc” is strongly recommended.

Default: “lulc”

--mask-method2

Possible choices: mndwi, ndvi, “”

Which additional normalized index to use, if any, to update the mask to remove errors of ommission (pixels classified as water that shouldn’t be) prior to calculated aggregated reflectance? If “ndvi”, then only pixels with an NDVI value less than 0.25 will be retained. If “mndwi” (recommended) then only pixels with an MNDWI value greater than 0 will be retained. Of “”, then no secondary mask is used.

Default: “mndwi”

--composite

Possible choices: rgb, cir, swir

Which color composite to download (for the images used for performing QA). “rgb”, color infrared (“cir”), or short-wave infrared (“swir”)

Default: “rgb”