Classes

class src.fluvius.USGS_Water_DB(verbose=False)

A custom class for storing for querying the http://nrtwq.usgs.gov data portal and storing data to Pandas DataFrame format.

__init__(verbose=False)

Initializes the class to create web driver set source url.

Parameters

verbose (bool) – Sets the verbosity of the web scrapping query.

class src.fluvius.USGS_Station(site_no, instantaneous=False, verbose=False, year_range=array([2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]))

A custom class for storing USGS Station data. Specific functions collect station instantaneous and modeled discharge and suspended sediment concentration.

__init__(site_no, instantaneous=False, verbose=False, year_range=array([2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]))

Initializes the USGS_Station class based on user-provided parameters.

Parameters
  • site_no (str) – The 8 digit USGS station site number that is zero padded.

  • instantaneous (bool) – Sets data query for instantaneous recorded data only.

  • verbose (bool) – Sets the query verbosity.

  • year_range (numpy int array) – Numpy array of year range to search.

class src.fluvius.WaterData(data_source, container, buffer_distance, storage_options)

A custom class for collecting database of water quality station data.

__init__(data_source, container, buffer_distance, storage_options)

Initializes the WaterData class to open database of stored station data for data inspection and individual station data loading.

Parameters
  • data_source (str) – String denoting station source (‘usgs’, ‘usgsi’, ‘itv’, or ‘ana’).

  • container (str) – Name of Azure Blob Storage container where water station data is stored.

  • buffer_distance (int) – Buffer distance around station latitude and longitude in meters for data visualization and query.

  • storage_options (dict) – Azure Blob Storage permissions for data loading in format, {‘account_name’: ‘storage account name’,’account_key’: ‘storage account key’}.

get_available_station_list()

Searches the blob container and saves the list of directories containing field measurement data.

get_source_df()

Returns a dataframe <pandas.DataFrame> containing metadata for each station.

get_station_data(station=None)

Extracts measurement and metadata for a station. Gets all the station data if station is None.

Parameters

station (str, default=None) – The station for which to extract data. Iteratively extracts information for all stations and sets a lits of the resulting WaterStations as an attribute.

class src.fluvius.WaterStation(site_no, lat, lon, buffer, container, storage_options, data_source)

A generalized water station data class. Each WaterStation contains SSC and other measurement data, as well as metadata about the sampling location itself.

__init__(site_no, lat, lon, buffer, container, storage_options, data_source)

Initialize a WaterStation based on a site that appears in WaterData.

Parameters
  • site_no (str) – The string identifier corresponding to the site for which a WaterStation instance will be created.

  • lat (double) – The latitude of the site’s coordinates.

  • lon (double) – The longitude of the site’s coordinates.

  • buffer (int) – The buffer used for generating a square AOI that will be used to extract image chips.

  • container (str) – The storage container in fluviusdata storage account that houses raw data for the site.

  • storage_options (dict) – Azure Blob Storage permissions for data loading in format, {‘account_name’: ‘storage account name’,’account_key’: ‘storage account key’}.

  • data_source (str) – One of itv, ana, usgs, or usgsi, describing the original source of the data.

build_catalog(collection='sentinel-2-l2a')

Use pystac-client to search for Sentinel 2 L2A data.

Parameters

collection (str, default='sentinel-2-l2a') – The collection name (as it appears in the Planetary Computer data catalogue). The default value should be used.

chip_cloud_analysis(scl)

Calculate the total cloud cover percentage within an SCL image chip.

Parameters

scl (numpy array) – The numpy array containing Scene Classification Layer (SCL) data.

drop_bad_usgs_obs()

Some stations from USGS have two measurements of instantaneous computed discharge. This method drops observations for which the two measurements are not equal. Note that the method only applies to “usgs” stations. If WaterStation.data_source is not equal to “usgs”, the method does nothing, so it can be safely applied to WaterStations from any data source with minimal performance impact.

get_area_of_interest()

Create and return a polygon delineating the desired chip boundary.

get_chip_features(write_chips=False, local_save_dir='data/chips_default', mask_method1='lulc', mask_method2='')

Extract and aggregate reflectance and metadata features for each of the SSC measurements at the WaterStation.

Parameters
  • write_chips (bool, default=False) – Write chips to local storage? Useful for debugging.

  • local_save_dir (str, default="data/chips_default") – If write_chips=True, the directory to which image chips will be saved.

  • mask_method1 (str, default="lulc") – Which data to use for masking (removing) non-water in order to calculate aggreated reflectance values for only water pixels? Choose (“scl”) to water pixels as identified based on the Scene Classification Layer that accompanies the Snetinel tile, or (“lulc”) to use Impact Observatory’s Land-Use/Land-Cover layer to identify water, and the Scene Classification Layer to remove clouds. Using “lulc” is strongly recommended.

  • mask_method2 (str , default="mndwi") – Which additional normalized index to use, if any, to update the mask to remove errors of ommission (pixels classified as water that shouldn’t be) prior to calculated aggregated reflectance? If “ndvi”, then only pixels with an NDVI value less than 0.25 will be retained. If “mndwi” (recommended) then only pixels with an MNDWI value greater than 0 will be retained. Of “”, then no secondary mask is used.

get_chip_metadata(signed_url)

Extract metadata (sensing time, and mean solar and viewing azimuths and zeniths) for an image chip.

Parameters

signed_url (str) – The Planetary-Computer-signed url for the Sentinel 2 image asset.

get_cloud_filtered_image_df(cloud_thr)

Get an dataframe with asset URLs corresponding to images that have cloud cover less than cloud_thr, and set it as an attribute to the WaterStation.

Parameters

cloud_thr (float, 0-100) – The maximum cloud cover tolerance. The data frame will only include information on assets from Sentinel 2 tiles that have a total cloud cover percentage less than or equal to cloud_thr.

get_io_lulc_chip(epsg)

Extract the Impact Observatory Land-Use/Land-Cover chip for the WaterStation AOI. Returns a numpy array.

Parameters

epsg (int) – The espg to which the the IO/LULC data should be projected.

get_scl_chip(signed_url, return_meta_transform=False)

Extract the Scene Classification Layer (SCL) chip for the WaterStation AOI. Returns a numpy array.

Parameters
  • signed_url (str) – The Planetary-Computer-signed url for the Sentinel 2 SCL image asset.

  • return_meta_transform (bool) – Return image transformation metadadata? If true, then a 3-tuple is returned. The first element is the SCL chip, the second is the raster metadata, and the third is the raster transform.

get_spectral_chip(hrefs_10m, hrefs_20m, return_meta_transform=False)

Returns an image (as a numpy array) with one or more bands along the 3rd dimension from a signed url (one for each band).

Parameters
  • hrefs_10m (list) – hrefs for the 10 meter Sentinel bands.

  • hrefs_20m (list) – hrefs for the 20 meter Sentinel bands.

get_visual_chip(signed_url, return_meta_transform=False)

Extract the visual chip (RGB) for the WaterStation AOI. Returns a Pillow Image.

Parameters
  • signed_url (str) – The Planetary-Computer-signed url for the Sentinel 2 image asset.

  • return_meta_transform (bool) – Return image transformation metadadata? If true, then a 3-tuple is returned. The first element is the visual chip, the second is the raster metadata, and the third is the raster transform.

merge_image_df_with_samples(day_tolerance=8)
Match Sentinel 2 assets to in situ SSC measurements. Returns a data

frame with asset information that is matched to in situ SSC measurements and sets it as an attribute for the WaterStation.

day_tolerancefloat, default=8

The maximum allowable number of days between an SSC measurement and Sentinel 2 acquisition for the acquisition to be matched to the SSC measurement.

perform_chip_cloud_analysis(quiet=False)

Get the percentage of clouds for each chip corresponding to the WaterStation SSC measurement dates.

Parameters

quiet (bool) – Prinf informational output during processing?

class src.utils.MultipleRegression(num_features, layer_out_neurons, activation_function)

A custom class for use in PyTorch that builds an MLP model based on user-supplied parameters.

__init__(num_features, layer_out_neurons, activation_function)

Initializes the MLP based on user-provided parameters.

Parameters
  • num_features (int) – The number of features for the model (i.e. the number of neurons in the first layer)

  • layer_out_neurons (list of int) – A list of length equal to the desired number of hidden layers in the MLP, with elements corresponding to the number of neurons desired for each layer.

  • activation_function (function) – The function (from torch.nn) to use for activation layers in the MLP.