Open
Milestone

Build USGS Landsat Collection 2 Capability

Background

  • The USGS released collection 2 in December 2020.
  • Imagery acquired from January 1 2022 will only be available as Collection 2. https://www.usgs.gov/center-news/landsat-collection-1-forward-processing-cease-end-2021
  • So we are planning for the USGS to stop providing Collection 1 in December 2021
  • Landsat 9 was successfully launched at the end of September; this task also includes preparing to ingest Landsat 9 into our systems

There are many tasks to achieve this goal. The tasks can be divided into two main areas:

  1. the science
  2. the operations

On operations, this change presents a great opportunity to review and refresh the systems (database tables, scripts, code repository, etc)

Current status

18 October 2021

Work is progressing well. NSW have installed and will soon start to test more rigourously.

Landsat 9 successful launch in late September; looking at data flowing around December.

Landsat 7 to be decommissioned.

13 July 2021

We are planning on making a 'minimal switch' to USGS Landsat 2 Collection 2 to ensure we are on track for a December 2021 switch to collection 2.

The minimal switch involves using the Collection 2, Level 1 (at-sensor radiance) as a drop-in replacement for Collection 1, Level 1.

https://gitlab.com/jrsrp/sys/usgslandsat/collection2-research/-/issues/2

Science tasks

task who details Status
Comparisons of geometric registration of Level-1 imagery, between Collection-1 and Collection-2. NF I do not anticipate any serious differences, but it would be useful to know this before doing radiometric comparisons, as being a pixel off would add some noise which we would be better off without. I will use some parts of my existing autorectify code, so I think this should be fairly straightforward. I will do a check of one cloud-free date, for every path/row in Australia, as I believe that they rectify each path/row independently. Check done. Registration errors are typically on the order of a few metres, over all scenes, i.e. negligible difference.
Comparisons of radiometric agreement for Level-1 (i.e. at-sensor radiance), between Collection-1 and Collection-2 NF This will check whether they have changed any systematic stuff like the sensor calibration. In principle, there is a time component to the calibration, so I will probably do comparisons at a few times in the whole time series for each sensor, in a few different path/rows. I will also look at including the thermal bands, but I don' think we use that for much (Fmask, ?), so not too important. If there are no serious differences, then JRSRP should be able to use the C-2 radiance in place of the C-1 radiance, without much fuss. If not, that will require more serious thought. Completed. The C-2 radiance matches C-1 very closely, quite sufficient to use interchangeably
Comparisons of Level-2 (i.e. surface reflectance) between C-2 and our existing JRSRP surface reflectance product NF The C-1 dataset did not routinely include a surface reflectance product, so this is a new comparison for us. I do not expect major differences, but I also don't expect them to be identical. I would expect minor systematic differences between them. We will need to compare a range of land pixels, in a range of angular configurations and atmospheric conditions. If it turns out there is a lot of noise in this, then we may need to break this down a bit, to check separately their atmospheric corrections and their BRDF corrections, etc., but I will see how that goes. If we are able to fit a systematic tweak to make them match, then we may consider using their surface reflectance in place of ours, but that is for a way down the track. Status = depressing. They don't seem to do BRDF at all. See below.
Comparison of Fmask cloud mask with USGS cloud mask
write up investigations NF + ? I would expect to put that onto the JRSRP Figshare site. Will need to assign someone else to write-up if Neil lacks the energy to do so
Landsat 9 preparations Includes sprectral response functions for atmos correction into 6S, filenaming, and update scripts 8.12.2021. Neil has found the response curves and will update 6S. https://landsat.gsfc.nasa.gov/satellites/landsat-9/landsat-9-instruments/oli-2-design/oli-2-relative-spectral-response/

rectification

16 June Update from Neil: no significant shift between C1 and C2.

radiance

16 June Update from Neil: early days, but look similar. However too early to make any conclusions.

23 June update from Neil: looking good. Quite optimistic of Collection 2 being a drop-in replacement for collection 1.

12 July 2021 from Neil: Proposes using Collection 2, Level 1 as a drop-in replacement for Collection 1, Level 1.

reflectance

A comparison with JRSRP surface reflectance.

23 June update from Neil: in progress.

June 30 update: It appears, from my reading of their Level-2 doco, that they are not doing any BRDF correction at all, either for sun or view angle, or for terrain illumination. This is disappointing. So, as a result, I will need to include a full assessment of the BRDF effects in their imagery, and check that we can use existing methods to correct these on the Level-2 imagery.

Operational systems

task who details
review requirements TG + In progress
explore capabilities of the m2m interface NF & SG Complete.
scope download rates and costs - what are they on athena and SDC? In progress. Need to review speeds on NSW AARNet connection. compare with costs and rates of requester-pays aws s3 bucket https://www.usgs.gov/core-science-systems/nli/landsat/landsat-commercial-cloud-data-access
review existing tables TG Complete.
update existing scripts in NSW to match updated tables RA In progress
design system NF, SG, RA, TG Complete.
implement the bits Pretty much complete, except any final details
Create gitlab repository and deployment mechanisms all In progress. See proposed repository structure below
Should we use Level 1 or level 2 products? Complete. Initial descision to move ahead with L1. Neil investigating science on L2.
Test python-fmask on C2 NF Has been reported by Xavier in Colombia that it does not work on C2, even when filenames are accounted for (https://github.com/ubarsc/python-fmask/issues/44). Xavier knows his stuff, so needs to be investigated. After checks by NF, seems it was a false alarm, and works fine.

Task. Design system

Database tables

Two tables are required:

  • landsat_list
  • usgslandsat_donotimport

landsat_list

Notes:

  • wwwname is removed from NSW table
  • only retain records for the 're' product
  • tarfile is brought across from landsat_source used exclusively by NSW
  • entityid is brought across from usgs_landsat_list (and usgs_landsat_list_c1) where it is sceneID; it matches the entityID used by the usgs in the m2m interface
  • geom is brought across from footprints, so as to consolidate all footprints in the one table
Column Type
satellite character(2)
instrument character(2)
product character(2)
date text
scene text
scene_date text
hour real
id integer
path integer
row integer
data_source text
acqdate timestamp with time zone
tarfile text
entity_id text
geom Polygon

usgslandsat_donotimport

Notes:

  • No changes from existing table
Column Type
scene_id text
reason text

Downloading

Functionality:

  • determine what to download
  • download the files
  • optionally start the importer

Scripts:

  • usgs_findnewsceneids_m2m.py
  • usgs_downloadfromAWS.py

Modules:

  • usgsm2m.py
  • usgss3.py

Importing

Functionality:

  • imports files
  • updates database tables
  • rollback on fail

Scripts:

  • qv_importusgslandsat_c2.py

Modules:

  • usgslandsatmeta.py

Expunging

Functionality:

  • remove raw files and imported stages, and downstream products
  • updates tables, including usgslandsat_donotimport if relevant

Scripts:

  • qv_expungelandsatimage.py

Modules:

  • ?

Task. Create gitlab repository and deployment mechanisms

The repository

Name: jrsrp-landsat

URL: https://gitlab.com/jrsrp/sys/usgslandsat/jrsrp-landsat

Scope

  • For managing Landsat data (download, ingestion, removal) of Landsat data onto JRSRP HPC systems.
  • And, given all our work starts from a surface reflectance baseline, we may also include JRSRP-specific radiometric scripts in the future

Respository structure

  /jrsrp_landsat
    usgsm2m.py
    usgss3.py
    __init__.py
  /bin
    qv_importusgslandsat_c2.py
    qv_expungelandsatimage.py (to be moved from rscrepo)
    usgs_findnewsceneids_m2m.py
    usgs_downloadfromAWS.py
  README.md
  CONTRIBUTING.md
  CHANGELOG.md
  requirements.txt
  setup.py

For example: from jrsrp_landsat import usgsm2m

Note: usgslandsatmeta.py is a critical component. It currently lives in rscutils, and will continue to do so in the short term. We may consider creating a module for Level-2 metadata in this repo but will make that decision later.

Project dependencies

  • aws cli major version 1 (see notes on aws-cli below)
  • rscutils
  • other...?

aws-cli

The aws-cli is available in two major versions: 1 or 2. Tony and Neil tested with V1 on SDC and Athena. It can be installed using pip. https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html

Neil tested Version 2 on his laptop. It doesn't have a pip installer, as best I can tell. However installation looks fairly straightforward. It does not require sudo to install outside of the system directories. https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html.

The docs describe breaking changes from v1 to v2. I (TG) have not investigated what these are, nor tested aws v2 for downloading the Landsat images from s3. But the tests we've done to date indicate either version will work.

Deployment / installation options

  • maintain rscinstall capability for single-file installs (current NSW file's contents):
copyto $RSC_PKGS/jrsrp-landsat/bin --executable
    qv_importusgslandsat_c2.py
    qv_expungelandsatimage.py
    usgs_findnewsceneids_m2m.py
    usgs_downloadfromAWS.py
end
copyto $RSC_PKGS/jrsrp-landsat/jrsrp_landsat --executable
    __init__.py
    usgsm2m.py
    usgss3.py
end
  • use setup.py, for example:
python setup.py build --executable "/usr/bin/env python"
python setup.py install --prefix $JRSRP_LANDSAT_ROOT --install-purelib $JRSRP_LANDSAT_ROOT/lib
  • from an internal python package repository using pip to support containers (or could use setup.py if internal package repo not available)

Module file for jrsrp-landsat

Please note the use of hypen in the name of the module and top-level directory

#%Module1.0

module-whatis "JRSRP scripts for downloading Landsat collection 2 data."

setenv JRSRP_LANDSAT_ROOT $env(RSC_PKG_ROOT)/jrsrp-landsat
prepend-path PATH $env(JRSRP_LANDSAT_ROOT)/bin
prepend-path PYTHONPATH $env(JRSRP_LANDSAT_ROOT)

Task. Review of requirements

See also the conversation under:

https://gitlab.com/jrsrp/sys/usgslandsat/collection2-research/-/issues/2

Questions:

  • What features of the existing system do we need in the new system?
  • What changes do we want to make?

Requirements of new system

  • Generate a list of missing files to download using the catalogue on the m2m interface
  • Download them from AWS S3
  • Import them
  • Process in overnight batch
  • include on-demand capability to download RT products for short periods for a specified scenes
  • We do not reacquire reprocessed products
  • Use the d-stage processing stage codes
  • In the first instance, the input stage is Collection 2, Level 1 (at-sensor radiance)

How is this different to the existing system?

All the general requirements are the same. The implementation is different:

  • remove the usgs_landsat_list_c1 table
  • appropriate changes to the landsat_list table so product ID, processing level, and collection can be obtained, e.g. for use with qv_expungelandsatimage.py
  • move both QLD and NSW onto same system, i.e. for NSW:
    • no longer retain the landsat_source table
    • use same load and expunge scripts as QLD
    • remove wwwname fields

Generate a list of missing files

For some extra background: a lot of effort was required in our existing system to simply determine the new scenes to download, which is outlined below. A part of that was keeping our local copy of the USGS's catalogue. The USGS publishes that catalogue online, accessible via the m2m interface, so we can eliminate the need for a local copy.

  • We generate a list of missing files to download, with usgs_findnewsceneids.py, by comparing our holdings (in landsat_list) to the USGS's holdings (in usgs_landsat_list_c1)
  • The information about the USGS holdings in usgs_landsat_list_c1 are managed with usgs_manageBulkMetadata.py, which:
    • downloads a number of very large xml files, one for each satellite/collection (usgs_download_xml.py)
    • updates usgs_landsat_list_c1 (usgs_update_metadb.py)
  • our holdings, in landsat_list are updated when a file is imported

We apply some additional constraints on what to download:

  • must have a processing level of L1T or L1TP
  • maximum permissible cloud cover of 89%
  • not a scene we've explicitly excluded in the usgslandsat_donotimport table (populated with qv_expungelandsatimage.py)
  • not a scene with a path/row in a given list

Task. Explore capabilities of the m2m interface and scope download rates and costs.

Summary:

  • use the m2m interface to search the USGS catalogue (and compare to our own).
  • but download via the USGS's AWS S3 bucket as it is more robust than the USGS download server, and can acquire data at a reasonable price (less the $2k p.a.)
  • download rates are variable, but no worse than current rates. We have the added option of using many more threads as S3 will scale

See:

  • https://gitlab.com/jrsrp/sys/usgslandsat/collection2-research/-/issues/2
  • https://gitlab.com/jrsrp/sys/usgslandsat/collection2-research/-/issues/1

Task. Review existing Tables

Does NSW need the wwwname column?

NO. It can be dropped.

Recommendations:

  • We need the ability to relate a qvf filename to a USGS tar file name and vice-versa. This could be achieved by adding a column to landsat_list, as proposed by Neil. The following scripts/modules change as a result:
    • update metadb_site.usgsTarname2qvfname to use landsat_list; retain databaseOnly= optional arg as it is used to construct path to raw file in qvf_site.qvladmrawsubdir
    • update metadb_site.qvfname2usgsTarname to use landsat_list; remove databaseOnly=True optional arg, as we should not have to search the filestore.
    • remove metadb_site.usgsTarname2wwwname (only used by metadb_site.usgsTarname2qvfname)
    • rewrite or remove metadb_site.findTarfile; use the new tarfilename column in the landsat_list instead; assume tar file exists if it is in this table, so no filestore search needed

See further details below.

Can we drop the landsat_footprints table?

YES. There are only three scripts that use it. These scripts are currently not used and can be changed if the landsat/ads comparison work is resurrected. The scripts are all in irsrepo and are:

  • ads_landsat_comparison.py, ads_landsat_comparison_de.py and ads_landsat_comparison_odr.py

The table landsat_footprints has records from 1986 to 2014. It was replaced by footprints.

Does NSW need the landsat_source table?

NO. The only useful functionality was the tarfile name in this table. We can replicate that by adding it to landsat_list.

Do we need usgs_landsat_list and usgs_landsat_list_c1 tables?

NO

We query the USGS holdings via the m2m interface, and use this to replace usgs_landsat_list_c1?

A grep through existing scripts in irsrepo and rscrepo shows that we don't use the other columns in this table.

Table usgs_landsat_list_c1; PRIMARY KEY (path, row, acqdate)

Column Example
path 106
row 66
acqdate 2013-12-31 12:23:24+11
sensor OLI_TIRS
process_level OLI_TIRS_L1TP
sat_id LANDSAT_8
cloud_pcnt 6
day_night DAY
sun_elevation 59.0964
sun_azimuth 121.172
scene_id LC81060662013365LGN01
product_id
product_category T1
product_generation_date 2017-04-27 10:00:00+10

Details to support the above recommendations on dropping wwwname

The following tables have the wwwname column.

Table name
landsat_source
landsat_list
landsat_footprints

The following scripts used these three tables to access the wwwname column.

Script Table used Comment Final comment
src/maint_tools/update_landsatlist.py landsat_list Checks if each record in landsat_list has a corresponding file on the filestore; i.e. it is a consistency check. There are separate records for pa, th and re products. Neil's proposal is to remove the 'pa' and 'th' records from landsat_list. We can reconstruct the wwwname on the fly. Consider only matching the 're' product on the filestore to the DB record. Updated. Now works with collection 2. However, it is not backward compatible and only updates 're' records.
src/satimport/irs_importusgslandsat.py landsat_source NSW's wrapper around qv_importusgslandsat.py, which updates the landsat_source table and wwwname columns. No impact if we drop landsat_source. Decommissioned/uninstalled.
src/satimport/irs_unloadusgslandsat.py landsat_list, landsat_source NSW's wrapper around qv_expungelandsatimage.py; which updates the landsat_source table and wwwname columns. No impact if we drop landsat_source and wwwname from landsat_list. Note: Neil has also added --withwwwname option to new importer if we want it. Decommissioned/uninstalled.
src/site_overrides/rsc_site/utils/metadb_site.py landsat_source function usgsTarname2wwwname (wwwname is read from landsat_source; only used by metadb_site.usgsTarname2qvfname). function findTarFile (can take a wwwname or qvf name as an argument; only used by metadb_site.qvfname2usgsTarname). function usgsTarname2qvfname (used by irs_unloadusgslandsat.py, qvf_site.qvladmrawsubdir, usgsname2qvf.py). function qvfname2usgsTarname (takes a qvf or wwwname as an argument; used by update_landsatsource.py, irs_unload_usgslandsat.py and qvf2usgsname.py.) Updated. Creates wwwname from the fields in landsat_list.
src/utils/qvf2usgsname.py landsat_source (indirectly via metadb_site.py) Uses metadb.qvfname2usgsTarname Updated.
src/spatialdb/check_landsat_footprints.py landsat_list This script is a check that there is a footprint for every record in landsat_list. Can be retired if we moved footprints to the landsat_list table. Decommissioned/uninstalled.
src/maint_tools/update_landsatsource.py landsat_source Can be retired if we remove landsat_source Decommissioned/uninstalled. Removed from cron. Some parts will be recycled to a script checking for match between records on landsat_list table and files on the filestore.
src/ads/ads_landsat_comparison.py landsat_footprints Owned by Sam for some research work. May not be needed. Tries to match ads image dates to nearest-date landsat image (spatial and temporal comparison). Rewrite script to use landsat_list if we move the footprints. I would think this research might look different if resurected. Recommend making notes at top of script to notify future developers of changes made to DB tables, and print a message if anyone ever tries to run it It appears that landsat_footprints is a test table to test these three scripts. It only has records up to Sept 2014. The actual table that is used by the landsat download system is "footprints" which holds footprints for ads, spot and Landsat together.
src/ads/ads_landsat_comparison_odr.py landsat_footprints Ditto. Same as above. landsat_footprint is a test table.
src/ads/ads_landsat_comparison_de.py landsat_footprints Ditto. Same as above. landsat_footprint is a test table.

Task. Should we use Level 1 or Level 2 products?

  • Collection 2, Level 1 is the input stage initially
  • More research into the Level 2 product will be conducted to see if we can switch over in future
  • are we going to add new fractional cover processing (this can be decided later)?
  • for later years: pending outcome of sen2-l2a cloud pilot, do we look to compute USGS Landsat data on cloud?

Task. Should we reacquire reprocessed products?

Our plan moving forward is to only download T1 products, except in emergency situations where we will acquire RT products for specific scenes for a short period of time.

We won't aquire the T1 product if we've already downloaded the T1. Our current practice is to download whatever tier product is available at the time of download. We do not currently reacquire the T1 equivalent. Consequently most of our archive is RT.

See this discussion: https://gitlab.com/jrsrp/sys/usgslandsat/collection2-research/-/issues/2#note_623471518

  • Work items 0
  • Merge requests 0
  • Participants 0
  • Labels 0
Loading
Loading
Loading
Loading
85% complete
85%
Start date
No start date
None
Due date
No due date
0
Work items 0
Open: 0 Closed: 0
4
Merge requests 4
Open: 0 Closed: 0 Merged: 4
0
Releases
None
Reference: jrsrp/sys%"Build USGS Landsat Collection 2 Capability"