Introduction
Earth Observation data from several data providers (ESA, NASA, USGS) are replicated (downloaded) for use in terrabyte. Therefore, we need to stay synchronized with the data holdings of each data provider as data are sometimes reprocessed or removed (e.g., due to reduced quality). For the data management task in terrabyte a "terrabyte Inventory" has been implemented to support the ingestion of data into terrabyte and to regularly monitor the synchronization between the external data provider and terrabyte.
Available collections
The inventory is available for the following collections:
- Sentinel: Sentinel-1 GRD and SLC, Sentinel-2 Collection 2 L2A, Sentinel-3 OLCI L1 EFR, Sentinel-5p L1 and L2
- Landsat: Landsat 8-9 (OLI-TIRS) Collection 2 L2, Landsat 7 (ETM) Collection 2 L2, Landsat 4-5 (TM) Collection 2 L2
- MODIS: 09GA, 09GQ, 10A1, 13A2, 13A3, 13Q1
- VIIRS: 09GA, 13A1, 15A2H
Inventory database
The terrabyte Inventory consists of a database with collections of all major Earth Observation datasets. The database includes all scenes of the collections that have been made available by the data provider and that is provided on terrabyte. To track the status of each scene, an "order:status" field has been introduced with the following values:
orderable
= not yet ingested into terrabytepending
= currently being ingested into terrabytesuccessful
= available on terrabytedeleted
= removed from terrabyte due to removal by the data provider
In addition, a scene can have the flag deprecated
. A deprecated scene has been either replaced by another scene or removed from the catalog.
Monthly synchronization
The following steps are executed on a monthly basis:
- Download of metadata from the data provider
- Synchronization of scenes between data provider and terrabyte
- Scenes available at data provider but missing in terrabyte inventory are insert into the terrabyte inventory with status orderable
- Scenes missing at data provider but available in terrabyte inventory are marked as
deleted
in terrabyte inventory and removed then from storage and STAC API catalog
- Report deviations between data provider and terrabyte inventory
Data providers
Metadata are downloaded from the following data providers to ensure a monthly synchronization.
Copernicus Data Space Ecosystem (CDSE)
Inventory - Copernicus CSV Catalogue: https://csv.dataspace.copernicus.eu/
The Copernicus Catalogue CSV dumps are generated every day at 3 AM UTC for the last 30 days. Every second day of a month at 3 AM UTC a full catalogue dump is generated to ensure that changes/deletion of archival products (older than 30 days) are reflected in the .csv files.
USGS Landsat
Inventory - Bulk Metadata Service: https://www.usgs.gov/landsat-missions/bulk-metadata-service
The Landsat-related metadata files contain the entire dataset. These files are updated daily and are quite large.
NASA MODIS
Inventory: No inventory service available. A consistency check is only possible with queries against the data discovery catalog.