Skip to main content

Overview

It is becoming the new normal for earth scientists to combine global and decadal observations recorded by multiple international satellite missions with cutting edge Artificial Intelligence (AI) methods to answer the multitude of pressing questions of climate change. The terrabyte platform supports these scientists with the tools they need to let them concentrate on finding the answers.

Design requirements

The HPDA terrabyte platform relies on effective interaction between these four components, which have been driven the design of the platform:

  1. Direct data access on all compute nodes: All relevant EO data sets are available online for analysis centrally and in close proximity to the processing systems. Additional user data can also be conveniently imported. All data is accessible via a Spatial Temporal Asset Catalog (STAC) Application Programming Interface (API). Currently, global data from Sentinel, Landsat, MODIS, VIIRS, Meteosat, ENVISAT, and ERS satellites and sensors are available. Additionally, Analysis vcReady Data (ARD) of Sentinel-1 and Sentinel-2 are produced and provided for further analysis.
  2. Close interaction between archive and online cache: Since the online data storage is not unlimited in size (currently 50 Petabytes), currently unneeded data is displaced by newer data, but can be restored at any time from the DLR archive of the German Remote Sensing Data Center (currently 60 Petabytes of EO data) or other archives (e.g., ESA Copernicus Dataspace Ecosystem, USGS, NASA). A dedicated 10 Gigabit/s connection between the LRZ in Garching near Munich and Oberpfaffenhofen has been activated for this purpose to link both storage systems. Together, terrabyte’s online storage and DLR’s Satellite data archive provide more than 100 Petabyte of relevant Earth Observation data for use in different applications.
  3. Ease of use: A comprehensive service package has been put together for DLR researchers so that they can use terrabyte conveniently and effectively. Higher-level services are provided to simplify the usage of the infrastructure: e.g., JupyterLab (“bring your own environment”) and QGIS in the browser via the terrabyte Portal, Charliecloud and Apptainer as Docker alternative, SLURM as workload manager, STAC metadata API to discover and filter available data curated by terrabyte and also users (“bring your own data”), and data cube analyses based on Open Data Cube, xarray, and Dask.
  4. Hybrid computing resources: terrabyte connects traditional High-Performance Computing (HPC) and Kubernetes-based cloud services. terrabyte combines 44.000 virtual CPU cores and 188 NVIDIA GPUs computing resources as an HPC cluster close to the Earth Observation data storage. In addition, web services can be launched in a scalable Kubernetes cluster providing central services as well as user-specific “application as a service” environments.

terrabyte is DLR's specific response to the immensely growing volume of Earth observation data and the need to generate important information on societal challenges and global change. Together with international partners, DLR is already working on methods and systems to provide access to these ever-increasing volumes of data in an effective and resource-efficient manner.