Skip to main content

Introduction

These are the file system resources available in the HPDA terrabyte environment:

  • Home directory
  • Geodata DSS
  • Software DSS
  • Project-specific DSS
  • Scratch file system

The abbreviation DSS refers to the Data Science Storage provided by LRZ. DSS implements a data centric management approach meeting the demands and requirements of data intensive science. All file systems are GPFS based and reachable from the login node as well as from the worker nodes of our clusters.

HomeUser's home directory for important data like scripts or personal data
GeodataAll terrabyte earth observation data including auxiliary data are provided here
SoftwareThe terrabyte software DSS space provides additional software modules curated by the terrabyte team. Many of the terrabyte portal services are based on these modules.
ProjectProject specific storage for e.g. results and individual input data.
ScratchScratch file system. Temporary files during processing, e.g. intermediate results and milestone files.

Data backup

It is your responsibility to save important data.
Considering that we maintain file systems of several hundreds of terabytes in the DSS and $SCRATCH_DLR spaces, it is not possible or too expensive to backup all these data automatically. Although the storage units are protected by RAID mechanisms, severe incidents might still lead to data loss. In most cases however, it is the user himself who accidently deletes or overwrites files. Therefore it is within the responsibility of the user to transfer data to more safe places (e.g. $HOME) or transfer important data to external storage systems. Due to extended off-line times for dump and restore operations, we may not be able to recover data from any type of file outage/inconsistency of the scratch or DSS filesystems. A specified lifetime for a file system until the end of your project should not be mistaken as a safety insurance of data stored there.

DSS file system structure

While for both Scratch and DSS the metadata performance (i.e., performance for generating, accessing and deleting directories and files) is improved compared to previously used technologies, the capacity for metadata (e.g., number of file entries in a directory) is limited. Therefore, please do not generate extremely large numbers of very small files in these areas; instead, try to aggregate into larger files and write data into these e.g. via direct access. Violation of this rule can lead to blocking your access to the $SCRATCH_DLR or DSS area since otherwise user operation on the cluster may be obstructed.

Please also note that there exists a per-directory limit for storing i-node metadata (directory entries and file names); this limits the number of files which can be put into a single directory. As a good practice, keep the number of files in a single directory below 1000.

Check storage quota and usage

See the DSS monitoring section of our documentation on how to monitor your storage containers.

Manage storage containers

A command line tool and a graphical user interface are available to manage DSS storage containers. The usage of both tools is explained in the DSS management section. This is only relevant for users that are either Data Curators (Institute representatives and similar) or Container Managers (designated access managers of a single storage container). Everybody else can also login to the service but has no management access.

Data transfers

Data transfers of up to a few GB to/from the file systems can be performed by using scp or rsync via the login node. For large-scale data transfers, we recommend using the Globus online facilities. See the Data Transfers section for more information.