Introduction

These are the file system resources available in the HPDA terrabyte environment:

Home directory
Geodata DSS
Software DSS
Project-specific DSS
Scratch file system

The abbreviation DSS refers to the Data Science Storage provided by LRZ. DSS implements a data centric management approach meeting the demands and requirements of data intensive science. All file systems are GPFS based and reachable from the login node as well as from the worker nodes of our clusters.


Home	User's home directory for important data like scripts or personal data
Geodata	All terrabyte earth observation data including auxiliary data are provided here
Software	The terrabyte software DSS space provides additional software modules curated by the terrabyte team. Many of the terrabyte portal services are based on these modules.
Project	Project specific storage for e.g. results and individual input data.
Scratch	Scratch file system. Temporary files during processing, e.g. intermediate results and milestone files.


Home	Read-Write access by default
Geodata	Read access only by default
Software	Read access only by default
Project	Upon request - see Apply for storage. Typically Read-Write access.
Scratch	Read-Write access by default


Home	`$HOME` or `/dss/dsshome1/lxc##/<userID>`
Geodata	STAC API or `/dss/dsstbyfs01/pn56su/pn56su-dss-0008/...`
Software	To make the terrabyte modules available, run `module use /dss/dsstbyfs01/pn56su/pn56su-dss-0020/usr/share/modules/files/`
Project	`/dss/dsstbyfs02/ <projectID>/ <containerID>/ <userID>` For convenient access, you may define your own variable (e.g. WORK) in your ~/.profile or ~/.bashrc or similar.
Scratch	`$SCRATCH_DLR`


Home	100 GB per user
Geodata	50 PB
Software	x GB
Project	Individual. In general, a few TB per project can be granted.
Scratch	1 PB


Home	YES, via snapshots (see note below)
Geodata	N/A
Software	N/A
Project	NO automatic backup
Scratch	NO automatic backup

Snapshots

Your home directory is backed up by a nightly file system snapshot that will be kept for 7 days at most. In order to restore files, you can simply copy them from /dss/dsshome1/.snapshots/YYYY-MM-DD_HHMM/lxc##/<userID>.


Home	Alive until expiration of user ID
Geodata	N/A
Software	N/A
Project	Alive until expiration of user ID / project ID
Scratch	Sliding window file deletion (see note below). No guarantee for data integrity.

Sliding window file deletion

Any files and directories on the scratch file system older than typically 30 days (the interval may be shortened if the fill-up rate becomes very high) are removed from the disk area. This deletion mechanism is invoked once a day.

Data backup

It is your responsibility to save important data.
Considering that we maintain file systems of several hundreds of terabytes in the DSS and $SCRATCH_DLR spaces, it is not possible or too expensive to backup all these data automatically. Although the storage units are protected by RAID mechanisms, severe incidents might still lead to data loss. In most cases however, it is the user himself who accidently deletes or overwrites files. Therefore it is within the responsibility of the user to transfer data to more safe places (e.g. $HOME) or transfer important data to external storage systems. Due to extended off-line times for dump and restore operations, we may not be able to recover data from any type of file outage/inconsistency of the scratch or DSS filesystems. A specified lifetime for a file system until the end of your project should not be mistaken as a safety insurance of data stored there.

DSS file system structure

While for both Scratch and DSS the metadata performance (i.e., performance for generating, accessing and deleting directories and files) is improved compared to previously used technologies, the capacity for metadata (e.g., number of file entries in a directory) is limited. Therefore, please do not generate extremely large numbers of very small files in these areas; instead, try to aggregate into larger files and write data into these e.g. via direct access. Violation of this rule can lead to blocking your access to the $SCRATCH_DLR or DSS area since otherwise user operation on the cluster may be obstructed.

Please also note that there exists a per-directory limit for storing i-node metadata (directory entries and file names); this limits the number of files which can be put into a single directory. As a good practice, keep the number of files in a single directory below 1000.

Check storage quota and usage

See the DSS monitoring section of our documentation on how to monitor your storage containers.

Manage storage containers

A command line tool and a graphical user interface are available to manage DSS storage containers. The usage of both tools is explained in the DSS management section. This is only relevant for users that are either Data Curators (Institute representatives and similar) or Container Managers (designated access managers of a single storage container). Everybody else can also login to the service but has no management access.

Data transfers

Data transfers of up to a few GB to/from the file systems can be performed by using scp or rsync via the login node. For large-scale data transfers, we recommend using the Globus online facilities. See the Data Transfers section for more information.

Data backup​

DSS file system structure​

Check storage quota and usage​

Manage storage containers​

Data transfers​

Data backup

DSS file system structure

Check storage quota and usage

Manage storage containers

Data transfers