Introduction
These are the file system resources available in the HPDA terrabyte environment:
- Home directory
- Geodata DSS
- Software DSS
- Project-specific DSS
- Scratch file system
The abbreviation DSS refers to the Data Science Storage provided by LRZ. DSS implements a data centric management approach meeting the demands and requirements of data intensive science. All file systems are GPFS based and reachable from the login node as well as from the worker nodes of our clusters.
- Purpose
- Availability
- Access
- Size
- Backup
- Lifetime
Home | User's home directory for important data like scripts or personal data |
Geodata | All terrabyte earth observation data including auxiliary data are provided here |
Software | The terrabyte software DSS space provides additional software modules curated by the terrabyte team. Many of the terrabyte portal services are based on these modules. |
Project | Project specific storage for e.g. results and individual input data. |
Scratch | Scratch file system. Temporary files during processing, e.g. intermediate results and milestone files. |
Home | Read-Write access by default |
Geodata | Read access only by default |
Software | Read access only by default |
Project | Upon request - see Apply for storage. Typically Read-Write access. |
Scratch | Read-Write access by default |
Home | $HOME or /dss/dsshome1/lxc##/<userID> |
Geodata | STAC API or /dss/dsstbyfs01/pn56su/pn56su-dss-0008/... |
Software | To make the terrabyte modules available, run module use /dss/dsstbyfs01/pn56su/pn56su-dss-0020/usr/share/modules/files/ |
Project | /dss/dsstbyfs02/ <projectID>/ <containerID>/ <userID> For convenient access, you may define your own variable (e.g. WORK) in your ~/.profile or ~/.bashrc or similar. |
Scratch | $SCRATCH_DLR |
Home | 100 GB per user |
Geodata | 50 PB |
Software | x GB |
Project | Individual. In general, a few TB per project can be granted. |
Scratch | 1 PB |
Home | YES, via snapshots (see note below) |
Geodata | N/A |
Software | N/A |
Project | NO automatic backup |
Scratch | NO automatic backup |
Your home directory is backed up by a nightly file system
snapshot that will be kept for 7 days at most. In order to restore files, you can
simply copy them from /dss/dsshome1/.snapshots/YYYY-MM-DD_HHMM/lxc##/<userID>
.
Home | Alive until expiration of user ID |
Geodata | N/A |
Software | N/A |
Project | Alive until expiration of user ID / project ID |
Scratch | Sliding window file deletion (see note below). No guarantee for data integrity. |
Any files and directories on the scratch file system older than typically 30 days (the interval may be shortened if the fill-up rate becomes very high) are removed from the disk area. This deletion mechanism is invoked once a day.
Data backup
It is your responsibility to save important data.
Considering that we maintain file systems of several hundreds of terabytes in the
DSS and $SCRATCH_DLR spaces, it is not possible or too expensive to backup all these
data automatically. Although the storage units are protected by RAID mechanisms,
severe incidents might still lead to data loss. In most cases however, it is the
user himself who accidently deletes or overwrites files. Therefore it is within
the responsibility of the user to transfer data to more safe places (e.g. $HOME)
or transfer important data to external storage systems. Due to extended off-line
times for dump and restore operations, we may not be able to recover data from any
type of file outage/inconsistency of the scratch or DSS filesystems. A specified
lifetime for a file system until the end of your project should not be mistaken
as a safety insurance of data stored there.
DSS file system structure
While for both Scratch and DSS the metadata performance (i.e., performance for generating, accessing and deleting directories and files) is improved compared to previously used technologies, the capacity for metadata (e.g., number of file entries in a directory) is limited. Therefore, please do not generate extremely large numbers of very small files in these areas; instead, try to aggregate into larger files and write data into these e.g. via direct access. Violation of this rule can lead to blocking your access to the $SCRATCH_DLR or DSS area since otherwise user operation on the cluster may be obstructed.
Please also note that there exists a per-directory limit for storing i-node metadata (directory entries and file names); this limits the number of files which can be put into a single directory. As a good practice, keep the number of files in a single directory below 1000.
Check storage quota and usage
See the DSS monitoring section of our documentation on how to monitor your storage containers.
Manage storage containers
A command line tool and a graphical user interface are available to manage DSS storage containers. The usage of both tools is explained in the DSS management section. This is only relevant for users that are either Data Curators (Institute representatives and similar) or Container Managers (designated access managers of a single storage container). Everybody else can also login to the service but has no management access.
Data transfers
Data transfers of up to a few GB to/from the file systems can be performed by using scp or rsync via the login node. For large-scale data transfers, we recommend using the Globus online facilities. See the Data Transfers section for more information.