Commit 7644cfae authored by Brandon's avatar Brandon

Filesystems

parent a9eca5bd
# Data Management
NERSC provides its users with the means to store, manage, and share their research data products.
In addition to systems specifically tailored for data-intensive computations, we provide a variety of storage resources optimized for different phases of the data lifecycle; tools to enable users to manage, protect, and control their data; high-speed networks for intra-site and inter-site (ESnet) data transfer; gateways and portals for publishing data for broad consumption; and consulting services to help users craft efficient data management processes for their projects.
## OSTP/Office of Science Data Management Requirements
Project Principal Investigators are responsible for meeting OSTP (Office of Science and Technology Policy) and DOE Office of Science data management requirements for long-term data sharing and preservation. The OSTP has issued a memorandum on Increasing Access to the Results of Federally Funded Scientific Research (http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf) and the DOE has issued a Statement on Digital Data Management.
NERSC resources are intended for users with active allocations, and as described below, NERSC cannot guarantee long-term data access without a prior, written, service-level agreement. Please carefully consider these policies, including their limitations and restrictions, as you develop your data management plan.
NERSC provides its users with the means to store, manage, and share
their research data products.
In addition to systems specifically tailored for data-intensive
computations, we provide a variety of storage resources optimized for
different phases of the data lifecycle; tools to enable users to
manage, protect, and control their data; high-speed networks for
intra-site and inter-site (ESnet) data transfer; gateways and portals
for publishing data for broad consumption; and consulting services to
help users craft efficient data management processes for their
projects.
## Overview
| file system | space | inodes | purge time | backed up | access |
|-----------------|-------|--------|------------|-----------|-----------------|
| Project | 1 TB | 1 M | - | yes | repo |
| Global HOME | 40 GB | 1 M | - | yes | user |
| Global common | 10 GB | 1 M | - | no | repo |
| Cori SCRATCH | 20 TB | 10 M | 12 weeks | no | user |
| Edison SCRATCH | 10 TB | 5 M | 12 weeks | no | user |
| Edison SCRATCH3 | - | - | 8 weeks | no | special request |
## Quotas
!!! warning
When a quota is reached writes to that filesystem may fail.
!!! note
If your `$SCRATCH` usage exceeds your quota, you will not be
able to submit batch jobs until you reduce your usage.
### Current usage
NERSC provides a `myquota` command which displays applicable quotas
and current usage.
To see current usage for home and available scratch filesystems:
```
nersc$ myquota
```
For project and global common the full path to the directory
```
nersc$ myquota --path=/project/projectdirs/<project_name>
```
or
```
nersc$ myquota --path=/global/common/software/<project_name>
```
### Increases
If you or your project needs additional space you may request it via
the
[Disk Quota Increase Form](https://nersc.service-now.com/nav_to.do?uri=catalog_home.do).
## Backups
!!! danger
All NERSC users should back up important files to HPSS on
a regular basis. **Ultimately, it is your responsibility to
protect yourself from data loss.**
### Snapshots
Global homes and project use a *snapshot* capability to provide users
a seven-day history of their directories. Every directory and
sub-directory in global homes contains a ".snapshots" entry.
* `.snapshots` is invisble to `ls`, `ls -a`, `find` and similar
commands
* Contents are visible through `ls -F .snapshots`
* Can be browsed normally after `cd .snapshots`
* Files cannot be created, deleted or edited in snapshots
* Files can *only* be copied *out* of a snapshot
### Backup/Restore
Global homes are backed up to HPSS on a regular basis. If you require
a restoration of lost data that cannot be accomplished via the
snapshots capability, please contact NERSC Consulting with pathnames
and timestamps of the missing data. Such restore requests may take a
few days to complete.
## Purging
Files in `$SCRATCH` directories may be purged if they are older than
12 weeks (defined by last access time).
!!! warning
`$SCRATCH` directories are **not** backed up
## OSTP/Office of Science Data Management Requirements
Project Principal Investigators are responsible for meeting OSTP
(Office of Science and Technology Policy) and DOE Office of Science
data management requirements for long-term data sharing and
preservation. The OSTP has issued a memorandum on Increasing Access to
the Results of Federally Funded Scientific Research
(http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf)
and the DOE has issued a Statement on Digital Data Management.
NERSC resources are intended for users with active allocations, and as
described below, NERSC cannot guarantee long-term data access without
a prior, written, service-level agreement. Please carefully consider
these policies, including their limitations and restrictions, as you
develop your data management plan.
# Data sharing
## Unix file permissions
[Unix permissions](https://en.wikipedia.org/wiki/File_system_permissions)
## give/take
NERSC provides two commands: `give` and `take` which are useful for
sharing data between users.
To send a file or path to `<receiving_username>`:
```
nersc$ give -u <receiving_username> <file or directory>
```
To receive a file sent by `<sending_username>`:
```
nersc$ take -u <sending_username> <filename>
```
To take all files from `<sending_username>`:
```
nersc$ take -a -u <sending_username>
```
To see what files `<sending_username>` has sent to you:
```
nersc$ take -u <sending_username>
```
For a full list of options pass the `--help` flag.
## project directories
The [project](/filesystems/project.md) filesystem allows sharing of
data within a project.
## Science Gateways
* [Science gateways](/services/science-gateways.md)
The High Performance Storage System (HPSS) is a modern, flexible,
performance-oriented mass storage system. It has been used at NERSC
for archival storage since 1998. HPSS is intended for long term
storage of data that is not frequently accessed.
HPSS is Hierarchical Storage Management (HSM) software developed by a
collaboration of DOE labs, of which NERSC is a participant, and
IBM. The HSM software enables all user data to be ingested onto high
performance disk arrays and automatically migrated to a very large
enterprise tape subsystem for long-term retention. The disk cache in
HPSS is designed to retain many days worth of new data and the tape
subsystem is designed to provide the most cost-effective long-term
scalable data storage available.
NERSC's HPSS system is named archive.nersc.gov and can be accessed
through a variety of clients such as hsi, htar, ftp, pftp, and grid
clients.
Some characteristics of the NERSC HPSS system (January 2018)
* Data stored in archive system: 120 PB, >220 million files
* Growth Rate: 1 PB/month
* Current Maximum capacity: 240 PB
* Buffer (disk) cache: >2 PB
* Average transfer rate: 100 MB/sec
* Peak measured transfer rate: 1 GB/sec
## Backups
!!! danger
All NERSC users should back up important files to HPSS on
a regular basis. **Ultimately, it is your responsibility to
protect yourself from data loss.**
### Snapshots
Global homes and project use a *snapshot* capability to provide users
a seven-day history of their directories. Every directory and
sub-directory in global homes contains a ".snapshots" entry.
* `.snapshots` is invisble to `ls`, `ls -a`, `find` and similar
commands
* Contents are visible through `ls -F .snapshots`
* Can be browsed normally after `cd .snapshots`
* Files cannot be created, deleted or edited in snapshots
* Files can *only* be copied *out* of a snapshot
### Backup/Restore
Global homes are backed up to HPSS on a regular basis. If you require
a restoration of lost data that cannot be accomplished via the
snapshots capability, please contact NERSC Consulting with pathnames
and timestamps of the missing data. Such restore requests may take a
few days to complete.
## Purging
Files in `$SCRATCH` directories may be purged if they are older than
12 weeks (defined by last access time).
!!! warning
`$SCRATCH` directories are **not** backed up
The 1.8 PB NERSC Burst Buffer is based on
Cray [DataWarp](http://www.cray.com/products/storage/datawarp) that
uses flash or SSD (solid-state drive) technology to significantly
increase the I/O performance on Cori for all file sizes and all access
patterns that sits within the High Speed Network (HSN) on
Cori. Accessible only from compute nodes, the Burst Buffer provides
per-job (or short-term) storage for I/O intensive codes.
The peak bandwidth performance is over 1.7 TB/s wtih each Burst Buffer
node contributing up to 6.5 GB/s. The number of Burst Buffer nodes
depends on the granularity and size of the Burst Buffer
allocation. Performance is also dependent on access pattern, transfer
size and access method (e.g. MPI I/O, shared files).
* [Examples](/jobs/examples.md)
Cori has one scratch file system named `/global/cscratch1` with 30 PB
disk space and >700 GB/sec IO bandwidth. Cori scratch is a Lustre
filesystem designed for high performance temporary storage of large
files. It contains 10000+ disks and 248 I/O servers (OSSs/OSTs).
Edison has three local scratch file systems named /scratch1,
/scratch2, and /scratch3. Users are assigned to either /scratch1 or
/scratch2 in a round-robin fashion, so a user will be able to use one
or the other but not both. The third file system is reserved for users
who need large IO bandwidth, and the access is granted
by
[request](http://www.nersc.gov/users/computational-systems/edison/file-storage-and-i-o/edison-scratch3-directory-request-form/).
| Filesystem | Total disk space | Bandwidth |
|------------|:----------------:|:---------:|
| /scratch1 | 2.1 PB | 48 GB/s |
| /scratch2 | 2.1 PB | 48 GB/s |
| /scratch3 | 3.2 PB | 72 GB/s |
## Summary
The global common file system is a global file system available on all
NERSC computational systems. It offers a performant platform to
install software stacks and compile code. Directories are provided by
default to every MPP project. Additional global common directories can
be provided upon request.
| space quota | inode quota | purge time | backups |
|-------------|-------------|------------|---------|
| 40 GB | 1 M | none | yes |
## Usage
Global common directories are created in
`/global/common/software`. The name of a "default" project directory
is the same as its associated MPP repository. There is also a Unix
group with the same name; all members of the repository are also
members of the group. Access to the global common directory is
controlled by membership in this group. Because this directory is
shared across all systems, you may want to install your software
stacks into separate subdirectories depending on the system or the
processing architecture. For some general programs you can use the
same installs across all systems, but for best performance, we
recommend separate installs for each system and architecture (e.g. for
edison vs. for Cori KNL). Since it's mounted read-only on the compute
nodes, software installs should be done on the login nodes.
## Performance
The global common system is optimized for software installation. It
has a smaller block size and is mounted read-only on the
computes. This allows us to turn on client-side caching which
dramatically increases the read time of shared libraries across many
nodes.
## Backup/Restore
!!! warning
No managed backups of global common directories are done
by NERSC.
## Summary
Home directories provide a convenient means for a user to have
access to files such as dotfiles, source files, input files,
configuration files regardless of the platform.
| space quota | inode quota | purge time | backups |
|-------------|-------------|------------|---------|
| 40 GB | 1 M | none | yes |
## Usage
Refer to your home directory using the environment variable `$HOME`
whenever possible. The absolute path may change, but the value of
`$HOME` will always be correct.
## Quotas
!!! warning
Quota increases in global homes are approved only in
*extremely* unusual circumstances.
## Performance
Performance of global homes is optimized for small files and is
suitable for compiling and linking executables. Global home
directories are not intended for large, streaming I/O. **User
applications that depend on high-bandwidth for streaming large files
should not be run in your `$HOME` directory.**
## Backups
All NERSC users should backup important files on a regular
basis. Ultimately, it is the user's responsibility to prevent data
loss. However, NERSC provides mechanisms to support user's in
protecting against data loss.
### Snapshots
A *snapshot* capability is used to provide users a seven-day history
of their home directories. Every directory and sub-directory in
`$HOME` contains a `.snapshots` entry.
* `.snapshots` is invisble to `ls`, `ls -a`, `find` and similar commands
* Contents are visible through `ls -F .snapshots`
* Can be browsed normally after `cd .snapshots`
* Files cannot be created, deleted or edited in snapshots
* Files can *only* be copied *out* of a snapshot
### Archive
Global homes are backed up to [HPSS](archive.md) monthly.
If the snapshot capability does not meet your need
contact [NERSC Consulting](/help/index.md) with pathnames and
timestamps of the missing data.
!!! note
Restore requests may take several days to complete.
# Overview
# Filesystem overview
## Summary
NERSC file systems can be divided into two categories: local and
global. Local file systems are only accessible on a single platform
and provide the best performance; global file systems are accessible
on multiple platforms, simplifying data sharing between platforms.
File systems are configured for different purposes. On each machine
you have access to at least three different file systems with
different levels of performance, permanence and available space.
File systems are configured for different purposes. Each machine has
access to at least three different file systems with different levels
of performance, permanence and available space.
| file system | space | inodes | purge time | snapshots | backup | access |
|-----------------|-------|--------|------------|-----------|--------|-----------------|
| project | 1 TB | 1 M | - | yes | no | repository |
| home | 40 GB | 1 M | - | yes | yes | user |
| common | 10 GB | 1 M | - | no | no | repository |
| Cori scratch | 20 TB | 10 M | 12 weeks | no | no | user |
| Edison scratch | 10 TB | 5 M | 12 weeks | no | no | user |
| Edison scratch3 | - | - | 8 weeks | no | no | special request |
## Global storage
......@@ -18,7 +29,7 @@ scripts that you want to keep. This file system is not tuned for high
performance for parallel jobs. Referenced by the environment variable
`$HOME`.
### [Global Common](global-common.md)
### [Common](global-common.md)
A performant platform to install software stacks and compile
code. Mounted read-only on compute nodes.
......@@ -28,7 +39,7 @@ code. Mounted read-only on compute nodes.
Large, permanent, medium-performance file system. Project directories
are intended for sharing data within a group of researchers.
### Archive (HPSS)
### [Archive](archive.md) (HPSS)
A high capacity tape archive intended for long term storage of
inactive and important data. Accessible from all systems at
......@@ -43,14 +54,18 @@ storage of data that is not frequently accessed.
### Scratch
[Edison](../edison/index.md) and [Cori](../cori/index.md) each have
dedicated, large, local, parallel scratch file systems based on
Lustre. The scratch file systems are intended for temporary uses such
as storage of checkpoints or application input and output.
[Edison](/systems/edison/index.md) and [Cori](/systems/cori/index.md)
each have dedicated, large, local, parallel scratch file systems based
on Lustre. The scratch file systems are intended for temporary uses
such as storage of checkpoints or application input and output.
* [Cori scratch](/filesystems/cori-scratch.md)
* [Edison scratch](/filesystems/edison-scratch.md)
### Burst Buffer
### [Burst Buffer](/filesystems/cori-burst-buffer.md)
[Cori's](../cori/index.md) Burst Buffer provides very high performance
I/O on a per-job or short-term basis. It is particularly useful for
codes that are I/O-bound, for example, codes that produce large
checkpoint files, or that have small or random I/O reads/writes.
Cori's [Burst Buffer](/filesystems/cori-burst-buffer.md) provides very
high performance I/O on a per-job or short-term basis. It is
particularly useful for codes that are I/O-bound, for example, codes
that produce large checkpoint files, or that have small or random I/O
reads/writes.
The project file system is a global file system available on all NERSC
computational systems. It allows sharing of data between users,
systems, and the "outside world".
## Usage
Every MPP repository has an associated project directory and unix
group. Project directories are created in `/project/projectdirs`.
All members of the project have access through their membership in the
unix group.
Occasionally there are cases where the above model is too
limiting. For example:
* large projects with multiple MPP repositories
* long-term projects which outlive specific MPP repositories
In these cases, a project directory administrator may request the
creation of a "designer" project directory with a specific name. This
will result in the creation of a new Unix group with that name,
consisting solely of the project directory administrator, followed by
the creation of the project directory itself. The project directory
administrator must then use NIM to add users to the newly-created Unix
group.
!!! info
If you are a _PI_ or a _PI Proxy_, you can request a designer project
directory in NIM.
1. Search for the MPP repository name you wish this designer
project directory to be attached to.
1. Scroll to the bottom of the "Project Information" tab and you
will see a link that says "Request a custom project directory".
## Quotas
| Type | Quota |
|--------|:-----:|
| Space | 1 TB |
| inodes | 1 M |
## Performance
The system has a peak aggregate bandwidth of 130 GB/sec bandwidth for
streaming I/O.
## Snapshots
Global homes use a *snapshot* capability to provide users a seven-day
history of their global home directories. Every directory and
sub-directory in global homes contains a ".snapshots" entry.
* `.snapshots` is invisble to `ls`, `ls -a`, `find` and similar commands
* Contents are visible through `ls -F .snapshots`
* Can be browsed normally after `cd .snapshots`
* Files cannot be created, deleted or edited in snapshots
* Files can *only* be copied *out* of a snapshot
## Lifetime
Project directories will remain in existence as long as the owning
project is active. Projects typically "end" at the end of a NERSC
Allocation Year. This happens when the PI chooses not to renew the
project, or DOE chooses not to provide an allocation for a renewal
request. In either case, the following steps will occur following the
termination of the project:
1. **-365 days** - The start of the new Allocation Year and no Project
renewal
The data in the project directory will remain available on the
project file system until the start of the next Allocation
Year. Archival process begins.
1. **+0 days** - The start of the following Allocation Year
Users notified that the affected project directory will be
archived, and then removed from the file system in 90 days.
1. **+30 days**
The project directory will become read-only.
1. **+60 days**
The full pathname to the project directory will be
modified. Automated scripts will likely fail.
1. **+90 days**
User access to the directory will be terminated. The directory
will then be archived in HPSS, under ownership of the PI, and
subsequently removed from the file system.
## Overview
| file system | space | inodes | purge time |
|-----------------|-------|--------|------------|
| Project | 1 TB | 1 M | - |
| Global HOME | 40 GB | 1 M | - |
| Global common | 10 GB | 1 M | - |
| Cori SCRATCH | 20 TB | 10 M | 12 weeks |
| Edison SCRATCH | 10 TB | 5 M | 12 weeks |
| Edison SCRATCH3 | - | - | 8 weeks |
## Quotas
!!! warning
When a quota is reached writes to that filesystem may fail.
!!! note
If your `$SCRATCH` usage exceeds your quota, you will not be
able to submit batch jobs until you reduce your usage.
### Current usage
NERSC provides a `myquota` command which displays applicable quotas
and current usage.
To see current usage for home and available scratch filesystems:
```
nersc$ myquota
```
For project and global common the full path to the directory
```
nersc$ myquota --path=/project/projectdirs/<project_name>
```
or
```
nersc$ myquota --path=/global/common/software/<project_name>
```
### Increases
If you or your project needs additional space you may request it via
the
[Disk Quota Increase Form](https://nersc.service-now.com/nav_to.do?uri=catalog_home.do).
hero: docs.nersc.gov is in beta
# NERSC Technical Documentation
## Quick links/ FAQ
......
......@@ -6,17 +6,17 @@ This example uses one MPI processes per physical core.
??? example "Edison"
```bash
--8<-- "docs/jobs/examples/01-basic-mpi/edison/basic-mpi.sh"
--8<-- "docs/jobs/examples/basic-mpi/edison/basic-mpi.sh"
```
??? example "Cori Haswell"
```bash
--8<-- "docs/jobs/examples/01-basic-mpi/cori-haswell/basic-mpi.sh"
--8<-- "docs/jobs/examples/basic-mpi/cori-haswell/basic-mpi.sh"
```
??? example "Cori KNL"
```bash
--8<-- "docs/jobs/examples/01-basic-mpi/cori-knl/basic-mpi.sh"
--8<-- "docs/jobs/examples/basic-mpi/cori-knl/basic-mpi.sh"
```
## Hybrid MPI+OpenMP jobs
......@@ -32,12 +32,12 @@ physical core.
??? example "Edison"
```bash
--8<-- "docs/jobs/examples/02-hybrid-mpi-openmp/edison/hybrid-mpi-openmp.sh"
--8<-- "docs/jobs/examples/hybrid-mpi-openmp/edison/hybrid-mpi-openmp.sh"
```
??? example "Cori Haswell"
```bash
--8<-- "docs/jobs/examples/02-hybrid-mpi-openmp/cori-haswell/hybrid-mpi-openmp.sh"
--8<-- "docs/jobs/examples/hybrid-mpi-openmp/cori-haswell/hybrid-mpi-openmp.sh"
```
??? example "Cori KNL"
......
......@@ -7,29 +7,5 @@ awarded the prize in Physiology or Medicine. Cori is comprised of
## Filesystems
### Cori scratch
Cori has one scratch file system named `/global/cscratch1` with 30 PB
disk space and >700 GB/sec IO bandwidth. Cori scratch is a Lustre
filesystem designed for high performance temporary storage of large
files. It contains 10000+ disks and 248 I/O servers (OSSs/OSTs).
* [Policy](data/policy)
### Burst Buffer
The 1.8 PB NERSC Burst Buffer is based on
Cray [DataWarp](http://www.cray.com/products/storage/datawarp) that
uses flash or SSD (solid-state drive) technology to significantly
increase the I/O performance on Cori for all file sizes and all access
patterns that sits within the High Speed Network (HSN) on
Cori. Accessible only from compute nodes, the Burst Buffer provides
per-job (or short-term) storage for I/O intensive codes.
The peak bandwidth performance is over 1.7 TB/s wtih each Burst Buffer
node contributing up to 6.5 GB/s. The number of Burst Buffer nodes
depends on the granularity and size of the Burst Buffer
allocation. Performance is also dependent on access pattern, transfer
size and access method (e.g. MPI I/O, shared files).
* [Examples](jobs/examples)
* [Cori scratch](/filesystems/cori-scratch.md)
* [Burst Buffer](/filesystems/cori-burst-buffer.md)
......@@ -4,18 +4,8 @@ fast Intel Xeon processors, and 64 GB of memory per node.
## Filesystems
Edison has three local scratch file systems named /scratch1,
/scratch2, and /scratch3. Users are assigned to either /scratch1 or
/scratch2 in a round-robin fashion, so a user will be able to use one
or the other but not both. The third file system is reserved for users
who need large IO bandwidth, and the access is granted
by
[request](http://www.nersc.gov/users/computational-systems/edison/file-storage-and-i-o/edison-scratch3-directory-request-form/).
* [Edison scratch](/filesystems/edison-scratch.md)
* [Cori scratch](/filesystems/cori-scratch.md)[^1]
| Filesystem | Total disk space | Bandwidth |
|------------|:----------------:|:---------:|
| /scratch1 | 2.1 PB | 48 GB/s |
| /scratch2 | 2.1 PB | 48 GB/s |
| /scratch3 | 3.2 PB | 72 GB/s |
* [Policy](data/policy)
[^1]: Cori's scratch filesystem is also mounted on Edison login and
compute nodes
The global common file system is a global file system available on all NERSC computational systems. It offers a performant platform to install software stacks and compile code. Directories are provided by default to every MPP project. Additional global common directories can be provided upon request.
## Usage
Global common directories are created in `/global/common/software`. The name of a "default" project directory is the same as its associated MPP repository. There is also a Unix group with the same name; all members of the repository are also members of the group. Access to the global common directory is controlled by membership in this group. Because this directory is shared across all systems, you may want to install your software stacks into separate subdirectories depending on the system or the processing architecture. For some general programs you can use the same installs across all systems, but for best performance, we recommend separate installs for each system and architecture (e.g. for edison vs. for Cori KNL). Since it's mounted read-only on the compute nodes, software installs should be done on the login nodes.
## Quotas
| Type | Quota |
|--------|:-----:|
| Space | 10 GB |
| inodes | 1 M |
!!! note "Purge policy"
This filesystem is not subject to purging.
## Performance
The global common system is optimized for software installation. It has a smaller block size and is mounted read-only on the computes. This allows us to turn on client-side caching which dramatically increases the read time of shared libraries across many nodes.
## Backup/Restore
No managed backups of global common directories are done by NERSC.
!!! warning
All NERSC users should back up important files to HPSS on a regular basis. Ultimately, it is your responsibility to protect yourself from data loss.
Global home directories provide a convenient means for a user to have access to files such as dotfiles, source files, input files, configuration files regardless of the platform.
## Usage
Wherever possible, you should refer to your home directory using the environment variable `$HOME`. The absolute path to your home directory may change, but the value of `$HOME` will always be correct.
For security reasons, you should never allow "world write" access to your `$HOME` directory or your `$HOME/.ssh` directory. NERSC scans for such security weakness, and, if detected, will change the permissions on your directories.
## Quotas
| Type | Quota |
|--------|:-----:|
| Space | 40 GB |
| inodes | 1 M |
!!! warning
Quota increases in global homes are approved only in *extremely* unusual circumstances.
!!! note "Purge policy"
This filesystem is not subject to purging.
## Performance
Performance of global homes is optimized for small files. This is suitable for compiling and linking executables. Global home directories are not intended for large, streaming I/O. **User applications that depend on high-bandwidth for streaming large files should not be run in your `$HOME` directory.**
## Snapshots