Commit 7412cafd authored by Brandon's avatar Brandon

LINT

parent 47260210
......@@ -11,13 +11,9 @@ markdown_lint:
image: node
stage: lint
script:
- npm install markdownlint-cli
- ./node_modules/markdownlint-cli/markdownlint.js docs
- npm install -g markdownlint-cli
- markdownlint docs
allow_failure: true
cache:
key: markdownlint
paths:
- node_modules/
mkdocs_build:
stage: build
......
......@@ -3,7 +3,7 @@
"first-heading-h1": true,
"heading-style": { "style": "atx" },
"no-trailing-spaces": true,
"line-length": { "line_length": 80, "code_blocks": false, "tables": false},
"line-length": { "line_length": 130, "code_blocks": false, "tables": false},
"no-missing-space-atx": true,
"no-multiple-space-atx": true
}
# Accounts
## Obtaining an account
In order to use the NERSC facilities you need:
......
# BerkeleyGW
The BerkeleyGW Package is a set of computer codes that calculates the quasiparticle
properties and the optical responses of a large variety of materials from bulk periodic
crystals to nanostructures such as slabs, wires and molecules. The package takes as
input the mean-field results from various electronic structure codes such as the
Kohn-Sham DFT eigenvalues and eigenvectors computed with Quantum ESPRESSO, PARATEC,
PARSEC, Octopus, Abinit, Siesta etc.
The BerkeleyGW Package is a set of computer codes that calculates the
quasiparticle properties and the optical responses of a large variety
of materials from bulk periodic crystals to nanostructures such as
slabs, wires and molecules. The package takes as input the mean-field
results from various electronic structure codes such as the Kohn-Sham
DFT eigenvalues and eigenvectors computed with Quantum ESPRESSO,
PARATEC, PARSEC, Octopus, Abinit, Siesta etc.
NERSC provides modules for [BerkeleyGW](https://www.berkeleygw.org).
......
......@@ -2,20 +2,28 @@
## ![Mathematica logo](./images/mathematica-spikey.png)<br/>Description and Overview
Mathematica is a fully integrated environment for technical computing. It performs symbolic manipulation of equations, integrals, differential equations, and most other mathematical expressions. Numeric results can be evaluated as well.
Mathematica is a fully integrated environment for technical
computing. It performs symbolic manipulation of equations, integrals,
differential equations, and most other mathematical
expressions. Numeric results can be evaluated as well.
## How to Use Mathematica
### Running in the Notebook Interface
To use the graphical interface to Mathematica, you will need to connect to a NERSC machine with X11.
To use the graphical interface to Mathematica, you will need to
connect to a NERSC machine with X11.
```shell
mylaptop$ ssh -X edison.nersc.gov
```
We highly recommend the use of [NX](https://www.nersc.gov/users/connecting-to-nersc/using-nx/) to invoke an X11 session and run the Mathematica notebook interface.
Next, you will need to load the Mathematica module. To use the default version of Mathematica, use
We highly recommend the use
of [NX](https://www.nersc.gov/users/connecting-to-nersc/using-nx/) to
invoke an X11 session and run the Mathematica notebook interface.
Next, you will need to load the Mathematica module. To use the default
version of Mathematica, use
```shell
nersc$ module load mathematica
......@@ -27,19 +35,37 @@ To start up Mathematica once the module is loaded, you can simply type
nersc$ mathematica
```
You should see the Mathematica logo ("Spikey") appear, followed by the application interface.
You should see the Mathematica logo ("Spikey") appear, followed by the
application interface.
### Mathematica Licensing
NERSC's Mathematica licenses are for a limited number of seats. Once you are finished using Mathematica, please be sure you disconnect your X11 session so that your seat becomes available to the next user. If you use NX, remember to exit Mathematica within your NX session before closing, as NX will otherwise keep the session alive, preventing other users from launching Mathematica.
NERSC's Mathematica licenses are for a limited number of seats. Once
you are finished using Mathematica, please be sure you disconnect your
X11 session so that your seat becomes available to the next user. If
you use NX, remember to exit Mathematica within your NX session before
closing, as NX will otherwise keep the session alive, preventing other
users from launching Mathematica.
When starting up a new Mathematica session, check to be sure that you don't already have an instance of Mathematica running. The most common issue with Mathematica licensing at NERSC is that another user is inadvertently using multiple seats.
When starting up a new Mathematica session, check to be sure that you
don't already have an instance of Mathematica running. The most common
issue with Mathematica licensing at NERSC is that another user is
inadvertently using multiple seats.
### Running Mathematica Scripts
To run Mathematica scripts, you can do so in interactive mode or in batch mode. Both approaches require the use of the job scheduler. On both Cori and Edison, this is [SLURM](https://www.nersc.gov/users/computational-systems/cori/running-jobs/slurm-at-nersc-overview/).
To run Mathematica scripts, you can do so in interactive mode or in
batch mode. Both approaches require the use of the job scheduler. On
both Cori and Edison, this
is
[SLURM](https://www.nersc.gov/users/computational-systems/cori/running-jobs/slurm-at-nersc-overview/).
To run in interactive mode, use salloc to obtain access to a compute node. To avoid using multiple license seats at once (always a good thing), specify a single node and a single task per node. If you want to take advantage of parallelism in your script, you can specify more than one cpu per task. An allocation to run on four cores in the regular queue would be obtained with a command like the following:
To run in interactive mode, use salloc to obtain access to a compute
node. To avoid using multiple license seats at once (always a good
thing), specify a single node and a single task per node. If you want
to take advantage of parallelism in your script, you can specify more
than one cpu per task. An allocation to run on four cores in the
regular queue would be obtained with a command like the following:
```shell
nersc$ salloc -N 1 -n 1 -c 4 -p regular -t 00:10:00
......@@ -50,7 +76,9 @@ Running the script is then as simple as
nersc$ srun ./mymathscript.m
```
To run in batch mode, you will need to write a SLURM batch script and then use sbatch to put your job into the queue. An example batch script that makes 4 cores available to your script follows.
To run in batch mode, you will need to write a Slurm batch script and
then use sbatch to put your job into the queue. An example batch
script that makes 4 cores available to your script follows.
```shell
#! /bin/bash -l
......@@ -65,7 +93,15 @@ module load mathematica
srun ./mersenne-cori.m
```
If you want to take advantage of parallelism in Mathematica, you can use the application's built-in parallel commands. (See the Wolfram web site for more about [parallel commands in Mathematica](https://reference.wolfram.com/language/ParallelTools/tutorial/GettingStarted.html).) Following is an example script that works on Cori. With Cori, you can use up to 16 cores; with Edison, you can use up to 12 cores. Be sure the first line of your script points to the correct directory for the machine on which you're running.
If you want to take advantage of parallelism in Mathematica, you can
use the application's built-in parallel commands. (See the Wolfram web
site for more
about
[parallel commands in Mathematica](https://reference.wolfram.com/language/ParallelTools/tutorial/GettingStarted.html).)
Following is an example script that works on Cori. With Cori, you can
use up to 16 cores; with Edison, you can use up to 12 cores. Be sure
the first line of your script points to the correct directory for the
machine on which you're running.
```
#!/global/common/cori/software/mathematica/10.3.0/bin/MathematicaScript -script
......@@ -94,4 +130,3 @@ For Edison, the first line of this script would be
Extensive on-line documentation is available at the Wolfram web site.
[Mathematica Documentation](http://www.wolfram.com/products/mathematica/)
# MATLAB
![MATLAB logo](./images/matlablogo.png)<br/>
[MATLAB](https://www.mathworks.com/products/matlab.html) is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. MATLAB features a family of add-on application-specific solutions called toolboxes. Toolboxes are comprehensive collections of MATLAB functions (M-files) that extend the MATLAB environment to solve particular classes of problems. These are the toolboxes installed in NERSC MATLAB, along with the number of licenses.
![MATLAB logo](./images/matlablogo.png)
[MATLAB](https://www.mathworks.com/products/matlab.html) is a
high-performance language for technical computing. It integrates
computation, visualization, and programming in an easy-to-use
environment where problems and solutions are expressed in familiar
mathematical notation. MATLAB features a family of add-on
application-specific solutions called toolboxes. Toolboxes are
comprehensive collections of MATLAB functions (M-files) that extend
the MATLAB environment to solve particular classes of problems. These
are the toolboxes installed in NERSC MATLAB, along with the number of
licenses.
* Image Processing (2)
* Neural Networks (1)
......@@ -11,17 +21,30 @@
* Statistics (2)
* Compiler (1)
The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of the art in software for matrix computation.
The name MATLAB stands for matrix laboratory. MATLAB was originally
written to provide easy access to matrix software developed by the
LINPACK and EISPACK projects. Today, MATLAB engines incorporate the
LAPACK and BLAS libraries, embedding the state of the art in software
for matrix computation.
## How to Use MATLAB on Edison and Cori
MATLAB is available at NERSC on Edison and Cori. The number of MATLAB licenses at NERSC is not very large (currently 16), so users should not be running a MATLAB session when it is not being actively used. If you use NX, it's particularly easy to think you've released your license when you haven't, since license checkouts persist between NX sessions.
MATLAB is available at NERSC on Edison and Cori. The number of MATLAB
licenses at NERSC is not very large (currently 16), so users should
not be running a MATLAB session when it is not being actively used. If
you use NX, it's particularly easy to think you've released your
license when you haven't, since license checkouts persist between NX
sessions.
MATLAB can be run interactively or in batch mode. MATLAB can run on a compute node with exclusive access to the full usable memory (about 62 GB on Edison nodes and 120 GB on Cori Phase 1 nodes) by submitting a batch job to the regular queue.
MATLAB can be run interactively or in batch mode. MATLAB can run on a
compute node with exclusive access to the full usable memory (about 62
GB on Edison nodes and 120 GB on Cori Phase 1 nodes) by submitting a
batch job to the regular queue.
### Running Interactively
To run MATLAB interactively on Edison, connect with ssh -X or via NX, and do the following:
To run MATLAB interactively on Edison, connect with ssh -X or via NX,
and do the following:
```
salloc -p regular -N 1 -c 24 -t 30:00
......@@ -38,7 +61,8 @@ matlab
### Batch Jobs
To run one instance of MATLAB non-intearctively through a batch job, you can use the following job script on Edison:
To run one instance of MATLAB non-intearctively through a batch job,
you can use the following job script on Edison:
```
#!/bin/bash -l
......@@ -65,7 +89,9 @@ pp = parpool('edison_cluster', 24)
#### Running MATLAB Parallel Commands
The following program illustrates how MATLAB parallel commands can be used on Cori. NERSC's license currently limits parallel use to a single node and the number of threads that one node supports.
The following program illustrates how MATLAB parallel commands can be
used on Cori. NERSC's license currently limits parallel use to a
single node and the number of threads that one node supports.
```
% hello-world.m
......@@ -80,7 +106,9 @@ end
For loop-level parallelism, MATLAB provides the parfor construct.
In order to make MATLAB work in parallel on Edison, you need to make sure it will access the MATLAB libraries before the standard cray libraries. You can do this by running this command for bash:
In order to make MATLAB work in parallel on Edison, you need to make
sure it will access the MATLAB libraries before the standard cray
libraries. You can do this by running this command for bash:
```
export LD_PRELOAD="/global/common/sw/cray/cnl6/ivybridge/matlab/R2016b/bin/glnxa64/libmpichnem.so /global/common/sw/cray/cnl6/ivybridge/matlab/R2016b/bin/glnxa64/libmpichsock.so"
......@@ -92,13 +120,24 @@ or for csh / tcsh:
setenv LD_PRELOAD "/global/common/sw/cray/cnl6/ivybridge/matlab/R2016b/bin/glnxa64/libmpichnem.so /global/common/sw/cray/cnl6/ivybridge/matlab/R2016b/bin/glnxa64/libmpichsock.so"
```
However, if you have LD_PRELOAD set on a login node, this will interfere with the regular Edison and Cori environment. It's important that you only set the LD_PRELOAD variable in your parallel MATLAB job script (or interactive job).
However, if you have `LD_PRELOAD` set on a login node, this will
interfere with the regular Edison and Cori environment. It's important
that you only set the `LD_PRELOAD` variable in your parallel MATLAB job
script (or interactive job).
#### Parallelism with the MATLAB Compiler
Another way to run MATLAB in parallel is to run multiple instances of a compiled MATLAB program. By compiling, you create a stand-alone application that doesn't need to obtain a separate license from the NERSC license server to run. The MathWorks web site has details about using the [MATLAB Compiler](https://www.mathworks.com/products/compiler.html).
Another way to run MATLAB in parallel is to run multiple instances of
a compiled MATLAB program. By compiling, you create a stand-alone
application that doesn't need to obtain a separate license from the
NERSC license server to run. The MathWorks web site has details about
using
the
[MATLAB Compiler](https://www.mathworks.com/products/compiler.html).
## Documentation
Extensive [on-line documentation](http://www.mathworks.com/) is available. You may subscribe to the MATLAB Digest, a monthly e-mail newsletter by sending e-mail to subscribe@mathworks.com.
Extensive [on-line documentation](http://www.mathworks.com/) is
available. You may subscribe to the MATLAB Digest, a monthly e-mail
newsletter by sending e-mail to subscribe@mathworks.com.
# PARATEC
PARATEC is a parallel, plane-wave basis, density functional theory (DFT) code developed at Berkeley. PARATEC is one of the DFT packages supported by the BerkeleyGW code. PARATEC supports many traditional DFT features and exchange-correlation functionals. PARATEC uses norm-conserving pseudopotentials that can be generated with the FHI pseudopotential program.
PARATEC is a parallel, plane-wave basis, density functional theory
(DFT) code developed at Berkeley. PARATEC is one of the DFT packages
supported by the BerkeleyGW code. PARATEC supports many traditional
DFT features and exchange-correlation functionals. PARATEC uses
norm-conserving pseudopotentials that can be generated with the FHI
pseudopotential program.
NERSC provides modules for PARATEC.
......
# VASP
VASP is a package for performing ab initio quantum-mechanical
molecular dynamics (MD) using pseudopotentials and a plane wave basis
set. The approach implemented in VASP is based on a finite-temperature
......
## Introduction
# Introduction to MFA
NERSC provides users with the ability to use __Multi-Factor
Authentication__ (__MFA__) for logging into NERSC resources. MFA
......@@ -469,10 +469,8 @@ status of MFA on NERSC systems and services.
</tbody>
</table>
#### MFA Available in September, 2018
[comment]: <> (| Authentication | Host |)
[comment]: <> (|:---:|:---:|)
[comment]: <> (| Shibboleth | Online Help Desk ([https://help.nersc.gov/](https://help.nersc.gov)) |)
......@@ -501,7 +499,8 @@ status of MFA on NERSC systems and services.
<td style="text-align: center;"><a title="https://my.nersc.gov" href="https://my.nersc.gov">My NERSC</a></td>
</tr>
<tr>
<td style="text-align: center;"><a href="https://portal.nersc.gov">Science gateways</a> accepting NIM passwords not displaying the&nbsp; NERSC (Shibboleth)&nbsp; login banner</td>
<td style="text-align: center;">
<a href="https://portal.nersc.gov">Science gateways</a> accepting NIM passwords not displaying the&nbsp; NERSC (Shibboleth)&nbsp; login banner</td>
</tr>
<tr>
<td style="text-align: center; vertical-align: middle;" rowspan="3">Others</td>
......@@ -516,7 +515,6 @@ status of MFA on NERSC systems and services.
</tbody>
</table>
#### MFA Coming Soon
[comment]: <> (| Authentication | Host |)
......
......@@ -55,49 +55,104 @@ at [nxcloud01](https://nxcloud01.nersc.gov).
If you are having trouble connecting to NX, please can try these steps:
1. Log into [NIM](https://nim.nersc.gov) to clear any login failures. Access to NX uses your NERSC user name and password. If your password is mistyped five times, NERSC will lock you out of their systems. Logging into NIM will automatically clear these failures. This will also let you know if your password is expired (which would prevent you from accessing NX, among many other things).
2. Re-download the [NX configuration file](https://portal.nersc.gov/project/mpccc/nx/Connection_to_NERSC_NX_service.nxs.gz). NX will often "update" the configuration file to try to save your settings and sometimes this file can get some bad settings. You must have the new NX player AND the new configuration file to connect to the NX service.
3. Try to ssh directly to the NX server. You can do this with the command "ssh <nersc_username>@nxcloud01.nersc.gov" and your NERSC user name and password. If your access to the NX server is blocked by a local firewall or something else and you can't connect via ssh, you will also not be able to connect with the NX client.
If you've tried these steps and still cannot connect, please open a help ticket. In this ticket, please include the following information:
1. The type of system you're trying to connect from (i.e. Mac, Windows, Linux, etc.).
1. Log into [NIM](https://nim.nersc.gov) to clear any login
failures. Access to NX uses your NERSC user name and password. If
your password is mistyped five times, NERSC will lock you out of
their systems. Logging into NIM will automatically clear these
failures. This will also let you know if your password is expired
(which would prevent you from accessing NX, among many other
things).
2. Re-download
the
[NX configuration file](https://portal.nersc.gov/project/mpccc/nx/Connection_to_NERSC_NX_service.nxs.gz). NX
will often "update" the configuration file to try to save your
settings and sometimes this file can get some bad settings. You
must have the new NX player AND the new configuration file to
connect to the NX service.
3. Try to ssh directly to the NX server. You can do this with the
command "ssh <nersc_username>@nxcloud01.nersc.gov" and your NERSC
user name and password. If your access to the NX server is blocked
by a local firewall or something else and you can't connect via
ssh, you will also not be able to connect with the NX client.
If you've tried these steps and still cannot connect, please open a
help ticket. In this ticket, please include the following information:
1. The type of system you're trying to connect from (i.e. Mac,
Windows, Linux, etc.).
1. A screen capture of the error you get (if possible).
1. A tarball of the NX logs. You can find instructions for how to bundle your NX logs on the [NoMachine website](https://www.nomachine.com/DT07M00098).
1. A tarball of the NX logs. You can find instructions for how to
bundle your NX logs on
the [NoMachine website](https://www.nomachine.com/DT07M00098).
### Configuring the NX Environment
#### Font size is too big or too small
To change the font size inside your terminal: In the menu of Konsole Application, choose "Settings"->"Manage Profiles", then click "Edit Profile...", now you can change the font size in the "Appearance" tab, after changing, click "OK" until you are back to the terminal. Now every new terminal window you open will have the new font size.
To change the font size inside your terminal: In the menu of Konsole
Application, choose "Settings"->"Manage Profiles", then click "Edit
Profile...", now you can change the font size in the "Appearance" tab,
after changing, click "OK" until you are back to the terminal. Now
every new terminal window you open will have the new font size.
To change the font size of your menu bars/window titles: Right click on an empty desktop then choose "Konsole", inside the Konsole, type "kcmshell4 fonts". Then you have a dialog box to change your font size.
To change the font size of your menu bars/window titles: Right click
on an empty desktop then choose "Konsole", inside the Konsole, type
"kcmshell4 fonts". Then you have a dialog box to change your font
size.
#### Resizing the NX screen
With the latest NX Player (5.0.63 or later), the most efficient way is to enable "Remote Resize" in the NX menu:
With the latest NX Player (5.0.63 or later), the most efficient way is
to enable "Remote Resize" in the NX menu:
1. Connect to NX
1. From the desktop, bring up the NX player menu with a hotkey: Mac: Ctrl+Option+0, Windows: Ctrl+Alt+0, Linux: Ctrl+Alt+0
1. Choose the "Display" submenu, then toggle the "Remote Resize" button. You can also choose "Change Settings" to manually change the resolution.
1. From the desktop, bring up the NX player menu with a hotkey: Mac:
Ctrl+Option+0, Windows: Ctrl+Alt+0, Linux: Ctrl+Alt+0
1. Choose the "Display" submenu, then toggle the "Remote Resize"
button. You can also choose "Change Settings" to manually change
the resolution.
#### Emacs complains about missing fonts and shows white blocks
This is due to a problem with font server. Please use the following command instead: `emacs -font 7x14`
This is due to a problem with font server. Please use the following
command instead: `emacs -font 7x14`
### Keypairs and NX
The NX server acts as a gateway to all other NERSC systems. Users access other NERSC systems via SSH with their NERSC user name and password. The global home directories are not mounted on the NX servers, so if you want to use SSH keys on NX, you will need to generate a separate keypair.
The NX server acts as a gateway to all other NERSC systems. Users
access other NERSC systems via SSH with their NERSC user name and
password. The global home directories are not mounted on the NX
servers, so if you want to use SSH keys on NX, you will need to
generate a separate keypair.
#### Generating a Keypair
SSH Agent requires a keypair to function. The first time you click an item on the "NERSC Systems" menu the keypair is created and installed. You need to provide a password to encrypt your private key. This password can be different from your NIM password. You can generate a keypair by selecting the "(Re)generate Key Pair" menu item. If you have an existing keypair, this option will overwrite it. Once you've generated a keypair, you will need to [upload the public key to NIM](https://www.nersc.gov/users/connecting-to-nersc/connecting-with-ssh/#toc-anchor-2). This keypair will be good for 12 hours. You'll need to refresh it if it expires. You can do this by selecting the "Renew NX SSH Keypair" from the menu at the lower left hand side.
SSH Agent requires a keypair to function. The first time you click an
item on the "NERSC Systems" menu the keypair is created and
installed. You need to provide a password to encrypt your private
key. This password can be different from your NIM password. You can
generate a keypair by selecting the "(Re)generate Key Pair" menu
item. If you have an existing keypair, this option will overwrite
it. Once you've generated a keypair, you will need
to [upload the public key to NIM](https://www.nersc.gov/users/connecting-to-nersc/connecting-with-ssh/#toc-anchor-2).
This keypair will be good for 12 hours. You'll need to refresh it if it
expires. You can do this by selecting the "Renew NX SSH Keypair" from
the menu at the lower left hand side.
### Suspending or Terminating a NX Session
When you close the NX window (e.g., by clicking the "cross" button) a dialog box will appear providing the choice of either suspending or terminating the session.
When you close the NX window (e.g., by clicking the "cross" button) a
dialog box will appear providing the choice of either suspending or
terminating the session.
*Suspending* the session will preserve most running applications inside
the session intact and allow you to reconnect to the session at a
later time.
Suspending the session will preserve most running applications inside the session intact and allow you to reconnect to the session at a later time.
Terminating the session will kill all the running applications inside the session and all unsaved work will be lost.
*Terminating* the session will kill all the running applications inside
the session and all unsaved work will be lost.
If you lose your connection to the NX server (e.g., if your internet connection is lost) NX will automatically suspend the session allowing you to reconnect to the same session while keeping the running applications intact
If you lose your connection to the NX server (e.g., if your internet
connection is lost) NX will automatically suspend the session allowing
you to reconnect to the same session while keeping the running
applications intact
# SSH
All NERSC computers (except HPSS) are reached using either the Secure
Shell (SSH) communication and encryption protocol (version 2) or by
Grid tools that use trusted certificates.
## SSH
SSH (Secure Shell) is an encryted network protocol used to log into
computers over an unsecured network. On UNIX/LINUX/BSD type sytems,
SSH is also the name of a suite of software applications for
......@@ -22,12 +22,12 @@ connections.
edison$
```
### Passwordless logins and transfers
## Passwordless logins and transfers
!!! warning
All public keys must be stored in [NIM](https://nim.nersc.gov).
### Key fingerprints
## Key fingerprints
* Cori
```
......@@ -44,11 +44,11 @@ connections.
1024 3d:28:24:53:66:de:30:9e:eb:25:3b:03:b0:24:1c:77
```
### Host Keys
## Host Keys
These are the entries in `~/.ssh/known_hosts`.
#### Cori
### Cori
```
cori.nersc.gov ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCvoau+F7fGIHuvcDZZSG7dD2J7hgo3RupUL6Jaw978mb
......@@ -58,7 +58,7 @@ c0NoEs9IzyK2N4ywExwljpMs7vKwasz8qyjHB2aYaj6cHjV2ShCp+aevPdp1jfBtIgJUMkjMEa
+0K4zWM0aDzZEaj7vIlKpUCDAdQf/DsPoj808KOKLw0+Bs0qamX+D7+aXsPVG/jfBY5wSCgjlhqn
```
#### Edison
### Edison
```
edison.nersc.gov ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDpzjkAkaxZS7dCRQeGCDxcdJd
......
## OSTP/Office of Science Data Management Requirements
# OSTP/Office of Science Data Management Requirements
Project Principal Investigators are responsible for meeting OSTP
(Office of Science and Technology Policy) and DOE Office of Science
......
......@@ -63,7 +63,7 @@ nersc$ module load totalview
With most of the versions available on the systems, you will be able
to launch the debugger with a totalview command followed by the name
of the executable to debug, as you normally did before NERSC switched
to SLURM for batch scheduling.:
to Slurm for batch scheduling.:
```shell
nersc$ totalview srun -a -n numTasks ./testTV_ex
......
## Backups
# Backups
!!! danger
All NERSC users should back up important files to HPSS on
......
# Cori SCRATCH
Cori has one scratch file system named `/global/cscratch1` with 30 PB
disk space and >700 GB/sec IO bandwidth. Cori scratch is a Lustre
filesystem designed for high performance temporary storage of large
......
# Edison SCRATCH
Edison has three local scratch file systems named /scratch1,
/scratch2, and /scratch3. Users are assigned to either /scratch1 or
/scratch2 in a round-robin fashion, so a user will be able to use one
......
## Summary
# Global common
The global common file system is a global file system available on all
NERSC computational systems. It offers a performant platform to
......
## Summary
# Global HOME
Home directories provide a convenient means for a user to have
access to files such as dotfiles, source files, input files,
......
# Project filesystem
The project file system is a global file system available on all NERSC
computational systems. It allows sharing of data between users,
systems, and the "outside world".
......
# Filesystem Quotas
## Overview
| file system | space | inodes | purge time |
......
## Brief Overview of Unix File Permissions
# Unix File Permissions
## Brief Overview
Every file (and directory) has an owner, an associated Unix group, and a set of
permission flags that specify separate read, write, and execute permissions for
......
......@@ -2,16 +2,31 @@
## Introduction
Process affinity (or CPU pinning) means to bind MPI process to a CPU or a range of CPUs on the node. It is important to spread MPI ranks evenly onto different NUMA nodes. Thread affinity forces each process or thread to run on a specific subset of processors, to take advantage of local process state. Correct process and thread affinity is essential for getting optimal performance.
Process affinity (or CPU pinning) means to bind MPI process to a CPU
or a range of CPUs on the node. It is important to spread MPI ranks
evenly onto different NUMA nodes. Thread affinity forces each process
or thread to run on a specific subset of processors, to take advantage
of local process state. Correct process and thread affinity is
essential for getting optimal performance.
Each Haswell node contains 2 processors, there is 1 socket per processor, thus 2 sockets per node. Each processor has 16 cores, and each core has 2 hyperthreads. See Figure 1 below.
Each Haswell node contains 2 processors, there is 1 socket per
processor, thus 2 sockets per node. Each processor has 16 cores, and
each core has 2 hyperthreads. See Figure 1 below.
<a name="fig1"></a>
![haswell-layout](Affinity-haswell-layout1.png)
*Fig 1: CPUs, cores and sockets on a Haswell (Cori Phase 1) node.*
A Cori KNL node has 68 cores, 4 hyperthreads (cpus) per core and an additional level of hierarchy: each pair of cores share an L2 cache on a tile. The node is a single socket, but its tiles can be organized as 1, 2 or 4 NUMA nodes. Moreover, the MCDRAM can be used as an invisible-to-the-OS cache, or as one or more NUMA nodes. Figure 2 below illustrates a KNL node in "quad,flat" mode, in which all cpus share a single NUMA node with the DDR memory, and the MCDRAM is configured as a separate NUMA node. (Note that tile numbering here is illustrative, not accurate).
A Cori KNL node has 68 cores, 4 hyperthreads (cpus) per core and an
additional level of hierarchy: each pair of cores share an L2 cache on
a tile. The node is a single socket, but its tiles can be organized as
1, 2 or 4 NUMA nodes. Moreover, the MCDRAM can be used as an
invisible-to-the-OS cache, or as one or more NUMA nodes. Figure 2
below illustrates a KNL node in "quad,flat" mode, in which all cpus
share a single NUMA node with the DDR memory, and the MCDRAM is
configured as a separate NUMA node. (Note that tile numbering here is
illustrative, not accurate).
<a name="fig2"></a>
![knl-core-places-quadflat](knl-cores-places-quadflat.png)
......@@ -21,41 +36,79 @@ A Cori KNL node has 68 cores, 4 hyperthreads (cpus) per core and an additional l
## Recommendations
1) Use the "-n" flag for srun. Set the value to total number of MPI tasks for the job.
2) Use the "-c" flag for srun. Set the value as "number of of logical cores (CPUs) per MPI task" for MPI and hybrid MPI/OpenMP jobs. The "-c" flag is optional for fully packed pure MPI jobs.
On Haswell, there are a total of 32 physical cores (each with 2 hyperthreads, making 64 logical cpus total), so the value of "-c" should be set to 64/#MPI_per_node. For example, to use 16 MPI tasks per node, the "-c" value should be set to 64/16=4. If the#MPI_per_node is not a divisor of 64, the "-c" value should be set to floor (32/#MPI_per_node)*2. For example, to run with 12 MPI tasks per node, the "-c" value should be set to floor (32/12)*2 = 4.
On KNL, there are a total of 68 physical cores (with 4 hyperthreads, it is 272 logical cpus total), so the value of "-c" should be set to 68*4/#MPI_per_node. For example, to use 34 MPI tasks per node, the "-c" value should be set to 68*4/34=8. If #MPI_per_node is not a divisor of 68, the "-c" value should be set to floor (68 / #MPI_per_node)*4. For example, to run with 8 MPI tasks per node, the "-c" value should be set to floor (68/8)*4 = 32, or just think as using a total of 64 physical cores only, and set -c value as 64*4/8=32.
3) If #MPI tasks per node is not a divisor of 64 on Haswell (meaning the node is not fully packed), need to add an srun flag "--cpu_bind=cores". Add "--cpu_bind=threads" instead if #MPI_per_node > 32. In most cases for KNL, when use only 64 cores out of 68 cores, --cpu_bind is needed.
4) Set OMP_NUM_THREADS environment variable to number of OpenMP threads per MPI task.
5) Recommend to set run time environment variable for hybrid MPI/OpenMP jobs: OMP_PROC_BIND (mostly set to "true" or "spread") and OMP_PLACES (mostly set to "threads" or "cores"). These are useful for fine tuning thread affinity.
6) Recommend to use the [Slurm bcast option](/jobs/best-practices/) for large jobs to copy executables to the compute nodes before jobs starting. See details here.
7) Recommend to use the [core specialization](/jobs/best-practices/) feature to isloate system overhead to specific cores.
1. Use the "-n" flag for srun. Set the value to total number of MPI
tasks for the job.
1. Use the "-c" flag for srun. Set the value as "number of of logical
cores (CPUs) per MPI task" for MPI and hybrid MPI/OpenMP jobs. The
"-c" flag is optional for fully packed pure MPI jobs.
On Haswell, there are a total of 32 physical cores (each with 2
hyperthreads, making 64 logical cpus total), so the value of "-c"
should be set to 64/#MPI_per_node. For example, to use 16 MPI tasks
per node, the "-c" value should be set to 64/16=4. If
the#MPI_per_node is not a divisor of 64, the "-c" value should be
set to floor (32/#MPI_per_node)*2. For example, to run with 12 MPI
tasks per node, the "-c" value should be set to floor (32/12)*2 =
4.
On KNL, there are a total of 68 physical cores (with 4
hyperthreads, it is 272 logical cpus total), so the value of "-c"
should be set to 68*4/#MPI_per_node. For example, to use 34 MPI
tasks per node, the "-c" value should be set to 68*4/34=8. If
#MPI_per_node is not a divisor of 68, the "-c" value should be set
to floor (68 /#MPI_per_node)*4. For example, to run with 8 MPI
tasks per node, the "-c" value should be set to floor (68/8)*4 =
32, or just think as using a total of 64 physical cores only, and
set -c value as 64*4/8=32.
1. If #MPI tasks per node is not a divisor of 64 on Haswell (meaning
the node is not fully packed), need to add an srun flag
"--cpu_bind=cores". Add "--cpu_bind=threads" instead if
#MPI_per_node > 32. In most cases for KNL, when use only 64 cores
out of 68 cores, --cpu_bind is needed.
1. Set `OMP_NUM_THREADS` environment variable to number of OpenMP
threads per MPI task.
1. Set run time environment variable for hybrid MPI/OpenMP jobs:
`OMP_PROC_BIND` (mostly set to "true" or "spread") and `OMP_PLACES`
(mostly set to "threads" or "cores"). These are useful for fine
tuning thread affinity.
1. Use the [Slurm bcast option](/jobs/best-practices/) for large jobs
to copy executables to the compute nodes before jobs starting. See
details here.
1. Use the [core specialization](/jobs/best-practices/) feature to
isloate system overhead to specific cores.
## Job Script Generator
An interactive [Job Script Generator](https://my.nersc.gov/script_generator.php) is available at MyNERSC to provide some guidance on getting optimal process and thread binding on Edison, Cori Haswell, and Cori KNL.
An
interactive
[Job Script Generator](https://my.nersc.gov/script_generator.php) is
available at MyNERSC to provide some guidance on getting optimal
process and thread binding on Edison, Cori Haswell, and Cori KNL.
## Methods to Check Process and Thread Affinity
Pre-built binaries from a small test code (xthi.c) with pure MPI or hybrid MPI/OpenMP can be used to check affinity. Binaries are in users default path, and named as such: check-mpi.<compiler>.<machine> (pure MPI), or check-hybrid.<compiler>.<machine> (hybrid MPI/OpenMP), for example: check-mpi.intel.cori, check-hybrid.intel.cori, check-mpi.gnu.cori, check-hybrid.gnu.cori, etc. Run one of the small test binaries using the same choices of number of nodes, MPI tasks, and OpenMP threads as what your application will use, and check if the desired binding is obtained. The Cori binaries can be used to check for both Haswell or KNL, since binaries are compatible.
Pre-built binaries from a small test code (xthi.c) with pure MPI or
hybrid MPI/OpenMP can be used to check affinity. Binaries are in
users default path, and named as such: check-mpi.<compiler>.<machine>
(pure MPI), or check-hybrid.<compiler>.<machine> (hybrid MPI/OpenMP),
for example: check-mpi.intel.cori, check-hybrid.intel.cori,
check-mpi.gnu.cori, check-hybrid.gnu.cori, etc. Run one of the small
test binaries using the same choices of number of nodes, MPI tasks,
and OpenMP threads as what your application will use, and check if the
desired binding is obtained. The Cori binaries can be used to check
for both Haswell or KNL, since binaries are compatible.
Alternatively an srun flag "--cpu_bind=verbose" can be added to report process and thread binding.
Alternatively an srun flag `--cpu_bind=verbose` can be added to report process and thread binding.
Or you can set the following run time environment to obtain affinity information as part of the job stdout:
Or you can set the following run time environment to obtain affinity
information as part of the job stdout:
```
export KMP_AFFINITY=verbose (for Intel compiler)
export CRAY_OMP_CHECK_AFFINITY=TRUE (for CCE compiler)
```
# Best practices for jobs
## Time Limits
Due to backfill scheduling short and variable length jobs generally
......@@ -69,7 +71,14 @@ filesystem is detected.
## Running Large Jobs (over 1500 MPI tasks)
Large jobs may take a longer to start up, especially on KNL nodes. The srun option --bcast=<destination_path> is recommended for large jobs requesting over 1500 MPI tasks. By default SLURM loads the executable to the allocated compute nodes from the current working directory, this may take long time when the file system (where the executable resides) is slow. With the --bcast=/tmp/myjob, the executable will be copied to the /tmp/myjob directory. Since /tmp is part of the memory on the compute nodes, it can speed up the job startup time.
Large jobs may take a longer to start up, especially on KNL nodes. The
srun option --bcast=<destination_path> is recommended for large jobs
requesting over 1500 MPI tasks. By default Slurm loads the executable
to the allocated compute nodes from the current working directory,
this may take long time when the file system (where the executable
resides) is slow. With the --bcast=/tmp/myjob, the executable will be
copied to the /tmp/myjob directory. Since /tmp is part of the memory
on the compute nodes, it can speed up the job startup time.
```bash
% sbcast --compress=lz4 ./mycode.exe /tmp/mycode.exe # here -C is to compress first
......@@ -87,8 +96,15 @@ are in a single Aries dragonfly group.
## Core Specialization
Core specialization is a feature designed to isolate system overhead (system interrupts, etc.) to designated cores on a compute node. It is generally helpful for running on KNL, especially if the application does not plan to use all physical cores on a 68-core compute node. Set aside 2 or 4 cores for core specialization is recommended.
Core specialization is a feature designed to isolate system overhead
(system interrupts, etc.) to designated cores on a compute node. It is
generally helpful for running on KNL, especially if the application
does not plan to use all physical cores on a 68-core compute node. Set
aside 2 or 4 cores for core specialization is recommended.