Revision as of 17:03, 20 November 2010

General Information

Directory setup

home directory quota

There is a 10GB quota limit enforced on $HOME directory (/global/home/users/username) usage. Please keep your usage below this limit. There will NETAPP snapshots in place in this file system so we suggest you store only your source code and scripts in this area and store all your data under /clusterfs/cortex (see below).

In order to see your current quota and usage, use the following command:

 quota -s

data

For large amounts of data, please create a directory

 /clusterfs/cortex/scratch/username

and store the data inside that directory.

Connect

get a password

press the PASS WORD button on your crypto card
enter passoword
press enter
the 7 digit password is given (without the dash)

ssh to the gateway computer (hadley)

note: please don't use the gateway for computations (e.g. matlab)!

 ssh -Y neuro-calhpc.berkeley.edu (or hadley.berkeley.edu)

and use your crypto password

Setup environment

put all your customizations into your .bashrc
for login shells, .bash_profile is used, which in turn loads .bashrc

Useful commands

Start interactive session on compute node

start interactive session:

 qsub -X -I

start interactive session on particular node (nodes n0000.cortex and n0001.cortex have GPUs):

 qsub -X -I -l nodes=n0001.cortex

Perceus commands

The perceus manual is here

listing available cluster nodes:

 wwstats

list cluster usage

 wwtop

to restrict the scope of these commands to cortex cluster, add the following line to your .bashrc

 export NODES='*cortex'

module list
module avail
module help

help pages are here

Resource Manager PBS

Job Scheduler MOAB
List running jobs:

 qstat -a

List jobs of a given node:

 qstat -n 98

sample script

 #!/bin/bash
 
 #PBS -q cortex
 #PBS -l nodes=1:ppn=2:cortex
 #PBS -l walltime=01:00:00
 #PBS -o path-to-output
 #PBS -e path-to-error
 cd /global/home/users/kilian/sample_executables
 cat $PBS_NODEFILE
 mpirun -np 8 /bin/hostname
 sleep 60

submit script

 qsub scriptname

interactive session

 qsub -I -q cortex -l nodes=1:ppn=2:cortex -l walltime=00:15:00

flush STDOUT and STDERR to files in your home directory so you can tail the output of the job while it's running

 qsub -k oe scriptname

remove a queued/running job (you can get the job_id from qstat)

 qdel job_id

list nodes that your job is running on

 cat $PBS_NODEFILE

run the program on several cores

 mpirun -np 4 -mca btl ^openib sample_executables/mpi_hello

Finding out the list of occupants on each cluster node

One can find out the list of users using a particular node by ssh into the node, e.g.

 ssh n0000.cortex

After logging into the node, type

top

This is useful if you believe someone is abusing the machine and would like to send him/her a friendly reminder.

Software

Matlab

note: remember to start an interactive session before starting matlab!

In order to use matlab, you have to load the matlab environment:

 module load matlab/R2010a
         -or-
 module load matlab/R2007a

Once the matlab environment is loaded, you can start a matlab session by running

 matlab -nojvm -nodesktop

An example PBS script for running matlab code is

 #!/bin/bash
 #PBS -q cortex
 # request 1 nodes with 2 CPUs 
 #PBS -l nodes=1:ppn=2
 # reserve time on the selected cores
 #PBS -l walltime=01:00:00
 module load matlab
 matlab -nodisplay -nojvm << EOF
 test # here you should have whatever you would normally type in the Matlab prompt
 exit
 EOF

If you would like to see who is using matlab licenses, enter

 lmstat

Python

We have several Python Distributions installed: The Enthought Python Distribution (EPD), the Source Python Distribution (SPD) and Sage. The easiest way to get started is probably to use EPD (see below).

Enthought Python Distribution (EPD)

We have the Enthought Python Distribution 6.3.1 installed [EPD]. In order to use it, you have to follow the following steps:

login to the gateway server using "ssh -Y" (see above)
start an interactive session using "qsub -I -X" (see above)
load the python environment module:

 module load python/epd

start ipython:

 ipython -pylab

run the following commands inside ipython to test the setup:

 from enthought.mayavi import mlab
 mlab.test_contour3d()

CUDA

CUDA is a library to use the graphics processing units (GPU) on the graphics card for general-purpose computing. We have a separate wiki page to collect information on how to do general-purpose computing on the GPU: GPGPU. We have installed the CUDA 3.0 driver and toolkit.

In order to use CUDA, you have to load the CUDA environment:

 module load cuda

CUDA SDK (Outdated since version change to 3.0)

You can install the CUDA SDK by running

 bash /clusterfs/cortex/software/cuda-2.3/src/cudasdk_2.3_linux.run

You can compile all the code examples by running

 module load X11
 module load Mesa/7.4.4
 cd ~/NVIDIA_GPU_Computing_SDK/C
 make

The compiled examples can be found in the directory

 ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release

note: The examples using graphics with OpenGL don't seem to run on a remote X server. In order to make them work, we probably need to install something like virtualgl.

PyCuda

PyCuda 0.93 is installed as part of the Source Python Distribution (SPD). This is how you run all unit tests:

 module load python/spd
 cd /clusterfs/cortex/software/src/pycuda-0.93/test/
 nosetests

If you are having trouble installing PyCuda, please note the following:

gcc 4.1.2 related issues with boost [1]
also, gcc 4.1.2 related [2]

Usage Tips

Here are some tips on how to effectively use the cluster.

Mounting Cluster File System

Mounting the cluster file system remotely allows you to easily access files on the cluster, and allows you to use local programs to edit code or examine simulation outputs locally (very useful). I often edit the remote code using a text editor running on my local machine. This allows you to take advantage of the niceties of a native editor without having to copy code back and forth before you run a simulation on the cluster.

On linux distributions you can mount your cluster home directory locally using sshfs [3]

 sshfs hadley.berkeley.edu: <mount-dir>

On Mac and Windows machines the program ExpanDrive works well (uses Fuse under the hood): [4]

Support Requests

If you have a problem that is not covered on this page, you can send an email to our user list:

 redwood_cluster@lists.berkeley.edu

If you need additional help from the LBL group, send an email to their email list. Please always cc our email list as well.

 scs@lbl.gov

In urgent cases, you can also email Krishna Muriki (LBL User Services) directly.

@@ Line 182: / Line 182: @@
 === Enthought Python Distribution (EPD) ===
-We have the Enthought Python Distribution 5.0.0 installed [[http://www.enthought.com/products/epd.php EPD]]. In order to use it, you have to follow the following steps:
+We have the Enthought Python Distribution 6.3.1 installed [[http://www.enthought.com/products/epd.php EPD]]. In order to use it, you have to follow the following steps:
 * login to the gateway server using "ssh -Y" (see above)
@@ Line 199: / Line 199: @@
    mlab.test_contour3d()
+<!--
 === Source Python Distribution (SPD) ===
@@ Line 210: / Line 211: @@
 At the moment, we have numpy, scipy, and matplotlib installed. If you would like to have additional modules installed, let me know [[mailto:kilian@berkeley.edu kilian]]
-<!--
 === Sage ===

Cluster: Difference between revisions