Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What is the CRUG Cluster?

One of the goals of Carleton’s Computational Research Users Group is to create a shared, powerful, and expandable computation cluster that is usable by as many of our users as possible and funded by grants and faculty startup funds. Our users include faculty and students from all departments, and their needs are diverse. 

To meet these needs, the college used NSF grant money to purchase a large computer (56 cores, 768 GB ram, and 19 TB Disk).  The system is setup as a VMWare EXSi server, so that we can spin up individual, one-of-a-kind servers when needed, but in . In practice, as much of the system as possible is dedicated to running a SLURM server and compute node.  SLURM is a workload manager that allows users to submit jobs to a queue, and those jobs are then sent off to be run on compute nodes.  VMWare gives us the flexibility of running unique linux or windows virtual machines, and SLURM gives us the ability to add additional computation nodes when additional funding is found.  Note that while all the cores and ram is available, the majority of the disk space is set aside for redundancy.

...

The system is designed to take advantage of the SLURM workload manager.  We expect that the majority of the jobs will take advantage of multiple cores through some type of parallel processing.  95% of our needs seem to be embarrassingly parallel.  Users “ssh” into command.dmz.carleton.edu and submit slurm jobs through the linux command line.  An example of using SLURM can found at https://wiki.carleton.edu/pages/viewpage.action?pageId=57837534.  Some useful SLURM commands can be found at https://wiki.carleton.edu/display/carl/Useful+slurm+commands. If you are new to this, please contact our technical staff for help.

...

More information about X forwarding can be found at:  X Session Forwarding for Windows and X Session Forwarding for OSX  and and Linux.

Who gets access to the system? 

Priority is given to facultyFaculty have priority, but if the system has idle cycles, it can then be used by students working with faculty on research projects as well, and then by students wanting to use it for course work.

Is there a time limit for jobs? 

At the moment there is no official time limit; however, if you want to run a job that you anticipate taking longer than two weeks, please contact Mike Tie before submitting the job.

Eventually, there will be two SLURM queues.  One is for jobs expecting to finish in 48 hours or less, and one is for jobs intending to take less than two weeks.  If your jobs can’t finish in two weeks, you will need to write your code in such a way as to save its state so that you can restart it.  At some point there may also be a queue for students. As the intention is to give all users access, anyone hogging or abusing the system will have their jobs terminated.

What if my program needs more cores, memory, disk space or gpus than the cluster currently has? 

Please If you need more of the above, please find funding. ! We will happily add the new resources, and you will have priority use on the resources that you add to the system.

...

The following is a graphical representation of how we envision the initial system will be configured:

Image RemovedImage Added

Note as of June 27, 2019: The command and compute nodes are drastically smaller and only one of the HHMI nodes has been converted to a slurm compute node.  There is also a very large VM running, summer18.dmz.carleton.edu; this node was initially set up as a place for people to work while slurm was configured.  The current plan is to dramatically reduce the size (RAM and core count) of summer18 on July 29, 2019 and to transition users to command.dmz.carleton.edu. summer18 will be removed by the end of 2019.

...

As of July 27, 2019, none of the compute nodes are equipped with GPUs. Please help us find funding.

What if I want to run a Windows application, some other version of linux, or Mac OSX?

We can spin up small virtual machines for unique version of linux or for windows apps; please contact one of our technical staff.  Unfortunately, Mac OSX is not supported in a virtual environment?

How do I get software installed?

Speak with our technical staff.

Other questions?

Speak with Mike Tie.