Installing OpenPBS

Environment

Management server
reki(CentOS 7.5)
Compute servers
shiki(CentOS 8.2), ruli(CentOS 8.2)
These hostnames should be able to be resolved by DNS server or /etc/hosts.

Installation

You can see an installation guide on OpnePBS GitHub repository.
See INSTALL.
In my case, the prefix is /usr/lcoal/openpbs.

Config files

Management servers

/etc/pbs.conf
PBS_SERVER=reki
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=0
PBS_EXEC=/usr/local/openpbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

Compute servers

/etc/pbs.conf
PBS_SERVER=reki
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/usr/local/openpbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp
/var/spool/pbs/mom_priv/config
$clienthost reki
$restrict_user_maxsysid 999

Firewall

Management server

Open 15001-15004/tcp and 17001/tcp.
sudo firewall-cmd --zone=public --add-port=17001/tcp --permanent
sudo firewall-cmd --zone=public --add-port=15001-15004/tcp --permanent
sudo firewall-cmd --reload

Compute servers

Open 15001-15004/tcp.
sudo firewall-cmd --zone=public --add-port=15001-15004/tcp --permanent
sudo firewall-cmd --reload

Start PBS service

After installing, execute a command below.

sudo systemctl enable --now pbs.service

Compute servers registration

Execute these commands on managemet server.

sudo qmgr -c 'create node shiki'
sudo qmgr -c 'create node ruli'
When the registration goes well,
pbsnodes -a
shows server information.

SSH config

You need to setup ssh authorization to gather stdin/stderr files from compute servers if you don't use disk sharing like NFS. Set up .ssh/config on compute servers.

When something is wrong

See PBS logs.
Default log files location is /var/spool/pbs/XXX_logs/. For example, when pbsnodes -a shows server state state = state-unknown,down, see /var/spool/pbs/server_logs/XXXXXXXX and you may find an error log like this:

TPP;Server@momo86(Thread 0);sd 3, Received noroute to dest yyy.yyy.yyy.yyy:15003, msg="pbs_comm:xxx.xxx.xxx.xxx:17001: Dest not found at pbs_comm"
.

Date:2020-10-14