Environment
- Management server
- reki(CentOS 7.5)
- Compute servers
- shiki(CentOS 8.2), ruli(CentOS 8.2)
Installation
You can see an installation guide on OpnePBS GitHub repository.
See INSTALL.
In my case, the prefix is /usr/lcoal/openpbs.
Config files
Management servers
- /etc/pbs.conf
PBS_SERVER=reki PBS_START_SERVER=1 PBS_START_SCHED=1 PBS_START_COMM=1 PBS_START_MOM=0 PBS_EXEC=/usr/local/openpbs PBS_HOME=/var/spool/pbs PBS_CORE_LIMIT=unlimited PBS_SCP=/bin/scp
Compute servers
- /etc/pbs.conf
PBS_SERVER=reki PBS_START_SERVER=0 PBS_START_SCHED=0 PBS_START_COMM=0 PBS_START_MOM=1 PBS_EXEC=/usr/local/openpbs PBS_HOME=/var/spool/pbs PBS_CORE_LIMIT=unlimited PBS_SCP=/bin/scp
- /var/spool/pbs/mom_priv/config
$clienthost reki $restrict_user_maxsysid 999
Firewall
Management server
Open 15001-15004/tcp and 17001/tcp.sudo firewall-cmd --zone=public --add-port=17001/tcp --permanent sudo firewall-cmd --zone=public --add-port=15001-15004/tcp --permanent sudo firewall-cmd --reload
Compute servers
Open 15001-15004/tcp.sudo firewall-cmd --zone=public --add-port=15001-15004/tcp --permanent sudo firewall-cmd --reload
Start PBS service
After installing, execute a command below.
sudo systemctl enable --now pbs.service
Compute servers registration
Execute these commands on managemet server.
sudo qmgr -c 'create node shiki' sudo qmgr -c 'create node ruli'When the registration goes well,
pbsnodes -ashows server information.
SSH config
You need to setup ssh authorization to gather stdin/stderr files from compute servers if you don't use disk sharing like NFS. Set up .ssh/config on compute servers.
When something is wrong
See PBS logs.
Default log files location is /var/spool/pbs/XXX_logs/.
For example, when pbsnodes -a shows server state state = state-unknown,down, see /var/spool/pbs/server_logs/XXXXXXXX and you may find an error log like this:
TPP;Server@momo86(Thread 0);sd 3, Received noroute to dest yyy.yyy.yyy.yyy:15003, msg="pbs_comm:xxx.xxx.xxx.xxx:17001: Dest not found at pbs_comm".
Date:2020-10-14