purplejack 发表于 2013-5-3 13:02:49

作业提交系统PBS的安装问题

这两天都被作业提交系统torque的安装所困扰。我在官方网站上下载了torque好几个版本的源码*.tar.gz,开始安装4.1.2版本,按照 http://blog.chinaunix.net/uid-7726704-id-2045398.html 上的方法进行,出现了问题;后来下了官方的英文文档,按照上面的步骤安装时又出现了新的问题。我把出现问题的几次终端输出拷贝保存了下来,如下所示:
$ echo "sleep 30" | qsub
5.localhost.localdomain
$ qstat
Job id                  Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
5.localhost                STDIN            peng                   0 C batch         
$ qsub run_model.pbs
qsub: Unknown queue MSG=cannot locate queue
=====================================================================
# pbsnodes -a
pbsnodes: Server has no node list MSG=node list is empty - check 'server_priv/nodes' file
# pbs_mom
pbs_mom: LOG_ERROR::Resource temporarily unavailable (11) in pbs_mom, cannot lock '/var/spool/torque/mom_priv/mom.lock' - another mom running
cannot lock '/var/spool/torque/mom_priv/mom.lock' - another mom running
#
=====================================================================
# ./torque.setup peng
initializing TORQUE (admin: [email protected])
PBS_Server localhost.localdomain: Create mode and server database exists,
do you wish to continue y/(n)?y
root      5418   10 11:33 ?      00:00:00 pbs_server -t create
Max open servers: 10239
set server operators += [email protected]
Max open servers: 10239
set server managers += [email protected]
======================================================================
$ qsub run_model.pbs
qsub: Unknown queue MSG=cannot locate queue
You have new mail in /var/spool/mail/peng
*******************************************************************
From [email protected] May3 10:57:20 2013
Return-Path: <[email protected]>
X-Original-To: [email protected]
Delivered-To: [email protected]
Received: by localhost.localdomain (Postfix, from userid 0)
      id CC589282ECD; Fri,3 May 2013 10:57:20 +0800 (CST)
To: [email protected]
Subject: PBS JOB 1.localhost.localdomain
Precedence: bulk
Message-Id: <[email protected]>
Date: Fri,3 May 2013 10:57:20 +0800 (CST)
From: [email protected] (root)

PBS Job Id: 1.localhost.localdomain
Job Name:   STDIN
job deleted
Job deleted at request of [email protected]
Job could never run
******************************************************************
=====================================================================
# vim /var/spool/torque/server_priv/nodes
# pbs_server
PBS_Server: LOG_ERROR::Unknown node(15064) in process_host_name_part, host master not found
PBS_Server: LOG_ERROR::process_host_name_part, host master not found
PBS_Server: LOG_ERROR::Unknown node(15064) in process_host_name_part, host node01 not found
PBS_Server: LOG_ERROR::process_host_name_part, host node01 not found
pbs_server: network: Address already in use
PBS_Server: LOG_ERROR::PBS_Server, init_network failed dis
# pbs_sched
pbs_sched: LOG_ERROR::Address already in use (98) in main, bind
# pbs_mom
pbs_mom: LOG_ERROR::Resource temporarily unavailable (11) in pbs_mom, cannot lock '/var/spool/torque/mom_priv/mom.lock' - another mom running
cannot lock '/var/spool/torque/mom_priv/mom.lock' - another mom running
#
=====================================================================
其中================间隔表示中间做了一些无关或没有问题的操作。
请问有人知道是什么原因?怎么解决吗?请不吝指教。非常感谢!

luoyanchun 发表于 2021-4-21 17:15:22

pbs好古老了。现在在用slurm。
页: [1]
查看完整版本: 作业提交系统PBS的安装问题