这是个笔记

在移植CESM的时候,我想着我的服务器比较强,所以希望同时跑两个案例。
在我跑第二个案例的时候 “./case.submit”,出现以下错误:

我的案例及其编译器

./create_newcase --case 1850CLM50Bgc_gnu_cesm --res f19_g16 --compset I1850Clm50Bgc --run-unsupported --compiler gnu --mach mygnu
2021-08-08 12:21:36 MODEL EXECUTION BEGINS HERE
run command is mpirun  -np 4  /home/ubuntu/cesm/scratch/1850CLM50Bgc_gnu_cesm/bld/cesm.exe  >> cesm.log.$LID 2>&1  
ERROR: RUN FAIL: Command 'mpirun  -np 4  /home/ubuntu/cesm/scratch/1850CLM50Bgc_gnu_cesm/bld/cesm.exe  >> cesm.log.$LID 2>&1 ' failed
See log file for details: /home/ubuntu/cesm/scratch/1850CLM50Bgc_gnu_cesm/run/cesm.log.210808-122133

运行代码 “cat /home/ubuntu/cesm/scratch/1850CLM50Bgc_gnu_cesm/run/cesm.log.210808-122133”,查看cesm日志。

 Invalid PIO rearranger comm max pend req (comp2io),            0
 Resetting PIO rearranger comm max pend req (comp2io) to           64
 PIO rearranger options:
   comm type     =p2p                                                                             
   comm fcd      =2denable                                                                        
   max pend req (comp2io)  =           0
   enable_hs (comp2io)     = T
   enable_isend (comp2io)  = F
   max pend req (io2comp)  =          64
   enable_hs (io2comp)    = F
   enable_isend (io2comp)  = T
(seq_comm_setcomm)  init ID (  1 GLOBAL          ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_setcomm)  init ID (  2 CPL             ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_setcomm)  init ID (  5 ATM             ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_joincomm) init ID (  6 CPLATM          ) join IDs =     2     5       ( npes =     4) ( nthreads =  1)
(seq_comm_jcommarr) init ID (  3 ALLATMID        ) join multiple comp IDs       ( npes =     4) ( nthreads =  1)
(seq_comm_joincomm) init ID (  4 CPLALLATMID     ) join IDs =     2     3       ( npes =     4) ( nthreads =  1)
(seq_comm_setcomm)  init ID (  9 LND             ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_joincomm) init ID ( 10 CPLLND          ) join IDs =     2     9       ( npes =     4) ( nthreads =  1)
(seq_comm_jcommarr) init ID (  7 ALLLNDID        ) join multiple comp IDs       ( npes =     4) ( nthreads =  1)
(seq_comm_joincomm) init ID (  8 CPLALLLNDID     ) join IDs =     2     7       ( npes =     4) ( nthreads =  1)
(seq_comm_setcomm)  init ID ( 13 ICE             ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_joincomm) init ID ( 14 CPLICE          ) join IDs =     2    13       ( npes =     4) ( nthreads =  1)
(seq_comm_jcommarr) init ID ( 11 ALLICEID        ) join multiple comp IDs       ( npes =     4) ( nthreads =  1)
(seq_comm_joincomm) init ID ( 12 CPLALLICEID     ) join IDs =     2    11       ( npes =     4) ( nthreads =  1)
(seq_comm_setcomm)  init ID ( 17 OCN             ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_joincomm) init ID ( 18 CPLOCN          ) join IDs =     2    17       ( npes =     4) ( nthreads =  1)
(seq_comm_jcommarr) init ID ( 15 ALLOCNID        ) join multiple comp IDs       ( npes =     4) ( nthreads =  1)
(seq_comm_joincomm) init ID ( 16 CPLALLOCNID     ) join IDs =     2    15       ( npes =     4) ( nthreads =  1)
(seq_comm_setcomm)  init ID ( 21 ROF             ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_joincomm) init ID ( 22 CPLROF          ) join IDs =     2    21       ( npes =     4) ( nthreads =  1)
(seq_comm_jcommarr) init ID ( 19 ALLROFID        ) join multiple comp IDs       ( npes =     4) ( nthreads =  1)
(seq_comm_joincomm) init ID ( 20 CPLALLROFID     ) join IDs =     2    19       ( npes =     4) ( nthreads =  1)
(seq_comm_setcomm)  init ID ( 25 GLC             ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_joincomm) init ID ( 26 CPLGLC          ) join IDs =     2    25       ( npes =     4) ( nthreads =  1)
(seq_comm_jcommarr) init ID ( 23 ALLGLCID        ) join multiple comp IDs       ( npes =     4) ( nthreads =  1)
(seq_comm_joincomm) init ID ( 24 CPLALLGLCID     ) join IDs =     2    23       ( npes =     4) ( nthreads =  1)
(seq_comm_setcomm)  init ID ( 29 WAV             ) pelist   =     0     3     1 ( npes =     4) ( nthreads =  1)( suffix =)
(seq_comm_joincomm) init ID ( 30 CPLWAV          ) join IDs =     2    29       ( npes =     4) ( nthreads =  1)
(seq_comm_jcommarr) init ID ( 27 ALLWAVID        ) join multiple comp IDs       ( npes =     4) ( nthreads =  1)
(seq_comm_joincomm) init ID ( 28 CPLALLWAVID     ) join IDs =     2    27       ( npes =     4) ( nthreads =  1)
(seq_comm_setcomm)  init ID ( 33 ESP             ) pelist   =     0     0     1 ( npes =     1) ( nthreads =  1)( suffix =)
(seq_comm_joincomm) init ID ( 34 CPLESP          ) join IDs =     2    33       ( npes =     4) ( nthreads =  1)
(seq_comm_jcommarr) init ID ( 31 ALLESPID        ) join multiple comp IDs       ( npes =     1) ( nthreads =  1)
(seq_comm_joincomm) init ID ( 32 CPLALLESPID     ) join IDs =     2    31       ( npes =     4) ( nthreads =  1)
(seq_comm_printcomms)     1     0     4     1  GLOBAL:
(seq_comm_printcomms)     2     0     4     1  CPL:
(seq_comm_printcomms)     3     0     4     1  ALLATMID:
(seq_comm_printcomms)     4     0     4     1  CPLALLATMID:
(seq_comm_printcomms)     5     0     4     1  ATM:
(seq_comm_printcomms)     6     0     4     1  CPLATM:
(seq_comm_printcomms)     7     0     4     1  ALLLNDID:
(seq_comm_printcomms)     8     0     4     1  CPLALLLNDID:
(seq_comm_printcomms)     9     0     4     1  LND:
(seq_comm_printcomms)    10     0     4     1  CPLLND:
(seq_comm_printcomms)    11     0     4     1  ALLICEID:
(seq_comm_printcomms)    12     0     4     1  CPLALLICEID:
(seq_comm_printcomms)    13     0     4     1  ICE:
(seq_comm_printcomms)    14     0     4     1  CPLICE:
(seq_comm_printcomms)    15     0     4     1  ALLOCNID:
(seq_comm_printcomms)    16     0     4     1  CPLALLOCNID:
(seq_comm_printcomms)    17     0     4     1  OCN:
(seq_comm_printcomms)    18     0     4     1  CPLOCN:
(seq_comm_printcomms)    19     0     4     1  ALLROFID:
(seq_comm_printcomms)    20     0     4     1  CPLALLROFID:
(seq_comm_printcomms)    21     0     4     1  ROF:
(seq_comm_printcomms)    22     0     4     1  CPLROF:
(seq_comm_printcomms)    23     0     4     1  ALLGLCID:
(seq_comm_printcomms)    24     0     4     1  CPLALLGLCID:
(seq_comm_printcomms)    25     0     4     1  GLC:
(seq_comm_printcomms)    26     0     4     1  CPLGLC:
(seq_comm_printcomms)    27     0     4     1  ALLWAVID:
(seq_comm_printcomms)    28     0     4     1  CPLALLWAVID:
(seq_comm_printcomms)    29     0     4     1  WAV:
(seq_comm_printcomms)    30     0     4     1  CPLWAV:
(seq_comm_printcomms)    31     0     1     1  ALLESPID:
(seq_comm_printcomms)    32     0     4     1  CPLALLESPID:
(seq_comm_printcomms)    33     0     1     1  ESP:
(seq_comm_printcomms)    34     0     4     1  CPLESP:
 (t_initf) Read in prof_inparm namelist from: drv_in
 (t_initf) Using profile_disable=          F
 (t_initf)       profile_timer=                      4
 (t_initf)       profile_depth_limit=                4
 (t_initf)       profile_detail_limit=               2
 (t_initf)       profile_barrier=          F
 (t_initf)       profile_outpe_num=                  1
 (t_initf)       profile_outpe_stride=               0
 (t_initf)       profile_single_file=      F
 (t_initf)       profile_global_stats=     T
 (t_initf)       profile_ovhd_measurement= F
 (t_initf)       profile_add_detail=       F
 (t_initf)       profile_papi_enable=      F

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x151e7878232a
#1  0x151e78781503
#2  0x151e77dff03f
#3  0x55d58d7cc3ed
#4  0x55d58d7c29da
#5  0x55d58d7bf712
#6  0x55d58d67cb71
#7  0x55d58d6e40d0
#8  0x55d58cf41888
#9  0x55d58cf3bab7
#10  0x55d58cec4b2e
#11  0x55d58ceb5d8f
#12  0x55d58cec20e0
#13  0x151e77de1bf6
#14  0x55d58cea84d9
#15  0xffffffffffffffff
#0  0x155370fb532a
#1  0x155370fb4503
#2  0x15537063203f
#3  0x5590caf743c1
#4  0x5590caf6a9da
#5  0x5590caf67712
#6  0x5590cae2619f
#7  0x5590cae8c0d0
#8  0x5590ca6e9888
#9  0x5590ca6e3ab7
#10  0x5590ca66cb2e
#11  0x5590ca65dd8f
#12  0x5590ca66a0e0
#13  0x155370614bf6
#14  0x5590ca6504d9
#15  0xffffffffffffffff
#0  0x14d229d6232a
#1  0x14d229d61503
#2  0x14d2293df03f
#3  0x55bbac8b93c1
#4  0x55bbac8af9da
#5  0x55bbac8ac712
#6  0x55bbac769b71
#7  0x55bbac7d10d0
#8  0x55bbac02e888
#9  0x55bbac028ab7
#10  0x55bbabfb1b2e
#11  0x55bbabfa2d8f
#12  0x55bbabfaf0e0
#13  0x14d2293c1bf6
#14  0x55bbabf954d9
#15  0xffffffffffffffff
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node ubuntu exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

结果方案

其实什么错误都没有,只是工作站只能 submit 一个案例。关掉前一个,运行这一个就ok。写此博客第一为了做笔记,第二希望后面的人不要再像我一样,去查找解决方案,最后还没解决了。

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐