System Updates Archive

  • Omega Unscheduled Outage

    Thursday, July 16, 2015 - 5:15am to 7:30am

    At approximately 5:15am today, a circuit breaker tripped impacting power to about half of the nodes on the Omega cluster. Power was restored to those nodes at approximately 7:30am.

    The root cause remains under investigation by Data Center Engineering.

    The following jobs were impacted:

    compute-45-2 Down  cpu 0:16   load   jobname=flvc36mrun2.pbs   user=mjr92 q=gputest
    compute-45-3 Down  cpu 0:16   load   jobname=flvc36mrun2.pbs   user=mjr92 q=gputest
    compute-45-4 Down  cpu 0:16   load   jobname=flvc36mrun2.pbs   user=mjr92 q=gputest
    compute-46-1 Down  cpu 0:12   load   jobname=flvc36mrun2.pbs   user=mjr92 q=gputest
    compute-46-2 Down  cpu 0:12   load   jobname=flvc36mrun2.pbs   user=mjr92 q=gputest
    compute-46-4 Down  cpu 0:12   load  jobnum=4803721 jobname=...Se-3_b3lypecp   user=wd89 q=esi
    compute-46-5 Down  cpu 0:12   load  jobnum=4803705 jobname=2_CdSe-3_pw91ecp   user=wd89 q=esi
    compute-46-8 Down  cpu 0:12   load  jobnum=4803703 jobname=2_CdSe-3_m06lecp   user=wd89 q=esi
    compute-47-1 Down  cpu 0:12   load   jobname=2_CdSe-3_m06lecp   user=wd89 q=esi
    compute-47-3 Down  cpu 0:12   load  jobnum=4803700 jobname=3_CdSe-3_pw91ecp   user=wd89 q=esi
    compute-47-4 Down  cpu 0:12   load   jobname=3_CdSe-3_pw91ecp   user=wd89 q=esi
    compute-47-7 Down  cpu 0:12   load   jobname=3_CdSe-3_pw91ecp   user=wd89 q=esi
    compute-48-2 Down  cpu 0:12   load  jobnum=4803491 jobname=20_41_TS_qm   user=ma583 q=esi
    compute-48-4 Down  cpu 0:12   load   jobname=20_41_TS_qm   user=ma583 q=esi
    compute-48-5 Down  cpu 0:12   load   jobname=20_41_TS_qm   user=ma583 q=esi
    compute-48-7 Down  cpu 0:12   load   jobname=20_41_TS_qm   user=ma583 q=esi
    compute-49-3 Down  cpu 0:12   load  jobnum=4803027 jobname=..._Thiophene.sh   user=br287 q=esi
    compute-49-6 Down  cpu 0:12   load   jobname=CdSe270-2   user=wd89 q=esi
    compute-49-8 Down  cpu 0:12   load  jobnum=4803091 jobname=..._sec_D61A____   user=ma583 q=esi
    compute-50-2 Down  cpu 0:12   load   jobname=..._sec_D61A____   user=ma583 q=esi
    compute-50-3 Down  cpu 0:12   load   jobname=..._sec_D61A____   user=ma583 q=esi
    compute-50-4 Down  cpu 0:12   load   jobname=..._sec_D61A____   user=ma583 q=esi
    compute-50-6 Down  cpu 0:12   load  jobnum=4803581 jobname=...e_Opt_BS1.pbs   user=ky254 q=esi
    compute-50-7 Down  cpu 0:12   load  jobnum=4803581 jobname=...e_Opt_BS1.pbs   user=ky254 q=esi
    compute-50-8 Down  cpu 0:12   load  jobnum=4803189 jobname=..._Part_Opt.pbs   user=ky254 q=esi
    compute-51-1 Down  cpu 0:12   load  jobnum=4797311 jobname=CdSe270   user=wd89 q=esi
    compute-51-3 Down  cpu 0:12   load  jobnum=4803326 jobname=...qc_restart.sh   user=br287 q=esi
    compute-51-8 Down  cpu 0:12   load  jobnum=4797430 jobname=CdSe270-2   user=wd89 q=esi
    compute-52-1 Down  cpu 0:12   load  jobnum=4797430 jobname=CdSe270-2   user=wd89 q=esi
    compute-52-3 Down  cpu 0:12   load   jobname=CdSe270-2   user=wd89 q=esi
    compute-53-1 Down  cpu 0:12   load  jobnum=4797430 jobname=CdSe270-2   user=wd89 q=esi
    compute-53-2 Down  cpu 0:12   load  jobnum=4800703 jobname=...y2_opt_xqc.sh   user=br287 q=esi
    compute-53-3 Down  cpu 0:12   load   jobname=...y2_opt_xqc.sh   user=br287 q=esi
    compute-53-4 Down  cpu 0:12   load  jobnum=4803305 jobname=...A__imp_41____   user=ma583 q=esi
    compute-53-5 Down  cpu 0:12   load  jobnum=4800641 jobname=corr   user=jh943 q=esi
    compute-53-6 Down  cpu 0:12   load  jobnum=4797430 jobname=CdSe270-2   user=wd89 q=esi
    compute-53-7 Down  cpu 0:12   load  jobnum=4800639 jobname=corr   user=jh943 q=esi
    compute-53-8 Down  cpu 0:12   load   jobname=corr   user=jh943 q=esi
    compute-37-2 Down  cpu 0:8   load  jobnum=4802733 jobname=...12-part52.txt   user=fk65 q=fas_normal
    compute-37-3 Down  cpu 0:8   load  jobnum=4802539 jobname=...0.160-1850-dp   user=pd283 q=fas_normal
    compute-37-4 Down  cpu 0:8   load  jobnum=4802504 jobname=opp.2x10x1.1   user=md599 q=fas_normal
    compute-37-9 Down  cpu 0:8   load  jobnum=4801872 jobname=...hfb_dlnz_long   user=cng8 q=fas_very_long
    compute-37-10 Down  cpu 0:8   load  jobnum=4788165 jobname=...cavity_009.sh   user=awc24 q=fas_very_long
    qstat: Unknown Job Id Error 4799449.rocks.omega.hpc.yale.internal
    compute-37-14 Down  cpu 0:8   load  jobnum=4802534 jobname=...0.270-1920-dp   user=pd283 q=fas_normal
    compute-37-15 Down  cpu 0:8   load  jobnum=4802345 jobname=SiLTO.12si.0.5O   user=ak688 q=fas_normal
    compute-38-6 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-38-7 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-38-8 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-38-9 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-38-10 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-38-11 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-38-13 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-38-14 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-38-15 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-44-3 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-44-4 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-44-8 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-44-10 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-44-12 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-44-13 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-44-14 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    compute-44-16 Down  cpu 0:8   load   jobname=opp.2x12x1.2   user=md599 q=fas_normal
    qstat: Unknown Job Id Error 4799449.rocks.omega.hpc.yale.internal
    compute-15-7 Down  cpu 0:8   load  jobnum=4802287 jobname=...tion_31111_17   user=jmm357 q=fas_normal
    qstat: Unknown Job Id Error 4799449.rocks.omega.hpc.yale.internal
    qstat: Unknown Job Id Error 4799449.rocks.omega.hpc.yale.internal
    qstat: Unknown Job Id Error 4799449.rocks.omega.hpc.yale.internal
    qstat: Unknown Job Id Error 4799449.rocks.omega.hpc.yale.internal
    qstat: Unknown Job Id Error 4799449.rocks.omega.hpc.yale.internal
    compute-25-2 Down  cpu 0:8   load   jobname=...0.160-1850-dp   user=pd283 q=fas_normal
    compute-25-7 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-25-8 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-25-11 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-25-12 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-25-13 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-26-7 Down  cpu 0:8   load   jobname=7-14-15-aspect19   user=krv8 q=fas_normal
    compute-26-8 Down  cpu 0:8   load  jobnum=4802318 jobname=7-14-15-aspect19   user=krv8 q=fas_normal
    compute-26-9 Down  cpu 0:8   load  jobnum=4802345 jobname=SiLTO.12si.0.5O   user=ak688 q=fas_normal
    compute-26-15 Down  cpu 0:8   load  jobnum=4802235 jobname=H.2x2.neg.PTO   user=ak688 q=fas_normal
    compute-27-1 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-2 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-3 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-4 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-5 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-6 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-8 Down  cpu 0:8   load   jobname=openX_1.0_1   user=mw564 q=fas_long
    compute-27-9 Down  cpu 0:8   load   jobname=openX_1.0_1   user=mw564 q=fas_long
    compute-27-11 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-12 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-14 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-15 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-27-16 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-28-3 Down  cpu 0:8   load   jobname=7-14-15-aspect18   user=krv8 q=fas_normal
    compute-28-5 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-28-6 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-28-8 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-28-10 Down  cpu 0:8   load  jobnum=4802235 jobname=H.2x2.neg.PTO   user=ak688 q=fas_normal
    compute-28-15 Down  cpu 0:8   load   jobname=openX_0.35_2   user=mw564 q=fas_long
    compute-29-1 Down  cpu 0:8   load  jobnum=4800438 jobname=openX_0.35_2   user=mw564 q=fas_long
    compute-29-7 Down  cpu 0:8   load   jobname=7-14-15-aspect9   user=krv8 q=fas_normal
    compute-29-9 Down  cpu 0:8   load   jobname=7-14-15-aspect9   user=krv8 q=fas_normal
    compute-29-10 Down  cpu 0:8   load   jobname=7-14-15-aspect9   user=krv8 q=fas_normal
    compute-29-12 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-29-15 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-29-16 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-30-6 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-30-7 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-30-10 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-30-11 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-30-15 Down  cpu 0:8   load   jobname=7-14-15-aspect9   user=krv8 q=fas_normal
    compute-30-16 Down  cpu 0:8   load   jobname=7-14-15-aspect9   user=krv8 q=fas_normal
    compute-31-1 Down  cpu 0:8   load  jobnum=4802235 jobname=H.2x2.neg.PTO   user=ak688 q=fas_normal
    compute-31-2 Down  cpu 0:8   load   jobname=H.2x2.neg.PTO   user=ak688 q=fas_normal
    compute-31-4 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-31-5 Down  cpu 0:8   load  jobnum=4802379 jobname=SiSBTOLT.5mlO.2   user=ak688 q=fas_normal
    compute-31-7 Down  cpu 0:8   load  jobnum=4802379 jobname=SiSBTOLT.5mlO.2   user=ak688 q=fas_normal
    compute-31-9 Down  cpu 0:8   load   jobname=SiSBTOLT.5mlO.2   user=ak688 q=fas_normal
    compute-31-10 Down  cpu 0:8   load   jobname=SiSBTOLT.5mlO.2   user=ak688 q=fas_normal
    compute-31-11 Down  cpu 0:8   load  jobnum=4803002 jobname=...symP_CBS-APNO   user=vaccaro q=fas_normal
    compute-31-12 Down  cpu 0:8   load  jobnum=4803001 jobname=...symP_CBS-APNO   user=vaccaro q=fas_normal
    compute-31-14 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-31-15 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-32-4 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-32-5 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-32-6 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-32-10 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-32-11 Down  cpu 0:8   load  jobnum=4803083 jobname=plrs   user=olz3 q=fas_very_long
    compute-32-12 Down  cpu 0:8   load   jobname=plrs   user=olz3 q=fas_very_long
    compute-32-13 Down  cpu 0:8   load   jobname=plrs   user=olz3 q=fas_very_long
    compute-32-15 Down  cpu 0:8   load  jobnum=4802235 jobname=H.2x2.neg.PTO   user=ak688 q=fas_normal
    compute-32-16 Down  cpu 0:8   load  jobnum=4802235 jobname=H.2x2.neg.PTO   user=ak688 q=fas_normal
    compute-33-3 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-33-4 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-33-6 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-33-7 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-33-9 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-33-10 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-33-12 Down  cpu 0:8   load   jobname=SimpleQueue   user=sd566 q=fas_normal
    compute-33-13 Down  cpu 0:8   load  jobnum=4802777 jobname=...12-part92.txt   user=fk65 q=fas_normal
    compute-33-14 Down  cpu 0:8   load  jobnum=4802318 jobname=7-14-15-aspect19   user=krv8 q=fas_normal
    compute-33-15 Down  cpu 0:8   load  jobnum=4802318 jobname=7-14-15-aspect19   user=krv8 q=fas_normal
    compute-33-16 Down  cpu 0:8   load  jobnum=4802776 jobname=...12-part91.txt   user=fk65 q=fas_normal
    compute-34-1 Down  cpu 0:8   load  jobnum=4800448 jobname=openX_1.0_4   user=mw564 q=fas_long
    compute-34-2 Down  cpu 0:8   load  jobnum=4800448 jobname=openX_1.0_4   user=mw564 q=fas_long
    compute-34-3 Down  cpu 0:8   load  jobnum=4800448 jobname=openX_1.0_4   user=mw564 q=fas_long
    compute-34-5 Down  cpu 0:8   load   jobname=7-14-15-aspect18   user=krv8 q=fas_normal
    compute-34-8 Down  cpu 0:8   load  jobnum=4802534 jobname=...0.270-1920-dp   user=pd283 q=fas_normal
    compute-34-9 Down  cpu 0:8   load  jobnum=4802534 jobname=...0.270-1920-dp   user=pd283 q=fas_normal
    compute-34-12 Down  cpu 0:8   load  jobnum=4780106 jobname=...sd-apVDZ_freq   user=vaccaro q=fas_very_long
    compute-39-13 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-40-4 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-40-5 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-40-6 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-40-8 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-40-10 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-40-13 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-41-6 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-41-7 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-41-8 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-41-10 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-41-11 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-41-15 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-41-16 Down  cpu 0:8   load  jobnum=4802043 jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-42-1 Down  cpu 0:8   load   jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-42-2 Down  cpu 0:8   load   jobname=...0_CSF_256_512   user=jbb83 q=astro_prod
    compute-42-4 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-42-5 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-42-8 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-42-9 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-42-10 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-42-12 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-42-13 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-42-15 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-1 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-2 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-4 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-5 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-6 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-8 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-11 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-13 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-14 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    compute-43-15 Down  cpu 0:8   load   jobname=L500_CSF_SFoff   user=etl28 q=astro_prod
    
  • GPFS Issues

    Saturday, August 1, 2015 - 12:00am

    Previously we have experienced several outages of our GPFS filesystem, which primarily serves the Grace cluster but which is also mounted to other clusters as well.  We understand these outages have been particularly disruptive for some and we apologize for any inconvenience this has caused.

    To remedy the issue, the HPC team addressed networking issues during a recent Grace outage.   The GPFS filesystem currently appears stable and is currently exported to Louise and BDN.  The plan is to have GPFS also mounted onto Omega but additional testing will be needed.  A timeline has yet to be defined.  In the meantime, if you notice any continued issues, please report to hpc@yale.edu with the time and a description of the issue.

    As always, if you have any questions, concerns or comments please contact us at hpc@yale.edu

    Aug 1, 2015:  

    The HPC team recently repaired the network configurations that had caused several unexpected GPFS file system outages. The GPFS file system resides on Grace, but is also currently mounted on Louise and BulldogN. The plan is to have GPFS mounted on Omega, but additional testing is needed.  

  • Lustre on Omega was suspended briefly

    Tuesday, July 7, 2015 - 12:30pm

    The Lustre filesystem on Omega was briefly suspended.   While jobs continued to run, they were likely suspended for roughly 1 hour, 12:30 -1:30 pm.  We are recommending that jobs running during this time be checked.

  • Brief Grace Cluster Outage

    Monday, June 29, 2015 - 6:30pm

    The Grace cluster experienced brief outages today, 6/29 at 1:20 pm and again at 6:30 pm. Any scheduled job running during these times was likely impacted and will need to be restarted.  

    The HPC team is working with the vendor to understand the root cause and updates will be posted here.

    We apologize for any inconvenience this may have caused and as always, if you have any questions, concerns or comments please contact us at hpc@yale.edu

  • Brief Outage on Grace

    Tuesday, June 23, 2015 - 3:15pm

    The Grace cluster experienced a brief outage today at 3:15 pm.  Any scheduled job running on Grace at this time was likely impacted and will need to be restarted.   The HPC team will continue to monitor the situation and updates will be posted here as more information becomes available.

    We apologize for any inconvenience this may have caused and as always, if you have any questions or concerns please don’t hesitate to contact us at hpc@yale.edu

  • hpc@yale.edu email address bouncing

    We have discovered that our support email address, hpc@yale.edu, is bouncing back to the sender.  We are currently working with the email team to resolve the issue and it should be working shortly.  In the mean time, if you have an urgent issue, please contact the research support team directly (Andy, Steve, Rob or Jason) Their email addresses can be found to the right of this message.  Apologies for any inconvenience this may have caused.

  • GPFS / Grace Cluster back up

    Friday, June 12, 2015 - 10:00pm

    The GPFS filesystem is up and running normally.  Any scheduled  jobs previously running on Grace or any other job using data from /gpfs on other clusters was likely impacted and will need to be restarted.  

    We apologize for any inconvenience this may have caused and as always, if you have any questions, concerns or comments please don’t hesitate to contact us at hpc@yale.edu.

  • Brief Network Outage - Sunday morning - Low Impact

    Sunday, June 14, 2015 - 5:30am

    Yale ITS will be performing network maintenance affecting all network connections to and from the ITS Data Centers on Sunday, June 14, between 5:30 a.m. and 6:30 a.m. The HPC Clusters will be inaccessible during this time, however all jobs will continue to run.  Interactive jobs may experience a disruption.  

    If you have any questions or concerns, please email the HPC team at hpc@yale.edu.

  • BulldogN Maintenance Postponed

    Friday, June 5, 2015 - 12:00pm

    We have decided to postpone the maintenance window for Bulldogn that was scheduled for Monday June 8th. Another downtime will be required in a few weeks for electrical power work in our West Campus Data Center and in an effort to avoid multiple interruptions we have chosen to consolidate maintenance windows.  The precise date has yet to be determined.

    In the meantime, we may contact individual users in order to arrange to migrate their home directories while the cluster is operating.

    To see the latest updates please visit the Status Page on the Yale Center for Research Computing website.
    If you have any questions, concerns or comments, please don’t hesitate to contact us at hpc@yale.edu.
     

  • Grace Cluster Back Online

    Saturday, May 23, 2015 - 8:00am

    All storage and nodes as part of the Grace expansion are available for use. An additional 90 general use nodes are available as well as an additional 500 TB of storage.   Dedicated hardware is also available and details have been communicated directly to the group leaders.

    We apologize for any inconvenience this may have caused and as always, if you have any questions, concerns or comments please don’t hesitate to contact us at hpc@yale.edu.

Pages