Starting an Oracle CRS/RAC cluster that was hung after a lost of the NFS quorum disk

In this real life scenario, the NFS server was taken offline for a relocation without offlining the quorum disk first. This caused Cluster Ready Services (CRS) to hang when it tries to update the quorum disk. The RAC nodes also hang when any commands are issued against the staled NFS mount point. Due to an ongoing network issues where the NFS Server is currently located, we currently have no access the lost quorum disk. In this guide, we will see how to start the CRS without the quorum disk.

Note: $GRID_HOME is the software location of the Grid Infrastructure (GI). grid is the GI owner. This is an Oracle 12c Extended RAC.

Step 1 – On both RAC nodes, comment out the NFS mount in /etc/fstab so when we reboot, it won’t cause a problem.


# nfssrv1:/votedisk /voting_disk nfs rw,bg,hard,intr,rsize=32768,wsize=32768,tcp,noac,vers=3,timeo=600 0 0

Step 2 – Reboot both RAC nodes to clear the stale NFS mounts.

Step 3 – As user grid, we determined that CRS failed to start.


$ $GRID_HOME/bin/crsctl stat res -t

CRS-4535: Cannot communicate with Cluster Ready Services

CRS-4000: Command Status failed, or completed with errors.

This is due to the fact that it has no access to the ASM diskgroup that contains the votedisk (ie. CRS in this example) which is umounted.


$ sqlplus / as sysasm



SQL> select name, state from v$asm_diskgroup;



NAME              STATE

---------------   -----------------

FRA               MOUNTED

DATA              MOUNTED

CRS               DISMOUNTED

Step 4 – Mount the CRS diskgroup with the force option.

If we try to mount without a force, it will complains about the missing votedisk.


SQL> alter diskgroup crs mount;

alter diskgroup crs mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15040: diskgroup is incomplete

ORA-15042: ASM disk "1" is missing from group number "1"

This is a normal redundancy diskgroup so we should be able to force a mount with 1 missing disk, assuming at least one copy of all the extents for all the files are located on the remaining disks in the CRS diskgroup. After the following command to force CRS to mount, it will run at a reduce redundancy until we add the quorum disk back.


SQL> alter diskgroup crs mount force;



Diskgroup altered.

Step 5 – As root, stop the CRS with the -f (force) flag on RAC node 1, then stop it on RAC node 2.


# $GRID_HOME/grid/bin/crsctl stop crs -f

Step 6 – As root, start the CRS on RAC node 1, then start it on RAC node 2.


# $GRID_HOME/bin/crsctl start crs

Step 7 – Verify all the services are up.


# $GRID_HOME/bin/crsctl stat res -t

--------------------------------------------------------------------------------

Name           Target  State        Server                   State details

--------------------------------------------------------------------------------

Local Resources

--------------------------------------------------------------------------------

ora.CRS.dg

               ONLINE  ONLINE       rac1                     STABLE

               ONLINE  ONLINE       rac2                     STABLE

ora.DATA.dg

               ONLINE  ONLINE       rac1                     STABLE

               ONLINE  ONLINE       rac2                     STABLE

ora.FRA.dg

               ONLINE  ONLINE       rac1                     STABLE

               ONLINE  ONLINE       rac2                     STABLE

ora.LISTENER.lsnr

               ONLINE  ONLINE       rac1                     STABLE

               ONLINE  ONLINE       rac2                     STABLE

ora.asm

               ONLINE  ONLINE       rac1                     Started,STABLE

               ONLINE  ONLINE       rac2                     Started,STABLE

ora.net1.network

               ONLINE  ONLINE       rac1                     STABLE

               ONLINE  ONLINE       rac2                     STABLE

ora.ons

               ONLINE  ONLINE       rac1                     STABLE

               ONLINE  ONLINE       rac2                     STABLE

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.LISTENER_SCAN1.lsnr

      1        ONLINE  ONLINE       rac1                     STABLE

ora.MGMTLSNR

      1        ONLINE  ONLINE       rac1                     169.254.230.34 10.3.

                                                             39.129,STABLE

ora.rac1.vip

      1        ONLINE  ONLINE       rac1                     STABLE

ora.rac2.vip

      1        ONLINE  ONLINE       rac2                     STABLE

ora.cvu

      1        ONLINE  ONLINE       rac1                     STABLE

ora.oc4j

      1        ONLINE  ONLINE       rac1                     STABLE

ora.prod.acev.svc

      1        ONLINE  ONLINE       rac1                     STABLE

      2        ONLINE  ONLINE       rac2                     STABLE

ora.prod.ads.svc

      1        ONLINE  ONLINE       rac1                     STABLE

      2        ONLINE  ONLINE       rac2                     STABLE

ora.prod.cview.svc

      1        ONLINE  ONLINE       rac1                     STABLE

      2        ONLINE  ONLINE       rac2                     STABLE

ora.prod.db

      1        ONLINE  ONLINE       rac1                     Open,STABLE

      2        ONLINE  ONLINE       rac2                     Open,STABLE

ora.prod.dbfs.svc

      1        ONLINE  ONLINE       rac1                     STABLE

      2        ONLINE  ONLINE       rac2                     STABLE

ora.prod.lfe.svc

      1        ONLINE  ONLINE       rac1                     STABLE

      2        ONLINE  ONLINE       rac2                     STABLE

ora.prod.wfm.svc

      1        ONLINE  ONLINE       rac1                     STABLE

      2        ONLINE  ONLINE       rac2                     STABLE

ora.scan1.vip

      1        ONLINE  ONLINE       rac1                     STABLE

--------------------------------------------------------------------------------


# $GRID_HOME/bin/crsctl query css votedisk

##  STATE    File Universal Id                File Name Disk group

--  -----    -----------------                --------- ---------

1. ONLINE   285184cb09664fe7bf98649dc3e20676 (/dev/oracleasm/disks/CRS1) [CRS]

2. ONLINE   fe869b1cef364f44bfcedc8f1eade4d5 (/dev/oracleasm/disks/CRS2) [CRS]

Located 2 voting disk(s).


# $GRID_HOME/bin/ocrcheck

Status of Oracle Cluster Registry is as follows :

         Version                  :          4

         Total space (kbytes)     :     409568

         Used space (kbytes)      :       1712

         Available space (kbytes) :     407856

         ID                       : 1370413129

         Device/File Name         :       +CRS

                                    Device/File integrity check succeeded



                                    Device/File not configured



                                    Device/File not configured



                                    Device/File not configured



                                    Device/File not configured



         Cluster registry integrity check succeeded



         Logical corruption check succeeded

Leave a Reply Cancel reply