Starting an Oracle CRS/RAC cluster that was hung after a lost of the NFS quorum disk

In this real life scenario, the NFS server was taken offline for a relocation without offlining the quorum disk first. This caused Cluster Ready Services (CRS) to hang when it tries to update the quorum disk. The RAC nodes also hang when any commands are issued against the staled NFS mount point. Due to an ongoing network issues where the NFS Server is currently located, we currently have no access the lost quorum disk. In this guide, we will see how to start the CRS without the quorum disk.


Note: $GRID_HOME is the software location of the Grid Infrastructure (GI). grid is the GI owner. This is an Oracle 12c Extended RAC.


Step 1 – On both RAC nodes, comment out the NFS mount in /etc/fstab so when we reboot, it won’t cause a problem.


# nfssrv1:/votedisk /voting_disk nfs rw,bg,hard,intr,rsize=32768,wsize=32768,tcp,noac,vers=3,timeo=600 0 0


Step 2 – Reboot both RAC nodes to clear the stale NFS mounts.


Step 3 – As user grid, we determined that CRS failed to start.


$ $GRID_HOME/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.


This is due to the fact that it has no access to the ASM diskgroup that contains the votedisk (ie. CRS in this example) which is umounted.


$ sqlplus / as sysasm

SQL> select name, state from v$asm_diskgroup;

NAME              STATE
---------------   -----------------
FRA               MOUNTED
DATA              MOUNTED
CRS               DISMOUNTED


Step 4 – Mount the CRS diskgroup with the force option.


If we try to mount without a force, it will complains about the missing votedisk.


SQL> alter diskgroup crs mount;
alter diskgroup crs mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1"


This is a normal redundancy diskgroup so we should be able to force a mount with 1 missing disk, assuming at least one copy of all the extents for all the files are located on the remaining disks in the CRS diskgroup. After the following command to force CRS to mount, it will run at a reduce redundancy until we add the quorum disk back.


SQL> alter diskgroup crs mount force;

Diskgroup altered.


Step 5 – As root, stop the CRS with the -f (force) flag on RAC node 1, then stop it on RAC node 2.


# $GRID_HOME/grid/bin/crsctl stop crs -f


Step 6 – As root, start the CRS on RAC node 1, then start it on RAC node 2.


# $GRID_HOME/bin/crsctl start crs


Step 7 – Verify all the services are up.


# $GRID_HOME/bin/crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRS.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.DATA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.FRA.dg
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.asm
               ONLINE  ONLINE       rac1                     Started,STABLE
               ONLINE  ONLINE       rac2                     Started,STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.MGMTLSNR
      1        ONLINE  ONLINE       rac1                     169.254.230.34 10.3.
                                                             39.129,STABLE
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.cvu
      1        ONLINE  ONLINE       rac1                     STABLE
ora.oc4j
      1        ONLINE  ONLINE       rac1                     STABLE
ora.prod.acev.svc
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
ora.prod.ads.svc
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
ora.prod.cview.svc
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
ora.prod.db
      1        ONLINE  ONLINE       rac1                     Open,STABLE
      2        ONLINE  ONLINE       rac2                     Open,STABLE
ora.prod.dbfs.svc
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
ora.prod.lfe.svc
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
ora.prod.wfm.svc
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
--------------------------------------------------------------------------------

# $GRID_HOME/bin/crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
1. ONLINE   285184cb09664fe7bf98649dc3e20676 (/dev/oracleasm/disks/CRS1) [CRS]
2. ONLINE   fe869b1cef364f44bfcedc8f1eade4d5 (/dev/oracleasm/disks/CRS2) [CRS]
Located 2 voting disk(s).

# $GRID_HOME/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          4
         Total space (kbytes)     :     409568
         Used space (kbytes)      :       1712
         Available space (kbytes) :     407856
         ID                       : 1370413129
         Device/File Name         :       +CRS
                                    Device/File integrity check succeeded

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

                                    Device/File not configured

         Cluster registry integrity check succeeded

         Logical corruption check succeeded

Leave a Reply