In this real life scenario, the NFS server was taken offline for a relocation without offlining the quorum disk first. This caused Cluster Ready Services (CRS) to hang when it tries to update the quorum disk. The RAC nodes also hang when any commands are issued against the staled NFS mount point. Due to an ongoing network issues where the NFS Server is currently located, we currently have no access the lost quorum disk. In this guide, we will see how to start the CRS without the quorum disk.
Note: $GRID_HOME is the software location of the Grid Infrastructure (GI). grid is the GI owner. This is an Oracle 12c Extended RAC.
Step 1 – On both RAC nodes, comment out the NFS mount in /etc/fstab so when we reboot, it won’t cause a problem.
# nfssrv1:/votedisk /voting_disk nfs rw,bg,hard,intr,rsize=32768,wsize=32768,tcp,noac,vers=3,timeo=600 0 0
Step 2 – Reboot both RAC nodes to clear the stale NFS mounts.
Step 3 – As user grid, we determined that CRS failed to start.
$ $GRID_HOME/bin/crsctl stat res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
This is due to the fact that it has no access to the ASM diskgroup that contains the votedisk (ie. CRS in this example) which is umounted.
$ sqlplus / as sysasm
SQL> select name, state from v$asm_diskgroup;
NAME STATE
--------------- -----------------
FRA MOUNTED
DATA MOUNTED
CRS DISMOUNTED
Step 4 – Mount the CRS diskgroup with the force option.
If we try to mount without a force, it will complains about the missing votedisk.
SQL> alter diskgroup crs mount;
alter diskgroup crs mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "1"
This is a normal redundancy diskgroup so we should be able to force a mount with 1 missing disk, assuming at least one copy of all the extents for all the files are located on the remaining disks in the CRS diskgroup. After the following command to force CRS to mount, it will run at a reduce redundancy until we add the quorum disk back.
SQL> alter diskgroup crs mount force;
Diskgroup altered.
Step 5 – As root, stop the CRS with the -f (force) flag on RAC node 1, then stop it on RAC node 2.
# $GRID_HOME/grid/bin/crsctl stop crs -f
Step 6 – As root, start the CRS on RAC node 1, then start it on RAC node 2.
# $GRID_HOME/bin/crsctl start crs
Step 7 – Verify all the services are up.
# $GRID_HOME/bin/crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CRS.dg
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.DATA.dg
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.FRA.dg
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.asm
ONLINE ONLINE rac1 Started,STABLE
ONLINE ONLINE rac2 Started,STABLE
ora.net1.network
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
ora.ons
ONLINE ONLINE rac1 STABLE
ONLINE ONLINE rac2 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1 STABLE
ora.MGMTLSNR
1 ONLINE ONLINE rac1 169.254.230.34 10.3.
39.129,STABLE
ora.rac1.vip
1 ONLINE ONLINE rac1 STABLE
ora.rac2.vip
1 ONLINE ONLINE rac2 STABLE
ora.cvu
1 ONLINE ONLINE rac1 STABLE
ora.oc4j
1 ONLINE ONLINE rac1 STABLE
ora.prod.acev.svc
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
ora.prod.ads.svc
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
ora.prod.cview.svc
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
ora.prod.db
1 ONLINE ONLINE rac1 Open,STABLE
2 ONLINE ONLINE rac2 Open,STABLE
ora.prod.dbfs.svc
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
ora.prod.lfe.svc
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
ora.prod.wfm.svc
1 ONLINE ONLINE rac1 STABLE
2 ONLINE ONLINE rac2 STABLE
ora.scan1.vip
1 ONLINE ONLINE rac1 STABLE
--------------------------------------------------------------------------------
# $GRID_HOME/bin/crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 285184cb09664fe7bf98649dc3e20676 (/dev/oracleasm/disks/CRS1) [CRS]
2. ONLINE fe869b1cef364f44bfcedc8f1eade4d5 (/dev/oracleasm/disks/CRS2) [CRS]
Located 2 voting disk(s).
# $GRID_HOME/bin/ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 4
Total space (kbytes) : 409568
Used space (kbytes) : 1712
Available space (kbytes) : 407856
ID : 1370413129
Device/File Name : +CRS
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check succeeded