Graceful Shutdown of One Node in 2-node Oracle 12c/18c/19c RAC Configuration with No Downtime

In this tutorial, we will try to avoid application downtime when we are taking one of the RAC nodes offline to do maintenance by ensuring that all the services will be running on the remaining nodes.


Note:

-It is recommended to try out these procedures on a Test System before attempting on the Production System.

-Usernames/instances/DB/nodes, etc. are names taken from my system. Make sure you replace them with values appropriate for your system.

-$GRID_HOME is the software location of the Grid Infrastructure (GI).

-grid user is the GI software owner.

-$ORACLE_HOME is the software location of the Oracle Database.

-oracle user is the DB software owner.

-rac1/rac2 are the node names of Oracle RAC.

-PROD is the name of my RAC DB. PROD1/PROD2 are the names of the RAC instances running on rac1/rac2.

-ASM1/ASM2 are the names of the ASM Instances.

-This guide assumes that TAF (Transparent Application Failover) is configured and working properly.

-We want to ensure that all the services are running on rac2 or will be failing over to rac2 node when we take rac1 down to do maintenance.


Step 1 – Record the current service configuration for references.


As oracle user, run the following command to list all the services for the database:


[oracle@rac1 ~]$ srvctl config service -d PROD
Service name: CAT
Server pool:
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method:
Failover retries:
Failover delay:
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: PROD1PDB
Hub service:
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: PROD1
Available instances:
CSS critical: no
Service uses Java: false

Service name: DOG
Server pool:
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method:
Failover retries:
Failover delay:
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: PROD1PDB
Hub service:
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: PROD1
Available instances: PROD2
CSS critical: no
Service uses Java: false

Service name: PIG
Server pool:
Cardinality: 2
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method:
Failover retries:
Failover delay:
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: PROD1PDB
Hub service:
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout:
Stop option:
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: PROD1,PROD2
Available instances:
CSS critical: no
Service uses Java: false

Step 2 – We have 3 services (CAT, DOG and PIG) running. PIG running on both RAC instances:



[oracle@rac1 ~]$ srvctl status service -d PROD
Service CAT is running on instance(s) PROD1
Service DOG is running on instance(s) PROD1
Service PIG is running on instance(s) PROD1,PROD2

In the service configuration output from step 1, note these values for the services:


CAT:
Preferred instances: PROD1
Available instances:

DOG:
Preferred instances: PROD1
Available instances: PROD2

PIG:
Preferred instances: PROD1,PROD2
Available instances:

Step 3 – Let’s stop PROD1 and see how the services behave with the current configurations.


[oracle@rac1 ~]$ srvctl stop instance -d PROD -i PROD1 -failover -force

[oracle@rac1 ~]$ srvctl status service -d PROD
Service CAT is not running.
Service DOG is running on instance(s) PROD2
Service PIG is running on instance(s) PROD2

Note the following:

-CAT did not failover to PROD2 because PROD2 was not configured as an available instance. To avoid lost of service to CAT when we shutdown PROD1 instance, we must designate PROD2 as an available instance.

-DOG automatically failed over to its’ available instance – PROD2.

-PIG is already running on PROD2.

-failover option is required for 12c and higher for the service to automatically failover to the available instance.

-force option is required for Oracle RAC 12c and higher. Otherwise you get the following errors:


[oracle@rac1 ~]$ srvctl stop instance -d PROD -i PROD1
PRCD-1131 : Failed to stop database PROD and its services on nodes rac1
PRCR-1133 : Failed to stop database PROD and its running services
PRCR-1132 : Failed to stop resources using a filter
CRS-2974: unable to act on resource 'ora.prod.db' on server 'rac1' because that would require stopping or relocating resource 'ora.prod.dog.svc' but the -force option was not specified

Step 4 – The services are brought back to the way they were when we first started so we can see how to correct the problem with CAT not failing over to PROD2.


[oracle@rac1 ~]$ srvctl status service -d PROD
Service CAT is running on instance(s) PROD1
Service DOG is running on instance(s) PROD1
Service PIG is running on instance(s) PROD1,PROD2

This is our current configuration:


[oracle@rac1 ~]$ srvctl config service -d PROD | egrep "Service name|Pref|Avail"
Service name: CAT
Preferred instances: PROD1
Available instances:
Service name: DOG
Preferred instances: PROD1
Available instances: PROD2
Service name: PIG
Preferred instances: PROD1,PROD2
Available instances:

[oracle@rac1 ~]$ srvctl status db -d PROD
Instance PROD1 is running on node rac1
Instance PROD2 is running on node rac2

Let’s try to relocate CAT to PROD2:


[oracle@rac1 ~]$ srvctl relocate service -d PROD -s CAT -oldinst PROD1 -newinst PROD2
PRCD-1346 : failed to relocate services of database PROD
PRCR-1089 : Failed to relocate resource ora.prod.cat.svc.
CRS-2717: Server 'rac2' is not in any of the server pool(s) hosting resource 'ora.prod.cat.svc'

It will not allow relocation of CAT to PROD2 unless we configure PROD2 as an available instance. Let’s correct that:


[oracle@rac1 ~]$ srvctl modify service -modifyconfig -d PROD -s CAT -preferred PROD1 -available PROD2

[oracle@rac1 ~]$ srvctl config service -d PROD -s CAT | egrep "Service name|Pref|Avail"
Service name: CAT
Preferred instances: PROD1
Available instances: PROD2

PROD2 is configured as an available instance. We can either let it automatically failover to PROD2 or we can manually relocate the service to PROD2 before shutting down rac1:


[oracle@rac1 ~]$ srvctl relocate service -d PROD -s CAT -oldinst PROD1 -newinst PROD2

[oracle@rac1 ~]$ srvctl status db -d PROD
Instance PROD1 is running on node rac1
Instance PROD2 is running on node rac2

[oracle@rac1 ~]$ srvctl status service -d PROD
Service CAT is running on instance(s) PROD2
Service DOG is running on instance(s) PROD1
Service PIG is running on instance(s) PROD1,PROD2

[oracle@rac1 ~]$ srvctl stop instance -d PROD -i PROD1 -failover -force

[oracle@rac1 ~]$ srvctl status service -d PROD
Service CAT is running on instance(s) PROD2
Service DOG is running on instance(s) PROD2
Service PIG is running on instance(s) PROD2

Summary:

In Oracle RAC, you have the option to configure the services to run on one or more instances. In addition, you can have one or more available instances. If you configure a service to run only on one node and you do not configure an available node, it will not be able to failover surviving instance.