Redundancy > Redundancy Overview > Other Failover Considerations

Other Failover Considerations

The following miscellaneous items should be noted when performing a failover:

Failover Security

A user needs to have Admin access for the CHANGE event on the RSM to initiate a failover.

See the SVCINFO Security Event for the applicable CygNet Service in the Security Reference topics for information about security access levels regarding change queue translations.

Failover and Full Data Synchronization

Note the following constraints related to full syncs:

Delay ARS Startup After Failover

If a failed redundant server is starting back up, a race condition may exist between the ARS doing a sync to get the latest settings, and the starting services registering with the ARS. A keyword, WAIT_TIME_FOR_FIRST_SYNC, is available in the ARS service configuration file (Ars.cfg) to specify the number of seconds to delay the startup of the ARS, so that the service can synchronize with other ARS services on the same domain. Setting this keyword may prevent multiple services starting up and running on the same domain for a short time after a hard failover.

For services in a redundant environment we recommend you start with a value of 30, then consider adjusting as needed. For non-redundant services, we recommend you leave the value at zero.

Read-Only "Freezing" State

To prevent data loss during a soft (manual or planned) failover the RSM will trigger each redundant service that is active in its failover set to transition into a freezing state where all internal processing will begin to cease, followed by a read-only or frozen state where all internal processing has stopped. Services will cease all processing while in the read-only state to prevent further changes to the database. For example, the HSS will stop all scripts, the OPCIS will stop all processing, the SVCMON will stop all timers, and the UIS will stop all polling and response processing. All services will be evaluated and any other internal processing will be aborted.

The FAILOVER_STATUS info item (SVMFOSTATU) is available to monitor each service as it cycles through the various failover states. The RSM will also handle the case where a service fails to fully enter the read-only state, for whatever reason, and FAILOVER_STATUS will be set to "Error" causing the failover to abort.

Note: In the case of a hard failover, the RSM will skip this step and the failover will proceed.

Each service will log these transition states (pre-failover, waiting, freezing, frozen, error, etc.) to its service log file. Any message sent to a real-only service will be denied and an error will be logged. The length of time each service takes to transition from “freezing” to “frozen” depends on the size of the service and the internal processing taking place at the time the failover is triggered.

Auditing Failover

The master RSM in the redundancy environment audits and logs several details about each failover event, including:

The FailoverEventLog.log file found in the RSM folder for every redundant site logs the failover process and can be used to troubleshoot problems. This file is persisted between failover events, providing a historical record that exists beyond the last failover. A new file is created when the FailoverEventLog.log reaches 1MB and a maximum of five files are retained.

Failover Notifications

A GNS running on a Bastion host server can be configured to send notifications to alert users that a failover is occurring within the redundancy environment. Several failover-related attributes are passed from the redundant RSM to the reporting GNS, which resolve to several dynamic tokens. The tokens can be used to configure a notification message for a GNS Event record in the GNS message editor. See Sending Failover Notifications for more information.

Persisted Data After a Failover

As services shutdown as part of a failover, many services will persist data that needs to be sent to another service once services restart. These services may have data records they can't immediately send due to the shutdown, for example, any service may have cached AUD entries, or the CAS may have cached ELSALM entries. As these services restart these data records will be sent to the appropriate services on the domain where they were previously running.

Using CygNet Host Updater After a Failover

Users need to be cautious when running the CygNet Host Updater utility in an environment running multiple domains that have been fully or partially failed over. It is important to know that CygNet Host Updater is domain-aware and that awareness is of the domain originally installed on the host server, not the domain of the server that is currently running.

In addition to the CygNet Host Updater’s primary function of copying new CygNet files to the local host where the utility is running and dropping files into the APPS service folder, the utility also talks (via CygNet messaging) to any live APPS services in the CygNet environment.

After a full or partial failover, the domain on the server could be different to the one originally installed. CygNet Host Updater (and most other CygNet utilities) assumes that the domain on which it communicates, is the original domain on which the server was installed. After a failover the server is now running on a different domain, and CygNet Host Updater assumes it is talking to services running on the local host, when in fact it may be actually talking to services running on a different host.

This behavior results in CygNet Host Updater doing messaging to one set of services, and updating the local APPS files for a different set of services, which may be not as expected.

Measurement Redundancy

The Measurement service (FMS) can only be configured for redundancy between data centers. The FMS only supports data-center failover; it does not support local redundancy. Local redundancy can be achieved with Microsoft SQL server replication and/or clustering.

A redundant RSM will not start an FMS on a domain defined as in the local standby role. If you are configuring a local failover with a site that includes an FMS, the FMS can be included, but the local standby FMS service will not start. See Configuring FMS Redundancy for more information.

Monitoring Device Cryouts

During failover, the IP address of the CygNet server receiving a cryout message will change. Monitoring device cryouts may require additional configuration outside of the CygNet environment to accommodate failover. See TCP/IP Messaging and the Cryout Listen EIE for more details about TCP/IP messaging and cryouts.

Redundancy and VHS Data Forwarding

Both the active and standby VHS in a Redundancy server pair can be configured for VHS data forwarding. This allows an active service in a redundancy pair to forward data to another destination VHS keeping both services in sync. While this configuration allows the standby VHS in the redundancy pair to be configured for data forwarding, forwarding will be ignored and no data will be pushed unless the VHS is the active service of the pair.

See VHS Data Forwarding for more information.

Back to top

Let us know how we can improve this topic.

CygNet at weatherford.com

© 2020 Weatherford. All rights reserved.