High Availability Deployment Components
App Cluster: The application layer is composed of two Chronicle SOAR application servers that do not store any persistent data and are able to switch between nodes. The process of switching between the application nodes can take up to a few minutes. In cloud environments, the entry point will be a load balancer, and on the on-prem topology a virtual IP will be used. On both cloud & On-prem, the switch between cluster nodes is controlled by CoroSync and Pacemaker.
DB Cluster: The database layer is composed of two database servers that are being replicated at all times using an open-source replication tool called Repmgr for PostgreSQL. The replication uses the same port that the application is using (e.g. 5432) and ssh (e.g 22). The application failover uses /etc/hosts file in order to pass the app the primary db connection IP. The database layer includes an automatic failover capability that will detect if the primary server is down for some reason – and it will promote the standby server to become the primary. Once this process is done (called promotion) the “fallen” primary can be returned to the cluster as a standby server and not as the primary again – this is done to prevent recurring errors on that server. The application uses a script (built into the application server) that will always detect which is the primary server and then connect to it
The access to the app cluster is carried out through a balancer or Virtual IP (depends on cloud/on-prem). The master app node connects to the DB and performs data operations. In case of an app failure – the second app server becomes master and begins to provide the service. In case of a DB failure, the secondary DB will become the master and the main app node will re-establish a connection to the DB server and will continue to supply the service.