There has been a decision to not implement High Availability (HA) leadership or clustering for PayGate. Resiliency and availability requirements are recommended on the underlying database and deployment of Moov’s stack. This is done for a few reasons:
Transfer
objects for set of ABA routing numbers.Given these assumptions we’ve chosen to focus PayGate’s vertical scaling (add CPUs and memory) instead of clustering. Currently two instances of PayGate might be able to build and upload files for the same destination ABA routing numbers, but that setup is not officially supported. That coordination would rely on the database across multiple readers.
PayGate has two flavors of dependencies “CPU based in-memory” and “REST and database” servers along with a database (SQLite or MySQL).
The underling database PayGate is using will need to be deployed in an acceptable manor for replication, failure recovery, and backups. SQLite replication (possibly implemented via rqlite) has not been tested, but looks promising.
If we implemented HA for multiple PayGate instances we would likely elect a leader for each routing number configured. This would mean elections and heartbeats for each routing number along with stashing knowledge in PayGate when a leader was unresponsive. The implementation details of that seem a lot higher than adding CPU’s at the time of writing.