Keepalived监控postgresql进程自动切换ip的脚本
keepalived通过vrrp协议实现抢占式、非抢占式的路由切换功能。
在同一个网段内,管理多个主机的虚拟ip绑定到同一个公网ip,使得这些主机形成一个master主机多个backup备机的工作状态;
假如当前master主机出现故障,keepalived自动完成公网ip绑定到这些主机其中的一台backup备机上,使得对公网外的客户机,可以正常 访问服务,主机的故障对外没有影响,从而实现主机宕机故障转移的需求。例如web服务器的场景,每台主机上运行着nginx服务器, 当前master上的ngxin故障后,keepalived自动把流量访问到另外一台backup备机的nginx上。
在高可用场景,每个服务都应该配置一个备机,实现业务的可靠稳定性。例如postgresql的部署,它本身具备一台primary主机和多台standby备机 的部署机制。当primary主机故障发生后无法提供服务,业务需求马上能把备机投入到业务中,充当起主机的工作职位。
本文就是通过脚本的方式,监控postgresql的进程工作状态,一旦postgresql进程停止,部署的repmgrd进程会通知提升另外一台postgresql备机充当 主机提供服务。然后,keepalived 同时也监听到了postgresql进程状态,把ip切换到新主机上,继续提供业务服务。
#!/bin/sh
#############################################################################
# Author: MaXiang
# Created: 2020/09/01
# ToDo:
# Daemon for monitoring keepalived failover scripts
#############################################################################
SCR_DIR='/home/postgres'
KEEP_MGR=`ps -ef|grep 'keepalived_repmgr.sh'|grep -v grep|wc -l`
if [[ ${KEEP_MGR} -lt 1 ]]; then
nohup /bin/sh ${SCR_DIR}/keepalived_repmgr.sh >> ${SCR_DIR}/keepalived_repmgr.log &
fi
这里需要先配置 postgresql 主机之间可以通过ssh互信访问,即ssh IP地址 能访问到对方;
#!/bin/sh
#############################################################################
# Author: MaXiang
# Created: 2020/09/01
# ToDo:
# Created for monitoring keepalived for PostgreSQL repmgr cluster
# When production shutdown with errors,it needs about 10s to failover.
#############################################################################
NODES='pg1 pg2 pg3'
VIP='192.168.56.204'
PGDATA=/PostgreSQL/data
PGHOME=/PostgreSQL
REPMGR_CONF='/etc/repmgr.conf'
KEEP_CONF='/etc/keepalived/keepalived.conf'
# state time function to write logfiles
function log_time(){
date '+%Y-%m-%d %H:%M:%S '
}
echo "$(log_time) Repmgr and KeepAlived auto-failover monitoring start..."
while true
do
for NODE in ${NODES}
do
KEEP_STA=`ssh $NODE "/usr/sbin/ip a|grep -w ${VIP}|wc -l"`
if [[ ${KEEP_STA} -eq 1 ]]; then
KEEP_MASTER=$NODE
echo "$(log_time) KeepAlived Master is ${KEEP_MASTER}."
fi
done
for NODE in ${NODES}
do
MGR_STA=`ssh $NODE "$PGHOME/bin/pg_controldata -D ${PGDATA}|grep 'in production'|wc -l"`
if [[ ${MGR_STA} -ne 1 ]]; then
continue
else
MGR_PRIMARY=$NODE
echo "$(log_time) Repmgr Cluster Primary is ${MGR_PRIMARY}."
break
fi
done
sleep 1s
if [[ ${KEEP_MASTER} == ${MGR_PRIMARY} ]]; then
echo "$(log_time) High Available Architecture running normally."
else
echo "$(log_time) KeepAlived Master is ${KEEP_MASTER}, but Repmgr Primary is ${MGR_PRIMARY}."
for NODE in ${NODES}
do
case $NODE in
${MGR_PRIMARY} )
ssh ${NODE} "sed -i 's/priority .*/priority 100/g' ${KEEP_CONF}"
ssh ${NODE} "sudo systemctl restart keepalived"
echo "$(log_time) Modify KeepAlived priority for ${NODE} to 100."
;;
${KEEP_MASTER} )
ssh ${NODE} "sed -i 's/priority .*/priority 60/g' ${KEEP_CONF}"
ssh ${NODE} "sudo systemctl restart keepalived"
echo "$(log_time) Modify KeepAlived priority for ${NODE} to 60."
;;
* )
ssh ${NODE} "sed -i 's/priority .*/priority 80/g' ${KEEP_CONF}"
ssh ${NODE} "sudo systemctl restart keepalived"
echo "$(log_time) Modify KeepAlived priority for ${NODE} to 80."
;;
esac
done
SECS=10
while [[ ${SECS} -ge 0 ]]
do
echo "$(log_time) Waiting for failover to target: ${SECS}s"
let SECS--
if [[ ${SECS} -eq 0 ]]; then
break
fi
sleep 1s
done
echo "$(log_time) Failover KeepAlived to ${MGR_PRIMARY} completed."
fi
sleep 2s
done
好了,本文主要功能描述介绍,希望能帮助到你。