Nomad集群 自身高可用测试
Nomad集群 自身高可用测试1.搭建nomad集群本测试使用三台ubuntu18.04虚拟机,IP地址分别为:虚拟机1:192.168.60.10虚拟机2:192.168.60.11虚拟机3:192.168.60.12搭建方法参考Nomad Consul搭建集群,去除consul的部分即可。三台虚拟机分别创建/etc/nomad.d/nomad_test.hcl:datacenter = "dc
Nomad集群 自身高可用测试
1.搭建nomad集群
本测试使用三台ubuntu18.04虚拟机,IP地址分别为:
虚拟机1:192.168.60.10
虚拟机2:192.168.60.11
虚拟机3:192.168.60.12
每个虚拟机上都运行一个Nomad server 和一个Nomad client,每一个server对应一个client,而三个server中会有一个leader。
搭建方法参考Nomad Consul搭建集群,去除consul的部分即可。
三台虚拟机分别创建/etc/nomad.d/nomad.hcl
:
datacenter = "dc1"
data_dir = "/home/xxx/nomad/data" #自己修改路径
plugin "raw_exec" {
config {
enabled = true
}
}
server {
enabled = true
bootstrap_expect = 2
server_join {
retry_join = ["192.168.60.10:4648","192.168.60.11:4648","192.168.60.12:4648"]
}
}
client {
enabled = true
servers = ["192.168.60.10:4647"]
#虚拟机1为此值,虚拟机2为192.168.60.11:4647,虚拟机3为192.168.60.12:4647
}
bootstrap_expect = 2是集群允许server的最小数量,实际server数量可以大于等于它。
当有效server数量低于bootstrap_expect,则集群会一直等待恢复,正在运行的任务不变,也不会重调度。
但要注意,bootstrap_expect的数量代表了实际工作的server数量,只有它们对应的client才会被分配任务。
为了测试高可用,需要关掉一个server,所以集群server总数为3的情况下bootstrap_expect要设为2(不能大于2),否则可能无法继续运行。
nomad_test.hcl文件中的注释可能需要去掉。
三个虚拟机都执行sudo nomad agent -config /etc/nomad.d
修改hcl配置文件时,注意先把data文件删掉或者修改data路径
查看集群情况,ubuntu1中的server是leader:
ubuntu1@ubuntu1:~$ nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
ubuntu1.global 192.168.60.10 4648 alive true 2 1.1.3 dc1 global
ubuntu2.global 192.168.60.11 4648 alive false 2 1.1.3 dc1 global
ubuntu3.global 192.168.60.12 4648 alive false 2 1.1.3 dc1 global
ubuntu1@ubuntu1:~$ nomad node status
ID DC Name Class Drain Eligibility Status
f340253b dc1 ubuntu1 <none> false eligible ready
d7dc7cb2 dc1 ubuntu2 <none> false eligible ready
a9470c7f dc1 ubuntu3 <none> false eligible ready
2.测试driver=docker
以docker的http-echo镜像为例,创建httpecho.nomad
:
job "docs" {
datacenters = ["dc1"]
group "example" {
count = 1
network {
port "http" {
static = "5678"
}
}
restart {
attempts = 1
interval = "30m"
delay = "2s"
mode = "fail"
}
reschedule {
attempts = 15
interval = "1h"
delay = "5s"
delay_function = "constant"
unlimited = false
}
task "server" {
driver = "docker"
config {
image = "hashicorp/http-echo"
ports = ["http"]
args = [
"-listen",
":5678",
"-text",
"hello world",
]
}
}
}
}
在ubuntu1中执行nomad job run httpecho.nomad
,结果如下:
ubuntu1@ubuntu1:~$ nomad job run httpecho.nomad
==> 2021-08-27T13:51:14+08:00: Monitoring evaluation "ac5b793f"
2021-08-27T13:51:14+08:00: Evaluation triggered by job "docs"
==> 2021-08-27T13:51:15+08:00: Monitoring evaluation "ac5b793f"
2021-08-27T13:51:15+08:00: Evaluation within deployment: "540742e4"
2021-08-27T13:51:15+08:00: Allocation "c1c145f8" created: node "a9470c7f", group "example"
2021-08-27T13:51:15+08:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-27T13:51:15+08:00: Evaluation "ac5b793f" finished with status "complete"
==> 2021-08-27T13:51:15+08:00: Monitoring deployment "540742e4"
✓ Deployment "540742e4" successful
2021-08-27T13:51:42+08:00
ID = 540742e4
Job ID = docs
Job Version = 4
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
example 1 1 1 0
2021-08-27T14:01:40+08:00
用sudo docker ps
发现job运行在ubuntu1的client上:
ubuntu1@ubuntu1:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0ea066a3b0bc hashicorp/http-echo "/http-echo -listen …" 53 seconds ago Up 52 seconds 192.168.60.10:5678->5678/tcp, 192.168.60.10:5678->5678/udp server-c1c145f8-f3f9-efcf-14b2-13651aa2c6e2
可以在ubuntu2/3执行http 192.168.60.10:5678
(没有http可以sudo apt-get install httpie
):
ubuntu2@ubuntu2:~$ http 192.168.60.10:5678
HTTP/1.1 200 OK
Content-Length: 12
Content-Type: text/plain; charset=utf-8
Date: Fri, 27 Aug 2021 05:54:36 GMT
X-App-Name: http-echo
X-App-Version: 0.2.3
hello world
现在我们将ubuntu1中运行着sudo nomad agent -config /etc/nomad.d
的终端结束掉,发现ubuntu1的状态变成了left,ubuntu3变成了leader。
ubuntu2@ubuntu2:~$ nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
ubuntu1.global 192.168.60.10 4648 left false 2 1.1.3 dc1 global
ubuntu2.global 192.168.60.11 4648 alive false 2 1.1.3 dc1 global
ubuntu3.global 192.168.60.12 4648 alive true 2 1.1.3 dc1 global
查看ubuntu1的docker执行状态,发现httpecho还存在,功能也可用,说明关闭server不会对client执行的docker造成影响。
ubuntu1@ubuntu1:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0ea066a3b0bc hashicorp/http-echo "/http-echo -listen …" 13 minutes ago Up 13 minutes 192.168.60.10:5678->5678/tcp, 192.168.60.10:5678->5678/udp server-c1c145f8-f3f9-efcf-14b2-13651aa2c6e2
但是我们看其他两个虚拟机的docker 执行情况,发现ubuntu2中也起了一个httpecho,ubuntu3没有变化:
ubuntu2@ubuntu2:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7c228d09421 hashicorp/http-echo "/http-echo -listen …" 2 minutes ago Up 2 minutes 192.168.60.11:5678->5678/tcp, 192.168.60.11:5678->5678/udp server-f87ef72a-3ee0-b5c8-6f1e-0469e66d77b1
说明当运行着job的client所对应的server被关闭时,其上的job会在其他的clinet上启动。并且是否为leader和是否被选择执行job没有必然联系。
这时,我们再次在ubuntu1上开启server,sudo nomad agent -config /etc/nomad.d
发现原本仍然运行的ubuntu1中的httpecho被终止,而ubuntu2、3中的不变。
ubuntu1@ubuntu1:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ubuntu2@ubuntu2:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e7c228d09421 hashicorp/http-echo "/http-echo -listen …" 13 minutes ago Up 13 minutes 192.168.60.11:5678->5678/tcp, 192.168.60.11:5678->5678/udp server-f87ef72a-3ee0-b5c8-6f1e-0469e66d77b1
ubuntu3@ubuntu3:~$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
说明被关闭的server被恢复之后,job的执行会保持现状,并不会恢复到关闭之前的状态,多余的job会被关闭。
3.测试driver=raw_exec
编写简单程序main.c
:
#include <stdio.h>
int main()
{
while(1){sleep(10);}
return 0;
}
gcc main.c -o main.out
编译。
sudo cp main.out /usr/
拷贝到/usr下面,三个虚拟机都要完成以上操作。
创建exec.nomad
:
job "exec" {
datacenters = ["dc1"]
type = "batch"
group "execg" {
count = 1
restart {
attempts = 1
interval = "30m"
delay = "0s"
mode = "fail"
}
reschedule {
attempts = 15
interval = "1h"
delay = "5s"
delay_function = "constant"
unlimited = false
}
task "exect" {
driver = "raw_exec"
config {
command = "/usr/main.out"
}
}
}
}
在ubuntu1中执行nomad job run exec.nomad
,结果如下:
ubuntu1@ubuntu1:~$ nomad job run exec.nomad
==> 2021-08-30T16:08:45+08:00: Monitoring evaluation "562ed36f"
2021-08-30T16:08:45+08:00: Evaluation triggered by job "exec"
==> 2021-08-30T16:08:46+08:00: Monitoring evaluation "562ed36f"
2021-08-30T16:08:46+08:00: Allocation "b798e296" created: node "d7dc7cb2", group "execg"
2021-08-30T16:08:46+08:00: Evaluation status changed: "pending" -> "complete"
==> 2021-08-30T16:08:46+08:00: Evaluation "562ed36f" finished with status "complete"
用ps -ef
发现job运行在ubuntu2的client上:
ubuntu2@ubuntu2:~$ ps -ef
...
root 41559 2 0 16:07 ? 00:00:00 [kworker/u256:1-]
root 41599 40421 0 16:08 pts/0 00:00:00 /usr/bin/nomad logmon
root 41608 40421 0 16:08 ? 00:00:00 /usr/bin/nomad executor {"LogF
root 41616 41608 0 16:08 ? 00:00:00 /usr/main.out
ubuntu2 41633 40566 0 16:09 pts/3 00:00:00 ps -ef
现在我们将ubuntu2中运行着sudo nomad agent -config /etc/nomad.d
的终端结束掉,发现ubuntu2的状态变成了left。
ubuntu1@ubuntu1:~$ nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
ubuntu1.global 192.168.60.10 4648 alive false 2 1.1.3 dc1 global
ubuntu2.global 192.168.60.11 4648 left false 2 1.1.3 dc1 global
ubuntu3.global 192.168.60.12 4648 alive true 2 1.1.3 dc1 global
查看ubuntu2的进程执行状态,发现main.out进程还存在,说明关闭server不会对client执行的exec造成影响。
但是我们看其他两个虚拟机的进程执行情况,发现ubuntu1中也起了一个main.out进程,ubuntu3没有变化:
ubuntu1@ubuntu1:~$ ps -ef
root 85269 75952 0 16:12 pts/1 00:00:00 /usr/bin/nomad logmon
root 85278 75952 0 16:12 ? 00:00:00 /usr/bin/nomad executor {"LogF
root 85286 85278 0 16:12 ? 00:00:00 /usr/main.out
root 85460 2 0 16:12 ? 00:00:00 [kworker/u256:1-]
ubuntu1 85576 76826 0 16:13 pts/3 00:00:00 ps -ef
说明当运行着任务的client所对应的server被关闭时,其上的任务会在其他的clinet上启动。
这时,我们再次在ubuntu2上开启server,sudo nomad agent -config /etc/nomad.d
发现原本仍然运行的ubuntu2中的main.out进程被终止,而ubuntu1、3中的不变。
说明被关闭的server被恢复之后,任务的执行会保持现状,并不会恢复到关闭之前的状态,多余的任务会被关闭。
更多推荐
所有评论(0)