1. 需求

家里厅里有三台linux主机在跑虚拟机, 一台windows主机在跑wsl2 - ubuntu 20.04

分别是

硬件网络连接方式OSip虚拟机s
EUC i5 7250U 16Gwifiwin1010.0.1.223wsl2 - 随机ip
MineFine S500 R7 5800H 64G网线Zorin OS 16.2 (Ubuntu 20.04 LTS)10.0.1.198vm1 - 10.0.1.156
vm2 - 10.0.1.157
vm3 - 10.0.1.158
新创云 i7 5500U 8G网线Ubuntu server 22.04 LTS10.0.1.107vm0 - 10.0.1.151
vm1 - 10.0.1.152
ThinkPad X230 i5 3210M 16G网线Ubuntu server 22.04 LTS10.0.1.22vm0 - 10.0.1.154

长期开着4台服务器不划算, 大部分功能都配置在wsl2, 也就是EUC i5上, 其余3台linux服务器只是为了跑K8S 项目, 平时应该关注。

我这边电费每度电0.61 元, 假如每台linux服务待机40W, 那么一天的电费是0.04 * 24 * 3 = 1.76 元, 一年就是642 元!

所以需求是:

在wsl2 的Jenkins 创建两个job, 分别控制 其他 3台linux主机的开关机, 关机前必须先关闭所有运行的vm, 开机后开启所有的vm。



2. 前置条件

远程唤醒(开机)有两个条:

  1. 主机必须用网线连接, wifi 下是不支持唤醒的

  2. 触发唤醒的主机和被唤醒的主机必须在同1个网段。 这导致1个问题, 因为我的Jenkins job是跑下wsl2下的, 而wsl2 的ip是不能与宿主的win10 系统同1个网段的, 平时访问必须进行端口转发。 这代表我们不能直接在wsl2 去唤醒其他主机, 必须利用 win10系统去执行唤醒的命令。

    也就是 客户机 -> wsl2 jenkins -> win10 系统 -> 唤醒 其他linux



3. 检查linux主机的有线网卡设置, Mac地址, 是否已经开启支持唤醒功能

首先先确定linux主机正在用哪个有线端口工作, 和它的Mac地址

ifconfig
cat /etc/netplan/00-installer-config.yaml 

然后检查对应有线网卡的wakeonlan 功能是否被开启

gateman@MoreFine-S500:~$ sudo ethtool eno1
Settings for eno1:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	                        2500baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Full 
	                        2500baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Link partner advertised link modes:  10baseT/Half 10baseT/Full 
	                                     100baseT/Half 100baseT/Full 
	                                     1000baseT/Full 
	Link partner advertised pause frame use: Symmetric Receive-only
	Link partner advertised auto-negotiation: Yes
	Link partner advertised FEC modes: Not reported
	Speed: 1000Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	MDI-X: Unknown
	Supports Wake-on: pumbg
	Wake-on: d
	Link detected: yes

首先这个sudo ethtool xxx 命令必须用root 执行, 否则不会显示wakeonlan 的设置
Supports Wake-on: pumbg --> 只要带有字符g, 代表这个块网卡支持网络唤醒, 否则需要在主板bios开启wakeonlan功能
Wake-on: d -> d 代表未开启, g代表开启



4. 开启网卡wakeonlan 功能

gateman@MoreFine-S500:~$ sudo ethtool -s eno1 wol g

注意上面的命令只是临时开启wol g, 重启后很可能会变会关闭状态.
这是我们必须修改网卡的设置, 令到它开机后就设置为 wol g
在netplan的配置文件中加上wakeonlan: true

gateman@MoreFine-S500:~$ sudo vi /etc/netplan/00-installer-config.yaml 
# it is the network config written by 'subiquity'
network:
  ethernets:
    eno1:
      dhcp4: false
      wakeonlan: true
    enp2s0:
      dhcp4: false
    wlp3s0:
      dhcp4: false
  bridges:
    br0:
      interfaces: [eno1]
      dhcp4: no
      addresses: [10.0.1.198/24]
      routes:
        - to: default
          via: 10.0.1.1
      nameservers:
        addresses: [119.29.29.29, 8.8.8.8]
  version: 2

记得应用设置

sudo netplan apply



5. 安装唤醒工具Fing

做好上面第4步后, 这是我们应该可以用其它同网段linux主机去执行下面的命令去唤醒其他主机

wakeonlan xx:xx:xx:xx:xx:xx # -- 网卡mac地址

但是很奇怪, 上面的命令不能唤醒我的x230,应该是还要加上别的什么参数。

但是这个世界有个很好用的网络工具(基本功能免费)Fing。
安装它后,发现x230 可以被唤醒了。

更关键的是Fing 提供windows下和linux下的命令行工具(CLI), 方便整合开发。

网址:
https://www.fing.com/products/development-toolkit

因为我实际要在windows服务器去唤醒(wsl2 不在同一个网段), 所以下载window版本
在这里插入图片描述
安装后, 可以用下面的命令来测试唤醒机器

fing --wol 70:70:fc:00:85:5b@10.0.1.198/24  # Mac地址 和ip都需要提供



6. 设置Ansible 的host 列表

其实到上面第5步为止, 我们的远程开关机已经打通了,
下面的步骤只是为了在jenkins上配1个开机和1个关机的job

我的Jenkins是跑在wsl2上的, 首先要把3个物理linux主机的ip 分在同1个组

vi /etc/ansible/hosts
[physical_servers]
10.0.1.107
10.0.1.122
10.0.1.198

当然不要忘了把wsl2的 ssh key 安装到3台服务器上

ssh-copy-id -i ~/.ssh/id_rsa.pub gateman@x230

接下来我们就可以利用Ansible处理这个组了, 不用单独去对每台主机做处理。



7. 关机job – 编写linux脚本 和 ansible playboook for 关闭主机上的所有kvm虚拟机

shellscript:

gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat shutdown_all_vms.sh
#!/bin/bash
echo "=== shutting down all kvm vms ===\n"
for i in $(virsh list | grep running | awk '{print $2}');
do
  echo "shutting down " $i
  virsh shutdown $i;
done
sleep 10 # sleep 10 seconds
virsh list --all

playbook:
无费两个步骤, 1, 上传脚本, 2. 执行脚本

gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat shutdown_all_vms.yml
---
- hosts: "{{servers}}"
  remote_user: "{{ansible_user}}"
  gather_facts: false
  tasks:
  - name: print debug msg for paramaters
    debug:
      msg:
        - "servers is: {{servers}}"
        - "ansible_user is: {{ansible_user}}"

  - name: copy shutdown vm shell script to remote server
    copy:
      src: ./shutdown_all_vms.sh
      dest: /tmp
      backup: no
      mode: 0775

  - name: excute the shutdown vm script
    script: ./shutdown_all_vms.sh
    args:
      chdir: /tmp
    environment: # https://stackoverflow.com/questions/59522902/vms-are-not-visible-to-virsh-command-executed-using-ansible-shell-task
      LIBVIRT_DEFAULT_URI: qemu:///system
    register: cmdresult

  - name: show stdout cmdresult
    debug:
      msg: "{{ cmdresult.stdout }}"

  - name: show stderr cmdresult
    debug:
      msg: "{{ cmdresult.stderr }}"



8. 关机job – 设置免密码sudo

由于shutdown命令需要root
有两个方案, 要么用root去执行ansible, 代表必须安装ssh key 到3台主机的root账号下, 太危险
另1个方案就是令到普通用户可以免密码执行sudo (前提是在sudo group)

这里选择第2个

vi /etc/sudoers # 修改下面这一行
# Allow members of group sudo to execute any command
%sudo   ALL=(ALL:ALL) NOPASSWD:ALL

这样sudo 用户可以用 sudo shutdown -h now 来关机



9. 关机job – 编写ansible playboook for 关闭物理主机

只是简单地去执行 sudo shutdown -h now 命令, 记得加上ignore_unreachable: true, 否则ansible会认为这个job执行失败。

gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat shutdown_server.yml 
---
- hosts: "{{servers}}"
  remote_user: "{{ansible_user}}"
  gather_facts: false
  tasks:
  - name: print debug msg for paramaters
    debug:
      msg:
        - "servers is: {{servers}}"
        - "ansible_user is: {{ansible_user}}"


  - name: excute the shutdown vm script
    shell: sudo shutdown -h now
    ignore_errors: true
    ignore_unreachable: true # this will work for shutdown cases



10. 关机job – 编写Jenkins jobs 去关闭主机

首先先写1个common的执行某个ansible playbook的job

pipeline {

    agent { node { label 'master' } }

    stages {
     
        stage('display parameters') {
            steps {
                echo "servers is ${servers}"
                echo "ansible_user is ${ansible_user}"
                echo "playbook_path is ${playbook_path}"
            }
        }

       stage('run playbook'){
            steps {
                script {
                   sh "ansible-playbook -e \"servers=${servers} ansible_user=${ansible_user}\" -vv ${playbook_path}"
                }
            }
        }
    }
    
    post {
       
       failure {
           emailext (
               subject: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",
               body: """<p>FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':</p>
                   <p>Check console output at "<a href="${env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]</a>"</p>""",
               to: "nvd11@163.com",
               from: "nvd11@163.com"
           )
       }        
    }
}

再写1个关机的job
无非分两步, 一是去执行第1个playbook 去关闭所有的vm, 第二是去关闭物理机

def shut_vm_plybk = "/opt/apps/playbooks/remoteserver/shutdown_all_vms.yml"
def shut_server_plybk = "/opt/apps/playbooks/remoteserver/shutdown_server.yml"

pipeline {

    agent { node { label 'master' } }

    stages {

        stage ('display parameters') {
            steps {
                echo "servers is ${servers}"
                echo "ansible_user is ${ansible_user}"
            }
        }

        stage ('shutdown all vms in the servers') {
            steps {
                build job: 'common_run_playbook', 
                    parameters: [[$class: 'StringParameterValue', name: 'servers', value: "${servers}"],
                                 [$class: 'StringParameterValue', name: 'ansible_user', value: "${ansible_user}"],
                                 [$class: 'StringParameterValue', name: 'playbook_path', value: "${shut_vm_plybk}"]
                                ]
            }
        }
        
        stage ('shutdown physical server') {
            steps {
                build job: 'common_run_playbook', 
                    parameters: [[$class: 'StringParameterValue', name: 'servers', value: "${servers}"],
                                 [$class: 'StringParameterValue', name: 'ansible_user', value: "${ansible_user}"],
                                 [$class: 'StringParameterValue', name: 'playbook_path', value: "${shut_server_plybk}"]
                                ]
            }
        }
        
    }
    
    post {
       success {
           emailext (
               subject: "SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",
               body: """<p>SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':</p>
                   <p>Check console output at "<a href="${env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]</a>"</p>""",
               to: "nvd11@163.com",
               from: "nvd11@163.com"
           )
       }
       
       failure {
           emailext (
               subject: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",
               body: """<p>FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':</p>
                   <p>Check console output at "<a href="${env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]</a>"</p>""",
               to: "nvd11@163.com",
               from: "nvd11@163.com"
           )
       }        
    }
    
}



11. 开机job – 编写Jenkins 脚本for 远程唤醒物理机.

接下来就是开机job了
由于远程唤醒前, servers都不在线, 我们是无法用ansible去处理的。

所以接下来这个job要做三步

  1. 利用ansible --list-host 命令去获得 /etc/ansible/hosts 设置中 physical_servers 组中的ip列表
  2. 根据ip列表去获得mac(定义1个 mac地址的map)
  3. 执行windows 下的 fing.exe 去唤醒ip(循环)
def playbook_path="/opt/apps/playbooks"
def ip_list_str = ""
def ip_list = []
def mac_map = ["10.0.1.107":"00:e0:0a:f2:12:26",
               "10.0.1.122":"3c:97:0e:59:14:87",    
              ]
def start_vm_plybk = "/opt/apps/playbooks/remoteserver/startup_all_vms.yml"


pipeline {

    agent { node { label 'master' } }

    stages {

       stage('display parameters') {
            steps {
                echo "servers is ${servers}"
                echo "ansible_user is ${ansible_user}"
            }
        }

        stage('use ansible --list host command to get the ip list'){
            steps {
                
                script {
                   ip_list_str = sh(returnStdout: true, script: 'ansible physical_servers --list-host | grep -v hosts')
                   ip_list = ip_list_str.split('\n') as List
                }
                echo "ip_list_str = ${ip_list_str}"
                 
                println ip_list
            }
        }
      
       stage('loop the ip list to start them'){
            steps {
                script {
                   for(ip in ip_list){
                       ip = ip.trim()
                       echo "get a ip --> ${ip}"
                       
                       def mac_addr = mac_map[ip]
                       echo "mac address --> ${mac_addr}"
                       
                       sh " \"/mnt/d/Program Files (x86)/Fing/bin/fing.exe\" --wol ${mac_addr}@${ip}/24"
                   }
                   
                   sleep(90) // 由于x230开机较慢, 必须等1分钟半才能执行后面的 启动vm命令
                   
                }
              
            }
        }
 ...



12. 开机job – 编写linux脚本 和 ansible playboook for 开启主机上的所有kvm虚拟机

开机后就可以用ansible了,
这一步与第7步很类似的

shell script:

gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat startup_all_vms.sh
#!/bin/bash
echo "=== starting all kvm vms that contain the word vm ==="
for i in $(virsh list --all | grep vm | awk '{print $2}')
  do virsh start $i
done
sleep 10
virsh list --all

playbook:

gateman@DESKTOP-UIU9RFJ:/opt/apps/playbooks/remoteserver$ cat startup_all_vms.yml 
---
- hosts: "{{servers}}"
  remote_user: "{{ansible_user}}"
  gather_facts: false
  tasks:
  - name: print debug msg for paramaters
    debug:
      msg:
        - "servers is: {{servers}}"
        - "ansible_user is: {{ansible_user}}"

  - name: copy shutdown vm shell script to remote server
    copy:
      src: ./startup_all_vms.sh
      dest: /tmp
      backup: no
      mode: 0775

  - name: excute the shutdown vm script
    script: ./startup_all_vms.sh
    args:
      chdir: /tmp
    environment: # https://stackoverflow.com/questions/59522902/vms-are-not-visible-to-virsh-command-executed-using-ansible-shell-task
      LIBVIRT_DEFAULT_URI: qemu:///system
    register: cmdresult

  - name: show stdout cmdresult
    debug:
      msg: "{{ cmdresult.stdout }}"

  - name: show stderr cmdresult
    debug:
      msg: "{{ cmdresult.stderr }}"

13. 开机job – 完成编写Jenkins 脚本for 远程唤醒物理机 和 里面的虚拟机

其实就是完成第11 步的job

def playbook_path="/opt/apps/playbooks"
def ip_list_str = ""
def ip_list = []
def mac_map = ["10.0.1.107":"00:e0:0a:f2:12:26",
               "10.0.1.122":"3c:97:0e:59:14:87",    
              ]
def start_vm_plybk = "/opt/apps/playbooks/remoteserver/startup_all_vms.yml"


pipeline {

    agent { node { label 'master' } }

    stages {

       stage('display parameters') {
            steps {
                echo "servers is ${servers}"
                echo "ansible_user is ${ansible_user}"
            }
        }

        stage('use ansible --list host command to get the ip list'){
            steps {
                
                script {
                   ip_list_str = sh(returnStdout: true, script: 'ansible physical_servers --list-host | grep -v hosts')
                   ip_list = ip_list_str.split('\n') as List
                }
                echo "ip_list_str = ${ip_list_str}"
                 
                println ip_list
            }
        }
        
        stage('loop the ip list to start them'){
            steps {
                script {
                   for(ip in ip_list){
                       ip = ip.trim()
                       echo "get a ip --> ${ip}"
                       
                       def mac_addr = mac_map[ip]
                       echo "mac address --> ${mac_addr}"
                       
                       sh " \"/mnt/d/Program Files (x86)/Fing/bin/fing.exe\" --wol ${mac_addr}@${ip}/24"
                   }
                   
                   sleep(90)
                   
                }
              
            }
        }
        
        
        stage ('startup all vms in the servers') {
            steps {
                build job: 'common_run_playbook', 
                    parameters: [[$class: 'StringParameterValue', name: 'servers', value: "${servers}"],
                                 [$class: 'StringParameterValue', name: 'ansible_user', value: "${ansible_user}"],
                                 [$class: 'StringParameterValue', name: 'playbook_path', value: "${start_vm_plybk}"]
                                ]
            }
        }


        stage('completed') {
            steps {
                 println 'build is completed'
            }
        }
        
        
        
    }
    
    post {
       success {
           emailext (
               subject: "SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",
               body: """<p>SUCCESSFUL: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':</p>
                   <p>Check console output at "<a href="${env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]</a>"</p>""",
               to: "nvd11@163.com",
               from: "nvd11@163.com"
           )
       }
       
       failure {
           emailext (
               subject: "FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'",
               body: """<p>FAILED: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]':</p>
                   <p>Check console output at "<a href="${env.BUILD_URL}">${env.JOB_NAME} [${env.BUILD_NUMBER}]</a>"</p>""",
               to: "nvd11@163.com",
               from: "nvd11@163.com"
           )
       }        
    }
    
}

到这里为止, 要做事情做完了。
我们拥有了2个开关机的job! 想开机关机时, 用房间电脑打开jenkins 网页去执行这个两job就行, 不用人手跑去机柜开人手开关机
在这里插入图片描述

至于怎么快速验证 物理机和虚拟机有无被关闭开启?
两个方法:
1.是利用 kvm 的 vm manger , 但是这个kvm客户端没有windows版本, 掂!
在这里插入图片描述

2, 更合适的方法? 当然是用grafana + prometheus 啦,
参考:
https://blog.csdn.net/nvd11/article/details/128030197

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐