MENU

Docker Swarm 实战

2019 年 01 月 28 日 • 应用服务器

emmmmmm,最近有一套老系统在重构,目前项目运行环境为jdk1.7+resin-4.0.47+activemq+redis+mysql,嗯,就这些,现在准备要上测试服了,服务器买了一堆,现在需要用的基础服务包括但不仅限于elasticsearch、logstash、kibana、filebeat、zookeeper、activemq、mongodb、redis、MySQL,现在不需要用tomcat||resin了,项目用的Spring Boot,直接运行jar包就好,我不是很懂,测试环境从不加监控,所以zabbix暂时也放一边,现在需要把运行环境搭建一下,所以,直接上Docker Swarm了。

准备工作

目前docker-ce已经全部安装完毕,使用ansible进行的批量安装,这里就不多提了,整理了一下,需要用到的基础服务如下,elasticsearch集群、elasticsearch-head、logstash、kibana、kafka集群、zookeeper集群、activemq、mongodb、redis集群、MySQL

测试服随意一点,以方便快捷为主,所以全部放到容器,现在启用swarm,把做基础服务的服务器加到集群里。

启动swarm

[root@docker-manager ~]# docker swarm init --advertise-addr 172.24.90.38
Swarm initialized: current node (bh950ji26br60or076cmwvmu3) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

加入集群

[root@docker-manager ~]# docker swarm join-token worker 
To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377

[root@docker-manager ~]# ansible service -m shell -a "docker swarm join --token SWMTKN-1-4uk84scrsf1e0zbwy8mdt9ruub02ojmaeqe1z2igdj3hxv5k76-7yqzw1pw9s8j10qx9utz9gwce 172.24.90.38:2377"

现在已经全部加进来了,顺便提一嘴,现在swarm没做高可用,暂时只有一个manager,建议生产环境的话最低三个,也就是一个Leader,两个Reachable,下面开始创建服务。

创建MySQL服务

配置文件

我都是用yml文件去创建的服务,也就是stack,感觉这些服务写配置文件比较好,K8S暂时玩不利索,所以暂时用swarm了,还有就是我的服务约束是用主机名指定的,不太建议这样做,如果被指定的服务器宕掉了就会抛no suitable node的错,会一直等待被指定的服务器恢复,单机的服务没啥子办法,服务多的服务器话最好是用标签,自己看着办吧,MySQL的话就比较简单了,如下。

[root@docker-manager /swarm/mysql]#  cat mysql.yml
version: '3.7'

services:
  mysql:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/mysql:5.7
    hostname: mysql
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == mysql]
    ports:
      - 3306:3306
    environment:
      MYSQL_ROOT_PASSWORD: passwd
    volumes:
      - /data/mysql:/var/lib/mysql
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

新建网络

我指定使用了一个名为recharge的网络,现在还没有,所以要创建一下,创建后就可以启动服务了

[root@docker-manager ~]# docker network create --driver overlay --subnet 13.14.15.0/24 --ip-range 13.14.15.0/24 --gateway 13.14.15.1 recharge
jnptx5xp3jhmcn8uw4owrzqv9

创建服务

记得创建数据目录撒

[root@docker-manager /swarm/mysql]# ansible mysql -m file -a "path=/data/mysql state=directory"
[root@docker-manager /swarm/mysql]# docker stack deploy -c mysql.yml --with-registry-auth mysql
[root@docker-manager /swarm/mysql]# docker stack ps mysql

验证的话只要看一下数据目录有没有东西就行了,有东西就说明正常启动了。

[root@docker-manager /swarm/mysql]# ansible mysql -m shell -a "ls /data/mysql/"

没有问题的撒,过,下面搞activemq

创建activemq服务

先贴一下Dockerfile

Dockerfile

FROM webcenter/activemq
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone && \
     sed -i '/1G/d' /opt/activemq/bin/env

那个sed删掉的是脚本内一个名为ACTIVEMQ_OPTS_MEMORY的变量,是用来定义mq启动内存的,默认1G,现已加入全局变量,请自行调整,还有就是生产环境测试并发的时候,并发了一万条请求进来,结果MQ直接崩了,所以建议还是改一下配置文件吧,像是什么最大连接数和需要启动的transportConnectors对象,我们就是用了一个61616和管理端口,像是什么6161{3..4}的都没用到,我在配置文件里全部删掉了,然后就没问题了。

配置文件

[root@docker-manager /swarm/activemq]# cat activemq.yml 
version: '3.7'

services:
  activemq:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:5.14.3
    hostname: activemq
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == activemq]
    ports:
      - 8161:8161
      - 61616:61616
    environment:
      ACTIVEMQ_OPTS_MEMORY: -Xms2048M -Xmx2048M
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

创建服务

[root@docker-manager /swarm/activemq]# docker stack deploy -c activemq.yml --with-registry-auth activemq
[root@docker-manager /swarm/activemq]# docker stack ps activemq 
ID                  NAME                  IMAGE                                                     NODE                DESIRED STATE       CURRENT STATE             ERROR               PORTS
jcmhuzekzm75        activemq_activemq.1   registry.cn-beijing.aliyuncs.com/rj-bai/activemq:5.14.3   activemq            Running             Preparing 8 seconds ago                       

这个怎么确认是否创建成功,访问一下8161端口就知道了,这个是web管理端口

[root@docker-manager /swarm/activemq]# curl -I -u admin:admin 127.0.0.1:8161
HTTP/1.1 200 OK
Date: Mon, 21 Jan 2019 08:35:41 GMT
X-FRAME-OPTIONS: SAMEORIGIN
Content-Type: text/html
Content-Length: 6047
Server: Jetty(9.2.13.v20150730)

木有问题撒,过,下面是mongodb

创建activemq集群

这个时候后加的撒,之前mq一直是单点,而且从来没出过问题,但是我还是有点方,所以最近加了一个mq的集群,具体的搭建方式看一下这里,不是伪集群撒,我是直接把他扔到容器里了,使用七个容器,其中三个为zookeeper集群,剩下三个为activemq集群,一个master,两个slave,还有一个nginx四层代理,就这样,配置文件的话帖重点吧。

activemq集群配置信息

Dockerfile
基于我之前的镜像构建的。

FROM registry.cn-beijing.aliyuncs.com/rj-bai/activemq:5.14.3
COPY activemq.xml /opt/activemq/conf
COPY run.sh /app/

activemq.xml

        <persistenceAdapter>
                <replicatedLevelDB
                directory="${activemq.data}/leveldb"
                replicas="3"
                bind="tcp://0.0.0.0:22181"
                zkAddress="mqzoo1:2181,mqzoo2:2181,mqzoo3:2181"
                zkPath="/zookeeper/leveldb-stores"
                hostname="sedhostname"
                />
        </persistenceAdapter>

run.sh

#!/bin/sh
sed -i s#sedhostname#$HOSTNAME#g /opt/activemq/conf/activemq.xml

python /app/entrypoint/Init.py
exec /usr/bin/supervisord -n -c /etc/supervisor/supervisord.conf

服务配置文件

version: '3.7'

services:
  activemq1:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:cluster-5.14.3
    hostname: activemq1
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == activemq001]
    environment:
      ACTIVEMQ_OPTS_MEMORY: -Xms1024M -Xmx1024M
    networks:
      - rj-bai

  activemq2:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:cluster-5.14.3
    hostname: activemq2
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == activemq002]
    environment:
      ACTIVEMQ_OPTS_MEMORY: -Xms1024M -Xmx1024M
    networks:
      - rj-bai

  activemq3:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/activemq:cluster-5.14.3
    hostname: activemq3
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == activemq003]
    environment:
      ACTIVEMQ_OPTS_MEMORY: -Xms1024M -Xmx1024M
    networks:
      - rj-bai

networks: 
  rj-bai: 
    external: true
    name: rj-bai

zookeeper for activemq

version: '3.7'

services:
  mqzoo1:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: mqzoo1
    deploy:
      replicas: 1
      placement:
        constraints: [node.hostname == activemq001]
    environment:
      ZOO_MY_ID: 1
      JVMFLAGS: -Xms512m -Xmx512m
      ZOO_SERVERS: server.1=mqzoo1:2888:3888 server.2=mqzoo2:2888:3888 server.3=mqzoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - rj-bai

  mqzoo2:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: mqzoo2
    deploy:
      replicas: 1
      placement:
        constraints: [node.hostname == activemq002]
    environment:
      ZOO_MY_ID: 2
      JVMFLAGS: -Xms512m -Xmx512m
      ZOO_SERVERS: server.1=mqzoo1:2888:3888 server.2=mqzoo2:2888:3888 server.3=mqzoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - rj-bai

  mqzoo3:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: mqzoo3
    deploy:
      replicas: 1
      placement:
        constraints: [node.hostname == activemq003]
    environment:
      ZOO_MY_ID: 3
      JVMFLAGS: -Xms512m -Xmx512m
      ZOO_SERVERS: server.1=mqzoo1:2888:3888 server.2=mqzoo2:2888:3888 server.3=mqzoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - rj-bai

networks: 
  rj-bai: 
    external: true
    name: rj-bai

mq代理服务

使用nginx做了一个四层代理,配置文件如下。

user  nginx;
worker_processes  auto;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
    worker_connections  1024;
}

  stream {
    log_format  main  '$remote_addr - [$time_local] ' '$status ' '$upstream_addr';
    access_log  /var/log/nginx/access.log  main;

    upstream activemq {
      server activemq1:61616 max_fails=1 fail_timeout=1s;
      server activemq2:61616 max_fails=1 fail_timeout=1s;
      server activemq3:61616 max_fails=1 fail_timeout=1s;
 }

  server {
    listen 61616;
    proxy_pass activemq;
    }

    upstream admin {
      server activemq1:8161 max_fails=1 fail_timeout=1s;
      server activemq2:8161 max_fails=1 fail_timeout=1s;
      server activemq3:8161 max_fails=1 fail_timeout=1s;
 }

  server {
    listen 8161;
    proxy_pass admin;
  }
}

配置文件

version: '3.7'

services:
  activemq:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/nginx:1.15.9-mq
    hostname: nginx
    deploy:
      replicas: 2
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
      endpoint_mode: vip
    ports:
      - 8161:8161
      - 61616:61616
    networks:
      - rj-bai

networks: 
  rj-bai: 
    external: true
    name: rj-bai

所以你程序连接mq的时候直接连接这个代理就行了,创建顺序zookeeper-mq-cluster-mq-proxy就可以了

创建mongodb服务

这个我用到了mongo&mongo-expressmongo-expressmongodbweb管理工具,功能类似phpMyAdmin,选装,Dockerfile如下。

Dockerfile

mongo

FROM mongo
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone

mongo-express

FROM mongo-express
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone 

配置文件

mongo 如下

[root@docker-manager /swarm/mongodb]# cat mongo.yml
version: '3.7'
services:
  mongo:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/mongodb:4.0.5
    hostname: mongodb
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == mongodb]
#    environment:
#      MONGO_INITDB_ROOT_USERNAME: root
#      MONGO_INITDB_ROOT_PASSWORD: passwd
    ports:
      - 27017:27017
    volumes:
      - /data/mongodb:/data/db
    networks: 
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

被我注释掉的是定义登陆mongodb的用户名和密码,我问了一下开发人员说不用密码就好,所以我就没加,既然没加密码,安全限制不用说了,自己去做吧,下面是mongo-express

mongo-express如下

[root@docker-manager /swarm/mongodb]# cat mongo-express.yml
version: '3.7'
services:
  mongo-express:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/mongodb-express:0.12.0
    hostname: mongodb-express
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == mongodb]
    ports:
      - 8081:8081
    environment:
#      ME_CONFIG_MONGODB_ADMINUSERNAME: root
#      ME_CONFIG_MONGODB_ADMINPASSWORD: passwd
      ME_CONFIG_BASICAUTH_USERNAME: admin
      ME_CONFIG_BASICAUTH_PASSWORD: Sowhat?
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

至于这两个服务我为什么要拆开,因为暂时stack暂时不支持定义服务的启动顺序,正常来说是先启动mongodb,再启动mongo-express,如果先启动mongo-express第一次会启动失败,等mongo起了之后就没问题了,大概就这样,所以我分开了。

创建服务

[root@docker-manager /swarm/mongodb]# ansible mongodb -m file -a "path=/data/mongodb state=directory"
[root@docker-manager /swarm/mongodb]# docker stack deploy -c mongo.yml --with-registry-auth mongo
[root@docker-manager /swarm/mongodb]# docker stack deploy -c mongo-express.yml --with-registry-auth mongo-express

打开管理页面,看一下是否能连接到mongodb,常规操作一下。

没问题,这个页面的时间不太清楚是在哪里取的值,服务器和容器的时间都没问题,所以不管他,过,下面redis集群

创建redis集群

这个镜像是我自己手动做的,用的最新稳定版本,配置文件是直接传到了镜像里,也开启了持久化,大概是这样。

Dockerfile

先贴一下Dockerfile

FROM registry.cn-beijing.aliyuncs.com/rj-bai/centos:7.5
RUN yum -y install wget make gcc && yum clean all && \
    wget http://download.redis.io/releases/redis-5.0.3.tar.gz && tar zxf redis-5.0.3.tar.gz && rm -f redis-5.0.3.tar.gz && \
    cd redis-5.0.3/ && make && make install
COPY start.sh /
COPY redis.conf /
CMD ["/bin/bash", "/start.sh"]

start.sh文件内容

#!/bin/bash
if [ -n "$DIR" ];
then
 sed -i s\#./\#$DIR\#g /redis.conf
fi

if [ ! -n "$REDIS_PORT" ];
then
  redis-server /redis.conf
else
 sed -i 's#6379#'$REDIS_PORT'#g' /redis.conf && redis-server /redis.conf
fi

redis.conf主要内容

port 6379
save 900 1
save 300 10
save 60 10000
dbfilename "dump.rdb"
dir ./

用脚本替换了两个值,一个是端口一个是数据存储目录,大概就这样。

配置文件

[root@docker-manager /swarm/redis]# cat redis.yml
version: '3.7'
services:
  redis1:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
    hostname: redis1
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == redis-1]
    environment:
      DIR: /data/7000
      REDIS_PORT: 7000
    volumes:
      - /data/7000:/data/7000
    networks:
      - host

  redis2:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3 
    hostname: redis2
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == redis-1]
    environment:
      DIR: /data/7001
      REDIS_PORT: 7001
    volumes:
      - /data/7001:/data/7001
    networks:
      - host

  redis3:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
    hostname: redis3
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == redis-2]
    environment:
      DIR: /data/7002
      REDIS_PORT: 7002
    volumes:
      - /data/7002:/data/7002
    networks:
      - host

  redis4:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
    hostname: redis4
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == redis-2]
    environment:
      DIR: /data/7003
      REDIS_PORT: 7003
    volumes:
      - /data/7003:/data/7003
    networks:
      - host

  redis5:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
    hostname: redis5
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == redis-3]
    environment:
      DIR: /data/7004
      REDIS_PORT: 7004
    volumes:
      - /data/7004:/data/7004
    networks:
      - host

  redis6:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/redis:cluster-5.0.3
    hostname: redis6
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == redis-3]
    environment:
      DIR: /data/7005
      REDIS_PORT: 7005
    volumes:
      - /data/7005:/data/7005
    networks:
      - host

networks: 
  host: 
    external: true
    name: host

我用的是网络是host,开始创建服务。

创建服务

[root@docker-manager /swarm/redis]# docker stack deploy -c redis.yml --with-registry-auth redis
[root@docker-manager /swarm/redis]# docker stack ps redis

这还不算完事,现在只是把redis启动了,还没有做集群,还得这样。

[root@docker-manager ~]# docker run --rm -it inem0o/redis-trib create --replicas 1 172.24.89.242:7000 172.24.89.242:7001 172.24.89.241:7002 172.24.89.241:7003 172.24.89.237:7004 172.24.89.237:7005

执行后需要输入一个yes,看到这个提示就算是成功了。

登录到容器确认一下。

[root@docker-manager /swarm/redis]# ssh redis-1 
Last login: Mon Jan 21 18:39:54 2019 from 172.24.90.38
[root@redis-1 ~]# docker exec -it redis_redis1.1.lhe09fkmcs2j5mnvj3ow7uo14 /bin/bash
[root@redis1 /]# redis-cli -c -p 7000
127.0.0.1:7000> cluster nodes
28aa26332cd2799fe7b615865fa0259b9154299a 172.24.89.241:7003@17003 slave 34f738b7eff022be6eeb5c4cceafd52a935b1fc6 0 1548067394712 4 connected
3a0ad1bc61b8e765e7548cc87db6b3bfe9a7f60f 172.24.89.237:7004@17004 master - 0 1548067394211 5 connected 10923-16383
09db8b01f9f9d0b38152af82a8c38fdf85e1a9b3 172.24.89.242:7001@17001 slave 37b6dec48b97e816431d7f2cbe71489c3afdc508 0 1548067395000 3 connected
34f738b7eff022be6eeb5c4cceafd52a935b1fc6 172.24.89.242:7000@17000 myself,master - 0 1548067394000 1 connected 0-5460
a7c5314086b0a7d57d09861da8af31c514f8a167 172.24.89.237:7005@17005 slave 3a0ad1bc61b8e765e7548cc87db6b3bfe9a7f60f 0 1548067396217 6 connected
37b6dec48b97e816431d7f2cbe71489c3afdc508 172.24.89.241:7002@17002 master - 0 1548067395214 3 connected 5461-10922
127.0.0.1:7000> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:1113
cluster_stats_messages_pong_sent:1068
cluster_stats_messages_sent:2181
cluster_stats_messages_ping_received:1063
cluster_stats_messages_pong_received:1113
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:2181

没有问题撒,大概就是这样,过,下面是zookeeper集群

创建zookeeper集群

Dockerfile

基于官方镜像及文档,使用Dockerfile重新构建了一下镜像,Dockerfile如下

FROM zookeeper:latest
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone

配置文件

[root@docker-manager /swarm/zookeeper]# cat zookeeper.yml 
version: '3.7'

services:
  zoo1:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: zoo1
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == zookeeper-1]
    environment:
      ZOO_MY_ID: 1
      JVMFLAGS: -Xms1024m -Xmx1024m
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - recharge

  zoo2:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: zoo2
    deploy:
      replicas: 1
      placement:
        constraints: [node.hostname == zookeeper-2]
    environment:
      ZOO_MY_ID: 2
      JVMFLAGS: -Xms1024m -Xmx1024m
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - recharge

  zoo3:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: zoo3
    deploy:
      replicas: 1
      placement:
        constraints: [ node.hostname == zookeeper-3]
    environment:
      ZOO_MY_ID: 3
      JVMFLAGS: -Xms1024m -Xmx1024m
      ZOO_SERVERS: server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

创建服务

[root@docker-manager /swarm/zookeeper]# docker stack deploy -c zookeeper.yml --with-registry-auth zookeeper
[root@docker-manager /swarm/zookeeper]# docker stack ps zookeeper

去看一下是否成功了,正常来说一个leader两个follower就对了,这种效果撒

哦了撒,没问题,这个zookeeper主要是项目用的注册中心,下一个,kafka集群。

创建kafka集群

kafka依赖zookeeper,之前的那个zookeeper是项目用的,所以现在再启动一套供kafka使用,kafkaDockerfile如下

Dockerfile

FROM wurstmeister/kafka
ENV LANG en_US.utf8
RUN apk add -U tzdata
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone

zookeeper的构建方式还是以前那样

配置文件

zookeeperkafka的配置文件我还拆出来了,原因是kafka依赖zookeeper,得先启动zookeeper,再启动kafka,配置文件如下。

zookeeper

[root@docker-manager /swarm/kafka]# cat kafka-zookeeper.yml 
version: '3.7'

services:
  kaf-zoo1:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: kaf-zoo1
    deploy:
      replicas: 1
      placement:
        constraints: [node.hostname == kafka-1]
    environment:
      ZOO_MY_ID: 1
      JVMFLAGS: -Xms1024m -Xmx1024m
      ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - recharge

  kaf-zoo2:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: kaf-zoo2
    deploy:
      replicas: 1
      placement:
        constraints: [node.hostname == kafka-2]
    environment:
      ZOO_MY_ID: 2
      JVMFLAGS: -Xms1024m -Xmx1024m
      ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - recharge

  kaf-zoo3:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/zookeeper:3.4.13
    hostname: kaf-zoo3
    deploy:
      replicas: 1
      placement:
        constraints: [node.hostname == kafka-3]
    environment:
      ZOO_MY_ID: 3
      JVMFLAGS: -Xms1024m -Xmx1024m
      ZOO_SERVERS: server.1=kaf-zoo1:2888:3888 server.2=kaf-zoo2:2888:3888 server.3=kaf-zoo3:2888:3888
    healthcheck:
      test: ["CMD-SHELL","zkServer.sh","status || exit 1"]
      interval: 5s
      timeout: 3s
      retries: 3
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

kafka

[root@docker-manager /swarm/kafka]# cat kafka.yml
version: '3.7'

services:
  kafka1:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
    hostname: kafka1
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == kafka-1]
    environment:
      KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
      KAFKA_ADVERTISED_HOST_NAME: kafka1
      KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
      KAFKA_BROKER_ID: 1
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    networks:
      - recharge

  kafka2:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
    hostname: kafka2
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == kafka-2]
    environment:
      KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
      KAFKA_ADVERTISED_HOST_NAME: kafka2
      KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
      KAFKA_BROKER_ID: 2
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    networks:
      - recharge

  kafka3:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/kafka:2.1.0
    hostname: kafka3
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == kafka-3]
    environment:
      KAFKA_HEAP_OPTS: -Xmx1G -Xms1G
      KAFKA_ADVERTISED_HOST_NAME: kafka3
      KAFKA_ZOOKEEPER_CONNECT: "kaf-zoo1:2181,kaf-zoo2:2181,kaf-zoo3:2181"
      KAFKA_BROKER_ID: 3
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

创建服务

先创建zookeeper,然后再kafka

[root@docker-manager /swarm/kafka]# docker stack deploy -c kafka-zookeeper.yml --with-registry-auth kafka-zookeeper
[root@docker-manager /swarm/kafka]# docker stack deploy -c kafka.yml --with-registry-auth kafka

是否正常启动这个是否正常启动看kafka日志就行了。

[root@docker-manager ~]# docker service logs 服务名

自己看一下吧,没有抛错就没问题撒,下面创建elasticsearch 集群。

创建elasticsearch集群

这里先简单的提一下,我们这里用的es*3,logstash*1,kibana*1,es-head*1,kafka集群,需要kafka的原因是项目日志不是通过filebeatlogstash推到es里面的,而是log4j推到了kafkalogstashinputkafkaoutputes集群,说白了就是项目把日志推到了kafka里面,kafka推到了logstash,最后logstash再推到es集群,之前就做过,都是传统方式二进制安装,太麻烦就没写文档,大概就是这样,首先一步步实现吧,先把es集群做了,先贴一下Dockerfile

Dockerfile

先贴一下默认启用的插件吧,有别的需要请使用Dockerfile自行安装或卸载,使用6.5.4版本,默认包含 X-Pack插件。

[root@695653b6515c elasticsearch]# elasticsearch-plugin list
ingest-geoip
ingest-user-agent

上述两个插件我会用到,别的就用不到了,所以这两个就够了,下面是Dockerfile

FROM docker.elastic.co/elasticsearch/elasticsearch:6.5.4
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone

配置文件

我使用了三个es组成了集群,所以配置文件如下,服务名尽量不要改撒,和kibana有关联的。

[root@docker-manager /swarm/elasticsearch]# cat elasticsearch.yml 
version: '3.7'
services:
  elasticsearch:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == elasticsearch-1]
    environment:
      - cluster.name=es
      - node.name=es-1
      - http.cors.enabled=true
      - http.cors.allow-origin=*
      - discovery.zen.minimum_master_nodes=2
      - discovery.zen.fd.ping_timeout=120s
      - discovery.zen.fd.ping_retries=6
      - discovery.zen.fd.ping_interval=30s
      - "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
      - "ES_JAVA_OPTS=-Xms2G -Xmx2G"
    volumes:
      - /elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
    networks:
      - recharge

  elasticsearch2:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == elasticsearch-2]
    environment:
      - cluster.name=es
      - node.name=es-2
      - http.cors.enabled=true
      - http.cors.allow-origin=*
      - discovery.zen.minimum_master_nodes=2
      - discovery.zen.fd.ping_timeout=120s
      - discovery.zen.fd.ping_retries=6
      - discovery.zen.fd.ping_interval=30s
      - "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
      - "ES_JAVA_OPTS=-Xms2G -Xmx2G"
    volumes:
      - /elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
    networks:
      - recharge

  elasticsearch3:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch:6.5.4
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == elasticsearch-3]
    environment:
      - cluster.name=es
      - node.name=es-3
      - http.cors.enabled=true
      - http.cors.allow-origin=*
      - discovery.zen.minimum_master_nodes=2
      - discovery.zen.fd.ping_timeout=120s
      - discovery.zen.fd.ping_retries=6
      - discovery.zen.fd.ping_interval=30s
      - "discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3"
      - "ES_JAVA_OPTS=-Xms2G -Xmx2G"
    volumes:
      - /elasticsearch:/usr/share/elasticsearch/data
    ports:
      - 9202:9200
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

就这样,接下来需要创建目录和改一些参数。

创建服务

[root@docker-manager ~/sh]# cat es.sh 
#!/bin/bash
cat >>/etc/security/limits.conf<<OEF
* soft nofile 65536
* hard nofile 65536
* soft nproc 2048
* hard nproc 4096
OEF

cat >>/etc/sysctl.conf<<OEF
vm.max_map_count=655360
fs.file-max=655360
OEF

/usr/sbin/sysctl -p
[root@docker-manager ~/sh]# ansible elasticsearch -m script -a "/root/sh/es.sh"
[root@docker-manager ~/sh]# ansible elasticsearch -m file -a "path=/elasticsearch state=directory owner=1000 group=1000 mode=755"
[root@docker-manager /swarm/elasticsearch]# docker stack deploy -c elasticsearch.yml --with-registry-auth elasticsearch
[root@docker-manager /swarm/elasticsearch]# docker stack ps elasticsearch

看一下集群状态撒

[root@docker-manager ~]# curl http://127.0.0.1:9200/_cat/health?v
epoch      timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1548133528 05:05:28  es      green           3         3      0   0    0    0        0             0                  -                100.0%

木有问题撒,下面把logstash启了

创建logstash服务

先看Dockerfile吧,如下

Dockerfile

FROM docker.elastic.co/logstash/logstash:6.5.4
USER root
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone
USER logstash
RUN logstash-plugin install logstash-input-kafka && logstash-plugin install logstash-output-elasticsearch
COPY kafka.conf /usr/share/logstash/config/
COPY start.sh /
CMD ["/bin/bash","/start.sh"]

需要什么插件自行安装吧,我是直接吧配置文件传了进去,也可以使用自定义的,脚本内容如下

#!/bin/bash
if [ -n "$CONFIG" ];
then
 logstash -f "$CONFIG"
else
 logstash -f ./config/kafka.conf
fi

kafka.conf内容

这个文编的编写最好咨询开发人员,用的topics都有什么,然后去创建对应的topics,现在还没定义,我写的default

input{

        kafka{
        bootstrap_servers => ["kafka1:9092,kafka2:9092,kafka3:9092"]
        consumer_threads => 5
        topics => ["default"]
        decorate_events => true 
    type => "default"
      }
}

filter {

    grok {

        match => ["message", "%{TIMESTAMP_ISO8601:logdate}"]

    }

    date {

        match => ["logdate", "yyyy-MM-dd HH:mm:ss,SSS"]

        target => "@timestamp"

    }
    
    mutate {

        remove_tag => ["logdate"]

    }
}

output {
     elasticsearch {
        hosts => ["elasticsearch:9200","elasticsearch2:9200","elasticsearch3:9200"]
        index => "logstash-%{type}-%{+YYYY.MM.dd}"
    }
    stdout {
    codec => rubydebug {}
    }
}

启动logstash

配置文件

[root@docker-manager /swarm/logstash]# cat logstash.yml 
version: '3.7'

services:
  logstash:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/logstash:6.5.4
    hostname: logstash
    deploy:
      replicas: 1
      placement:
        constraints: [node.hostname == logstash]
    environment:
      - "LS_JAVA_OPTS=-Xms1G -Xmx1G"
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

如果要挂载自定义的配置文件,请使用root用户挂载撒

创建服务

[root@docker-manager /swarm/logstash]# ansible logstash -m script -a "/root/sh/es.sh"
[root@docker-manager /swarm/logstash]# docker stack deploy -c logstash.yml --with-registry-auth logstash
[root@docker-manager /swarm/logstash]# docker stack ps logstash

没问题的撒,然后接下来kibana

创建kibana服务

Dockerfile如下

Dockerfile

FROM docker.elastic.co/kibana/kibana:6.5.4
USER root
ENV LANG en_US.utf8
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone

配置文件

version: '3.7'

services:
  kibana:
    image: registry.cn-beijing.aliyuncs.com/rj-bai/kibana:6.5.4
    hostname: kibana
    deploy:
      replicas: 1
      endpoint_mode: vip
      placement:
        constraints: [node.hostname == logstash]
    ports: 
      - 5601:5601
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

创建服务

[root@docker-manager /swarm/kibana]# docker stack deploy -c kibana.yml --with-registry-auth kibana

创建es-head

Dockerfile如下

FROM mobz/elasticsearch-head:5
RUN ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
    echo Asia/Shanghai > /etc/timezone

大概就是这样撒,现在可以启动了。

配置文件

version: '3.2'

services:
  es-head: 
    image: registry.cn-beijing.aliyuncs.com/rj-bai/elasticsearch-head:5
    deploy:
      placement:
        constraints:
          - node.hostname == logstash
    ports: 
      - 9100:9100
    networks:
      - recharge

networks: 
  recharge: 
    external:
      name: recharge

可以启动了撒。

创建服务

[root@docker-manager /swarm/logstash]# docker stack deploy -c es-head.yml --with-registry-auth es-head

到这里就结束了,最后看一眼所有的服务

所有的服务

就这些,然后看一眼kibana吧,看es集群是否运行正常,主页如下

点开monitoring,启用监控,自己看吧。

现在es 集群还没有数据,而且有些服务还没有测,所以,我提供连接信息后,开发人员要了一个项目包,该包我看了一下除了没有涉及到连接mq&mysql,其他的都涉及到了,数据库不用说了,做了N次了,绝对没问题,mq让他们在本地项目连接测了一下,没问题,那就用这个包测一下吧。

测试阶段

然后现在有一个蛋疼的问题,在创建redis的时候我用的网络是host网络,而不是recharge网络,我们这里项目连接基础服务都是通过DNS解析去连接的,也就是写了hosts 文件,项目配置文件连接信息写的都是redis1,kafka1这种的,所以redis得手动写hosts了,至于为什么我用的hosts网络,原因就是做集群的时候不能用hosts去指定节点信息,支持不是很友好,如果跑在容器里,我会启动6个,做集群的时候我还得去查容器的IP,如果某个容器宕掉IP也会变,真的麻烦,考虑上生产的时候不用容器跑redis了,测试的暂时就这样吧。

现在直接在manager节点手动启一服务个就行了,不写配置文件了,我随便拉了一个openjdk1.8的镜像,使用Dockerfile把包传了进去,项目连接信息有这些。

spring.dubbo.registry.address=zookeeper://zoo1:2181;zookeeper://zoo2:2181;zookeeper://zoo3:2181
redis.ipPorts=redis1:7000,redis1:7001,redis2:7002,redis2:7003,redis3:7004,redis3:7005
spring.data.mongodb.host=mongo
elk.kafka.urls=kafka1:9092,kafka2:9092,kafka3:9092

镜像创建过程我就不写了,做个运行jar包的镜像应该都没问题,我这里直接启动,已经做好了

[root@docker-manager ~]# docker service create --name spring-boot --network recharge --constraint node.role==manager --replicas 1 spring-boot
[root@docker-manager ~]# docker service logs -f spring-boot

正常启动了,没有任何抛错,现在logstash应该也有项目日志输出了,看一下。

[root@docker-manager ~]# ssh logstash
[root@docker-manager ~]# docker service logs -f logstash_logstash

取了其中一段,去kibana看一眼撒

有了撒,创建索引后

有数据了,当然用的不止是这一个,现在顺便把nginx日志的绘图加进去,现在nginx我并没有扔到容器里,所以推log需要filebeat了,现在做一下。

nginx日志分析

点开kibana图标→Logging→Add log data→Nginx logs,会看到详细步骤,我都是centos7+的系统,所以选择RPM,阔以看到要装两个插件,都已经装了,所以登录到nginx服务器,执行几条命令就好,为了方便我全部服务器添加了es、kibanahosts解析,所以配置如下。

[root@nginx ~]# rpm -Uvh https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-6.5.4-x86_64.rpm
[root@nginx ~]# vim /etc/filebeat/filebeat.yml
setup.kibana:
             host: "kibana:5601"
output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["elasticsearch:9200","elasticsearch2:9201","elasticsearch3:9202"]

我的log并不是全推,这个nginx只有两个,所以写成这样。

[root@nginx ~]# filebeat modules enable nginx
Enabled nginx
[root@nginx ~]# vim /etc/filebeat/modules.d/nginx.yml
- module: nginx
  access:
    enabled: true
    var.paths: ["/usr/local/nginx/logs/yourlogfile.log","/usr/local/nginx/logs/yourlogfile.log"]
  error:
    enabled: true
    var.paths: ["/usr/local/nginx/logs/error.log"]

最后启动filebeat

[root@nginx ~]# filebeat setup
Loaded index template
Loading dashboards (Kibana must be running and reachable)
Loaded dashboards
Loaded machine learning job configurations
[root@nginx ~]# systemctl start filebeat.service

之后会多一个名为filebeat的索引,然后就有图形了,在添加数据的页面可以看到支持分析的logs有很多,像是什么redis、mysql系统日志之类的,自行探索吧,我暂时不打算加了,上生产再加。

最终测试

咨询了开发人员这套系统的项目是13个,生产数据库准备用阿里云的RDS,测试服暂时是准备了十个业务服务器,所以现在组成集群的服务器有这些,27个

emmmm,说实话第一次这样做,不知道基础服务能不能抗住,所以我准备测一下,启服务试试,越多越好,所以我写了下面的文件。

version: '3.7'
services:
  spring-boot: 
    image: registry.cn-beijing.aliyuncs.com/rj-bai/spring-boot:latest
    deploy: 
      replicas: 50
      endpoint_mode: vip
      restart_policy:
        condition: on-failure
        delay: 10s
        max_attempts: 3
        window: 10s
      update_config:
        parallelism: 10
        delay: 60s
        failure_action: rollback
      rollback_config: 
        parallelism: 10 
        delay: 60s
        failure_action: pause
    ports: 
      - 1116:1116
    networks:
      - recharge

networks: 
  recharge: 
    external: true
    name: recharge

compose文件的编写建议参考官方文档,我上面用的参数在那里都能找到,我是直接启动50spring-boot服务,没有什么服务约束,任意服务器都能跑,感觉这样测没啥意义,但是能让我心里有个底,50个服务同时启动上面的服务器能不能抗住,所以,就启动吧,上述项目没有涉及到连接mysql&mq,其他的都涉及到了,开始。

[root@docker-manager /swarm/spring-boot]# docker stack deploy -c spring-boot.yml --with-registry-auth spring-boot
[root@docker-manager /swarm/spring-boot]# docker stack ps spring-boot

大概是这种效果。

然后看logstash,日志疯狂输出,等停了应该就算完事了,看一眼写了多少日志。

算了一下大概3943条日志,现在已经启动完了,看看有没有宕掉的服务。

[root@docker-manager ~]# for i in `docker stack ls | grep -vi "NAME" | awk {'print $1'}`;
> do
>  docker stack ps $i --filter desired-state=shutdown 
> done
nothing found in stack: activemq
nothing found in stack: elasticsearch
nothing found in stack: es-head
nothing found in stack: kafka
nothing found in stack: kafka-zookeeper
nothing found in stack: kibana
nothing found in stack: logstash
nothing found in stack: mongo-express
nothing found in stack: mongodb
nothing found in stack: mysql
nothing found in stack: redis
nothing found in stack: spring-boot
nothing found in stack: zookeeper

没有,一切正常,最后瞜一眼全部服务

[root@docker-manager ~]# docker stack ls
[root@docker-manager ~]# docker service ls

没啥子卵问题,先把这个spring-boot服务删了吧,留着没啥用了,还有zabbix也可以用swarm去部署了,建议使用hosts网络去部署,之前写过,这里就不贴了,下面简单提一下jenkins

jenkins

最后简单提一下jenkins这一块,jenkins也是暂时没有放到容器里,因为jenkins服务器现在要做的事情有很多,之前说白了就是jenkins把项目包打出来,调用playbook开始更新,所以装了ansible,这样就行了,现在情况不同了。

现在要更新项目就必须用镜像去更新了,看了一下docker的插件,貌似没有我想要的,所以我暂时的做法是jenkins打包,打包成功后写Dockerfile将项目包传到准备好的jdk1.8镜像里,然后传到阿里云的镜像仓库,jenkins远程调用manager服务器上的脚本进行更新,经过确认所有项目都会映射一个端口出来,且不冲突,需要挂载一个目录到/www/logs,本目录为项目日志存储目录,所以jenkins构建成功后操作如下,变量均为jenkins内置变量。

#!/bin/bash

### 定义服务属性,自行修改
APP_FILE=jar包绝对路径
SERVER_NAME=服务名
APP_PORT=项目端口
REPLICAS=副本数
ENVIRONMENT=启动环境
NODE_LABELS=节点标签
REGISTRY=私有仓库地址
JAVA_OPTS="-server -Xms1024M -Xmx1024M -XX:CompressedClassSpaceSize=128M -Djava.security.egd=file:/dev/./urandom" 

### 登陆到私有仓库
docker login --username=xxxx --password xxxx $REGISTRY

### 创建项目名和构建次数目录
mkdir -p /data/docker/$JOB_NAME/$BUILD_NUMBER

### 复制项目包到新建目录
if [ -f $APP_FILE ];
 then
  cp $APP_FILE /data/docker/$JOB_NAME/$BUILD_NUMBER
   else
  echo "项目包不存在,脚本退出"
 exit 1
fi

### 编写Dockerfile,构建镜像
cd /data/docker/$JOB_NAME/$BUILD_NUMBER 
cp /data/init/entrypoint.sh /data/docker/$JOB_NAME/$BUILD_NUMBER 

jar=`ls *.jar`
cat >>Dockerfile<<OEF
FROM $REGISTRY/oracle-jdk:1.8
ADD $jar /
ADD entrypoint.sh /
CMD ["/bin/bash","/entrypoint.sh"]
OEF
docker build -t $REGISTRY/$JOB_NAME:$BUILD_NUMBER .

### 镜像传到私有仓库
docker push $REGISTRY/$JOB_NAME:$BUILD_NUMBER 
sleep 5

### 服务端更新
ssh swarm-manager "/scripts/deploy-service.sh" "$REGISTRY/$JOB_NAME:$BUILD_NUMBER" "$SERVER_NAME" "$APP_PORT" "$REPLICAS" "$ENVIRONMENT" "$NODE_LABELS" "$JAVA_OPTS"

manager脚本如下,所有项目调用这一个脚本就够了,只要参数别传错了。

#!/bin/bash
### 加载环境变量
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin

### 判断参数传入个数,没有做更详细的判断
IMAGE=$1
SERVER_NAME=$2
APP_PORT=$3
REPLICAS=$4
ENVIRONMENT=$5
NODE_LABELS=$6
JAVA_OPTS=`echo ${@:7}`

if [ $# -lt 7 ] ; then
    echo "USAGE: $0 请依次传入镜像地址、服务名称、端口号、副本个数、启动环境、JAVA_OPTS、节点标签名称撒"
    exit 1;
fi

### 登录到镜像仓库,下载镜像
docker login --username=xxxx --password xxxx $IMAGE 
docker pull "$IMAGE" > /dev/null 2>&1

### 判断镜像是否成功
if [ "$?" -ne 0 ]
then
    echo "Pull "$IMAGE" Failed"
    exit 1
fi

### 检查当前是否有该服务,如果有直接更新,没有创建
docker service ps "$SERVER_NAME" > /dev/null 2>&1
if [ $? -eq 0 ]
then
    docker service update --with-registry-auth --image "$IMAGE" "$SERVER_NAME" > /tmp/"$SERVER_NAME"
    cat /tmp/"$SERVER_NAME" | grep "rollback" > /dev/null 2>&1
    if [ "$?" -eq 0 ];
        then 
        echo "Update "$SERVER_NAME" fail,executed rollback"
        exit 1
    else
        echo "Update "$SERVER_NAME" Success"
        exit 0
    fi
else
    docker service create --name "$SERVER_NAME" \
    --replicas "$REPLICAS" \
    --network recharge \
    --constraint node.labels.regin=="$NODE_LABELS" \
    --with-registry-auth \
    --endpoint-mode vip \
    --publish "$APP_PORT:$APP_PORT" \
    --update-parallelism 1 \
    --update-order start-first \
    --update-failure-action rollback \
    --rollback-parallelism 1 \
    --rollback-failure-action pause \
    --health-cmd "curl 127.0.0.1:"$APP_PORT" > /dev/null 2>&1 || exit 1" \
    --health-interval 30s \
    --health-start-period 10s \
    --health-timeout 3s \
    --health-retries 3 \
    --env JAVA_OPTS="${JAVA_OPTS}" \
    --env ENVIRONMENT=$ENVIRONMENT \
    --mount type=bind,src=/www/logs,dst=/www/logs \
    $IMAGE > /dev/null
    if [ "$?" -eq 0 ]
       then
       echo "Deploy "$SERVER_NAME" Success"
     else 
       echo "Deploy "$SERVER_NAME" fail"
    exit 1
  fi
fi

顺便贴一下entrypoint.sh的内容吧,否则有些地方看着晕,不知道有些参数是干嘛的。

#!/bin/bash
## 获取jar包名称
Jar=`ls /*.jar`

## 还可以进行一些别的初始化工作

## 启动jar包
java ${JAVA_OPTS} -jar "$Jar" --spring.profiles.active="$ENVIRONMENT"

大概就是这样,至于在创建服务时使用的参数,如果你不知道是干嘛的,来这里翻一下,都能找到,我的更新策略是start-first,逻辑就是会先启动新容器,具体先启动几个新的,就看你同时更新最大任务数是多少了,也就是--update-parallelism参数,我目前更新策略是同时更新一个,所以就会启动一个新的,等新的启动完了再去关闭一个旧的,以此类推,如果你服务器资源紧张,不要开这个撒。

现在贴的这些都已经在生产环境使用了,只要参数传的没问题,就不会有叉子,像是什么更新回滚策略健康检查全部都加上了,按需调整就好,但是容器也没有加资源限制,默认情况下容器是没有任何资源限制的,但是要注意一个问题,健康检查用的命令一定要在容器中能执行,你的镜像中必须有这个命令才行,否则在创建的时候无限等待,或更新的时候在指定时间制定次数内检查失败就回滚了。

这张图是年前贴的,就不删了

后期维护遇到的问题

更新项目

今天老大找我,说再买一个服务器,能不能把全部后台迁移到新买的服务器上,我说可以,能迁移,又问我现在后台java启动内存这块配置是多大,我说是2G,老大又说太大了,1G就行了,现在的直接删了就行,重新创建就行,我说不用,更新一下服务就好,我看了一下和后台相关的服务一共三个,各启动了一个副本,所以又买了一个双核4G的阿里云服务器,买了之后常规操作将他加入swarm集群,然后开始考虑我要做的事情。

现在已经是在生产环境了,我项目部署方式上面写了,基础服务也是用的上面的部署方式,目前所有应用都扔到标签为server的节点,我当时想的很简单,服务器买完已经加入到集群了,接下来给这个服务器打一个标签,然后更新一下服务的constraintENV,应该就可以了,我的操作如下。

先给新服务器打一个标签。

[root@docker-manager ~]# docker node update --label-add regin=upms worker-11
worker-11
[root@docker-manager ~]# docker node inspect worker-11 | grep -i "label" -A2
            "Labels": {
                "regin": "upms"
            },

然后准备更新服务,先是查了一下现在的是什么,

[root@docker-manager ~]# docker service inspect upms-admin | egrep "regin|env" -A 3
                        "environment=prd",
                        "Xms=-Xms2048M",
                        "Xmx=-Xmx2048M"
                    ],
--
                        "node.labels.regin==server"
                    ],
                    "Platforms": [
                        {
[root@docker-manager ~]# docker service ps upms-admin 
ID                  NAME                IMAGE                                                NODE                DESIRED STATE       CURRENT STATE           ERROR               PORTS
k9ht7k0h1zwt        upms-admin.1        registry.cn-huhehaote.aliyuncs.com/xxx/xxx:xxx   worker-3             Running             Running 3 hours ago                       

现在是跑在worker-3节点的,现在需要把node.labels.regin==serve改为node.labels.regin==upms-Xms2048M改为-Xms1024M-Xmx2048M改为-Xmx1024M,大概就是这样,用命令去更新一下撒,更新之后应该跑在worker-11节点就对了。

[root@docker-manager ~]# docker service update \
> --env-add "Xms=-Xms1024M" \
> --env-add "Xmx=-Xmx1024M" \
> --constraint-rm node.labels.regin==server \
> --constraint-add node.labels.regin==upms \
> upms-admin
[root@docker-manager ~]# docker service inspect upms-admin | egrep "regin|env" -A 3
                        "environment=prd",
                        "Xms=-Xms1024M",
                        "Xmx=-Xmx1024M"
                    ],
--
                        "node.labels.regin==upms"
                    ],
                    "Platforms": [
                        {
--
                        "environment=prd",
                        "Xms=-Xms2048M",
                        "Xmx=-Xmx2048M"
                    ],
--
                        "node.labels.regin==server"
                    ],
                    "Platforms": [
                        {
[root@docker-manager ~]# docker service ps upms-admin
ID                  NAME                IMAGE                                                NODE                DESIRED STATE       CURRENT STATE                 ERROR               PORTS
v7m646mqanrh        upms-admin.1        registry.cn-huhehaote.aliyuncs.com/xxx/xxx:xxx   worker-11            Running             Running 37 seconds ago                            
ku3zy6b3cerd         \_ upms-admin.1    registry.cn-huhehaote.aliyuncs.com/xxx/xxx:xxx   worker-3             Shutdown            Shutdown about a minute ago                       

可以看到成功了,但是我在更新第一个项目的时候失败了,我的操作如下,下面是在我本地操作的。

失败示例

先在我本地创建一个服务吧,模拟生产环境的服务创建方式,本地的测试只有两个worker,所以先打一下标签吧。

[root@manager ~]# docker node update --label-add regin=server worker-1
[root@manager ~]# docker node update --label-add regin=upms worker-2
[root@manager ~]# docker node inspect worker-{1..2} | grep "regin"
                "regin": "server"
                "regin": "upms"

然后创建一个服务,指定在server运行,也就是worker-1,也用和生产一样的策略,有些变量我就手写了。

[root@manager ~]#     docker service create --name nginx \
>     --replicas 1 \
>     --network recharge \
>     --constraint node.labels.regin==server \
>     --with-registry-auth \
>     --endpoint-mode vip \
>     --publish 80:80 \
>     --update-parallelism 1 \
>     --update-order start-first \
>     --update-failure-action rollback \
>     --rollback-parallelism 1 \
>     --rollback-failure-action pause \
>     --health-cmd "curl 127.0.0.1:80 > /dev/null 2>&1 || exit 1" \
>     --health-interval 30s \
>     --health-start-period 30s \
>     --health-retries 3 \
>     --health-timeout 3s \
>     --env "environment="prd"" \
>     --env "Xms="-Xms2048M"" \
>     --env "Xmx="-Xmx2048M"" \
>     --mount type=bind,src=/www/logs,dst=/www/logs \
>     registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl
q0pi4boho3iqal51r5xq3wjnf
overall progress: 1 out of 1 tasks 
1/1: running   [==================================================>] 
verify: Service converged 
[root@manager ~]# docker service ps nginx 
ID                  NAME                IMAGE                                                NODE                DESIRED STATE       CURRENT STATE            ERROR               PORTS
bommsx7gbs6b        nginx.1             registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl   worker-1            Running             Running 27 seconds ago                                     

这样就创建好了,现在我想把这个服务迁到worker-2上,也就是有upms标签的节点,我迁移第一个项目的操作如下。

[root@manager ~]# docker service update --constraint-add node.labels.regin==upms nginx

结果就是这样。

nginx
overall progress: 0 out of 1 tasks 
1/1: no suitable node (scheduling constraints not satisfied on 3 nodes) 

提示找不到合适的节点,这个服务并没有受到影响,因为我指定了先启动新的再停旧的,再开一个窗口看一下现在的状态。

[root@manager ~]# docker service ps nginx 
ID                  NAME                IMAGE                                                NODE                DESIRED STATE       CURRENT STATE                ERROR                              PORTS
ppzxj3gmx2cr        nginx.1             registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl                       Running             Pending about a minute ago   "no suitable node (scheduling …"   
bommsx7gbs6b         \_ nginx.1         registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl   worker-1            Running             Running 3 minutes ago                                       

大概就这样,无法启动,操作窗口就夯住了,只能Ctrl+c取消,然后把之前的constraint删掉。

[root@manager ~]# docker service update --constraint-rm node.labels.regin==server nginx
nginx
overall progress: 1 out of 1 tasks 
1/1: running   [==================================================>] 
verify: Service converged 
[root@manager ~]# docker service ps nginx 
ID                  NAME                IMAGE                                                NODE                DESIRED STATE       CURRENT STATE             ERROR                              PORTS
wt4ni6n4r2ig        nginx.1             registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl   worker-2            Running             Running 28 seconds ago                                       
ppzxj3gmx2cr         \_ nginx.1         registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl                       Shutdown            Pending 16 minutes ago    "no suitable node (scheduling …"   
bommsx7gbs6b         \_ nginx.1         registry.cn-beijing.aliyuncs.com/rj-bai/nginx:curl   worker-1            Shutdown            Shutdown 24 seconds ago                                                                      

这样就正常启动了,所以说白了,如果你在创建时候指定过服务约束,现在想换地方,更新的时候constraint-rmconstraint-add要同时执行,否则就崩了,还好是后台,也有先启再停的策略。

总的来说其实swarm已经很强大了,但是对于一些管理细节方面和K8S相比差的还是比较远,目前swarm对于我当前是够用了,目前为止公司一套业务使用最多的服务器大概是在50台左右,所以swarm能够轻松驾驭了,最近也在琢磨k8s了,真的好复杂,估计一时半会不会在生产环境使用了,否则就是自找麻烦,暂时就这样。

最后编辑于: 2019 年 09 月 02 日
返回文章列表 文章二维码 打赏
本页链接的二维码
打赏二维码