etcd 使用详解

etcd是什么?官方给出的定义如下:

etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node.

etcd是一个高可用的、强一致性的、分布式的key-value存储,它有以下特点:

  • 使用简单,基于HTTP的API,可以使用标准的HTTP客户端进行读写,如curl
  • key-value存储,将数据存储在分层组织的目录中,类似标准文件系统
  • 监视特定键或目录的更改,并对更改做出响应
  • 安全,可选的SSL客户认证机制
  • 每个实例支持1000/s的写操作
  • Optional TTLs for keys expiration
  • 使用Raft协议实现的分布式

安装

假设集群部署环境如下:

  • etcd1: 192.168.122.11
  • etcd2: 192.168.122.243
  • etcd3: 192.168.122.41
  • OS: CentOS 7.x
  • etcd version: 3.3.18

首先下载etcd,直接下载二进制包,下载略。

启动 standalone cluster,命令如下:

1
2
3
4
5
6
7
8
9
10
11
etcd \
--data-dir=/opt/etcd/data \
--advertise-client-urls=http://192.168.122.11:2379 \
--initial-advertise-peer-urls=http://192.168.122.11:2380 \
--initial-cluster=etcd1=http://192.168.122.11:2380 \
--listen-client-urls=http://127.0.0.1:2379,http://192.168.122.11:2379 \
--listen-metrics-urls=http://127.0.0.1:2381 \
--listen-peer-urls=http://192.168.122.11:2380 \
--initial-cluster-state new \
--initial-cluster-token etcd-test-cluster-1 \
--name=etcd1

启动之后,就可以用客户端etcdctl与etcd集群进行交互:

1
2
3
4
export ETCDCTL_API=3
etcdctl put foo bar
etcdctl put foo1 bar1
etcdctl get foo

standalone模式的集群会成为整个基础架构的单点,接下来增加两个member(etcd2和etcd3)来实现高可用。首先把ectd2增加到集群中,需要通过客户端命令为集群增加member,在ectd1上执行:

1
etcdctl member add etcd2 --peer-urls "http://192.168.122.243:2380"

然后在etcd2上执行如下命令启动etdc服务:

1
2
3
4
5
6
7
8
9
10
etcd \
--data-dir /opt/etcd/data \
--advertise-client-urls http://192.168.122.243:2379 \
--initial-advertise-peer-urls http://192.168.122.243:2380 \
--initial-cluster-state existing \
--initial-cluster "etcd1=http://192.168.122.11:2380,etcd2=http://192.168.122.243:2380" \
--listen-client-urls http://127.0.0.1:2379,http://192.168.122.243:2379 \
--listen-metrics-urls http://127.0.0.1:2381 \
--listen-peer-urls http://192.168.122.243:2380 \
--name etcd2

注意这里 --initial-cluster-state--initial-cluster 参数。

再把ectd3加入集群,同样,执行命令增加member:

1
etcdctl member add etcd3 --peer-urls "http://192.168.122.41:2380"

在etcd3上启动服务:

1
2
3
4
5
6
7
8
9
10
etcd \
--data-dir /opt/etcd/data \
--advertise-client-urls http://192.168.122.41:2379 \
--initial-advertise-peer-urls http://192.168.122.41:2380 \
--initial-cluster-state existing \
--initial-cluster "etcd2=http://192.168.122.243:2380,etcd1=http://192.168.122.11:2380,etcd3=http://192.168.122.41:2380" \
--listen-client-urls http://127.0.0.1:2379,http://192.168.122.41:2379 \
--listen-metrics-urls http://127.0.0.1:2381 \
--listen-peer-urls http://192.168.122.41:2380 \
--name etcd3

到此完成etcd集群的部署,这个过程也可以作为有standalone改造为cluster的操作步骤。

TLS

etcd支持通过tls协议对数据通信进行加密,包括etcd peer之间的通信和client跟etcd的通信。如果连接需要互相验证,这种情况需要通过统一的证书管理中心(CA)来创建etcd实例及client的证书。这里使用CloudFlare的一个PKI工具cfssl来管理整个公钥基础设施。

首先创建Certificate Authority及配置文件,用来对接下来的tls证书进行授权:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
{

cat > ca-config.json <<EOF
{
"signing": {
"default": {
"expiry": "8760h"
},
"profiles": {
"etcd": {
"usages": ["signing", "key encipherment", "server auth", "peer auth", "client auth"],
"expiry": "8760h"
}
}
}
}
EOF

cat > ca-csr.json <<EOF
{
"CN": "etcd",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "Xiamen",
"O": "Corp",
"OU": "CA",
"ST": "Fujian"
}
]
}
EOF

cfssl gencert -initca ca-csr.json | cfssljson -bare ca

执行后,会生成ca证书及秘钥:

1
2
ca-key.pem
ca.pem

接下来为etcd集群的3个节点创建证书,每个节点配置文件如下:
etcd1-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"CN": "etcd1",
"hosts": [
"etcd1",
"192.168.122.11",
"127.0.0.1"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "Xiamen",
"O": "Corp",
"OU": "CA",
"ST": "Fujian"
}
]
}

etcd2-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"CN": "etcd2",
"hosts": [
"etcd2",
"192.168.122.243",
"127.0.0.1"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "Xiamen",
"O": "Corp",
"OU": "CA",
"ST": "Fujian"
}
]
}

etcd3-csr.json

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
"CN": "etcd3",
"hosts": [
"etcd3",
"192.168.122.41",
"127.0.0.1"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "Xiamen",
"O": "Corp",
"OU": "CA",
"ST": "Fujian"
}
]
}

创建证书:

1
2
3
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd1-csr.json | cfssljson -bare etcd1
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd2-csr.json | cfssljson -bare etcd2
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd3-csr.json | cfssljson -bare etcd3

执行后,会生成以下文件,把etcd2和etcd3各自证书及ca.pem拷贝到各自服务器上,假设证书都放置在/opt/etcd目录下:

1
2
3
4
5
6
etcd1.pem
etcd1-key.pem
etcd2.pem
etcd2-key.pem
etcd3.pem
etcd3-key.pem

启动etcd1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
etcd \
--name etcd1 \
--data-dir=/opt/etcd/data1 \
--initial-advertise-peer-urls=https://192.168.122.11:2380 \
--listen-peer-urls=https://192.168.122.11:2380 \
--listen-client-urls=https://127.0.0.1:2379,https://192.168.122.11:2379 \
--advertise-client-urls=https://192.168.122.11:2379 \
--listen-metrics-urls=http://127.0.0.1:2381 \
--initial-cluster "etcd1=https://192.168.122.11:2380,etcd2=https://192.168.122.243:2380,etcd3=https://192.168.122.41:2380" \
--initial-cluster-state new \
--initial-cluster-token etcd-test-cluster-1 \
--client-cert-auth \
--trusted-ca-file=/opt/etcd/ca.pem \
--cert-file=/opt/etcd/etcd1.pem \
--key-file=/opt/etcd/etcd1-key.pem \
--peer-client-cert-auth \
--peer-trusted-ca-file=/opt/etcd/ca.pem \
--peer-cert-file=/opt/etcd/etcd1.pem \
--peer-key-file=/opt/etcd/etcd1-key.pem

启动etcd2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
etcd \
--name etcd2 \
--data-dir=/opt/etcd/data1 \
--initial-advertise-peer-urls=https://192.168.122.243:2380 \
--listen-peer-urls=https://192.168.122.243:2380 \
--listen-client-urls=https://127.0.0.1:2379,https://192.168.122.243:2379 \
--advertise-client-urls=https://192.168.122.243:2379 \
--listen-metrics-urls=http://127.0.0.1:2381 \
--initial-cluster "etcd1=https://192.168.122.11:2380,etcd2=https://192.168.122.243:2380,etcd3=https://192.168.122.41:2380" \
--initial-cluster-state new \
--initial-cluster-token etcd-test-cluster-1 \
--client-cert-auth \
--trusted-ca-file=/opt/etcd/ca.pem \
--cert-file=/opt/etcd/etcd2.pem \
--key-file=/opt/etcd/etcd2-key.pem \
--peer-client-cert-auth \
--peer-trusted-ca-file=/opt/etcd/ca.pem \
--peer-cert-file=/opt/etcd/etcd2.pem \
--peer-key-file=/opt/etcd/etcd2-key.pem

启动etcd3:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
etcd \
--name etcd3 \
--data-dir=/opt/etcd/data1 \
--initial-advertise-peer-urls=https://192.168.122.41:2380 \
--listen-peer-urls=https://192.168.122.41:2380 \
--listen-client-urls=https://127.0.0.1:2379,https://192.168.122.41:2379 \
--advertise-client-urls=https://192.168.122.41:2379 \
--listen-metrics-urls=http://127.0.0.1:2381 \
--initial-cluster "etcd1=https://192.168.122.11:2380,etcd2=https://192.168.122.243:2380,etcd3=https://192.168.122.41:2380" \
--initial-cluster-state new \
--initial-cluster-token etcd-test-cluster-1 \
--client-cert-auth \
--trusted-ca-file=/opt/etcd/ca.pem \
--cert-file=/opt/etcd/etcd3.pem \
--key-file=/opt/etcd/etcd3-key.pem \
--peer-client-cert-auth \
--peer-trusted-ca-file=/opt/etcd/ca.pem \
--peer-cert-file=/opt/etcd/etcd3.pem \
--peer-key-file=/opt/etcd/etcd3-key.pem

集群启动后,检查日志是否正常,也可以通过curl来进行健康检查,在etcd1服务器上:

1
curl --cacert ca.pem --cert etcd1.pem --key etcd1-key.pem https://domain-name:2379/health

集群正常的话会返回以下信息:

1
{"health":"true"}

在本例中,我们启动了一个全新的实例(–data-dir使用了新的路径),如果要把开始的非tls集群改造成tls集群,需要怎么做呢?

非tls转换为tls

首先使用上文的方式创建相应的证书,证书创建完,在原来集群的基础上,对etcd集群配置peerURLs进行更新,假设在etcd1上进行操作,查看member信息:

1
2
export ETCDCTL_API=3
etcdctl member list

输出如下:

1
2
3
13acd07d0ffd081a, started, etcd2, http://192.168.122.243:2380, http://192.168.122.243:2379
4bb6fb458796057d, started, etcd3, http://192.168.122.41:2380, http://192.168.122.41:2379
95291b72fbb71ec3, started, etcd1, http://192.168.122.11:2380, http://192.168.122.11:2379

更新member etcd2和etcd3的peerURLs,使用tls加密:

1
2
etcdctl member update 13acd07d0ffd081a --peer-urls="https://192.168.122.243:2380"
etcdctl member update 4bb6fb458796057d --peer-urls="https://192.168.122.41:2380"

更改配置后,会导致整个集群etcd不可能,这时需要重启etcd2和etcd3,使用tls方式进行启动,命令如下:
etcd2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
etcd \
--name etcd2 \
--data-dir=/opt/etcd/data \
--initial-advertise-peer-urls=https://192.168.122.243:2380 \
--listen-peer-urls=https://192.168.122.243:2380 \
--listen-client-urls=https://127.0.0.1:2379,https://192.168.122.243:2379 \
--advertise-client-urls=https://192.168.122.243:2379 \
--listen-metrics-urls=http://127.0.0.1:2381 \
--initial-cluster "etcd1=https://192.168.122.11:2380,etcd2=https://192.168.122.243:2380,etcd3=https://192.168.122.41:2380" \
--client-cert-auth \
--trusted-ca-file=/opt/etcd/ca.pem \
--cert-file=/opt/etcd/etcd2.pem \
--key-file=/opt/etcd/etcd2-key.pem \
--peer-client-cert-auth \
--peer-trusted-ca-file=/opt/etcd/ca.pem \
--peer-cert-file=/opt/etcd/etcd2.pem \
--peer-key-file=/opt/etcd/etcd2-key.pem

etcd3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
etcd \
--name etcd3 \
--data-dir=/opt/etcd/data \
--initial-advertise-peer-urls=https://192.168.122.41:2380 \
--listen-peer-urls=https://192.168.122.41:2380 \
--listen-client-urls=https://127.0.0.1:2379,https://192.168.122.41:2379 \
--advertise-client-urls=https://192.168.122.41:2379 \
--listen-metrics-urls=http://127.0.0.1:2381 \
--initial-cluster "etcd1=https://192.168.122.11:2380,etcd2=https://192.168.122.243:2380,etcd3=https://192.168.122.41:2380" \
--client-cert-auth \
--trusted-ca-file=/opt/etcd/ca.pem \
--cert-file=/opt/etcd/etcd3.pem \
--key-file=/opt/etcd/etcd3-key.pem \
--peer-client-cert-auth \
--peer-trusted-ca-file=/opt/etcd/ca.pem \
--peer-cert-file=/opt/etcd/etcd3.pem \
--peer-key-file=/opt/etcd/etcd3-key.pem

etcd2和etcd3启动之后,集群会变为正常,但是这时etcd1还未加入集群,再更改etcd1的peerURLs:

1
2
etcdctl --endpoints="192.168.122.243:2379" --cacert=./ca.pem --cert=./etcd1.pem --key=./etcd1-key.pem member list
etcdctl --endpoints="192.168.122.243:2379" --cacert=./ca.pem --cert=./etcd1.pem --key=./etcd1-key.pem member update 95291b72fbb71ec3 --peer-urls="https://192.168.122.11:2380"

更新完后,重启etcd1上etcd服务,启动命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
etcd \
--name etcd1 \
--data-dir=/opt/etcd/data \
--initial-advertise-peer-urls=https://192.168.122.11:2380 \
--listen-peer-urls=https://192.168.122.11:2380 \
--listen-client-urls=https://127.0.0.1:2379,https://192.168.122.11:2379 \
--advertise-client-urls=https://192.168.122.11:2379 \
--listen-metrics-urls=http://127.0.0.1:2381 \
--initial-cluster "etcd1=https://192.168.122.11:2380,etcd2=https://192.168.122.243:2380,etcd3=https://192.168.122.41:2380" \
--client-cert-auth \
--trusted-ca-file=/opt/etcd/ca.pem \
--cert-file=/opt/etcd/etcd1.pem \
--key-file=/opt/etcd/etcd1-key.pem \
--peer-client-cert-auth \
--peer-trusted-ca-file=/opt/etcd/ca.pem \
--peer-cert-file=/opt/etcd/etcd1.pem \
--peer-key-file=/opt/etcd/etcd1-key.pem

再次检查日志,这时etcd1也加入集群了。

etcd in Kubernetes

在Kubernetes集群中,etcd实例可以跟部署在master上,也可以独立出来部署,两种方式拓扑图如下:
部署在master上

独立部署方式

以下序列图是当一个pod创建时涉及到的组件,以及apiserver和etcd的交互过程:

获取etcd所有的key/value数据:

1
2
3
4
5
6
7
8
9
ADVERTISE_URL="https://192.168.122.12:2379"

kubectl exec etcd-node-01 -n kube-system -- sh -c \
"ETCDCTL_API=3 etcdctl \
--endpoints $ADVERTISE_URL \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--key /etc/kubernetes/pki/etcd/server.key \
--cert /etc/kubernetes/pki/etcd/server.crt \
get \"\" --prefix=true -w json" > etcd-kv.json

参考