Apache Mesos?

UC Berkeley에서 Nexus 라는 이름으로 개발이 진행되던 프로젝트가 Mesos라는 이름으로 Apache 재단에 오픈소스로 발표된 프로젝트로써, Cloud Infrastructure 및 Computing Engine들의 자원을 통합적으로 관리 할수 있도록 만든 자원관리 프로젝트 입니다.

Mesos를 활용하는 기업으로는 Twitter, Facebook, eBay, Riot Games가 있는데, 트위터의 경우 SNS 사용량이 급증함에 따라, 엔지니어들이 이 문제를 개선하기 위해, 트위터의 모든 서비스들을 독립적으로 실행 되도록 만들었고, 이 독립적 서비스들을 관리할 도구로 Mesos를 채택 했다고 합니다.
또한, 리그오브레전드(LOL) 게임 제작사인 라이엇 게임사(Riot Games)의 경우 Mesos와 Docker, Marathon을 이용하여 웹 규모(web-scale)에서의 고가용성(HA, High Availability) 및 내결함성(fault-tolerant) 기능을 구현하여 서비스 중이고, eBay는 Docker, Mesos, Marathon, Jenkins를 활용하여 서비스 중인걸로 알고 있습니다.

 

Mesos Architecture

아키텍처를 살펴보면 Mesos Master와 장애 발생시 대체할 Standby Master, Automatic Failover를 하도록 구성된 ZooKeeper Quorum, Mesos Slave 서버들로 구성되어 있습니다.

mesos_architecture

Figure 1. Mesos Architecture

 

자원 할당 예

아래 그림을 통해 Mesos가 어떻게 자원(Resource)관리와 작업(Job)을 수행하는지 살펴 보겠습니다.

1. Slave 1 서버는 4CPU와 4GB 메모리를 사용할수 있다고 Master 서버에게 보고하기 위해, Master의 할당 정책 모듈을 호출 합니다.
2. Master 서버는 Slave 1 서버로 부터 보고 받은 자원 정보를 Framework 1에 보고 합니다.
3. Framework 1에서 Master로 부터 자원 정보를 가지고 Job1, 2를 수행토록 지시 합니다.
4. Master 서버는 Framework 1로 부터 지시받은 것을 Slave 1 서버에 전송하고, 이를 받은 Slave 1은 해당 작업을 수행 하게 됩니다.

architecture-example

 Figure 2. Example of resource offer

실제 구성

좀더 이해를 돕기 위해, Mesos Architecture와 동일하게 구성하고, Framework는 Marathon을 사용 해보도록 하겠습니다.
본 구성은 Ubuntu 14.04 리눅스에서 테스트 되었습니다.

구성 될 서버 IP는 아래와 같고, Master와 Slave의 공통 설정 부분은 “root@mesos:~#” 로, Master 서버들의 공통 설정 부분들은 “root@mesos-master:~#”, Slave 서버들의 공통 설정 부분들도 “root@mesos-slave:~#” 형태로 표현 될 것입니다.

  Hostname             IP
mesos-master-1     172.17.1.7
mesos-master-2     172.17.1.8
mesos-master-3     172.17.1.9
mesos-slave-1      172.17.1.10
mesos-slave-2      172.17.1.11
mesos-slave-3      172.17.1.12
mesos-marathon     172.17.1.13

 

1. 공통 부분
이 부분에서는 Mesos를 소스로 빌드하기 위해 필요한 작업입니다. Mesos를 패키지 형식으로 설치를 하려면,
http://mesosphere.io/downloads/ 를 방문하여 해당 배포판에 맞는 것을 다운로드하고 설치 하여서 구축하시기 바랍니다.

– 필요 패키지 설치

root@mesos:~# apt-get install -y git curl autoconf libtool build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev

 

– JDK 설치

root@mesos:~# curl -LO "http://download.oracle.com/otn-pub/java/jdk/8u11-b12/jdk-8u11-linux-x64.tar.gz" \
-H 'Cookie: oraclelicense=accept-securebackup-cookie'
root@mesos:~# tar xzf jdk-8u11-linux-x64.tar.gz && rm -f jdk-8u11-linux-x64.tar.gz
root@mesos:~# mv jdk1.8.0_11 /usr/local/jdk
root@mesos:~# echo '# JDK' >> /etc/profile
root@mesos:~# echo 'export JAVA_HOME=/usr/local/jdk' >> /etc/profile
root@mesos:~# echo 'export PATH=$PATH:$JAVA_HOME/bin' >> /etc/profile

 

– Maven 설치

root@mesos:~# curl -LO "http://www.us.apache.org/dist/maven/maven-3/3.2.2/binaries/apache-maven-3.2.2-bin.tar.gz"
root@mesos:~# tar xzf apache-maven-3.2.2-bin.tar.gz && rm -f apache-maven-3.2.2-bin.tar.gz
root@mesos:~# mv apache-maven-3.2.2 /opt/maven-3.2.2
root@mesos:~# echo '# Maven' >> /etc/profile
root@mesos:~# echo 'export M2_HOME=/opt/maven-3.2.2' >> /etc/profile
root@mesos:~# echo 'export M2=$M2_HOME/bin' >> /etc/profile
root@mesos:~# echo 'export PATH=$PATH:$M2' >> /etc/profile

 

– 환경 변수 적용

root@mesos:~# source /etc/profile

 

– Mesos 빌드

root@mesos:~# git clone https://github.com/apache/mesos
root@mesos:~# cd mesos
root@mesos:~# ./bootstrap
root@mesos:~# mkdir build && cd build
root@mesos:~# ../configure && make && make install

 

– Mesos 환경 변수 설정 및 적용

root@mesos:~# echo '# Mesos' >> /etc/profile
root@mesos:~# echo 'export MESOS_HOME=/opt/mesos/build' >> /etc/profile
root@mesos:~# echo 'export PATH=$PATH:$MESOS_HOME/bin' >> /etc/profile
root@mesos:~# source /etc/profile

 

2. Master 설정
– ZooKeeper 설정

root@mesos-master:~# echo '# ZooKeeper ' >> /etc/profile
root@mesos-master:~# echo 'export ZK_HOME=$MESOS_HOME/3rdparty/zookeeper-3.4.5' >> /etc/profile
root@mesos-master:~# echo 'export PATH=$PATH:$ZK_HOME/bin' >> /etc/profile
root@mesos-master:~# source /etc/profile

 

ZooKeeper의 sample 설정 파일을 복사 하고, 데이터로 사용 될 디렉토리를 변경 합니다.

root@mesos-master:~# cd $MESOS_HOME/build/3rdparty/zookeeper-3.4.5/conf
root@mesos-master:~# cp zoo_sample.cfg zoo.cfg
root@mesos-master:~# sed -i '/^dataDir/ s:.*:dataDir=/opt/zk-data:' zoo.cfg

 

동일한 standalone 형태의 ZooKeeper들을 복제 모드로 설정 합니다. (이 구성을 quorum이라 합니다.)
아래와 같이 각각의 Master 서버들의 IP를 입력 합니다.

# server.[myid]=[zookeeepr ip or host]:[Follow와 Leader간의 연결 포트]:[Leader 선출 포트]
root@mesos-master:~# echo 'server.1=172.17.1.7:2888:3888' >> $MESOS_HOME/build/3rdparty/zookeeper-3.4.5/conf/zoo.cfg
root@mesos-master:~# echo 'server.2=172.17.1.8:2888:3888' >> $MESOS_HOME/build/3rdparty/zookeeper-3.4.5/conf/zoo.cfg
root@mesos-master:~# echo 'server.3=172.17.1.9:2888:3888' >> $MESOS_HOME/build/3rdparty/zookeeper-3.4.5/conf/zoo.cfg

 

ZooKeeper 서버들에게 myid 번호를 부여 합니다.
mesos-master-1의 경우 myid는 1, mesos-master-2의 경우 myid는 2, mesos-master-3의 경우 myid는 3으로 설정 합니다.

root@mesos-master:~# mkdir /opt/zk-data
root@mesos-master:~# echo '1' > /opt/zk-data/myid

 

– ZooKeeper 실행

root@mesos-master:~# zkServer.sh start

JMX enabled by default
Using config: /opt/mesos/build/3rdparty/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

 

3. Mesos

Mesos 를 실행하기 앞서 주의 할점이 있는데, Master 서버에서 Mesos를 실행하고 최대 1분내에 Slave 서버가 Master 서버로 자신의 자원을 보고할 준비가 되어있지 않으면, Master의 Mesos 서버는 자살(suicide) 하게 됩니다. 그래서, slave 서버들부터 시작 후에 Master 서버들을 실행 하도록 합니다.

– Mesos Slave 실행
Master들의 ZooKeeper IP, Port, Path를 입력 후 실행 합니다.

root@mesos-master:~# mesos-slave.sh --master=zk://172.17.1.7:2181,172.17.1.8:2181,172.17.1.9:2181/mesos

 

실행과 동시에 cpu 및 memory가 격리(isolation)되어, Slave 서버의 자원을 수집하는 것을 알수 있습니다.

I0804 11:46:24.547829   153 main.cpp:126] Build: 2014-08-01 15:18:57 by
I0804 11:46:24.548681   153 main.cpp:128] Version: 0.20.0
I0804 11:46:24.548992   153 main.cpp:135] Git SHA: 3047bbe41c92978ed14547070142c7b6a3a6ea9c
I0804 11:46:24.549372   153 containerizer.cpp:124] Using isolation: posix/cpu,posix/mem
I0804 11:46:24.549829   153 main.cpp:149] Starting Mesos slave
2014-08-04 11:46:24,550:153(0x7fa06d9ab700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2014-08-04 11:46:24,550:153(0x7fa06d9ab700):ZOO_INFO@log_env@716: Client environment:host.name=mesos-slave-1
2014-08-04 11:46:24,550:153(0x7fa06d9ab700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2014-08-04 11:46:24,550:153(0x7fa06d9ab700):ZOO_INFO@log_env@724: Client environment:os.arch=3.8.0-39-generic
2014-08-04 11:46:24,550:153(0x7fa06d9ab700):ZOO_INFO@log_env@725: Client environment:os.version=#57~precise1-Ubuntu SMP Tue Apr 1 20:04:50 UTC 2014
2014-08-04 11:46:24,550:153(0x7fa06d9ab700):ZOO_INFO@log_env@733: Client environment:user.name=root
2014-08-04 11:46:24,551:153(0x7fa06d9ab700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2014-08-04 11:46:24,551:153(0x7fa06d9ab700):ZOO_INFO@log_env@753: Client environment:user.dir=/root
2014-08-04 11:46:24,551:153(0x7fa06d9ab700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=172.17.1.7:2181,172.17.1.8:2181,172.17.1.9:2181 sessionTimeout=10000 watcher=0x7fa073028270 sessionId=0 sessionPasswd=<null> context=0x7fa04c000c40 flags=0
I0804 11:46:24.552538   153 slave.cpp:169] Slave started on 1)@172.17.1.10:5051
I0804 11:46:24.553022   153 slave.cpp:280] Slave resources: cpus(*):2; mem(*):2930; disk(*):63045; ports(*):[31000-32000]
I0804 11:46:24.553424   153 slave.cpp:325] Slave hostname: mesos-slave-1
I0804 11:46:24.553668   153 slave.cpp:326] Slave checkpoint: true
2014-08-04 11:46:24,555:153(0x7fa067fff700):ZOO_INFO@check_events@1703: initiated connection to server [172.17.1.7:2181]
I0804 11:46:24.555593   174 state.cpp:33] Recovering state from '/tmp/mesos/meta'
2014-08-04 11:46:24,558:153(0x7fa067fff700):ZOO_INFO@check_events@1750: session establishment complete on server [172.17.1.7:2181], sessionId=0x1479c5040d2000e, negotiated timeout=10000
I0804 11:46:24.559201   171 group.cpp:313] Group process (group(1)@172.17.1.10:5051) connected to ZooKeeper
I0804 11:46:24.559284   171 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0804 11:46:24.559373   171 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I0804 11:46:24.561661   176 status_update_manager.cpp:193] Recovering status update manager
I0804 11:46:24.561902   175 containerizer.cpp:287] Recovering containerizer
I0804 11:46:24.562355   153 slave.cpp:3129] Finished recovery

 

조금 더 기다려보면 아래와 같은 메세지가 나옵니다.
이 메세지는 Slave 서버가 Master 서버로 자신의 자원을 보고할 준비가 되었다는 의미로 볼수 있습니다.

I0804 11:47:24.555347   171 slave.cpp:2984] Current usage 38.55%. Max allowed age: 3.601643301718310days

 

– Mesos Master 실행
Mesos 실행 옵션에 대해 간략 설명 하자면 아래와 같습니다.

--cluster : mesos master 서버의 클러스터 이름을 지정 합니다.
--quorum : 전체 master 서버를 2로 나눈수 보다 큰 값을 적습니다.
--zk : ZooKeeper IP와 Port, znode의 path를 적습니다.

 

Master Mesos 서버를 실행 합니다.

root@mesos-master:~# mesos-master.sh \
--cluster=ruo91-cluster \
--quorum=2 \
--zk=zk://172.17.1.7:2181,172.17.1.8:2181,172.17.1.8:2181/mesos \
--work_dir=/var/lib/mesos

 

실행 후 Slave 서버를 등록하고, 할당 정책 모듈을 호출 하는것을 알수 있습니다.

I0804 11:58:31.566506   569 master.cpp:2803] Registered slave 20140804-115806-117510572-5050-551-2 at slave(1)@172.17.1.10:5051 (mesos-slave-1)
I0804 11:58:31.566519   569 master.cpp:3973] Adding slave 20140804-115806-117510572-5050-551-2 at slave(1)@172.17.1.10:5051 (mesos-slave-1) with cpus(*):2; mem(*):2930; disk(*):63045; ports(*):[31000-32000]
I0804 11:58:31.566720   569 coordinator.cpp:340] Coordinator attempting to write TRUNCATE action at position 16
I0804 11:58:31.566792   569 hierarchical_allocator_process.hpp:444] Added slave 20140804-115806-117510572-5050-551-2 (mesos-slave-1) with cpus(*):2; mem(*):2930; disk(*):63045; ports(*):[31000-32000] (and cpus(*):2; mem(*):2930; disk(*):63045; ports(*):[31000-32000] available)
I0804 11:58:31.566947   569 replica.cpp:508] Replica received write request for position 16
I0804 11:58:31.573060   569 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took 6.075445ms
I0804 11:58:31.573268   569 replica.cpp:676] Persisted action at 16
I0804 11:58:31.589575   570 replica.cpp:655] Replica received learned notice for position 16
I0804 11:58:31.597354   570 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took 6.502205ms
I0804 11:58:31.597897   570 leveldb.cpp:401] Deleting ~2 keys from leveldb took 27097ns
I0804 11:58:31.598217   570 replica.cpp:676] Persisted action at 16
I0804 11:58:31.598479   570 replica.cpp:661] Replica learned TRUNCATE action at position 16

 

Slave 측에선 172.17.1.7 서버가 leader로 선출되어 정보를 가져가는걸 볼수 있고, slave의 정보를 /tmp/mesos 디렉토리로 저장하고 있는걸 볼수 있습니다.

I0804 11:58:06.491782   200 detector.cpp:138] Detected a new leader: (id='10')
I0804 11:58:06.492204   200 group.cpp:658] Trying to get '/mesos/info_0000000010' in ZooKeeper
I0804 11:58:06.493782   200 detector.cpp:382] A new leading master (UPID=master@172.17.1.7:5050) is detected
I0804 11:58:06.493978   200 slave.cpp:601] New master detected at master@172.17.1.7:5050
I0804 11:58:06.494534   199 status_update_manager.cpp:167] New master detected at master@172.17.1.7:5050
I0804 11:58:06.494439   200 slave.cpp:637] No credentials provided. Attempting to register without authentication
I0804 11:58:06.494740   200 slave.cpp:650] Detecting new master
I0804 11:58:31.567167   200 slave.cpp:768] Registered with master master@172.17.1.7:5050; given slave ID 20140804-115806-117510572-5050-551-2
I0804 11:58:31.567409   200 slave.cpp:781] Checkpointing SlaveInfo to '/tmp/mesos/meta/slaves/20140804-115806-117510572-5050-551-2/slave.info'
root@mesos-slave-1:~# strings /tmp/mesos/meta/slaves/20140804-115806-117510572-5050-551-2/slave.info
mesos-slave-1
mesos-slave-1
cpus
disk
ports
$20140804-115806-117510572-5050-551-28

 

Mesos Master에서는 5050 포트를 통해 Web UI를 제공합니다.

http://172.17.1.7:5050/

mesos-master-1

Figure 3. Mesos Master Web UI

Slave 서버들의 자원 정보를 볼수 있습니다.

mesos-master-2Figure 4. Mesos Master Web UI: Slave Status

4. Framework
Framework로 Marathon을 사용해서 “자원 할당 예”와 같이 Framework가 지시한 작업을 수행할수 있는지 확인 해보겠습니다.

– JDK 설치

root@mesos-marathon:~# curl -LO "http://download.oracle.com/otn-pub/java/jdk/8u11-b12/jdk-8u11-linux-x64.tar.gz" \
-H 'Cookie: oraclelicense=accept-securebackup-cookie'
root@mesos-marathon:~# tar xzf jdk-8u11-linux-x64.tar.gz && rm -f jdk-8u11-linux-x64.tar.gz
root@mesos-marathon:~# mv jdk1.8.0_11 /usr/local/jdk
root@mesos-marathon:~# echo '# JDK' >> /etc/profile
root@mesos-marathon:~# echo 'export JAVA_HOME=/usr/local/jdk' >> /etc/profile
root@mesos-marathon:~# echo 'export PATH=$PATH:$JAVA_HOME/bin' >> /etc/profile

 

– Scala 설치

root@mesos-marathon:~# curl -LO "http://www.scala-lang.org/files/archive/scala-2.11.2.deb"
root@mesos-marathon:~# dpkg -i scala-2.11.2.deb && apt-get -f install

 

– sbt 설치

root@mesos-marathon:~# curl -LO "http://scalasbt.artifactoryonline.com/scalasbt/sbt-native-packages/org/scala-sbt/sbt/0.13.1/sbt.deb"
root@mesos-marathon:~# dpkg -i sbt.deb

 

– Marathon 설치

root@mesos-marathon:~# git clone https://github.com/mesosphere/marathon.git /opt/marathon
root@mesos-marathon:~# cd /opt/marathon
root@mesos-marathon:~# sbt assembly

 

– Marathon 실행
Master의 ZooKeeper IP, Port, Path를 지정하여 실행 합니다.

root@mesos-marathon:~# /opt/marathon/bin/start \
--master zk://172.17.1.7:2181,172.17.1.8:2181,172.17.1.9:2181/mesos \
--zk zk://172.17.1.7:2181,172.17.1.8:2181,172.17.1.9:2181/marathon

 

Marathon의 Web UI를 통해 작업(Job)을 만들어 보겠습니다.

http://172.17.1.13:8080/

 “yongbok-ping-test-1″이라는 Application ID와 1CPU, 16MB의 메모리, 인스턴스 2개에서 “ping localhost”라는 명령어를 실행 하도록 만들겠습니다.

marathon-0

Figure 5.Marathon Framework

만들어진 모습입니다.

marathon-1

Figure 6.Marathon Framework 2

해당 ID를 열어 보면 2개의 인스턴스에서 해당 명령어들이 수행되고 있는걸 볼수 있습니다.

marathon-2Figure 7.Marathon Framework 3

그럼, 위에서 생성한 Application이 만들어지고 나서 Master와 Slave 서버에서는 어떤일이 벌어지는지 메세지를 통해 확인 해보겠습니다.

– Master
할당 된 작업이 있다는 메세지를 받아, 해당 작업을 수행할수 있는 Slave 서버들을 선별하여, 선별 된 Slave 서버에 해당 작업을 수행토록 지시 합니다.

I0804 13:39:51.023449   198 slave.cpp:1004] Got assigned task yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a for framework 20140804-115806-117510572-5050-551-0000
I0804 13:39:51.024117   198 gc.cpp:84] Unscheduling '/tmp/mesos/slaves/20140804-115806-117510572-5050-551-2/frameworks/20140804-115806-117510572-5050-551-0000' from gc
I0804 13:39:51.024225   198 slave.cpp:1114] Launching task yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a for framework 20140804-115806-117510572-5050-551-0000
I0804 13:39:51.026095   198 slave.cpp:1224] Queuing task 'yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a' for executor yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a of framework '20140804-115806-117510572-5050-551-0000
I0804 13:39:51.026212   205 containerizer.cpp:427] Starting container 'b986b1d1-a0d4-4698-a8f2-d6486d03f364' for executor 'yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a' of framework '20140804-115806-117510572-5050-551-0000'
I0804 13:39:51.027415   205 launcher.cpp:137] Forked child with pid '407' for container 'b986b1d1-a0d4-4698-a8f2-d6486d03f364'
I0804 13:39:51.031493   205 containerizer.cpp:537] Fetching URIs for container 'b986b1d1-a0d4-4698-a8f2-d6486d03f364' using command '/opt/mesos/build/src/mesos-fetcher'
I0804 13:39:52.001512   204 slave.cpp:2471] Monitoring executor 'yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a' of framework '20140804-115806-117510572-5050-551-0000' in container 'b986b1d1-a0d4-4698-a8f2-d6486d03f364'
I0804 13:39:52.050065   204 slave.cpp:1735] Got registration for executor 'yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a' of framework 20140804-115806-117510572-5050-551-0000
I0804 13:39:52.050689   204 slave.cpp:1854] Flushing queued task yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a for executor 'yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a' of framework 20140804-115806-117510572-5050-551-0000
I0804 13:39:52.054731   201 slave.cpp:2089] Handling status update TASK_RUNNING (UUID: a556d671-5adb-4e76-9db3-0ddcf067426c) for task yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a of framework 20140804-115806-117510572-5050-551-0000 from executor(1)@172.17.1.10:47717
I0804 13:39:52.054900   201 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: a556d671-5adb-4e76-9db3-0ddcf067426c) for task yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a of framework 20140804-115806-117510572-5050-551-0000
I0804 13:39:52.054975   201 status_update_manager.cpp:373] Forwarding status update TASK_RUNNING (UUID: a556d671-5adb-4e76-9db3-0ddcf067426c) for task yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a of framework 20140804-115806-117510572-5050-551-0000 to master@172.17.1.7:5050
I0804 13:39:52.055044   201 slave.cpp:2253] Sending acknowledgement for status update TASK_RUNNING (UUID: a556d671-5adb-4e76-9db3-0ddcf067426c) for task yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a of framework 20140804-115806-117510572-5050-551-0000 to executor(1)@172.17.1.10:47717
I0804 13:39:52.062878   199 status_update_manager.cpp:398] Received status update acknowledgement (UUID: a556d671-5adb-4e76-9db3-0ddcf067426c) for task yongbok-ping-test-1.5f3f2894-1b91-11e4-a55c-f61bafe44c8a of framework 20140804-115806-117510572-5050-551-0000

 

– Slave
Master 서버로 부터 받은 지시 작업을 수행 합니다.

I0804 13:40:33.049464   566 hierarchical_allocator_process.hpp:588] Framework 20140804-115806-117510572-5050-551-0000 filtered slave 20140804-115806-117510572-5050-551-0 for 5secs
I0804 13:40:35.841415   569 master.cpp:3282] Performing explicit task state reconciliation for 2 tasks of framework 20140804-115806-117510572-5050-551-0000
I0804 13:40:39.044380   568 master.cpp:3452] Sending 3 offers to framework 20140804-115806-117510572-5050-551-0000
I0804 13:40:39.048141   568 master.cpp:2126] Processing reply for offers: [ 20140804-115806-117510572-5050-551-627 ] on slave 20140804-115806-117510572-5050-551-1 at slave(1)@172.17.1.12:5051 (mesos-slave-3) for framework 20140804-115806-117510572-5050-551-0000
I0804 13:40:39.048331   568 master.cpp:2126] Processing reply for offers: [ 20140804-115806-117510572-5050-551-628 ] on slave 20140804-115806-117510572-5050-551-2 at slave(1)@172.17.1.10:5051 (mesos-slave-1) for framework 20140804-115806-117510572-5050-551-0000
I0804 13:40:39.048477   568 master.cpp:2126] Processing reply for offers: [ 20140804-115806-117510572-5050-551-629 ] on slave 20140804-115806-117510572-5050-551-0 at slave(1)@172.17.1.11:5051 (mesos-slave-2) for framework 20140804-115806-117510572-5050-551-0000
I0804 13:40:39.048704   568 hierarchical_allocator_process.hpp:588] Framework 20140804-115806-117510572-5050-551-0000 filtered slave 20140804-115806-117510572-5050-551-1 for 5secs
I0804 13:40:39.048810   568 hierarchical_allocator_process.hpp:588] Framework 20140804-115806-117510572-5050-551-0000 filtered slave 20140804-115806-117510572-5050-551-2 for 5secs
I0804 13:40:39.048893   568 hierarchical_allocator_process.hpp:588] Framework 20140804-115806-117510572-5050-551-0000 filtered slave 20140804-115806-117510572-5050-551-0 for 5secs
I0804 13:40:45.047854   569 master.cpp:3452] Sending 3 offers to framework 20140804-115806-117510572-5050-551-0000
I0804 13:40:45.050928   569 master.cpp:2126] Processing reply for offers: [ 20140804-115806-117510572-5050-551-630 ] on slave 20140804-115806-117510572-5050-551-1 at slave(1)@172.17.1.12:5051 (mesos-slave-3) for framework 20140804-115806-117510572-5050-551-0000
I0804 13:40:45.051033   569 hierarchical_allocator_process.hpp:588] Framework 20140804-115806-117510572-5050-551-0000 filtered slave 20140804-115806-117510572-5050-551-1 for 5secs
I0804 13:40:45.051241   569 master.cpp:2126] Processing reply for offers: [ 20140804-115806-117510572-5050-551-631 ] on slave 20140804-115806-117510572-5050-551-2 at slave(1)@172.17.1.10:5051 (mesos-slave-1) for framework 20140804-115806-117510572-5050-551-0000
I0804 13:40:45.051374   569 hierarchical_allocator_process.hpp:588] Framework 20140804-115806-117510572-5050-551-0000 filtered slave 20140804-115806-117510572-5050-551-2 for 5secs

 

자동 장애 복구

실제 동작중인 리더 Master 서버가 장애가 발생하여 더이상 서비스를 할수 없을때 어떻게 자동으로 리더를 선별하고, 복구하는지 알아 보겠습니다.

– 장애 발생전
모든 것들이 정상 입니다.

mesos-automatic_failover-1

Figure 8. Mesos Automatic Failover 1

– 장애 발생

Master-1이 알수 없는 이유로 장애가 발생 하였고, 이 사실을 ZooKeeper가 Standby 서버에 알립니다.

 

mesos-automatic_failover-2

Figure 9. Mesos Automatic Failover 2

– 장애 복구

장애 사실을 알게된 Standby 서버들은 새로운 리더를 선출, 등록 하여 정상 작동 되도록 합니다.

장애가 났던 서버는 정상적으로 복구 되면 자동으로 Standby 서버로 등록 됩니다.

I0804 15:00:12.016825   382 detector.cpp:138] Detected a new leader: (id='11')
I0804 15:00:12.017132   382 group.cpp:658] Trying to get '/mesos/info_0000000011' in ZooKeeper
I0804 15:00:12.018980   382 detector.cpp:382] A new leading master (UPID=master@172.17.1.8:5050) is detected
I0804 15:00:12.019160   382 master.cpp:1129] The newly elected leader is master@172.17.1.8:5050 with id 20140804-115813-134287788-5050-363
I0804 15:00:12.019610   382 master.cpp:1142] Elected as the leading master!
I0804 15:00:12.020633   382 master.cpp:960] Recovering from registrar
I0804 15:00:12.021252   382 registrar.cpp:313] Recovering registrar
I0804 15:00:12.023794   382 log.cpp:656] Attempting to start the writer
I0804 15:00:12.026499   382 replica.cpp:474] Replica received implicit promise request with proposal 3
I0804 15:00:12.025737   380 network.hpp:423] ZooKeeper group memberships changed
I0804 15:00:12.028565   380 group.cpp:658] Trying to get '/mesos/log_replicas/0000000010' in ZooKeeper
I0804 15:00:12.028961   380 group.cpp:658] Trying to get '/mesos/log_replicas/0000000011' in ZooKeeper
I0804 15:00:12.029736   380 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@172.17.1.8:5050, log-replica(1)@172.17.1.9:5050 }
I0804 15:00:12.033460   382 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took 5.486814ms
I0804 15:00:12.033679   382 replica.cpp:342] Persisted promised to 3

 

mesos-automatic_failover-3

Figure 10. Mesos Automatic Failover 3

결론

Slave: Master님 제 자원 정보를 드리겠습니다.

Master : Framework님 Slave 자원 정보가 이렇습니다.

Framework : Slave 일해!

Slave: 네 ㅠ.ㅠ

갑: Mesos Framework
을: Mesos Master
병: Mesos Slave

참고

– Apache Mesos
http://mesos.apache.org/gettingstarted/

– Introduction to Apache Mesos
http://www.slideshare.net/tomasbart/introduction-to-apache-mesos

– Cloud Computing: past, Present, and Future
http://www.slideshare.net/butest/cloud-computing-3859651

– Apache Mesos at Twitter (Texas LinuxFest 2014)
http://www.slideshare.net/caniszczyk/apache-mesos-at-twitter-texas-linuxfest-2014

– Datacenter Computing with Apache Mesos – BigData DC
http://www.slideshare.net/pacoid/datacenter-computing-with-apache-mesos

– YouTube
Twitter: Building Distributed Frameworks on Mesos
https://www.youtube.com/watch?v=n5GT7OFSh58

Riot Games: Building Webscale Apps with Marathon, Docker and Mesos
https://www.youtube.com/watch?v=6y86sw7wj_A

eBay: Delivering eBay’s CI solution with Apache Mesos & Docker – Dockercon
https://www.youtube.com/watch?v=VZPbLUJnR68