Flume?
Flume은 크게 Source, Channel, Sink 모듈로 구성되어 있습니다.
데이터를 수집하는 Logical node를 말합니다.
log data, stream, socket, database, avro, irc, file 등등의 데이터
source로 부터 받은 데이터를 sink(target)으로 넘기기전에 데이터를 쌓아두는 곳입니다.
간단하게 말하자면 트랜잭션의 보장을 위한 저장 공간입니다.
데이터를 보내는 것을 말합니다.
예를들면 log data, stream, socket, database, avro, irc, file 등등을 HDFS로 저장
[root@ruo91 ~]# su – hadoop
[hadoop@ruo91 ~]$ wget http://mirror.apache-kr.org/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
[hadoop@ruo91 ~]$ tar xzvf apache-maven-3.1.1-bin.tar.gz
[hadoop@ruo91 ~]$ nano ~/.bash_profile
# Maven
export M2_HOME=$HOME/apache-maven-3.1.1
export M2=$M2_HOME/bin
export MAVEN_OPTS=”-Xms512m -Xmx1024m -XX:PermSize=256m -XX:MaxPermSize=512m”
PATH=$PATH:$M2
[hadoop@ruo91 ~]$ source ~/.bash_profile
[hadoop@ruo91 ~]$ git clone https://git-wip-us.apache.org/repos/asf/flume.git flume-dev
[hadoop@ruo91 ~]$ cd flume-dev
[hadoop@ruo91 flume-dev]$ mvn install -DskipTests
[hadoop@ruo91 flume-dev]$ cp flume-ng-dist/target/apache-flume-1.5.0-SNAPSHOT-bin.tar.gz $HOME
[hadoop@ruo91 flume-dev]$ cd ~
[hadoop@ruo91 ~]$ tar xzvf apache-flume-1.5.0-SNAPSHOT-bin.tar.gz
[hadoop@ruo91 ~]$ mv apache-flume-1.5.0-SNAPSHOT-bin/ flume
[hadoop@ruo91 ~]$ nano ~/.bash_profile
# Flume
export FLUME_HOME=$HOME/flume
export PATH=$PATH:$FLUME_HOME/bin
[hadoop@ruo91 ~]$ source ~/.bash_profile
[hadoop@ruo91 ~]$ cp $FLUME_HOME/conf/flume-env.sh.template $FLUME_HOME/conf/flume-env.sh
[hadoop@ruo91 ~]$ echo ‘JAVA_HOME=”$HOME/jdk”‘ >> $FLUME_HOME/conf/flume-env.sh
[hadoop@ruo91 ~]$ chmod +x $FLUME_HOME/conf/flume-env.sh
[hadoop@ruo91 ~]$ nano $FLUME_HOME/conf/yongbok.conf
YongbokAgent.sources = Yongbok
YongbokAgent.channels = MemoryChannel
YongbokAgent.sinks = HDFS
# For each one of the sources, the type is defined
YongbokAgent.sources.Yongbok.type = exec
YongbokAgent.sources.Yongbok.command = tail -F /storage/logs/mirror-access.json
# The channel can be defined as follows.
YongbokAgent.sources.Yongbok.channels = MemoryChannel
# Each sink’s type must be defined
YongbokAgent.sinks.HDFS.type = hdfs
# example : hdfs://localhost:port/path/to/directory or /logs
YongbokAgent.sinks.HDFS.hdfs.path = /logs
#YongbokAgent.sinks.HDFS.hdfs.writeFormat = Text
YongbokAgent.sinks.HDFS.hdfs.batchSize = 1000
YongbokAgent.sinks.HDFS.hdfs.rollSize = 0
YongbokAgent.sinks.HDFS.hdfs.rollCount = 10000
#Specify the channel the sink should use
YongbokAgent.sinks.HDFS.channel = MemoryChannel
# Each channel’s type is defined.
YongbokAgent.channels.MemoryChannel.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
YongbokAgent.channels.MemoryChannel.capacity = 100000
YongbokAgent.channels.MemoryChannel.transactionCapacity = 10000
[hadoop@ruo91 ~]$ start-all.sh
[hadoop@ruo91 ~]$ flume-ng agent –conf $FLUME_HOME/conf -f $FLUME_HOME/conf/yongbok.conf -Dflume.root.logger=DEBUG,console -n YongbokAgent
5. 로그 저장 확인
[hadoop@ruo91 ~]$ hadoop fs -ls -h /logs
HDFS administration 페이지에서 확인도 가능합니다.
Thanks 😀
참고