HttpFS는 HDFSProxy를 대체하기 위해서 클라우데라(Cloudera)에서 만들었으며 HTTP REST API로 구현 되어 손쉽게 HDFS로 접근을 할수 있습니다.
이 글에서는 HttpFS와 Red Gate사의 HDFS Explorer 툴을 가지고 HDFS에 접근 하는 방법에 대해서 다룰것입니다.
(Tested : Apache Hadoop 2.2.0)
하둡이 설치 되어있다는 가정하에 진행합니다.
1. hadoop 관련 설정
– core-site.xml
아래의 userid 부분에는 시스템에 존재하는 실제 계정을 넣으시면 됩니다. 예제에서는 hadoop을 실행하는 사용자인 hadoop 을 넣었습니다.
hadoop.proxyuser.[userid].hosts
hadoop.proxyuser.[userid].groups
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://your-cluster:port</value> <final>true</final> </property> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> </configuration>
– hdfs-site.xml
webhdfs를 활성화 시켜 줍니다. 이것을 활성화 하지 않으면 HDFS로 접근을 할수 없게 됩니다.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/storage/hadoop/dfs/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/storage/hadoop/dfs/data</value> <final>true</final> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <!-- DFS block size --> <property> <name>dfs.blocksize</name> <value>268435456</value> </property> <!-- WebHDFS --> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
2. HttpFS 설정 관련
– httpfs-site.xml
hadoop의 설정 파일이 위치해 있는 경로를 지정합니다.
<?xml version="1.0" encoding="UTF-8"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> <configuration> <!-- HttpFS --> <property> <name>httpfs.hadoop.config.dir</name> <value>/home/hadoop/2.2.0/etc/hadoop</value> </property> </configuration>
3. Hadoop 및 HttpFS 실행
Hadoop을 실행하고
hadoop@ruo91:~$ start-all.sh
HttpFS를 실행 합니다.
hadoop@ruo91:~$ httpfs.sh start Setting HTTPFS_HOME: /home/hadoop/2.2.0 Setting HTTPFS_CONFIG: /home/hadoop/2.2.0/etc/hadoop Sourcing: /home/hadoop/2.2.0/etc/hadoop/httpfs-env.sh setting HTTPFS_HTTP_HOSTNAME=your-cluster-hostname Setting HTTPFS_LOG: /home/hadoop/2.2.0/logs Setting HTTPFS_TEMP: /home/hadoop/2.2.0/temp Setting HTTPFS_HTTP_PORT: 14000 Setting HTTPFS_ADMIN_PORT: 14001 Using HTTPFS_HTTP_HOSTNAME: your-cluster-hostname Setting CATALINA_BASE: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat Setting HTTPFS_CATALINA_HOME: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat Setting CATALINA_OUT: /home/hadoop/2.2.0/logs/httpfs-catalina.out Setting CATALINA_PID: /tmp/httpfs.pid Using CATALINA_OPTS: Adding to CATALINA_OPTS: -Dhttpfs.home.dir=/home/hadoop/2.2.0 -Dhttpfs.config.dir=/home/hadoop/2.2.0/etc/hadoop -Dhttpfs.log.dir=/home/hadoop/2.2.0/logs -Dhttpfs.temp.dir=/home/hadoop/2.2.0/temp -Dhttpfs.admin.port=14001 -Dhttpfs.http.port=14000 -Dhttpfs.http.hostname=your-cluster-hostname Using CATALINA_BASE: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat Using CATALINA_HOME: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat Using CATALINA_TMPDIR: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat/temp Using JRE_HOME: /home/hadoop/jdk Using CLASSPATH: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat/bin/bootstrap.jar Using CATALINA_PID: /tmp/httpfs.pid
잘 작동하는지 테스트 해봅니다.
hadoop@ruo91:~$ curl -i "http://your-cluster:14000/webhdfs/v1?op=gethomedirectory&user.name=hadoop" HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Set-Cookie: hadoop.auth="u=hadoop&p=hadoop&t=simple&e=1391172244468&s=9+3kbPD5KPnYKSezu+YoEmsSTJg="; Version=1; Path=/ Content-Type: application/json Transfer-Encoding: chunked Date: Fri, 31 Jan 2014 02:44:04 GMT {"Path":"\/user\/hadoop"}
4. Red Gate – HDFS Explorer
다운로드는 http://bigdata.red-gate.com/hdfs-explorer.html 에서 받아 설치 합니다.
설치 완료후 실행되면 Cluster address와 core-site.xml에서 지정한 사용자 ID를 입력후 접속하면
아래와 같은 그림으로 접속이 가능하며 drag&drop으로 파일을 저장/다운로드 할수 있으며, 프로그램 내에서 삭제도 가능합니다.