HttpFS는 HDFSProxy를 대체하기 위해서 클라우데라(Cloudera)에서 만들었으며 HTTP REST API로 구현 되어 손쉽게 HDFS로 접근을 할수 있습니다.

이 글에서는 HttpFS와 Red Gate사의 HDFS Explorer 툴을 가지고 HDFS에 접근 하는 방법에 대해서 다룰것입니다.
(Tested : Apache Hadoop 2.2.0)

하둡이 설치 되어있다는 가정하에 진행합니다.

1. hadoop 관련 설정
– core-site.xml
아래의 userid 부분에는 시스템에 존재하는 실제 계정을 넣으시면 됩니다. 예제에서는 hadoop을 실행하는 사용자인 hadoop 을 넣었습니다.

hadoop.proxyuser.[userid].hosts
hadoop.proxyuser.[userid].groups

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://your-cluster:port</value>
<final>true</final>
</property>

<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>

<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>

 

– hdfs-site.xml

webhdfs를 활성화 시켜 줍니다. 이것을 활성화 하지 않으면 HDFS로 접근을 할수 없게 됩니다.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/storage/hadoop/dfs/name</value>
<final>true</final>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/storage/hadoop/dfs/data</value>
<final>true</final>
</property>

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>

<!-- DFS block size -->
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>

<!-- WebHDFS -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

 

2. HttpFS 설정 관련

– httpfs-site.xml

hadoop의 설정 파일이 위치해 있는 경로를 지정합니다.

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<!-- HttpFS -->
<property>
<name>httpfs.hadoop.config.dir</name>
<value>/home/hadoop/2.2.0/etc/hadoop</value>
</property>
</configuration>

 

3. Hadoop 및 HttpFS 실행

Hadoop을 실행하고

hadoop@ruo91:~$ start-all.sh

 

HttpFS를 실행 합니다.

hadoop@ruo91:~$ httpfs.sh start
Setting HTTPFS_HOME: /home/hadoop/2.2.0
Setting HTTPFS_CONFIG: /home/hadoop/2.2.0/etc/hadoop
Sourcing: /home/hadoop/2.2.0/etc/hadoop/httpfs-env.sh
setting HTTPFS_HTTP_HOSTNAME=your-cluster-hostname
Setting HTTPFS_LOG: /home/hadoop/2.2.0/logs
Setting HTTPFS_TEMP: /home/hadoop/2.2.0/temp
Setting HTTPFS_HTTP_PORT: 14000
Setting HTTPFS_ADMIN_PORT: 14001
Using HTTPFS_HTTP_HOSTNAME: your-cluster-hostname
Setting CATALINA_BASE: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat
Setting HTTPFS_CATALINA_HOME: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat
Setting CATALINA_OUT: /home/hadoop/2.2.0/logs/httpfs-catalina.out
Setting CATALINA_PID: /tmp/httpfs.pid

Using CATALINA_OPTS:
Adding to CATALINA_OPTS: -Dhttpfs.home.dir=/home/hadoop/2.2.0 -Dhttpfs.config.dir=/home/hadoop/2.2.0/etc/hadoop -Dhttpfs.log.dir=/home/hadoop/2.2.0/logs -Dhttpfs.temp.dir=/home/hadoop/2.2.0/temp -Dhttpfs.admin.port=14001 -Dhttpfs.http.port=14000 -Dhttpfs.http.hostname=your-cluster-hostname
Using CATALINA_BASE: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat
Using CATALINA_HOME: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat
Using CATALINA_TMPDIR: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat/temp
Using JRE_HOME: /home/hadoop/jdk
Using CLASSPATH: /home/hadoop/2.2.0/share/hadoop/httpfs/tomcat/bin/bootstrap.jar
Using CATALINA_PID: /tmp/httpfs.pid

 

잘 작동하는지 테스트 해봅니다.

hadoop@ruo91:~$ curl -i "http://your-cluster:14000/webhdfs/v1?op=gethomedirectory&user.name=hadoop"
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Set-Cookie: hadoop.auth="u=hadoop&p=hadoop&t=simple&e=1391172244468&s=9+3kbPD5KPnYKSezu+YoEmsSTJg="; Version=1; Path=/
Content-Type: application/json
Transfer-Encoding: chunked
Date: Fri, 31 Jan 2014 02:44:04 GMT

{"Path":"\/user\/hadoop"}

 

4. Red Gate – HDFS Explorer

다운로드는 http://bigdata.red-gate.com/hdfs-explorer.html 에서 받아 설치 합니다.
설치 완료후 실행되면 Cluster address와 core-site.xml에서 지정한 사용자 ID를 입력후 접속하면

hadoop_httpfs_hdfs-explorer_add_an_HDFS_connection

아래와 같은 그림으로 접속이 가능하며 drag&drop으로 파일을 저장/다운로드 할수 있으며, 프로그램 내에서 삭제도 가능합니다.

hadoop_httpfs_hdfs-explorer