ElasticSearch

Lucene을 바탕으로 개발한 분산 검색엔진인 ElasticSearch를 정리 합니다.

홈페이지 : http://www.elasticsearch.org/, http://elasticsearch.kr/

매뉴얼 : http://www.elasticsearch.org/guide/
한글 형태소 분석기 : https://github.com/chanil1218/elasticsearch-analysis-korean
한국 유저 커뮤니티 : https://www.facebook.com/groups/elasticsearch.kr/

다운로드 : http://www.elasticsearch.org/download/, https://github.com/elasticsearch/elasticsearch/

Arirang 다운로드 : https://lucenekorean.svn.sourceforge.net/svnroot/lucenekorean/

라이선스 : Apache 2.0
플랫폼 : Java

ElasticSearch 개요

Architecture

ElasticSearch Architecture

_index : index 이름
_type : type 이름
_id : Document ID
_score
_source : Document 저장
properties

필드명 (field)

type : string

기본 개념

용어	상세
Cluster	Node의 집합으로 유일한 이름을 가짐
Node	Cluster를 이루는 서버
Index	유사한 특징을 가진 문서들의 모음 Term, Count, Docs로 구성 DBMS에서의 데이터베이스와 유사한 개념
Shard	Index의 한 종류로, 데이터를 저장하고 있는 index
Replica	Index의 한 종류로, Shard의 복제본 사용자로부터 요청이 들어오면 Shard로 전달하여 처리 합니다.
Type	index 내에서 논리적인 category/partition DBMS에서 테이블과 유사한 개념
Mapping	DBMS에서 테이블 스키마와 유사한 개념
Document	기본적인 정보의 저장 단위 JSON (JavaScript Object Notaion)으로 표현 DBMS에서 레코드와 유사한 개념
Field	document를 구성하고 있는 항목으로 name과 value로 구성
Gateway	cluster 상태, Index 설정 등 다양한 정보를 저장하는 것
Query	검색어
TermQuery	검색어의 종류
Term	검색어의 항목
Token	검색어의 항목을 구성하는 요소

CentOS에서 ElasticSearch 설치

ElasticSearch 설치

ElasticSearch 설치

JDK 1.7 이상 필요

cd install
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.0.tar.gz
tar -xvzf elasticsearch-1.2.0.tar.gz
chown -R hduser:hdgroup elasticsearch-1.2.0
mv elasticsearch-1.2.0 /nas/appl/elasticsearch

환경 설정

vi ~hduser/.bash_profile

### ----------------------------------------------------------------------------
###     ELASTICSEARCH 설정
### ----------------------------------------------------------------------------
export ELASTICSEARCH_HOME=/nas/appl/elasticsearch
export PATH=$PATH:$ELASTICSEARCH_HOME/bin

ElasticSearch 환경 설정

데이터와 로그 폴더 생성

cd /nas/appl/elasticsearch
mkdir data
mkdir logs
chown hduser:hdgroup data logs

vi /nas/appl/elasticsearch/config/elasticsearch.yml

cluster.name: elasticsearch
node.name: "node201"
path.data: /nas/appl/elasticsearch/data
path.logs: /nas/appl/elasticsearch/logs
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node201:9200"]
bootstrap.mlockall: true

서비스 실행 및 확인

su - hduser
elasticsearch                                 #--- Foreground로 실행
elasticsearch -d                              #--- Daemon으로 실행

curl localhost:9200                           #--- 서비스 확인
http://node201.hadoop.com:9200/               #--- 서비스 확인
http://node201.hadoop.com:9200/_status
http://node201.hadoop.com:9200/_plugin/head/  #--- elasticsearch-head plugin이 설치된 경우

ElasticSearch 로드밸런서 설치

ElasticSearch 로드밸런서 환경 설정

vi /nas/appl/elasticsearch/config/elasticsearch.yml

node.master: false
node.data: false
network.bind_host: 192.168.0.1

로드밸런서용 plugin 설치

plugin -install mobz/elasticsearch-head
plugin -install lukas-vlcek/bigdesk

한글 형태소 분석기 plugin 설치

한글 형태소 분석기

plugin -install chanil1218/elasticsearch-analysis-korean/1.3.0
#plugin -url https://dl-web.dropbox.com/spa/grpekzky9x5y6mc/elastic-analysis-korean/public/elasticsearch-analysis-korean-1.3.0.zip -install analysis-korean

환경 설정

환경 변수

bin/elasticsearch 환경변수
JAVA_OPTS
ES_JAVA_OPTS, ES_HEAP_SIZE
ES_MIN_MEM=256m, ES_MAX_MEM=1gb

환경 설정 방법

환경 설정 파일로 설정

vi /nas/appl/elasticsearch/config/elasticsearch.yml
  index:
    store:
      type: memory

명령행 옵션으로 설정

elasticsearch -Des.index.store.type=memory

REST API로 설정

curl -XPUT 'node201.hadoop.com:9200/customer/ -d '
  index:
    store:
      type: memory
'

file descriptors 확인

max_file_descriptors

curl 'node201.hadoop.com:9200/_nodes/process?pretty'

memory settings : disable swap

한번만 적용

swapoff -a

항상 적용

vi /etc/fstab
  #--- swap을 주석 처리

ElasticSearch 설정으로 처리

ulimit -l unlimited            #--- root 사용자로 실행
mkdir /tmp/tmpJna
vi config/elasticsearch.yml
   bootstrap.mlockall: true
elasticsearch -Djna.tmpdir=/tmp/tmpJna

Service로 실행

환경 설정 변수

ES_USER, ES_GROUP
ES_HEAP_SIZE, ES_HEAP_NEWSIZE, ES_DIRECT_SIZE
MAX_OPEN_FILES
MAX_LOCKED_MEMORY, MAX_MAP_COUNT
LOG_DIR, DATA_DIR, WORK_DIR
CONF_DIR, CONF_FILE
ES_JAVA_OPTS, RESTART_ON_UPGRADE

#--- /etc/init.d/elasticsearch
#--- /etc/sysconfig/elasticsearch
/sbin/chkconfig --add elasticsearch

Korean Analysis for ElasticSearch

Korean Analysis for ElasticSearch

설치

bin/plugin -install chanil1218/elasticsearch-analysis-korean/1.3.0
bin/plugin -url https://dl-web.dropbox.com/spa/grpekzky9x5y6mc/elastic-analysis-korean/public/elasticsearch-analysis-korean-1.3.0.zip -install analysis-korean
설치 후 생성 정보

/nas/appl/elasticsearch/plugins/analysis-korean/elasticsearch-analysis-korean-1.3.0.jar

동작 확인

korea 인덱스 삭제

curl -XDELETE  'node201.hadoop.com:9200/korea?pretty'

korea 인덱스 생성

#curl -XDELETE 'node201.hadoop.com:9200/korea?pretty'
curl -XPUT 'node201.hadoop.com:9200/korea?pretty' -d '{
  "settings": { 
    "index": {
      "analysis": {
        "analyzer": {
          "kr_analyzer": {
            "type": "org.elasticsearch.index.analysis.KoreanAnalyzerProvider",
            "tokenizer": "KoreanTokenizer",
            "filter": [ "trim", "lowercase", "KoreanFilter" ]
          }   
        }   
      }  
    }   
  } 
}'

KoreanAnalyzer 동작 확인

### curl -XGET 'node201.hadoop.com:9200/korea/_analyze?pretty&analyzer=kr_analyzer&text=이전 글에서 ElasticSearch와 Arirang 형태소 분석기를 살펴 보았습니다.'
curl -XGET 'node201.hadoop.com:9200/korea/_analyze?pretty&analyzer=kr_analyzer&text=%EC%9D%B4%EC%A0%84%20%EA%B8%80%EC%97%90%EC%84%9C%20ElasticSearch%EC%99%80%20Arirang%20%ED%98%95%ED%83%9C%EC%86%8C%20%EB%B6%84%EC%84%9D%EA%B8%B0%EB%A5%BC%20%EC%82%B4%ED%8E%B4%20%EB%B3%B4%EC%95%98%EC%8A%B5%EB%8B%88%EB%8B%A4.'


http://node201.hadoop.com:9200/korea/_analyze?pretty&analyzer=kr_analyzer&text=이전%20글에서%20ElasticSearch와%20Arirang%20형태소%20분석기를%20살펴%20보았습니다.

elasticsearch-analysis-korean-1.3.0.jar 파일 구조

es-plugin.properties

plugin=org.elasticsearch.plugin.analysis.kr.AnalysisKoreanPlugin

packages

org.apache.lucene.analysis.kr
   dic/
   KoreanAnalyzer.java
   KoreanFilter.java
   KoreanTokenizer.java
 
org.apache.solr.analysis.kr
   KoreanFilterFactory.java
   KoreanTokenizerFactory

org.elasticsearch.index.analysis
  KoreanAnalysisBinderProcessor.java
  KoreanAnalyzerProvider.java
  KoreanFilterFactory.java
  KoreanTokenizerFactory.java

org.elasticsearch.plugin.analysis.kr
  AnalysisKoreanPlugin.java

프로그램 호출 구조

AnalysisKoreanPlugin.java : AnalysisModule로 KoreanAnalysisBinderProcessor 등록
KoreanAnalysisBinderProcessor.java : Analyzer, Tokenizer, Filter 등록

KoreanAnalyzerProvider.java (kr_analyzer) -> KoreanAnalyzer
KoreanTokenizerFactory.java (kr_tokenizer) -> KoreanTokenizer
KoreanFilterFactory.java (kr_filter) -> KoreanFilter

Directory 구조

home : path.home
bin
conf : path.conf : 설정 파일 폴더
data : path.data : 데이터가 저장되는 폴더

index.store.distributor : least_used (default), random

path.data: ["/mnt/first", "/mnt/second"]

work : path.work : 임시 작업용 폴더
logs : path.logs
plugins : path.plugins

Java 개발 환경 구성

ElasticSearch Java 환경 구성

ElasticSearch 다운로드 사이트에서 elasticsearch-1.2.1.zip 파일을 다운로드 합니다.

lib/elasticsearch-1.2.1.jar

ElasticSearch github 사이트에서 elasticsearch-master.zip 파일을 다운로드 합니다.

src/main/java/ 폴더 아래의 소스 파일을 사용 합니다.

Lucene Java 환경 구성

Lucene 사이트에서 "DOWNLOAD" 버튼을 눌러 lucene-4.8.1.zip 파일을 다운로드 합니다.

core/lucene-core-4.8.1.jar

Lucene 사이트에서 "DOWNLOAD" 버튼을 눌러 lucene-4.8.1-src.tgz 파일을 다운로드 합니다.

core/src/java/ 폴더 아래의 소스 파일을 사용 합니다.

Arirang Java 환경 구성

SVN 저장소에서 소스를 다운로드 합니다.

arirang.morph 소스를 먼저 받아 mvn install 진행

사전 구성 및 사용법

파일:Arirang 사전.zip http://www.jopenbusiness.com/mediawiki/images/5/54/Arirang_사전.zip

REST API

기본 구조

REST API 형식

http://node201.hadoop.com:9200/index/type/id
http://node201.hadoop.com:9200/[index/][type/]action
curl -X<REST Verb> <Node>:<Port>/<Index>/<Type>/<ID>

curl 'node201.hadoop.com:9200/index/type/id'
curl 'node201.hadoop.com:9200/[index/][type/]action'

index, type, id를 여러개 지정할 경우 ","를 사용하여 구분. * 사용 가능
공통 parameter

pretty : 반환 값이 있다면 JSON response를 표시
v : verbose. 상세 정보 표시
help : 사용 가능한 컬럼 정보 표시
h=컬럼1,컬럼2 : headers. 컬럼 표시
bytes=b : 1kb 대신에 1024와 같이 숫자를 표시

curl REST Verb (curl -X???)

등록 (POST, PUT)	customer 인덱스 생성 curl -XPUT 'node201.hadoop.com:9200/customer?pretty' curl -GET 'node201.hadoop.com:9200/_cat/indices?v' external 타입으로 문서 추가 문서 번호는 자동으로 생성 curl -XPOST 'node201.hadoop.com:9200/customer/external?pretty' -d ' { "name": "Mountain Lover" }' curl -XGET 'node201.hadoop.com:9200/customer/external/1lz2jL6CQui07FnZGd_R9w?pretty' external 타입으로 1번 문서 추가 curl -XPUT 'node201.hadoop.com:9200/customer/external/1?pretty' -d ' { "name": "Mountain Lover" }' curl -XGET 'node201.hadoop.com:9200/customer/external/1?pretty'
수정 (POST, PUT)	external 타입으로 1번 문서 수정 curl -XPOST 'node201.hadoop.com:9200/customer/external/1/_update?pretty' -d ' { "doc": { "name": "Mountain Lover!", "age": 20 } }' curl -XGET 'node201.hadoop.com:9200/customer/external/1?pretty' external 타입으로 1번 문서 수정 curl -XPUT 'node201.hadoop.com:9200/customer/external/1?pretty' -d ' { "name": "Mountain Lover!" }' curl -XGET 'node201.hadoop.com:9200/customer/external/1?pretty'
삭제 (DELETE)	문서 삭제 curl -XDELETE 'node201.hadoop.com:9200/customer/external/1?pretty' curl -XGET 'node201.hadoop.com:9200/customer/external/1?pretty' curl -XDELETE 'node201.hadoop.com:9200/customer/external/_query?pretty' -d ' { "query": { "match": { "name": "Mountain Lover!" } } }' curl -XGET 'node201.hadoop.com:9200/customer/external/1?pretty' customer 인덱스 삭제 curl -XDELETE 'node201.hadoop.com:9200/customer?pretty' curl -GET 'node201.hadoop.com:9200/_cat/indices?v'
조회 (GET)	조회 curl -GET 'node201.hadoop.com:9200/_cat/indices?v' curl -XGET 'node201.hadoop.com:9200/customer/external/1?pretty' 조회되는 데이터 구조 _index _type _id _version : 1, 2, 3, ... _source : { name1: value1, name2: value2 }
검색 (GET, POST)	REST request URI curl -XGET 'node201.hadoop.com:9200/customer/_search?q=*&pretty' REST request body curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty' -d ' { "query": { "match_all": {} } }'

Document API

index api : -XPUT : 등록, -XPOST : 등록 (id 자동 생성)

/_create

?op_type=create : 이미 데이터가 있으면 오류

?routing=~ : routing에 지정한 값의 해쉬값을 사용하여 작업할 node 지정

?version=n
?parent=~
?timestamp=2014-11-15T14%3A12%3A12
?ttl=34 : time to live (milliseconds)
?consistency=one, quorum, all
?replication=async, sync
?refresh=true
?timeout=5m

get api : -XGET

?fields=~,~
?routing=~ : routing에 지정한 값의 해쉬값을 사용하여 작업할 node 지정

?version=n
?realtime=false
/_source : _source 필드만 반환 (-XHEAD 사용 가능)
?_source=false : _source 필드를 반환하지 않음

?_source_include, _source_exclude

?preference=_primary, _local, ~
?refresh=true

delete api : -XDELETE

?routing=~ : routing에 지정한 값의 해쉬값을 사용하여 작업할 node 지정

?version=n
?parent=~
?consistency=one, quorum, all
?replication=async, sync
?refresh=true
?timeout=5m

update api : -XPUT

/_update
"script" 필드 사용법

ctx._source.필드명

"script" : "ctx._source.counter += count",
"params" : {                           #--- script에 인자 전달
  "count" : 4
} 

ctx._source.remove(\"text\")           #--- 필드 삭제
ctx._source.tags.contains(tag) ? (ctx.op = \"delete\") : (ctx.op = \"none\")
if (ctx._source.tags.contains(tag)) { ctx.op = \"none\" } else { ctx._source.tags += tag }

"upsert" : 필드가 있으면 수정, 없으면 등록
"doc_as_upsert": true : 문서가 있으면 수정, 없으면 등록
?routing=~ : routing에 지정한 값의 해쉬값을 사용하여 작업할 node 지정
?parent=~
?replication=async, sync
?timeout=5m
?consistency=one, quorum, all
?refresh=true
?fields=~,~
?version=n
?version_type
?timestamp=2014-11-15T14%3A12%3A12

ctx._timestamp

?ttl=34 : time to live (milliseconds)

ctx._ttl

multi get api : -XGET, /_mget

curl -XGET 'node201.hadoop.com:9200/_mget' -d '{
  "docs": [
    {
      "_index": "~",
      "_type": "~",
      "_id": "~"
    }
  }
}'

curl -XGET 'node201.hadoop.com:9200/customer/external/_mget' -d '{
  "ids": ["~", "~"]
}'

"_source"

"_source": false
"_source": [ "field1", "field2" ]
"_source": {
  "include": [ "~" ],
  "exclude": [ "~", "~" ]
}

"fields": [ "~", "~" ]
"_routing": "~"

bulk api : /_bulk

requests 파일

#--- index, create, update, delete
{ "index": { "_index": "~", "_type": "~", "_id": "~" } }
{ "field1": "value1" }
{ "update": { "_index": "~", "_type": "~", "_id": "~" } }
{ "doc": { "field1": "value1" }, "doc_as_upsert": true }
#--- upsert, doc_as_upsert, script, params, lang 파라메터 지원

bulk api

curl -s -XPOST 'node201.hadoop.com:9200/_bulk --data-binary @requests

_version, _routing, _parent, _timestamp, _ttl, _consistency
?refresh=true

delete by query api : -XDELETE, /_query

curl -XDELETE 'node201.hadoop.com:9200/customer/external/_query?q=user:~'
#--- q : query
#--- df : default field
#--- analyzer : query analyzer
#--- default_operator : OR (default), AND

curl -XDELETE 'node201.hadoop.com:9200/customer/external/_query' -d '{
  "query": {
    "term": { "user": "~" }
  }
'}

?routing=~ : routing에 지정한 값의 해쉬값을 사용하여 작업할 node 지정
?replication=async, sync
?consistency=one, quorum, all

bulk udp api

설정

bulk.udp.enabled: true
bulk.udp.bulk_actions: 1000
bulk.udp.bulk_size: 5m                 #-- 5MB
bulk.udp.flush_interval: 5s
bulk.udp.concurrent_requests: 4
bulk.udp.host:                         #--- network.host에 지정된 값이 default임
bulk.udp.port; 9700-9800
bulk.udp.receive_buffer_size: 10mb

사용법

cat requests | nc -w 0 -u node201.hadoop.com 9700

term vectors api

curl -XGET 'node201.hadoop.com:9200/customer/external/1/_termvector?pretty=true&fields=~,~"

multi termvectors api

curl -XGET 'node201.hadoop.com:9200/_mtermvectors' -d '{
  "docs": [
    {
      "_index": "~",
      "_type": "~",
      "_id": "~",
      "term_statistics": true
    }
  ]
'}

Search API

REST request uri search

q : Query String Query
analyzer
default_operator : OR (default), AND
_source = false, _source_include, _source_exclude
df : 디폴트 필드 지정
fields : 필드 지정
sort : field:asc, field:desc
explain
track_score = true
timeout
from : 반환할 레코드의 시작 인덱스 (0, 1, 2, ...)
size : 반환할 레코드 수 (디폴트는 10)
search_type : query_then_fetch (default), dfs_query_the_fetch, dfs_query_and_fetch, query_and_fetch, count, scan
lowercase_expanded_terms
analyze_wildcard : false (default), true
scroll=5m
preference : _primary, _primary_first, _local, _only_node:xyz, _prefer_node:xyz, _shards:2,3

curl -XGET 'node201.hadoop.com:9200/customer/_search?q=*&pretty'
curl -XGET 'node201.hadoop.com:9200/customer/_search?pretty&q=user:kimchi'

REST request body search

_all : 모든 인덱스를 가르키는 예약어
?routing=~ : routing에 지정한 값의 해쉬값을 사용하여 작업할 node 지정

"from" : 0
"size" : 10
"sort" : [ { "post_date" : {"order" : "asc"}}, "_score" ]
"_source": false
"_source": { "include": [ "obj1.*", "obj2.*" ], "exclude": [ "*.description" ] }
"fields" : ["user", "postDate"]
"script_fields" : 계산을 통하여 새로운 필드 생성
"fielddata_fields" : ["test1", "test2"]
"post_filter" : { "term" : { "tag" : "green" } }
"highlight" : 결과에 highlight 추가
"rescore" : _score 계산 규칙 조정
"explain": true
"version": true

curl -XGET 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "query": { "match_all": {} },
  "sort": { "balance": { "order": "desc" } },   #--- 정렬
  "from": 10,                          #--- 10번째까지 skip
  "size": 10,                          #--- 10개의 데이터 반환
  "_source": [ "account_number", "balance" ]   #--- 반환할 필드 지정
}'

query

curl -XGET 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "query": { 
    "term": { "user": "kimchi" }
  }
}'

match_all query

curl -XGET 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "query": { "match_all": {} }
}'

match query

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "query": { "match": { "account_number": 20 } }
}'

match_phrase query

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "query": { "match_phrase": { "address": "mill lane" } }
}'

bool query

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "query": {
    "bool": {
      #--- "must" : AND, "should" : OR, "must_not" : NOT (~ AND ~)
      "must": [                        
        { "match": { "address": "mill" } },
        { "match": { "address": "lane" } }
      ],
     "must_not": [
        { "match": { "state": "ID" } }
     ]
    }
  }
}'

filtered query

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "query": {
    "filtered": {
      "query": { "match_all": {} },
      "filter": {
        "range": {                     #--- range filter
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}'

aggregation

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "size": 0,
  "aggs": {
    "group_by_state": {                #--- count(state) 반환
      "terms": {
        "field": "state"
      }
    }
  }
}'

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty' -d '
{
  "size": 0,
  "aggs": {
    "group_by_state": {                #--- state별 avg(balance) 반환
      "terms": {
        "field": "state",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}'

search shards api : search 문이 어떤 노드의 shards에서 처리되었는지 정보 반환

curl -XGET 'node201.hadoop.com:9200/customer/_search_shards'

search template : Template를 사용하여 search문 구성

curl -XGET 'node201.hadoop.com:9200/customer/_search/template?pretty' -d '{
  "template" : {
    "query": { "match" : { "틀:My field" : "틀:My value" } },
    "size" : "틀:My size"
  },
  "params" : {
    "my_field" : "foo",
    "my_value" : "bar",
    "my_size" : 5
  }
}'

facets

search에 결과에 대한 aggregation 처리나 통계 처리

terms : 필드에 대한 통계 처리

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty=true' -d '{
  "query" : { ~ },
  "facets": { "terms": { "field": "~" } }
}'

facets global 설정

main : 현재 search문에만 적용
global : 모든 search문에 적용

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty=true' -d '{
  "facets": {
    "myFacets": {
      "terms": { "field": "~" },
      "global": true
    }
  }
}'

facet filter

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty=true' -d '{
  "facets": {
    "myFacets": {
      "terms": { "field": "~" }
    },
    "facet_filter" {
      "terms": { "user": "kimchi" }
    }
  }
}'

terms facet : 빈도수가 높은 10개의 terms을 반환

"all_terms" : true
"exclude" : ["term1", "term2"]
"regex" : "_regex expression here_", "regex_flags" : "DOTALL"
"script" : "term + 'aaa'"
"script" : "term == 'aaa' ? true : false"
"script_field" : "_source.my_field",

curl -XPOST 'node201.hadoop.com:9200/customer/_search?pretty=true' -d '{
  "query" : { ~ },
  "facets": { 
    "필드": {
      "terms": { 
        "field": "~", 
        "size": 10,
        "order": "count"       #--- count (default), term, reverse_count, reverse_term
      } 
    }
  }
}'

APIs

API	상세
_cat API	http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cat.html Cluster health check : http://node201.hadoop.com:9200/_cat/health?v Node information : http://node201.hadoop.com:9200/_cat/nodes?v Index information : http://node201.hadoop.com:9200/_cat/indices?v http://node201.hadoop.com:9200/_cat/indices/인덱스명?v Master information : http://node201.hadoop.com:9200/_cat/master?v Shards information : http://node201.hadoop.com:9200/_cat/shards?v http://node201.hadoop.com:9200/_cat/shards/샤드명?v Alias information : http://node201.hadoop.com:9200/_cat/aliases?v Distk 할당 정보 : http://node201.hadoop.com:9200/_cat/allocation?v 전체 문서 개수 : http://node201.hadoop.com:9200/_cat/count?v 인덱스의 문서 개수 : http://node201.hadoop.com:9200/_cat/count/인덱스명?v Node별 로드된 필드 데이터 정보 : http://node201.hadoop.com:9200/_cat/fielddata?v http://node201.hadoop.com:9200/_cat/fielddata/필드1,필드2?v http://node201.hadoop.com:9200/_cat/fielddata?v&fields=필드1,필드2 Pending tasks information : http://node201.hadoop.com:9200/_cat/pending_tasks?v Plugin information : http://node201.hadoop.com:9200/_cat/plugins?v Recovery information : http://node201.hadoop.com:9200/_cat/recovery?v Thread pool information : http://node201.hadoop.com:9200/_cat/thread_pool?v
_nodes API	http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster.html Node명 지정 방법 _nodes/_local : 로컬 node _nodes/IP1,IP2 _nodes/노드명 _nodes/노드속성
_cluster API	http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-health.html Cluster health check : http://node201.hadoop.com:9200/_cluster/health?pretty=true http://node201.hadoop.com:9200/_cluster/health/인덱스1,인덱스2?pretty=true

참고 문헌

Basic API

Node

curl -X GET http://node201.hadoop.com:9200/_status                     #--- 상태 확인

Index 관리 (데이터베이스)

_all : 모든 index 적용

curl -X POST http://node201.hadoop.com:9200/index001                   #--- index 생성
curl -X DELETE http://node201.hadoop.com:9200/index001                 #--- index 삭제

curl -X GET http://node201.hadoop.com:9200/index001/_mapping           #--- Mapping 조회
curl -X GET http://node201.hadoop.com:9200/index001/_status            #--- 상태 확인
curl -X GET http://node201.hadoop.com:9200/index001/_search            #--- 검색
curl -X GET http://node201.hadoop.com:9200/_all/_search                #--- 검색

Type 관리 (테이블)

#--- type 생성, _id는 자동으로 생성됨
curl -X POST http://node201.hadoop.com:9200/index001/type001 -d '{ title: "Greeting", body: "Hello World!" }'
curl -X DELETE http://node201.hadoop.com:9200/index001/type001         #--- type 삭제

curl -X GET http://node201.hadoop.com:9200/index001/type001/_mapping   #--- Mapping 조회
curl -X GET http://node201.hadoop.com:9200/index001/type001/_status    #--- 상태 확인
curl -X GET http://node201.hadoop.com:9200/index001/type001/_search    #--- 검색
http://node201.hadoop.com:9200/index001/type001/_search?q=title:Gre*ting

Mapping 관리 (테이블 스키마)

#--- Mapping 생성 
curl -X PUT http://node201.hadoop.com:9200/index001/type001/_mapping -d '{
  type001: {
    properties: {
      title: { 
        type: "string", 
        index: "not_analyzed"
      }
    }
  }
}'
curl -X GET http://node201.hadoop.com:9200/index001/type001/_mapping   #--- Mapping 조회

Document 관리 (레코드)

#--- document 생성
curl -X POST http://node201.hadoop.com:9200/index001/type001/data001 -d '{ title: "Greeting", body: "Hello World!" }'
curl -X POST http://node201.hadoop.com:9200/index001/type001/data001/_update -d '{ title: "Greeting", body: "Hello World!" }'
curl -X DELETE http://node201.hadoop.com:9200/index001/type001/data001 #--- data001 데이터 삭제

curl -X GET http://node201.hadoop.com:9200/index001/type001/data001    #--- data001 데이터 조회
#--- document 검색
curl -X GET http://node201.hadoop.com:9200/index001/type001/_search -d '{query: {text: {_all: "Hello"}}}'

Search

q : 검색어, fieldName:fieldValue
default_operator=OR : 기본 연산자, AND, OR
fields=_source : 반환할 필드
sort : 정렬, field:asc, field:desc
timeout : 검색 수행 타임아웃, default는 무제한
size=10 : 반환할 데이터의 개수

http://node201.hadoop.com:9200/index001/type001/_search?q=title:Gre*ting
curl -X POST http://node201.hadoop.com:9200/index001/type001/_search -d '{ query: {term: {title: "Greeting"}} }'
curl -X POST http://node201.hadoop.com:9200/index001/type001/_search -d '{ query: {bool: {must: {match: {title: "Greeting"}}}} }'

Prefix query

scoring_boolean
constant_score_boolean : score를 계산하지 않음
constant_score_filter : filter를 사용
top_terms_n : scoring_boolean과 유사하나 n개의 결과만 반환
top_terms_boost_n : top_terms_n과 유사하지만 boost에 대해서 score 계산

curl -X GET 'http://node201.hadoop.com:9200/index001/type001/_search?pretty' -d '{
  "query": {
    "prefix": {
      "name": "j",                           #--- j로 시작하는 단어 검색
      "rewrite": "constant_score_boolean"
    }
  }
}'

Rescore

{
  "fields" : ["title", "available"],
  
  "query" : {
    "match_all" : {}
  },
  
  "rescore" : {
    "query" : {
      "rescore_query" : {
        "custom_score" : {
          "query" : {
            "match_all" : {}
          },
          "script" : "doc['year'].value"
        }
      }
    }
  }
}

window_size
query_weight
rescore_query_weight
rescore_mode = total , max , min , avg , and multiply

total : original_query_score * query_weight + rescore_query_score * rescore_query_weight

Bulk API

Bulk로 문서 등록, 수정, 삭제

curl -XPOST 'node201.hadoop.com:9200/customer/external/_bulk?pretty' -d '
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
'

curl -XPOST 'node201.hadoop.com:9200/customer/external/_bulk?pretty' -d '
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
'

documents.json 파일을 사용하여 Bulk indexing

curl -XPOST http://node201.hadoop.com:9200/customer/external/_bulk?pretty --data-binary @documents.json

Multi Get

 curl http://node201.hadoop.com:9200/library/book/_mget?fields=title -d '{
  "ids" : [1,3]
}'

MultiSearch

curl http://node201.hadoop.com:9200/library/books/_msearch?pretty --data-binary '
  { "type" : "book" }
  { "filter" : { "term" : { "year" : 1936} }}
  { "search_type": "count" }
  { "query" : { "match_all" : {} }}
  { "index" : "library-backup", "type" : "book" }
  { "sort" : ["year"] }
'

Sort

{
  "query" : {
    "terms" : {
      "title" : [ "crime", "front", "punishment" ],
      "minimum_match" : 1
    }
  },
  "sort" : [
    { "section" : "desc" }
    #-- {"release_dates" : { "order" : "asc", "mode" : "min" }}
    #-- min, max, avg, sum
  ]
}

Indexing data

REST API

[heep://curl.haxx.se curl]을 사용하여 테스트 가능

curl  -XPUT  http://localhost:9200/blog/article/1  -d  '{~}'

bulk API
UDP bulk API
river plugin

User Query DSL

Lucene Query language

TF/IDF (Term Frequency / Inverse Document Frequency)

Document boost
Field boost
Coord
Inverse document frequency
Length norm
Term frequency
Query norm

q : Query
d : Document

Query type

custom_boost_factor
constant_score
custom_score

JAVA API

Client

import static org.elasticsearch.node.NodeBuilder.nodeBuilder; 

import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.node.Node;

	private Boolean getTransportClient() {
		Settings settings = null;
		
		settings = ImmutableSettings.settingsBuilder().put("cluster.name", CLUSTER_NAME).build();
		client = new TransportClient(settings).addTransportAddress(new InetSocketTransportAddress(HOST, PORT));
		return true; 
	}
	
	//--- elasticsearch.yml
	//---   cluster.name=~
	private Boolean getNodeClient() {
		node = nodeBuilder().clusterName(CLUSTER_NAME).client(true).local(true).node();
		client = node.client();
		return true;
	}

index java api

import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;

import org.elasticsearch.action.index.IndexResponse; 

json = jsonBuilder().startObject()
        .field("name", "value")
        .endObject().string();
res = client.prepareIndex("index", "type", "id").setSource(json).execute().actionGet();

UtilLogger.info.print(logCaller, "_index : " + res.getIndex());
UtilLogger.info.print(logCaller, "_type : " + res.getType());
UtilLogger.info.print(logCaller, "_id : " + res.getId());
UtilLogger.info.print(logCaller, "_version : " + res.getVersion());
UtilLogger.info.print(logCaller, "_index : " + res.getIndex());

get java api

관리자 매뉴얼

elasticsearch.yml

index.query.bool.max_clause_count

오류 처리

Heap 메모리 부족시

vi /nas/appl/elasticsearch/bin/elasticsearch.in.sh

#ES_MIN_MEM=256m
#ES_MAX_MEM=1g

ES_MIN_MEM=4g
ES_MAX_MEM=4g

많은 Client에서 접속하여, 파일 개수 부족으로 오류 발생시

오류 메시지

org.elasticsearch.common.netty.channel.ChannelException: Failed to create a selector.
Caused by: java.io.IOException: Too many open files

조치 방법

ulimit -n
vi  /etc/security/limits.conf
    hduser soft nofile 999999
    hduser hard nofile 999999

참고 문헌

https://www.found.no/tag/Elasticsearch/

http://jjeong.tistory.com/

http://cafe.naver.com/korlucene

Helloworld naver

POSTAG_SEJONG/K

https://www.found.no/foundation/writing-a-plugin/

ElasticSearch (http://guruble.wordpress.com/tag/elasticsearch/)

http://blog.naver.com/PostView.nhn?blogId=sung487&logNo=10164948506

MeCab (C++로 작성)

https://bitbucket.org/eunjeon/mecab-ko-lucene-analyzer/raw/master/elasticsearch-analysis-mecab-ko/ (최신)
https://github.com/bibreen/mecab-ko-lucene-analyzer (예전 버전)

https://github.com/bibreen/mecab-ko-lucene-analyzer/tree/master/elasticsearch-analysis-mecab-ko

ElasticSearch

목차

ElasticSearch 개요

Architecture

기본 개념

관련 오픈소스

CentOS에서 ElasticSearch 설치

ElasticSearch 설치

ElasticSearch 로드밸런서 설치

환경 설정

Service로 실행

Korean Analysis for ElasticSearch

Directory 구조

Java 개발 환경 구성

ElasticSearch Java 환경 구성

Lucene Java 환경 구성

Arirang Java 환경 구성

REST API

기본 구조

Document API

Search API

facets

APIs

Basic API

Bulk API

Indexing data

User Query DSL

JAVA API

Client

index java api

get java api

관리자 매뉴얼

elasticsearch.yml

오류 처리

참고 문헌

둘러보기 메뉴

검색