반응형
    
    
    
  
애널라이저 (Analyzer)

1단계 : 문자 필터 (Character filter) : 0개 또는 그 이상의 문자 필터 (Character filter)
2단계 : 토크나이저 (Tokenizer) : 1개 이상의 토크나이저 (Tokenizer)
3단계 : 토큰 필터 (Token filter) : 0개 또는 그 이상의 토큰필터(Token filter)
위와 같은 3단계를 거치는 이유는 Elasticsearch 특성인 역색인 방식 때문. 
1단계 문자 필터 (Character filter) 
입력된 원본의 텍스트를 분석에 필요한 형태로 변환 하는 역할 
GET _analyze
{
  "char_filter": [
    "html_strip"
  ],
  "text": "<h3>The road to success and the road to failure are almost exactly the same.</h3>"
}
//결과
{
  "tokens" : [
    {
      "token" : """The road to success and the road to failure are almost exactly the same.""",
      "start_offset" : 0,
      "end_offset" : 81,
      "type" : "word",
      "position" : 0
    }
  ]
}
2단계 토크나이저 (Tokenizer) 
입력 데이터를 설정된 기준에 따라 검색어 토큰으로 분리하는 역할
GET _analyze
{
  "char_filter": [
    "html_strip"
  ],
  "tokenizer": "whitespace",
  "text": "<h3>The road to success and the road to failure are almost exactly the same.</h3>"
}
//결과
{
  "tokens" : [
    {
      "token" : "The",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "road",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "to",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "success",
      "start_offset" : 16,
      "end_offset" : 23,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "and",
      "start_offset" : 24,
      "end_offset" : 27,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "the",
      "start_offset" : 28,
      "end_offset" : 31,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "road",
      "start_offset" : 32,
      "end_offset" : 36,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "to",
      "start_offset" : 37,
      "end_offset" : 39,
      "type" : "word",
      "position" : 7
    },
    {
      "token" : "failure",
      "start_offset" : 40,
      "end_offset" : 47,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "are",
      "start_offset" : 48,
      "end_offset" : 51,
      "type" : "word",
      "position" : 9
    },
    {
      "token" : "almost",
      "start_offset" : 52,
      "end_offset" : 58,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "exactly",
      "start_offset" : 59,
      "end_offset" : 66,
      "type" : "word",
      "position" : 11
    },
    {
      "token" : "the",
      "start_offset" : 67,
      "end_offset" : 70,
      "type" : "word",
      "position" : 12
    },
    {
      "token" : "same.",
      "start_offset" : 71,
      "end_offset" : 76,
      "type" : "word",
      "position" : 13
    }
  ]
}
3단계 토큰 필터 (Token filter)
분리된 토큰들에 다시 필터를 적용해서 실제로 검색에 쓰이는 검색어들로 최종 변환하는 역할
GET _analyze
{
  "char_filter": [
    "html_strip"
  ],
  "tokenizer": "whitespace",
  "filter": [
    "stop",
    "lowercase",
    "snowball"
  ],
  "text": "<h3>The road to success and the road to failure are almost exactly the same.</h3>"
}
//결과
{
  "tokens" : [
    {
      "token" : "the",
      "start_offset" : 4,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "road",
      "start_offset" : 8,
      "end_offset" : 12,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "success",
      "start_offset" : 16,
      "end_offset" : 23,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "road",
      "start_offset" : 32,
      "end_offset" : 36,
      "type" : "word",
      "position" : 6
    },
    {
      "token" : "failur",
      "start_offset" : 40,
      "end_offset" : 47,
      "type" : "word",
      "position" : 8
    },
    {
      "token" : "almost",
      "start_offset" : 52,
      "end_offset" : 58,
      "type" : "word",
      "position" : 10
    },
    {
      "token" : "exact",
      "start_offset" : 59,
      "end_offset" : 66,
      "type" : "word",
      "position" : 11
    },
    {
      "token" : "same.",
      "start_offset" : 71,
      "end_offset" : 76,
      "type" : "word",
      "position" : 13
    }
  ]
}반응형
    
    
    
  'ELK' 카테고리의 다른 글
| Elasticsearch nori plugin 사용해보기 (0) | 2022.07.27 | 
|---|---|
| Elasticsearch Index Lifecycle Management (ILM) 사용해 보기 (0) | 2022.07.19 | 
| Elasticsearch Ingest Node PipeLine 사용해 보기 (0) | 2022.07.19 | 
| Elasticsearch Scroll Search API 조회 (0) | 2021.07.14 | 
| Elasticsearch 개념 및 용어 (0) | 2021.07.07 |