Understanding Elasticsearch Simple Contraction Synonyms

It appears you’re encountering an issue with simple contraction synonyms in your AWS Elasticsearch service (v6.2), where they seem to be acting as equivalent synonyms instead. Let’s delve into why this might be happening and how to ensure your synonyms function as intended.

You’re aiming to set up a simple contraction synonym so that a search for “vitamin” or “supplement” leads to results containing “peak performance,” as illustrated by your configuration:

"vitamin, supplement => peak performance"

However, the current behavior is that searching for “peak performance” also retrieves documents containing “vitamin” or “supplement,” which is characteristic of equivalent synonyms, not the simple contraction you’re aiming for.

According to the Elasticsearch documentation on explicit mappings:

Explicit mappings match any token sequence on the LHS of “=>” and replace with all alternatives on the RHS. These types of mappings ignore the expand parameter in the schema.

Despite setting "expand": "false" in your custom analyzer, the issue persists. Let’s examine your index configuration and query to pinpoint the cause.

Analyzer Settings:

{ "filter": { "english_keywords": { "keywords": [ "example" ], "type": "keyword_marker" }, "english_stemmer": { "type": "stemmer", "language": "english" }, "synonyms_en": { "type": "synonym", "expand": "false", "synonyms": [ "vitamin, supplement => peak performance", "soap => wash", "protein => access", "kids => koala" ] }, "english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" }, "english_stop": { "type": "stop", "stopwords": "_english_" } }, "analyzer": { "custom_en": { "filter": [ "english_possessive_stemmer", "lowercase", "synonyms_en", "english_stop", "english_keywords", "english_stemmer" ], "tokenizer": "standard" } } }

POST Query Body:

You are focusing your search on the name.regular field to isolate the problem.

{ "from": 0, "size": 10, "_source": [ "name.regular" ], "query": { "bool": { "must": { "multi_match": { "query": "peak performance", "fuzziness": "auto", "operator": "and", "type": "most_fields", "fields": [ "name.regular.en" ] } } } }, "post_filter": { "term": {"channel_id": 1} } }

Query Response (Unexpected):

The results include items with “vitamin” and “supplement” alongside “peak performance,” indicating equivalent synonym behavior instead of simple contraction.

{ "took": 7, "timed_out": false, "_shards": { "total": 5, "successful": 5, "skipped": 0, "failed": 0 }, "hits": { "total": 28, "max_score": 9.393171, "hits": [ { "_index": "en_us_primary", "_type": "_doc", "_id": "product_2799_1", "_score": 9.393171, "_source": { "name": { "regular": "Vitality Vitamin D3" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_3068_1", "_score": 8.375387, "_source": { "name": { "regular": "Calmicid Antacid Supplement" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_861_1", "_score": 8.023193, "_source": { "name": { "regular": "Peak Performance Men Save $47.95" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_862_1", "_score": 7.4778767, "_source": { "name": { "regular": "Peak Performance Longevity 50+ Save $47.95" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_980_1", "_score": 7.229685, "_source": { "name": { "regular": "Peak Performance Brain Women Save $90.92" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_7798_1", "_score": 7.229685, "_source": { "name": { "regular": "Peak Performance Heart Men Save $93.92" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_2708_1", "_score": 6.8760505, "_source": { "name": { "regular": "Sei Bella Fortifying Vitamin Lotion" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_967_1", "_score": 6.7668524, "_source": { "name": { "regular": "Peak Performance Metabolic Pack Men Save $78.93" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_860_1", "_score": 6.45929, "_source": { "name": { "regular": "Peak Performance Women Save $47.95" } } }, { "_index": "en_us_primary", "_type": "_doc", "_id": "product_592_1", "_score": 6.414261, "_source": { "name": { "regular": "Peak Performance Total Men Save $131.89" } } } ] } }

Possible Causes and Solutions

While your configuration seems correct based on the documentation for simple contraction synonyms, there are a few potential reasons why you might be observing this unexpected behavior:

  1. Analyzer Application: Ensure that the custom_en analyzer is correctly applied to the name.regular.en field in your index mapping. If the analyzer is not properly set for this field, the synonyms filter won’t be applied during indexing and querying. Double-check your index mappings to confirm the analyzer configuration.

  2. Query Analyzer vs. Index Analyzer: It’s crucial that the same analyzer (custom_en in this case) is used for both indexing and querying. If you’re using a different analyzer at query time, the synonym filter might not be applied, leading to unexpected matches. In your query, you’re using multi_match which by default should use the index analyzer, but it’s worth explicitly confirming this or testing with a match query against name.regular.en to ensure the analyzer is being used as expected.

  3. Synonym Format and Parsing: While less likely with the explicit “=>” syntax, ensure there are no hidden characters or formatting issues in your synonym rules that could be misinterpreting the contraction. Try simplifying the synonym rule to just "vitamin => peak performance" to see if the issue persists, ruling out any potential problems with the comma separation or the “supplement” term.

  4. Elasticsearch Version Specific Behavior: Although the documentation indicates expand: false is ignored for explicit mappings, there might be subtle version-specific behaviors in Elasticsearch 6.2 that are causing this. As a test, you could try removing "expand": "false" altogether from your synonym filter definition. While it should be ignored for explicit mappings, removing it might inadvertently resolve an underlying, version-specific parsing or processing issue.

  5. Testing with Analyze API: Use the Elasticsearch Analyze API to directly inspect how your analyzer processes terms. Analyze both “vitamin” and “peak performance” using the custom_en analyzer. This will show you the token stream and whether the synonyms are being applied as you expect during the analysis process. This is a critical step to debug analyzer behavior independently of queries. For example:

    POST your_index_name/_analyze
    {
      "analyzer": "custom_en",
      "text": "vitamin"
    }

    and

    POST your_index_name/_analyze
    {
      "analyzer": "custom_en",
      "text": "peak performance"
    }

    Examine the output tokens to see if “vitamin” is being correctly contracted to “peak performance” and vice-versa in both analysis outputs.

Refining Synonyms for Contraction Behavior

If, after these checks, the issue remains, consider a slightly different approach to ensure contraction behavior. While explicit mappings with “=>” should enforce contraction, in some scenarios, particularly with older versions or complex analysis chains, you might need to refine your strategy.

  • One-Way Synonyms (If Supported): Some older synonym formats or plugins might offer a more direct way to define one-way synonyms. However, with the standard synonym filter and explicit mappings, the “=>” syntax is designed for contraction.

  • Careful Analyzer Order: Ensure the synonyms_en filter is placed before other filters like english_stemmer if stemming might be interfering with the synonym matching. In your current configuration, synonyms_en is correctly placed after lowercase and english_possessive_stemmer and before english_stop and english_stemmer, which is generally a good order.

By systematically checking these points and utilizing the Analyze API, you should be able to pinpoint why your simple contraction synonyms are behaving like equivalent synonyms and rectify the configuration to achieve the desired one-way mapping from “vitamin” and “supplement” to “peak performance”. Remember to test your changes thoroughly after each adjustment to confirm the intended behavior.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *