Glen Mazza's Weblog

https://glenmazza.net/blog/date/20181027 Saturday October 27, 2018

ElasticSearch Notes: Complex Date Filtering, Bulk Updates

Some things learned this past week with ElasticSearch:

Advanced Date Searches: A event search page my company provides for its Pro customers allows for filtering by start date and end date, however some events do not have an end date defined. We decided to have differing business rules on what the start and end dates will filter based on whether or not the event has an end date, specifically:

  • If an event has both start and end dates:
    1. The start date of the range filter, if provided, must be before the end date of the event
    2. The end date of the range filter, if provided, must be after the start date of the event
  • If an event does not have an end date:
    1. The start date of the range filter, if provided, must be before the start date of the event
    2. The end date of the range filter, if provided, must be after the start date of the event

The above business logic had to be implemented in Java but as an intermediate step I first worked out an ElasticSearch query out of it using Kibana. Creating the query first helps immensely in the subsequent conversion to code. For the ElasticSearch query, this is what I came up with (using arbitrary sample dates to test the queries):

GET events-index/_search
{
    "query": {
    "bool": {
        "should" : [
        {"bool" :
            {"must": [
                { "exists": { "field": "eventMeta.dateEnd" }},
                { "range" : { "eventMeta.dateStart": { "lte": "2018-09-01"}}},
                { "range" : { "eventMeta.dateEnd": { "gte": "2018-10-01"}}}
                ]
            }
        },
        {"bool" :
            {"must_not": { "exists": { "field": "eventMeta.dateEnd"}},
             "must": [
                { "range" : { "eventMeta.dateStart": { "gte": "2018-01-01", "lte": "2019-12-31"}}}
                ]
            }
        }
    ]
    }
}
}

As can be seen above, I first used a nested Bool query to separate the two main cases, namely events with and without and end date. The should at the top-level bool acts as an OR, indicating documents fitting either situation are desired. I then added the additional date requirements that need to hold for each specific case.

With the query now available, mapping to Java code using ElasticSearch's QueryBuilders (API) was very pleasantly straightforward, one can see the roughly 1-to-1 mapping of the code to the above query (the capitalized constants in the code refer to the relevant field names in the documents):

private QueryBuilder createEventDatesFilter(DateFilter filter) {

    BoolQueryBuilder mainQuery = QueryBuilders.boolQuery();

    // query modeled as a "should" (OR), divided by events with and without an end date,
    // with different filtering rules for each.
    BoolQueryBuilder hasEndDateBuilder = QueryBuilders.boolQuery();
    hasEndDateBuilder.must().add(QueryBuilders.existsQuery(EVENT_END_DATE));
    hasEndDateBuilder.must().add(fillDates(EVENT_START_DATE, null, filter.getStop()));
    hasEndDateBuilder.must().add(fillDates(EVENT_END_DATE, filter.getStart(), null));
    mainQuery.should().add(hasEndDateBuilder);

    BoolQueryBuilder noEndDateBuilder = QueryBuilders.boolQuery();
    noEndDateBuilder.mustNot().add(QueryBuilders.existsQuery(EVENT_END_DATE));
    noEndDateBuilder.must().add(fillDates(EVENT_START_DATE, filter.getStart(), filter.getStop()));
    mainQuery.should().add(noEndDateBuilder);

    return mainQuery;
}

Bulk Updates: We use a "sortDate" field to indicate the specific date front ends should use for sorting results (whether ascending or descending, and regardless of the actual source of the date used to populate that field). For our news stories we wanted to rely on the last update date for stories that have been updated since their original publish, the published date otherwise. For certain older records loaded it turned out that the sortDate was still at the publishedDate when it should have been set to the updateDate. For research I used the following query to determine the extent of the problem:

GET news-index/_search
{
   "query": {
   "bool": {
      "must": [
         { "exists": { "field": "meta.updateDate" }},
         {
            "script": {
               "script": "doc['meta.dates.sortDate'].value.getMillis() < doc['meta.updateDate'].value.getMillis()"
            }
         }
      ]
   }
    }
}

For the above query I used a two part Bool query, first checking for a non-null updateDate in the first clause and then a script clause to find sortDates preceding updateDates. (I found I needed to use .getMillis() for the inequality check to work.)

Next, I used ES' Update by Query API to do an all-at-once update of the records. The API has two parts, an optional query element to indicate the documents I wish to have updated (strictly speaking, in ES, to be replaced with a document with the requested changes) and a script element to indicate the modifications I want to have done to those documents. For my case:

POST news-index/_update_by_query
{
   "script": {
   "source": "ctx._source.meta.dates.sortDate = ctx._source.meta.updateDate",
   "lang": "painless"
},
   "query": {
      "bool": {
         "must": [
            { "exists": { "field": "meta.updateDate" }},
            {
               "script": {
                  "script": "doc['meta.dates.sortDate'].value.getMillis() < doc['meta.updateDate'].value.getMillis()"
               }
            }
         ]
      }
   }
}

For running your own updates, good to test first by making a do-nothing update in the script (e.g., set sortDate to sortDate) and specifying just one document to be so updated, which can be done by adding a document-specific match requirement to the filter query (e.g., { "match": { "id": "...." }},"). Kibana should report that just one document was "updated", if so switch to the desired update to confirm that single record was updated properly, and then finally remove the match filter to have all desired documents updated.

https://glenmazza.net/blog/date/20181007 Sunday October 07, 2018

Using functions with a single generic method to convert lists

For converting from a Java collection say List<Foo> to any of several other collections List<Bar1>, List<Bar2>, ... rather than create separate FooListToBar1List, FooListToBar2List, ... methods a single generic FooListToBarList method and a series of Foo->Bar1, Foo->Bar2... converter functions can be more succinctly used. The below example converts a highly simplified List of SaleData objects to separate Lists of Customer and Product information, using a common generic saleDataListToItemList(saleDataList, converterFunction) method along with passed-in converter functions saleDataToCustomer and saleDataToProduct. Of particular note is how the converter functions are specified in the saleDataListToItemList calls. In the case of saleDataToCustomer, which takes two arguments (the SailData object and a Region string), a lambda expression is used, while the Product converter can be specified as a simple method reference due to it having only one parameter (the SailData object).

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;
import java.util.function.Function;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class Main {

    public static void main(String[] args) {

        List saleDataList = new ArrayList<>();
        saleDataList.add(new SaleData("Bob", "radio"));
        saleDataList.add(new SaleData("Sam", "TV"));
        saleDataList.add(new SaleData("George", "laptop"));

        List customerList = saleDataListToItemList(saleDataList, sd -> Main.saleDataToCustomerWithRegion(sd, "Texas"));
        System.out.println("Customers: ");
        customerList.forEach(System.out::println);

        List productList = saleDataListToItemList(saleDataList, Main::saleDataToProduct);
        System.out.println("Products: ");
        productList.forEach(System.out::println);
    }

    private static  List saleDataListToItemList(List sdList, Function converter) {
        // handling potentially null sdList:  https://stackoverflow.com/a/43381747/1207540
        return Optional.ofNullable(sdList).map(List::stream).orElse(Stream.empty()).map(converter).collect(Collectors.toList());
    }

    private static Product saleDataToProduct(SaleData sd) {
        return new Product(sd.getProductName());
    }

    private static Customer saleDataToCustomerWithRegion(SaleData sd, String region) {
        return new Customer(sd.getCustomerName(), region);
    }

    private static class SaleData {
        private String customerName;
        private String productName;

        SaleData(String customerName, String productName) {
            this.customerName = customerName;
            this.productName = productName;
        }

        String getProductName() {
            return productName;
        }

        String getCustomerName() {
            return customerName;
        }

    }

    private static class Product {
        private String name;

        Product(String name) {
            this.name = name;
        }

        @Override
        public String toString() {
            return "Product{" +
                    "name='" + name + '\'' +
                    '}';
        }
    }

    private static class Customer {
        private String name;
        private String region;

        Customer(String name, String region) {
            this.name = name;
            this.region = region;
        }

        @Override
        public String toString() {
            return "Customer{" +
                    "name='" + name + '\'' +
                    ", region='" + region + '\'' +
                    '}';
        }
    }

}

Output from running:

Customers: 
Customer{name='Bob', region='Texas'}
Customer{name='Sam', region='Texas'}
Customer{name='George', region='Texas'}
Products: 
Product{name='radio'}
Product{name='TV'}
Product{name='laptop'}