ES浮点数被识别为字符串

最近准备把ES的版本从5.1.2升到6.2.4，将Kafka的数据写入ES的工具类ESPersistor需要进行相应api的调整。在5.1.2的java api中，使用IndexRequest.source(String source)来设置要写入的json字符串，但在6.2.4中这个函数已经被移除，可选的替代者有以下几种（source的重载函数还有很多，但这里不在讨论范围内）

1 2	IndexRequest.source(String source, XContentType xContentType) IndexRequest.source(Map source, XContentType contentType)

第一种写法没有问题，指定XContentType.JSON就和之前版本的写入效果完全一样

第二种写法就发生了比较诡异的现象，假如json字符串中有值为浮点数，比如{“value”: 0.1}，写入ES之后类型并不是float，而是text。假如字段value之前并不存在，那么ES会自动创建类型为text的字段value，后续就没办法对value做数值类型的计算了。那么为什么浮点类型会被认为是字符串呢？看代码

public IndexRequest source(Map source, XContentType contentType) throws ElasticsearchGenerationException {
    try {
        XContentBuilder builder = XContentFactory.contentBuilder(contentType);
        builder.map(source);
        return source(builder);
    } catch (IOException e) {
        throw new ElasticsearchGenerationException("Failed to generate [" + source + "]", e);
    }
}

参数Map source实际上是会被转换成XContentBuilder来处理，再看builder.map(source);

private XContentBuilder map(Map<String, ?> values, boolean ensureNoSelfReferences) throws IOException {
    if (values == null) {
        return nullValue();
    }

    // checks that the map does not contain references to itself because
    // iterating over map entries will cause a stackoverflow error
    if (ensureNoSelfReferences) {
        CollectionUtils.ensureNoSelfReferences(values);
    }

    startObject();
    for (Map.Entry<String, ?> value : values.entrySet()) {
        field(value.getKey());
        // pass ensureNoSelfReferences=false as we already performed the check at a higher level
        unknownValue(value.getValue(), false);
    }
    endObject();
    return this;
}

先检查json(map)中是否有自我引用，然后遍历所有Entry，将key/value写到XContentBuilder中，再看看值是怎么写入的unknownValue(value.getValue(), false);

private void unknownValue(Object value, boolean ensureNoSelfReferences) throws IOException {
    if (value == null) {
        nullValue();
        return;
    }
    Writer writer = WRITERS.get(value.getClass());
    if (writer != null) {
        writer.write(this, value);
    } else if (value instanceof Path) {
        //Path implements Iterable<Path> and causes endless recursion and a StackOverFlow if treated as an Iterable here
        value((Path) value);
    } else if (value instanceof Map) {
        map((Map<String,?>) value, ensureNoSelfReferences);
    } else if (value instanceof Iterable) {
        value((Iterable<?>) value, ensureNoSelfReferences);
    } else if (value instanceof Object[]) {
        values((Object[]) value, ensureNoSelfReferences);
    } else if (value instanceof Calendar) {
        value((Calendar) value);
    } else if (value instanceof ReadableInstant) {
        value((ReadableInstant) value);
    } else if (value instanceof BytesReference) {
        value((BytesReference) value);
    } else if (value instanceof ToXContent) {
        value((ToXContent) value);
    } else {
        // This is a "value" object (like enum, DistanceUnit, etc) just toString() it
        // (yes, it can be misleading when toString a Java class, but really, jackson should be used in that case)
        value(Objects.toString(value));
    }
}

判断value的类型，如果是ES标准数据类型，直接从WRITERS中获取相应的Writer写入，例如对于Float，调用(builder, value) -> builder.value((Float) value)写入；对于其他类型，调用相应的value重载函数写入；如果列举的类型都不匹配，则当做字符串来处理

DEBUG一下，发现JSONObject {“value”: 0.1} 执行到这的时候，value.getClass()居然是BigDecimal，跟所有列举的类型都不匹配，所有就当字符串处理了，写入ES时就成了{“value”: “0.1”}，那么为什么值的类型会变成BigDecimal呢？测试一下

1
2
3

JSONObject data = new JSONObject();
data.put("value", 0.1);
System.out.println(data.get("value").getClass());

打印结果是class java.lang.Double，没有问题，这是new一个JSONObject的情况，再测试一下字符串parse成JSONObject的情况

1 2	JSONObject data = JSON.parseObject("{\"value\":0.1}"); System.out.println(data.get("value").getClass());

打印结果是class java.math.BigDecimal，OK破案了，真凶是fastjson，它在parseObject时会把Float识别为BigDecimal，看一下源码，parseObject会调用parse(String text, int features)函数，features的值是常量JSON.DEFAULT_PARSER_FEATURE，这个常量是由一系列的Feature位或计算出来的

public static int              DEFAULT_PARSER_FEATURE;
static {
    int features = 0;
    features |= Feature.AutoCloseSource.getMask();
    features |= Feature.InternFieldNames.getMask();
    features |= Feature.UseBigDecimal.getMask();
    features |= Feature.AllowUnQuotedFieldNames.getMask();
    features |= Feature.AllowSingleQuotes.getMask();
    features |= Feature.AllowArbitraryCommas.getMask();
    features |= Feature.SortFeidFastMatch.getMask();
    features |= Feature.IgnoreNotMatch.getMask();
    DEFAULT_PARSER_FEATURE = features;
}

其中有个Feature是UseBigDecimal，这个Feature会使得DefaultJSONParser中会把Float转成BigDecimal

case LITERAL_FLOAT:
    Object value = lexer.decimalValue(lexer.isEnabled(Feature.UseBigDecimal));
    lexer.nextToken();
    return value;

问题根源找到了，解决也就不难了，自定义一个features，把Feature.UseBigDecimal从DEFAULT_PARSER_FEATURE中用异或去掉，然后JSON.parse使用自定义的features就可以了

1
2
3

int features = JSON.DEFAULT_PARSER_FEATURE ^ Feature.UseBigDecimal.getMask();
JSONObject data = (JSONObject) JSON.parse(data.toJSONString(), features);
System.out.println(data.get("value").getClass());

打印class java.lang.Double，解决