ES浮点数被识别为字符串

最近准备把ES的版本从5.1.2升到6.2.4,将Kafka的数据写入ES的工具类ESPersistor需要进行相应api的调整。在5.1.2的java api中,使用IndexRequest.source(String source)来设置要写入的json字符串,但在6.2.4中这个函数已经被移除,可选的替代者有以下几种(source的重载函数还有很多,但这里不在讨论范围内)

1
2
IndexRequest.source(String source, XContentType xContentType)
IndexRequest.source(Map source, XContentType contentType)

第一种写法没有问题,指定XContentType.JSON就和之前版本的写入效果完全一样

第二种写法就发生了比较诡异的现象,假如json字符串中有值为浮点数,比如{“value”: 0.1},写入ES之后类型并不是float,而是text。假如字段value之前并不存在,那么ES会自动创建类型为text的字段value,后续就没办法对value做数值类型的计算了。那么为什么浮点类型会被认为是字符串呢?看代码

1
2
3
4
5
6
7
8
9
public IndexRequest source(Map source, XContentType contentType) throws ElasticsearchGenerationException {
try {
XContentBuilder builder = XContentFactory.contentBuilder(contentType);
builder.map(source);
return source(builder);
} catch (IOException e) {
throw new ElasticsearchGenerationException("Failed to generate [" + source + "]", e);
}
}

参数Map source实际上是会被转换成XContentBuilder来处理,再看builder.map(source);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private XContentBuilder map(Map<String, ?> values, boolean ensureNoSelfReferences) throws IOException {
if (values == null) {
return nullValue();
}

// checks that the map does not contain references to itself because
// iterating over map entries will cause a stackoverflow error
if (ensureNoSelfReferences) {
CollectionUtils.ensureNoSelfReferences(values);
}

startObject();
for (Map.Entry<String, ?> value : values.entrySet()) {
field(value.getKey());
// pass ensureNoSelfReferences=false as we already performed the check at a higher level
unknownValue(value.getValue(), false);
}
endObject();
return this;
}

先检查json(map)中是否有自我引用,然后遍历所有Entry,将key/value写到XContentBuilder中,再看看值是怎么写入的unknownValue(value.getValue(), false);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
private void unknownValue(Object value, boolean ensureNoSelfReferences) throws IOException {
if (value == null) {
nullValue();
return;
}
Writer writer = WRITERS.get(value.getClass());
if (writer != null) {
writer.write(this, value);
} else if (value instanceof Path) {
//Path implements Iterable<Path> and causes endless recursion and a StackOverFlow if treated as an Iterable here
value((Path) value);
} else if (value instanceof Map) {
map((Map<String,?>) value, ensureNoSelfReferences);
} else if (value instanceof Iterable) {
value((Iterable<?>) value, ensureNoSelfReferences);
} else if (value instanceof Object[]) {
values((Object[]) value, ensureNoSelfReferences);
} else if (value instanceof Calendar) {
value((Calendar) value);
} else if (value instanceof ReadableInstant) {
value((ReadableInstant) value);
} else if (value instanceof BytesReference) {
value((BytesReference) value);
} else if (value instanceof ToXContent) {
value((ToXContent) value);
} else {
// This is a "value" object (like enum, DistanceUnit, etc) just toString() it
// (yes, it can be misleading when toString a Java class, but really, jackson should be used in that case)
value(Objects.toString(value));
}
}

判断value的类型,如果是ES标准数据类型,直接从WRITERS中获取相应的Writer写入,例如对于Float,调用(builder, value) -> builder.value((Float) value)写入;对于其他类型,调用相应的value重载函数写入;如果列举的类型都不匹配,则当做字符串来处理

DEBUG一下,发现JSONObject {“value”: 0.1} 执行到这的时候,value.getClass()居然是BigDecimal,跟所有列举的类型都不匹配,所有就当字符串处理了,写入ES时就成了{“value”: “0.1”},那么为什么值的类型会变成BigDecimal呢?测试一下

1
2
3
JSONObject data = new JSONObject();
data.put("value", 0.1);
System.out.println(data.get("value").getClass());

打印结果是class java.lang.Double,没有问题,这是new一个JSONObject的情况,再测试一下字符串parse成JSONObject的情况

1
2
JSONObject data = JSON.parseObject("{\"value\":0.1}");
System.out.println(data.get("value").getClass());

打印结果是class java.math.BigDecimal,OK破案了,真凶是fastjson,它在parseObject时会把Float识别为BigDecimal,看一下源码,parseObject会调用parse(String text, int features)函数,features的值是常量JSON.DEFAULT_PARSER_FEATURE,这个常量是由一系列的Feature位或计算出来的

1
2
3
4
5
6
7
8
9
10
11
12
13
public static int              DEFAULT_PARSER_FEATURE;
static {
int features = 0;
features |= Feature.AutoCloseSource.getMask();
features |= Feature.InternFieldNames.getMask();
features |= Feature.UseBigDecimal.getMask();
features |= Feature.AllowUnQuotedFieldNames.getMask();
features |= Feature.AllowSingleQuotes.getMask();
features |= Feature.AllowArbitraryCommas.getMask();
features |= Feature.SortFeidFastMatch.getMask();
features |= Feature.IgnoreNotMatch.getMask();
DEFAULT_PARSER_FEATURE = features;
}

其中有个Feature是UseBigDecimal,这个Feature会使得DefaultJSONParser中会把Float转成BigDecimal

1
2
3
4
case LITERAL_FLOAT:
Object value = lexer.decimalValue(lexer.isEnabled(Feature.UseBigDecimal));
lexer.nextToken();
return value;

问题根源找到了,解决也就不难了,自定义一个features,把Feature.UseBigDecimal从DEFAULT_PARSER_FEATURE中用异或去掉,然后JSON.parse使用自定义的features就可以了

1
2
3
int features = JSON.DEFAULT_PARSER_FEATURE ^ Feature.UseBigDecimal.getMask();
JSONObject data = (JSONObject) JSON.parse(data.toJSONString(), features);
System.out.println(data.get("value").getClass());

打印class java.lang.Double,解决