干货总结|快速构造String对象及访问其内部成员的技巧

一、相关知识JDK String的实现

字符串在JDK 8及JDK 9之后的实现有很大不同的。JDK 8中,String的结构是这样的:

1.1 String JDK 8的实现

class String {
char[] value;
// 构造函数会拷贝
public String(char value[]) {
this.value = Arrays.copyOf(value, value.length);
}
// 无拷贝构造函数
String(char[] value, boolean share) {
// assert share : "unshared not supported";
this.value = value;
}
}

1.2 String JDK 9及之后版本的实现

class String {
static final byte LATIN1 = 0;
static final byte UTF16  = 1;
byte code;
byte[] value;
// 无拷贝构造函数
String(byte[] value, byte coder) {
this.value = value;
this.coder = coder;
}
}
JDK9之后,通过byte[]来保存value,通过code字段区分是LATIN1或者UTF16。大多数的字符串都是LATIN1。针对这种情况,我们构造字符串或者对字符串进行编码为二进制时,针对性做ZeroCopy的实现,可以获得极致的性能。

二、相关知识Unsafe

JDK 8之后提供sun.Unsafe可以做一些原生的操作,性能更好,不安全,错误的调用会导致JVM Crash。如果用对了,能提升性能。Unsafe能帮你绕过任何限制。

public class UnsafeUtils {
public static final Unsafe UNSAFE;
static {
Unsafe unsafe = null;
try {
Field theUnsafeField = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafeField.setAccessible(true);
unsafe = (Unsafe) theUnsafeField.get(null);
} catch (Throwable ignored) {
// ignored
}
UNSAFE = unsafe;
}
}

三、相关知识Trusted MethodHandles.Lookup

JDK 8开始支持Lambda,为了方便将一个Method映射为一个Lambda Function,避免反射开销。java.invoke.LambdaMetafactory可以实现这一功能,但这个也受限于可见性的限制,也就是说不能调用私有方法。有一个技巧,结合Unsafe,可以在不同版本的JDK都能构造一个Trusted MethodHandles.Lookup来绕开可见性的限制,调用任何JDK内部方法。如下:

import static com.alibaba.fastjson2.util.UnsafeUtils.UNSAFE;
static final MethodHandles.Lookup IMPL_LOOKUP;
static {
Class lookupClass = MethodHandles.Lookup.class;
Field implLookup = lookupClass.getDeclaredField("IMPL_LOOKUP");
long fieldOffset = UNSAFE.staticFieldOffset(implLookup);
IMPL_LOOKUP = (MethodHandles.Lookup) UNSAFE.getObject(lookupClass, fieldOffset);
}
static MethodHandles.Lookup trustedLookup(Class objectClass) throws Exception {
return IMPL_LOOKUP.in(objectClass);
}
注意:在IBM OpenJ9 JDK 8/11版本上面的实现受到可见性限制,需要做额外处理,参考FASTJSON2 JDKUtils#trustedLookup的代码 :
https://github.com/alibaba/fastjson2/blob/fastcode_demo_20221218/core/src/main/java/com/alibaba/fastjson2/util/JDKUtils.java#L254

四、零拷贝构造String对象

快速构造字符串的关键是要做减少拷贝,甚至零拷贝,在JDK 8、JDK 9~15、JDK 16及之后的版本的实现都不一样。

4.1 JDK 8零拷贝构造String对象的实现

在JDK8中,实现零拷贝构造String对象,需要调用其构造函数String(char[], boolean),比如:

BiFunction<char[], Boolean, String>  stringCreatorJDK8
= (char[] value, boolean share) -> new String(chars, boolean);
由于String(char[], boolean)方法不是public的,上面的代码会报错,要通过反射构造一个TRUSTED的MethodHandles.Lookup,然调用String的内部方法,映射成一个BiFunction<char[], Boolean, String>,代码如下:
import com.alibaba.fastjson2.util.JDKUtils;
import java.util.function.BiFunction;
import java.lang.invoke.MethodHandles;
import static java.lang.invoke.MethodType.methodType;
MethodHandles.Lookup caller = JDKUtils.trustedLookup(String.class);
MethodHandle handle = caller.findConstructor(
String.class,
methodType(void.class, char[].class, boolean.class)
);
CallSite callSite = LambdaMetafactory.metafactory(
caller,
"apply",
methodType(BiFunction.class),
methodType(Object.class, Object.class, Object.class),
handle,
methodType(String.class, char[].class, boolean.class)
);
BiFunction<char[], Boolean, String>  STRING_CREATOR_JDK8
= (BiFunction<char[], Boolean, String>)
callSite.getTarget().invokeExact();

4.2 JDK9及之后版本实现零拷贝构造String对象的实现

在JDK 9~JDK 15中,我们要构造一个这样的Function用于零拷贝构造String对象:

BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11
= (byte[] value, byte coder) -> new String(value, coder);
同样,JDK 9中的String(byte[], byte)方法不是public,无法直接调用,上面的代码会报错,要构造一个TRUSTED MethodHandles.Lookup方法调用String内部方法,如下:

import com.alibaba.fastjson2.util.JDKUtils;
import static java.lang.invoke.MethodType.methodType;
MethodHandles.Lookup caller = JDKUtils.trustedLookup(String.class);
MethodHandle handle = caller.findConstructor(
String.class,
methodType(void.class, byte[].class, byte.class)
);
CallSite callSite = LambdaMetafactory.metafactory(
caller,
"apply",
methodType(BiFunction.class),
methodType(Object.class, Object.class, Object.class),
handle,
methodType(String.class, byte[].class, Byte.class)
);
BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11
= (BiFunction<byte[], Byte, String>)
callSite.getTarget().invokeExact();
注意:当用户配置JVM参数-XX:-CompactStrings时,上述方法无效。

4.3 快速构造String对象应用举例

stiatic BiFunction<char[], Boolean, String>  STRING_CREATOR_JDK8 = ...
static BiFunction<byte[], Byte, String> STRING_CREATOR_JDK11 = ...
static String formatYYYYMMDD(LocalDate date) {
int year = date.getYear();
int month = date.getMonthValue();
int dayOfMonth = date.getDayOfMonth();
int y0 = year / 1000 + '0';
int y1 = (year / 100) % 10 + '0';
int y2 = (year / 10) % 10 + '0';
int y3 = year % 10 + '0';
int m0 = month / 10 + '0';
int m1 = month % 10 + '0';
int d0 = dayOfMonth / 10 + '0';
int d1 = dayOfMonth % 10 + '0';
String str;
if (STRING_CREATOR_JDK11 != null) {
byte[] bytes = new byte[10];
bytes[0] = (byte) y0;
bytes[1] = (byte) y1;
bytes[2] = (byte) y2;
bytes[3] = (byte) y3;
bytes[4] = '-';
bytes[5] = (byte) m0;
bytes[6] = (byte) m1;
bytes[7] = '-';
bytes[8] = (byte) d0;
bytes[9] = (byte) d1;
str = STRING_CREATOR_JDK11.apply(bytes, JDKUtils.LATIN1);
} else {
char[] chars = new char[10];
chars[0] = (char) y1;
chars[1] = (char) y2;
chars[2] = (char) y3;
chars[3] = (char) y4;
chars[4] = '-';
chars[5] = (char) m0;
chars[6] = (char) m1;
chars[7] = '-';
chars[8] = (char) d0;
chars[9] = (char) d1;
if (STRING_CREATOR_JDK8 != null) {
str = STRING_CREATOR_JDK8.apply(chars, Boolean.TRUE);
} else {
str = new String(chars);
}
}
return str;
}
上面的例子中,根据JDK版本,在JDK 8中直接创建char[],JDK 9中直接创建byte[],然后通过零拷贝的方式构造字符串对象,这样就实现了快速格式化LocalDate到String,这样的实现远比使用SimpleDateFormat/java.time.DateTimeFormat等实现要快得多。

五、直接访问String对象内部成员

5.1 JDK 8快速访问value

static final Field FIELD_STRING_VALUE;
static final long FIELD_STRING_VALUE_OFFSET;
static {
Field field = null;
long fieldOffset = -1;
try {
field = String.class.getDeclaredField("value");
fieldOffset = UnsafeUtils.objectFieldOffset(field);
} catch (Exception ignored) {
FIELD_STRING_ERROR = true;
}
FIELD_STRING_VALUE = field;
FIELD_STRING_VALUE_OFFSET = fieldOffset;
}
public static char[] getCharArray(String str) {
if (!FIELD_STRING_ERROR) {
try {
return (char[]) UnsafeUtils.UNSAFE.getObject(
str,
FIELD_STRING_VALUE_OFFSET
);
} catch (Exception ignored) {
FIELD_STRING_ERROR = true;
}
}
return str.toCharArray();
}

5.2 JDK 9及之后版本直接访问coder & value

我们需要构造如下的函数:

ToIntFunction<String> stringCoder = (String str) -> str.coder();
Function<String, byte[]> stringValue = (String str) -> str.value();
但由于String.coder和value方法不是public可见的,和上面的4.2类似,要通过TRUSTED MethodHandles.Lookup构造,如下:
import com.alibaba.fastjson2.util.JDKUtils;
import static java.lang.invoke.MethodType.methodType;
MethodHandles.Lookup lookup = JDKUtils.trustedLookup(String.class);
MethodHandle coder = lookup.findSpecial(
String.class,
"coder",
methodType(byte.class),
String.class
);
CallSite applyAsInt = LambdaMetafactory.metafactory(
lookup,
"applyAsInt",
methodType(ToIntFunction.class),
methodType(int.class, Object.class),
coder,
MethodType.methodType(byte.class, String.class)
);
ToIntFunction<String> STRING_CODER
= (ToIntFunction<String>) applyAsInt.getTarget().invokeExact();
MethodHandle value = lookup.findSpecial(
String.class,
"value",
methodType(byte[].class),
String.class
);
CallSite apply = LambdaMetafactory.metafactory(
lookup,
"apply",
methodType(Function.class),
methodType(Object.class, Object.class),
value,
methodType(byte[].class, String.class)
);
Function<String, byte[]> STRING_VALUE
= (Function<String, byte[]>) apply.getTarget().invokeExact();

5.3 直接访问举例

static Byte LATIN1 = 0;
static ToIntFunction<String> STRING_CODER = ...
static Function<String, byte[]> STRING_VALUE ...
byte[] buf = ...;
int off;
void writeString(string str) {
if (STRING_CODER != null && STRING_VALUE != null) {
// improved for JDK 9 LATIN1
int coder = stringCoder.apply(str);
if (coder == LATIN1) {
// str.getBytes(0, str.length, buf, off);
byte[] value = STRING_VALUE.apply(str);
System.arrayCopy(value, 0, buf, off, value.length);
return;
}
}
// normal logic
}

5.4 巧用String.getBytes方法

String有一个Deprecated的getBytes方法,当有非LATIN字符时,结果不对。但当在coder为LATIN1时,可用于直接拷贝其中value,

class String {
@Deprecated
public void getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) {
int j = dstBegin;
int n = srcEnd;
int i = srcBegin;
char[] val = value;   /* avoid getfield opcode */
while (i < n) {
dst[j++] = (byte)val[i++];
}
}
}

static Byte LATIN1 = 0;
static ToIntFunction<String> STRING_CODER = ...
byte[] buf = ...;
int off;
void writeString(string str) {
if (STRING_CODER != null) {
// improved for JDK 9 LATIN1
int coder = STRING_CODER.apply(str);
if (coder == LATIN1) {
str.getBytes(0, str.length, buf, off);
return;
}
}
// normal logic
}

参考实现:

FASTJSON2项目使用了上面的技巧,其中JDKUtils和UnsafeUtils有上面技巧的实现:

注意事项:

上面的技巧不建议新手使用,需要先清楚原理,才可使用。
作者|温绍锦(高铁)

发表回复