类文件解析

2016/01/24 JVM

类文件(ClassFile)

之前在介绍类的双亲委派机制时,多次涉及到对ClassFileParser类中的parseClassFile()的调用。因为类的装载不仅仅是找到类对应的Class文件二进制流,更要解析出Class文件中包含的信息,将其转换为C/C++表示方式,这样虚拟机在运行过程中才能更方便的进行操作。

在介绍parseClassFile()方法之前,先认识一下Java虚拟机规范定义的Class文件格式及ClassFileParse类中定义的一些重要属性。

类文件的格式如下:

ClassFile {
    u4             magic;
 
    u2             minor_version;
    u2             major_version;
 
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
 
    u2             access_flags;
    u2             this_class;
    u2             super_class;
 
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
 
    u2             fields_count;
    field_info     fields[fields_count];
 
    u2             methods_count;
    method_info    methods[methods_count];
 
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

解析类(ClassFileParser)

HotSpot定义了ClassFileParser类来辅助读取及保存类解析的相关信息,类及重要属性的定义如下:

源代码位置:src/share/vm/classfile/classLoader.hpp

class ClassFileParser VALUE_OBJ_CLASS_SPEC {
 private:
  u2        _major_version;
  u2        _minor_version;
  Symbol*   _class_name;
  ClassLoaderData*         _loader_data;
  KlassHandle              _host_klass;
  GrowableArray<Handle>*   _cp_patches; // overrides for CP entries
 
 
  // class attributes parsed before the instance klass is created:
  bool        _synthetic_flag;
  int         _sde_length;
  char*       _sde_buffer;
  u2          _sourcefile_index;
  u2          _generic_signature_index;
 
  // Metadata created before the instance klass is created.  Must be deallocated
  // if not transferred to the InstanceKlass upon successful class loading
  // in which case these pointers have been set to NULL.
  instanceKlassHandle _super_klass;
  ConstantPool*    _cp;
  Array<u2>*       _fields;
  Array<Method*>*  _methods;
  Array<u2>*       _inner_classes;
  Array<Klass*>*   _local_interfaces;
  Array<Klass*>*   _transitive_interfaces;
  // ...
  InstanceKlass*   _klass;  // InstanceKlass once created.
  …
  ClassFileStream* _stream;              // Actual input stream
  ...
}

类中的属性通过名称就可以知道存储的相关信息,其中最主要的就是通过cp保存常量池信息、通过fields保存域信息、通过methods保存方法、通过klass保存类相关的信息。通过_stream属性可以方便地读取流的信息。

类还定义了许多重要的函数,例如解析常量池的parse_constant_pool()与parse_constant_pool_entries()函数、解析方法的parse_methods()函数、解析字段的parse_fields()函数等。 

文件流(_stream)

_stream这个属性保存的是字节码文件流。如果要读取Class文件的内容,首先需要获取文件对应的字节流,ClassFileStream 内部维护了一个buffer,该buffer指向Class文件所对应的字节流。

ClassFileStream对象是在ClassLoader::load_classfile()函数中创建的,这个方法在之前介绍类的双亲委派机制时提到过,当装载一个类时,可能会调用到SystemDictionary::load_instance_class()函数,而这个函数会体现出“双亲委派”的逻辑。如果使用启动类加载器,那么可能需要调用load_classfile()方法装载类。load_classfile()方法的实现如下:

源代码位置:src/share/vm/classfile/classLoader.cpp

instanceKlassHandle ClassLoader::load_classfile(Symbol* h_name, TRAPS) {
 
  stringStream st;
  st.print_raw(h_name->as_utf8());
  st.print_raw(".class");
  const char* name = st.as_string(); // 通过st获取对应的文件名
 
  // Lookup stream for parsing .class file
  ClassFileStream* stream = NULL;
  {
    ClassPathEntry* e = _first_entry;
    while (e != NULL) {
      stream = e->open_stream(name, CHECK_NULL);
      if (stream != NULL) {
        break;
      }
      e = e->next();
    }
  }
  ...
}

遍历class_path找到要加载的类文件,获取到文件的绝对路径后就创建ClassFileStream对象。ClassPathEntry 是一个链表结构(因为class path有多个),同时在ClassPathEntry中还声明了一个虚函数open_stream()。这样就可以通过循环遍历链表上的结构,直到查找到某个路径下名称为name的文件为止,这时候open_stream()函数会返回ClassFileStream实例。

在load_classfile()方法中获取到ClassFileStream实例后会调用ClassFileParser类中的parseClassFile()方法,如下:


instanceKlassHandle ClassLoader::load_classfile(Symbol* h_name, TRAPS) {
  // ...
 
  instanceKlassHandle h;
  if (stream != NULL) {
    // class file found, parse it
    ClassFileParser parser(stream);
    ClassLoaderData* loader_data = ClassLoaderData::the_null_class_loader_data();
    Handle protection_domain;
    TempNewSymbol parsed_name = NULL;
    instanceKlassHandle result = parser.parseClassFile(h_name,loader_data,protection_domain,parsed_name,false,CHECK_(h));
    // add to package table
    if (add_package(name, classpath_index, THREAD)) {
      h = result;
    }
  }
 
  return h;
}

调用parseClassFile()方法后返回表示Java类的instanceKlass对象,最终方法返回的是操作instanceKlass对象的句柄instanceKlassHandle。下一篇开始将详细介绍parseClassFile()方法的实现。

简单介绍一下ClassFileStream类中的一些被频繁调用的方法,如下:


u1 ClassFileStream::get_u1(TRAPS) {
  return *_current++;
}
 
u2 ClassFileStream::get_u2(TRAPS) {
  u1* tmp = _current;
  _current += 2;
  return Bytes::get_Java_u2(tmp);
}
 
u4 ClassFileStream::get_u4(TRAPS) {
  u1* tmp = _current;
  _current += 4;
  return Bytes::get_Java_u4(tmp);
}
 
u8 ClassFileStream::get_u8(TRAPS) {
  u1* tmp = _current;
  _current += 8;
  return Bytes::get_Java_u8(tmp);
}
 
void ClassFileStream::skip_u1(int length, TRAPS) {
  _current += length;
}
 
void ClassFileStream::skip_u2(int length, TRAPS) {
  _current += length * 2;
}
 
void ClassFileStream::skip_u4(int length, TRAPS) {
  _current += length * 4;
}

Class文件由字节为单位的字节流组成,所有的16位、32位和64位长度的数据将被构造成 2个、4个和8个8字节单位来表示。多字节数据项总是按照Big-Endian的顺序进行存储,而x86等处理器则是使用了相反的Little-Endian顺序来存储数据。 因此,在x86平台上需要进行转换。代码如下:

源代码位置:openjdk/hotspot/src/cpu/x86/vm/bytes_x86.hpp

// Efficient reading and writing of unaligned unsigned data in Java
// byte ordering (i.e. big-endian ordering). Byte-order reversal is
// needed since x86 CPUs use little-endian format.
static inline u2   get_Java_u2(address p)           { return swap_u2(get_native_u2(p)); }
static inline u4   get_Java_u4(address p)           { return swap_u4(get_native_u4(p)); }
static inline u8   get_Java_u8(address p)           { return swap_u8(get_native_u8(p)); }

调用的相关函数如下:

源代码位置:openjdk/hotspot/src/cpu/x86/vm/bytes_x86.hpp

// Efficient reading and writing of unaligned unsigned data in platform-specific byte ordering
// (no special code is needed since x86 CPUs can access unaligned data)
static inline u2   get_native_u2(address p)         { return *(u2*)p; }
static inline u4   get_native_u4(address p)         { return *(u4*)p; }
static inline u8   get_native_u8(address p)         { return *(u8*)p; }

解析Class文件

类文件解析的入口是ClassFileParser类中定义的parseClassFile()方法。上一小节得到了文件字节流stream后,接着会在ClassLoader::load_classfile()函数中调用parseClassFile()函数,调用的源代码实现如下:

源代码位置:src/share/vm/classfile/classLoader.cpp

instanceKlassHandle h;
if (stream != NULL) {
    // class file found, parse it
    ClassFileParser parser(stream);
    ClassLoaderData* loader_data = ClassLoaderData::the_null_class_loader_data();
    Handle protection_domain;
    TempNewSymbol parsed_name = NULL;
    instanceKlassHandle result =
                            parser.parseClassFile(h_name,loader_data,protection_domain,parsed_name,false,CHECK_(h));
    // add to package table
    if (add_package(name, classpath_index, THREAD)) {
      h = result;
    }
}

另外还有一些函数也会在必要的时候调用parseClassFile()函数,如装载Java主类时调用的SystemDictionary::resolve_from_stream()函数等。

调用的parseClassFile()函数的实现如下:  

instanceKlassHandle parseClassFile(Symbol* name,
                                     ClassLoaderData* loader_data,
                                     Handle protection_domain,
                                     TempNewSymbol& parsed_name,
                                     bool verify,
                                     TRAPS) {
    KlassHandle no_host_klass;
    return parseClassFile(name, loader_data, protection_domain, no_host_klass, NULL, parsed_name, verify, THREAD);
}

调用的另外一个方法的原型如下:

instanceKlassHandle ClassFileParser::parseClassFile(Symbol* name,
                                                    ClassLoaderData* loader_data,
                                                    Handle protection_domain,
                                                    KlassHandle host_klass,
                                                    GrowableArray<Handle>* cp_patches,
                                                    TempNewSymbol& parsed_name,
                                                    bool verify,
                                                    TRAPS)

这个方法的实现太复杂,这里简单分几个步骤详细介绍。

1. 解析魔数、主版本号与次版本号

ClassFileStream* cfs = stream();
...
u4 magic = cfs->get_u4_fast();
guarantee_property(magic == JAVA_CLASSFILE_MAGIC,"Incompatible magic value %u in class file %s",magic, CHECK_(nullHandle));
// Version numbers
u2 minor_version = cfs->get_u2_fast();
u2 major_version = cfs->get_u2_fast();
…
_major_version = major_version;
_minor_version = minor_version;

读取魔数主要是为了验证值是否为0xCAFEBABE。读取到Class文件的主、次版本号并保存到ClassFileParser实例的major_version和minor_version中。 

2. 解析访问标识

// Access flags
AccessFlags access_flags;
jint flags = cfs->get_u2_fast() & JVM_RECOGNIZED_CLASS_MODIFIERS;
 
if ((flags & JVM_ACC_INTERFACE) && _major_version < JAVA_6_VERSION) {
    // Set abstract bit for old class files for backward compatibility
    flags |= JVM_ACC_ABSTRACT;
}
access_flags.set_flags(flags);

读取并验证访问标识,这个访问标识在进行字段及方法解析过程中会使用,主要用来判断这些字段或方法是定义在接口中还是类中。JVM_RECOGNIZED_CLASS_MODIFIERS是一个宏,定义如下:

#define JVM_RECOGNIZED_CLASS_MODIFIERS (JVM_ACC_PUBLIC     |    \
                                        JVM_ACC_FINAL      |    \
                                        JVM_ACC_SUPER      |    \  // 辅助invokespecial指令
                                        JVM_ACC_INTERFACE  |    \
                                        JVM_ACC_ABSTRACT   |    \
                                        JVM_ACC_ANNOTATION |    \
                                        JVM_ACC_ENUM       |    \
                                        JVM_ACC_SYNTHETIC)

最后一个标识符是由前端编译器(如Javac等)添加上去的,表示是合成的类型。

3. 解析当前类索引

类索引(this_class)是一个u2类型的数据,类索引用于确定这个类的全限定名。类索引指向常量池中类型为CONSTANT_Class_info的类描述符,再通过类描述符中的索引值找到常量池中类型为CONSTANT_Utf8_info的字符串。

// This class and superclass
u2 this_class_index = cfs->get_u2_fast();
 
Symbol*  class_name  = cp->unresolved_klass_at(this_class_index);
assert(class_name != NULL, "class_name can't be null");
 
// Update _class_name which could be null previously to be class_name
_class_name = class_name;

将读取到的当前类的名称保存到ClassFileParser实例的_class_name属性中。

调用的cp->unresolved_klass_at()方法的实现如下:

源代码位置:/hotspot/src/share/vm/oops/constantPool.hpp


// 未连接的返回Symbol*
// This method should only be used with a cpool lock or during parsing or gc
Symbol* unresolved_klass_at(int which) {     // Temporary until actual use
    intptr_t* oaar = obj_at_addr_raw(which);
    Symbol* tmp = (Symbol*)OrderAccess::load_ptr_acquire(oaar);
    Symbol* s = CPSlot(tmp).get_symbol();
    // check that the klass is still unresolved.
    assert(tag_at(which).is_unresolved_klass(), "Corrupted constant pool");
    return s;
}

举个例子如下:


#3 = Class         #17        // TestClass
...
#17 = Utf8          TestClass 

类索引为0x0003,去常量池里找索引为3的类描述符,类描述符中的索引为17,再去找索引为17的字符串,就是“TestClass”。调用obj_at_addr_raw()方法找到的是一个指针,这个指针指向表示“TestClass”这个字符串的Symbol对象,也就是在解析常量池项时会将本来存储索引值17替换为存储指向Symbol对象的指针。 

调用的obj_at_addr_raw()方法的实现如下:

intptr_t*   obj_at_addr_raw(int which) const {
    assert(is_within_bounds(which), "index out of bounds");
    return (intptr_t*) &base()[which];
}
intptr_t*   base() const {
  return (intptr_t*) (
     (  (char*) this  ) + sizeof(ConstantPool)
  );
}

base()是ConstantPool中定义的方法,所以this指针指向当前ConstantPool对象在内存中的首地址,加上ConstantPool类本身需要占用的内存大小后,指针指向了常量池相关信息,这部分信息通常就是length个指针宽度的数组,其中length为常量池数量。通过(intptr_t*)&base()[which]获取到常量池索引which对应的值,对于上面的例子来说就是一个指向Symbol对象的指针。 

4. 解析父类索引

父类索引(super_class)是一个u2类型的数据,父类索引用于确定这个类的父类全限定名。由于java语言不允许多重继承,所以父类索引只有一个。父类索指向常量池中类型为CONSTANT_Class_info的类描述符,再通过类描述符中的索引值找到常量池中类型为CONSTANT_Utf8_info的字符串。

u2 super_class_index = cfs->get_u2_fast();
instanceKlassHandle super_klass = parse_super_class(super_class_index,CHECK_NULL);

调用的parse_super()方法的实现如下:


instanceKlassHandle ClassFileParser::parse_super_class(int super_class_index,TRAPS) {
 
  instanceKlassHandle super_klass;
  if (super_class_index == 0) { // 当为java.lang.Object类时,没有父类
    check_property(_class_name == vmSymbols::java_lang_Object(),
                   "Invalid superclass index %u in class file %s",super_class_index,CHECK_NULL);
  } else {
    check_property(valid_klass_reference_at(super_class_index),
                   "Invalid superclass index %u in class file %s",super_class_index,CHECK_NULL);
    // The class name should be legal because it is checked when parsing constant pool.
    // However, make sure it is not an array type.
    bool is_array = false;
    constantTag mytemp = _cp->tag_at(super_class_index);
    if (mytemp.is_klass()) {
       super_klass = instanceKlassHandle(THREAD, _cp->resolved_klass_at(super_class_index));
    }
  }
  return super_klass;
}

如果类已经连接,那么可通过super_class_index直接找到表示父类的InstanceKlass实例,否则返回的值就是NULL。 

resolved_klass_at()方法的实现如下:

源代码位置:/hotspot/src/share/vm/oops/constantPool.hpp

// 已连接的返回Klass*
Klass* resolved_klass_at(int which) const {  // Used by Compiler
    // Must do an acquire here in case another thread resolved the klass
    // behind our back, lest we later load stale values thru the oop.
    Klass* tmp = (Klass*)OrderAccess::load_ptr_acquire(obj_at_addr_raw(which));
    return CPSlot(tmp).get_klass(); 
} 

其中的CPSlot类的实现如下:

class CPSlot VALUE_OBJ_CLASS_SPEC {
  intptr_t  _ptr;
 public:
  CPSlot(intptr_t ptr): _ptr(ptr) {}
  CPSlot(Klass*   ptr): _ptr((intptr_t)ptr) {}
  CPSlot(Symbol*  ptr): _ptr((intptr_t)ptr | 1) {} // 或上1表示已经解析过了,Symbol*本来不需要解析
 
  intptr_t value()     { return _ptr; }
  bool is_resolved()   { return (_ptr & 1) == 0; }
  bool is_unresolved() { return (_ptr & 1) == 1; }
 
  Symbol* get_symbol() {
    assert(is_unresolved(), "bad call");
    return (Symbol*)(_ptr & ~1);
  }
  Klass* get_klass() {
    assert(is_resolved(), "bad call");
    return (Klass*)_ptr;
  }
};  

5. 解析实现接口

接口表,interfaces[]数组中的每个成员的值必须是一个对constant_pool表中项目的一个有效索引值, 它的长度为 interfaces_count。每个成员interfaces[i] 必须为CONSTANT_Class_info类型常量,其中 0 ≤ i <interfaces_count。在interfaces[]数组中,成员所表示的接口顺序和对应的源代码中给定的接口顺序(从左至右)一样,即interfaces[0]对应的是源代码中最左边的接口。

u2 itfs_len = cfs->get_u2_fast();
Array<Klass*>* local_interfaces =
parse_interfaces(itfs_len, protection_domain, _class_name,&has_default_methods, CHECK_(nullHandle));

parse_interfaces()方法的实现如下:

Array<Klass*>* ClassFileParser::parse_interfaces(int     length,
                                                 Handle  protection_domain,
                                                 Symbol* class_name,
                                                 bool*   has_default_methods,
                                                 TRAPS
){
  if (length == 0) {
    _local_interfaces = Universe::the_empty_klass_array();
  } else {
    ClassFileStream* cfs = stream();
    _local_interfaces = MetadataFactory::new_array<Klass*>(_loader_data, length, NULL, CHECK_NULL);
 
    int index;
    for (index = 0; index < length; index++) {
      u2 interface_index = cfs->get_u2(CHECK_NULL);
      KlassHandle interf;
 
      if (_cp->tag_at(interface_index).is_klass()) {
        interf = KlassHandle(THREAD, _cp->resolved_klass_at(interface_index));
      } else {
        Symbol*  unresolved_klass  = _cp->klass_name_at(interface_index);
 
        Handle   class_loader(THREAD, _loader_data->class_loader());
 
        // Call resolve_super so classcircularity is checked
        Klass* k = SystemDictionary::resolve_super_or_fail(class_name,
                                                           unresolved_klass,
                               class_loader,
                               protection_domain,
                                                           false, CHECK_NULL);
        // 将表示接口的InstanceKlass实例封装为KlassHandle实例
        interf = KlassHandle(THREAD, k);
      }
 
      if (InstanceKlass::cast(interf())->has_default_methods()) {
         *has_default_methods = true;
      }
      _local_interfaces->at_put(index, interf());
    }
 
    if (!_need_verify || length <= 1) {
       return _local_interfaces;
    }
  }
  return _local_interfaces;
}

循环对类实现的每个接口进行处理,通过interface_index找到接口在C++类中的表示InstanceKlass实例,然后封装为KlassHandle后,存储到_local_interfaces数组中。需要注意的是,如何通过interface_index找到对应的InstanceKlass实例,如果接口索引在常量池中已经是对应的InstanceKlass实例,说明已经连接过了,直接通过_cp_resolved_klass_at()方法获取即可;如果只是一个字符串表示,需要调用SystemDictionary::resolve_super_or_fail()方法进行连接,这个方法在连接时会详细介绍,这里不做过多介绍。

klass_name_at()方法的实现如下:


Symbol* ConstantPool::klass_name_at(int which) {
  assert(tag_at(which).is_unresolved_klass() || tag_at(which).is_klass(),
         "Corrupted constant pool");
  // A resolved constantPool entry will contain a Klass*, otherwise a Symbol*.
  // It is not safe to rely on the tag bit's here, since we don't have a lock, and the entry and
  // tag is not updated atomicly.
  CPSlot entry = slot_at(which);
  if (entry.is_resolved()) { // 已经连接时,获取到的是指向InstanceKlass实例的指针
    // Already resolved - return entry's name.
    assert(entry.get_klass()->is_klass(), "must be");
    return entry.get_klass()->name();
  } else {  // 未连接时,获取到的是指向Symbol实例的指针
    assert(entry.is_unresolved(), "must be either symbol or klass");
    return entry.get_symbol();
  }
}

其中的slot_at()方法的实现如下:

CPSlot slot_at(int which) {
    assert(is_within_bounds(which), "index out of bounds");
    // Uses volatile because the klass slot changes without a lock.
    volatile intptr_t adr = (intptr_t)OrderAccess::load_ptr_acquire(obj_at_addr_raw(which));
    assert(adr != 0 || which == 0, "cp entry for klass should not be zero");
    return CPSlot(adr);
}

同样调用obj_at_addr_raw()方法,获取ConstantPool中对应索引处存储的值,然后封装为CPSlot对象返回即可。

6. 解析类属性

ClassAnnotationCollector parsed_annotations;``parse_classfile_attributes(&parsed_annotations, CHECK_(nullHandle));

调用parse_classfile_attributes()方法解析类属性,方法的实现比较繁琐,只需要按照各属性的格式来解析即可,有兴趣的读者可自行研究。

7.常量池解析

在调用ClassFileParser::parseClassFile()方法对类文件进行解释时,会调ClassFileParser::parse_constant_pool()方法对常量池进行解释,调用的语句如下:

constantPoolHandle cp = parse_constant_pool(CHECK_(nullHandle));

方法parse_constant_pool()的实现如下:


constantPoolHandle ClassFileParser::parse_constant_pool(TRAPS) {
  ClassFileStream* cfs = stream();
  constantPoolHandle nullHandle;
 
  u2 length = cfs->get_u2_fast();
  ConstantPool* constant_pool = ConstantPool::allocate(_loader_data, length,
                                                        CHECK_(nullHandle));
  _cp = constant_pool; // save in case of errors
  constantPoolHandle cp (THREAD, constant_pool);
  // ...
  // parsing constant pool entries
  parse_constant_pool_entries(length, CHECK_(nullHandle));
  return cp;
}

调用ConstantPool::allocate()创建ConstantPool对象,然后调用parse_constant_pool_entries()解析常量池中的项并将这些项保存到ConstantPool对象中。  

首先介绍一下ConstantPool类,这个类的对象代码具体的常量池,保存着常量池元信息。

1、ConstantPool类

类的定义如下:

class ConstantPool : public Metadata {
 private:
  Array<u1>*           _tags;        // the tag array describing the constant pool's contents
  ConstantPoolCache*   _cache;       // the cache holding interpreter runtime information 解释执行时的运行时信息
  InstanceKlass*       _pool_holder; // the corresponding class
  Array<u2>*           _operands;    // for variable-sized (InvokeDynamic) nodes, usually empty
 
  // Array of resolved objects from the constant pool and map from resolved
  // object index to original constant pool index
  jobject              _resolved_references; // jobject是指针类型
  Array<u2>*           _reference_map;
 
  int                  _flags;  // old fashioned bit twiddling
  int                  _length; // number of elements in the array
 
  union {
    // set for CDS to restore resolved references
    int                _resolved_reference_length;
    // keeps version number for redefined classes (used in backtrace)
    int                _version;
  } _saved;
 
  Monitor*             _lock;
  ...
}

类表示常量池元信息,所以继承了类Metadata。_tags表示常量池中的内容,常量池中的总项数通过_length来保存,所以_tags数组的长度也为_length,具体存储的内容就是每一项的tag值,这都是虚拟机规范定义好的;_cache辅助解释运行来保存一些信息,在介绍解释运行时会介绍。其它的属性暂时不做过多介绍。

2、创建ConstantPool实例

在解析常量池的方法ClassFileParser::parse_constant_pool()中首先会调用ConstantPool::allocate()方法创建ConstantPool实例,方法的实现如下:

ConstantPool* ConstantPool::allocate(ClassLoaderData* loader_data, int length, TRAPS) {
  // Tags are RW but comment below applies to tags also.
  Array<u1>* tags = MetadataFactory::new_writeable_array<u1>(loader_data, length, 0, CHECK_NULL);
 
  int size = ConstantPool::size(length);
 
  // CDS considerations:
  // Allocate read-write but may be able to move to read-only at dumping time
  // if all the klasses are resolved.  The only other field that is writable is
  // the resolved_references array, which is recreated at startup time.
  // But that could be moved to InstanceKlass (although a pain to access from
  // assembly code).  Maybe it could be moved to the cpCache which is RW.
  return new (loader_data, size, false, MetaspaceObj::ConstantPoolType, THREAD) ConstantPool(tags);
}

参数length就表示常量池项的数量,调用ConstantPool::size()计算所需要分配内存的大小,然后创建ConstantPool对象返回。size()方法的实现如下:

static int size(int length){
      int s = header_size();
      return align_object_size(s + length);
}
 
// Sizing (in words)
static int header_size() {
      int num = sizeof(ConstantPool);
      return num/HeapWordSize;
}

由方法实现可知,就是ConstantPool实例本身占用的内存大小加上length个指针长度。ConstantPool对象最终的内存布局如下图所示。

_valid是定义在Metadata中的int类型,只有debug版本才有,如果是product版本,则没有这个属性,那么Metadata就只占用8字节。关于对象的内存布局在之前已经介绍过,这里不再介绍。

调用header_size()在debug版本下得到的值为88(在不压缩指针的情况下,也就是使用命令XX禁止指针压缩),然后还需要加上length个指针宽度,这就是ConstantPool对象需要的内存空间大小。

通过重载new运算符进行堆内存分配,new运算符的重载定义在MetaspaceObj(ConstantPool间接继承此类)类中,如下:

void* MetaspaceObj::operator new(size_t size, ClassLoaderData* loader_data,
                                 size_t word_size, bool read_only,
                                 MetaspaceObj::Type type, TRAPS) throw() {
  // Klass has it's own operator new
  return Metaspace::allocate(loader_data, word_size, read_only,
                             type, CHECK_NULL);
}

调用的Metaspace::allocate()方法在堆中分配内存,这个方法在介绍垃圾收集时将详细介绍,这里只需要知道,这个方法会在堆中分配size大小的内存并且会将内存清零。

调用ConstantPool构造函数初始化一些属性,如下:

ConstantPool::ConstantPool(Array<u1>* tags) {
  set_length(tags->length());
  set_tags(NULL);
  set_cache(NULL);
  set_reference_map(NULL);
  set_resolved_references(NULL);
  set_operands(NULL);
  set_pool_holder(NULL);
  set_flags(0);
 
  // only set to non-zero if constant pool is merged by RedefineClasses
  set_version(0);
  set_lock(new Monitor(Monitor::nonleaf + 2, "A constant pool lock"));
 
  // initialize tag array
  int length = tags->length();
  for (int index = 0; index < length; index++) {
    tags->at_put(index, JVM_CONSTANT_Invalid);
  }
  set_tags(tags);
}

可以看到对tags、_length及_lock等属性的初始化。其中tags数组中存储了JVM_CONSTANT_Invalid值,在分析具体的常量池项时会更新为如下枚举类中定义的值:

源代码位置:hotspot/src/share/vm/prims/jvm.h

enum {
    JVM_CONSTANT_Utf8 = 1,      // 1
    JVM_CONSTANT_Unicode,       // 2      /* unused */
    JVM_CONSTANT_Integer,       // 3
    JVM_CONSTANT_Float,         // 4
    JVM_CONSTANT_Long,          // 5
    JVM_CONSTANT_Double,        // 6
    JVM_CONSTANT_Class,         // 7
    JVM_CONSTANT_String,        // 8
    JVM_CONSTANT_Fieldref,      // 9
    JVM_CONSTANT_Methodref,     // 10
    JVM_CONSTANT_InterfaceMethodref,   // 11
    JVM_CONSTANT_NameAndType,          // 12
    JVM_CONSTANT_MethodHandle           = 15,  // JSR 292
    JVM_CONSTANT_MethodType             = 16,  // JSR 292
    //JVM_CONSTANT_(unused)             = 17,  // JSR 292 early drafts only
    JVM_CONSTANT_InvokeDynamic          = 18,  // JSR 292
    JVM_CONSTANT_ExternalMax            = 18   // Last tag found in classfiles
};

这就是常量池项中的tag值,不过常量池第一项仍然为JVM_CONSTANT_Invalid。

下面介绍一下虚拟机规范规定的格式:

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}
 
CONSTANT_Integer_info {
    u1 tag;
    u4 bytes;
}
 
CONSTANT_Float_info {
    u1 tag;
    u4 bytes;
}
 
CONSTANT_Long_info {
    u1 tag;
    u4 high_bytes;
    u4 low_bytes;
}
 
CONSTANT_Double_info {
    u1 tag;
    u4 high_bytes;
    u4 low_bytes;
}
 
CONSTANT_Class_info {
    u1 tag;
    u2 name_index;
}
 
 
CONSTANT_String_info {
    u1 tag;
    u2 string_index;
}
 
CONSTANT_Fieldref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}
 
CONSTANT_Methodref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}
 
CONSTANT_InterfaceMethodref_info {
    u1 tag;
    u2 class_index;
    u2 name_and_type_index;
}
 
CONSTANT_NameAndType_info {
    u1 tag;
    u2 name_index;
    u2 descriptor_index;
}
 
 
CONSTANT_MethodHandle_info {
    u1 tag;
    u1 reference_kind;
    u2 reference_index;
}
 
CONSTANT_MethodType_info {
    u1 tag;
    u2 descriptor_index;
}
 
CONSTANT_InvokeDynamic_info {
    u1 tag;
    u2 bootstrap_method_attr_index;
    u2 name_and_type_index;
}

在常量池解析过程中,通过索引确定了常量池项后会将tag放到ConstantPool类中的_tags数组中,数组的下标与常量池索引相对应;剩下的信息只能存储到ConstantPool类后开辟的length个指针宽度的空间中,也可以成是length长度的指针数组,其中的下标也与常量池索引对应。指针在64位上的长度为8,所以能够存储除CONSTANT_Utf8_info外的所有常量池项信息(除tag外)。例如对于CONSTANT_Double_info来说,高4位存储high_bytes,低4位存储low_bytes。遇到CONSTANT_Utf8_info常量池项时,直接封装为Symbol对象,这样只要存储指向Symbol对象的指针即可。

在parse_constant_pool()方法中调用parse_constant_pool_entries()方法对常量池中的各个项进行解析,方法的实现如下:

void ClassFileParser::parse_constant_pool_entries(int length, TRAPS) {
  // Use a local copy of ClassFileStream. It helps the C++ compiler to optimize
  // this function (_current can be allocated in a register, with scalar
  // replacement of aggregates). The _current pointer is copied back to
  // stream() when this function returns. DON'T call another method within
  // this method that uses stream().
  ClassFileStream*  cfs0 = stream();
  ClassFileStream   cfs1 = *cfs0;
  ClassFileStream*  cfs  = &cfs1;
 
  Handle class_loader(THREAD, _loader_data->class_loader());
 
  // Used for batching symbol allocations.
  const char*   names[SymbolTable::symbol_alloc_batch_size];
  int           lengths[SymbolTable::symbol_alloc_batch_size];
  int           indices[SymbolTable::symbol_alloc_batch_size];
  unsigned int  hashValues[SymbolTable::symbol_alloc_batch_size];
  int           names_count = 0;
 
  // parsing  Index 0 is unused
  for (int index = 1; index < length; index++) {
    // Each of the following case guarantees one more byte in the stream
    // for the following tag or the access_flags following constant pool,
    // so we don't need bounds-check for reading tag.
    u1 tag = cfs->get_u1_fast();
    switch (tag) {
      case JVM_CONSTANT_Class :
        {
          cfs->guarantee_more(3, CHECK);  // name_index, tag/access_flags
          u2 name_index = cfs->get_u2_fast();
          _cp->klass_index_at_put(index, name_index);
        }
        break;
      case JVM_CONSTANT_Fieldref :
        {
          cfs->guarantee_more(5, CHECK);  // class_index, name_and_type_index, tag/access_flags
          u2 class_index = cfs->get_u2_fast();
          u2 name_and_type_index = cfs->get_u2_fast();
          _cp->field_at_put(index, class_index, name_and_type_index);
        }
        break;
      case JVM_CONSTANT_Methodref :
        {
          cfs->guarantee_more(5, CHECK);  // class_index, name_and_type_index, tag/access_flags
          u2 class_index = cfs->get_u2_fast();
          u2 name_and_type_index = cfs->get_u2_fast();
          _cp->method_at_put(index, class_index, name_and_type_index);
        }
        break;
      case JVM_CONSTANT_InterfaceMethodref :
        {
          cfs->guarantee_more(5, CHECK);  // class_index, name_and_type_index, tag/access_flags
          u2 class_index = cfs->get_u2_fast();
          u2 name_and_type_index = cfs->get_u2_fast();
          _cp->interface_method_at_put(index, class_index, name_and_type_index);
        }
        break;
      case JVM_CONSTANT_String :
        {
          cfs->guarantee_more(3, CHECK);  // string_index, tag/access_flags
          u2 string_index = cfs->get_u2_fast();
          _cp->string_index_at_put(index, string_index);
        }
        break;
      case JVM_CONSTANT_MethodHandle :
      case JVM_CONSTANT_MethodType :
        if (tag == JVM_CONSTANT_MethodHandle) {
          cfs->guarantee_more(4, CHECK);  // ref_kind, method_index, tag/access_flags
          u1 ref_kind = cfs->get_u1_fast();
          u2 method_index = cfs->get_u2_fast();
          _cp->method_handle_index_at_put(index, ref_kind, method_index);
        } else if (tag == JVM_CONSTANT_MethodType) {
          cfs->guarantee_more(3, CHECK);  // signature_index, tag/access_flags
          u2 signature_index = cfs->get_u2_fast();
          _cp->method_type_index_at_put(index, signature_index);
        } else {
          ShouldNotReachHere();
        }
        break;
      case JVM_CONSTANT_InvokeDynamic :
        {
          cfs->guarantee_more(5, CHECK);  // bsm_index, nt, tag/access_flags
          u2 bootstrap_specifier_index = cfs->get_u2_fast();
          u2 name_and_type_index = cfs->get_u2_fast();
          if (_max_bootstrap_specifier_index < (int) bootstrap_specifier_index)
            _max_bootstrap_specifier_index = (int) bootstrap_specifier_index;  // collect for later
          _cp->invoke_dynamic_at_put(index, bootstrap_specifier_index, name_and_type_index);
        }
        break;
      case JVM_CONSTANT_Integer :
        {
          cfs->guarantee_more(5, CHECK);  // bytes, tag/access_flags
          u4 bytes = cfs->get_u4_fast();
          _cp->int_at_put(index, (jint) bytes);
        }
        break;
      case JVM_CONSTANT_Float :
        {
          cfs->guarantee_more(5, CHECK);  // bytes, tag/access_flags
          u4 bytes = cfs->get_u4_fast();
          _cp->float_at_put(index, *(jfloat*)&bytes);
        }
        break;
      case JVM_CONSTANT_Long :
        {
          cfs->guarantee_more(9, CHECK);  // bytes, tag/access_flags
          u8 bytes = cfs->get_u8_fast();
          _cp->long_at_put(index, bytes);
        }
        index++;   // Skip entry following eigth-byte constant, see JVM book p. 98
        break;
      case JVM_CONSTANT_Double :
        {
          cfs->guarantee_more(9, CHECK);  // bytes, tag/access_flags
          u8 bytes = cfs->get_u8_fast();
          _cp->double_at_put(index, *(jdouble*)&bytes);
        }
        index++;   // Skip entry following eigth-byte constant, see JVM book p. 98
        break;
      case JVM_CONSTANT_NameAndType :
        {
          cfs->guarantee_more(5, CHECK);  // name_index, signature_index, tag/access_flags
          u2 name_index = cfs->get_u2_fast();
          u2 signature_index = cfs->get_u2_fast();
          _cp->name_and_type_at_put(index, name_index, signature_index);
        }
        break;
      case JVM_CONSTANT_Utf8 :
        {
          cfs->guarantee_more(2, CHECK);  // utf8_length
          u2  utf8_length = cfs->get_u2_fast();
          u1* utf8_buffer = cfs->get_u1_buffer();
          assert(utf8_buffer != NULL, "null utf8 buffer");
          // Got utf8 string, guarantee utf8_length+1 bytes, set stream position forward.
          cfs->guarantee_more(utf8_length+1, CHECK);  // utf8 string, tag/access_flags
          cfs->skip_u1_fast(utf8_length);
 
          if (EnableInvokeDynamic && has_cp_patch_at(index)) {
            Handle patch = clear_cp_patch_at(index);
 
            char* str = java_lang_String::as_utf8_string(patch());
            // (could use java_lang_String::as_symbol instead, but might as well batch them)
            utf8_buffer = (u1*) str;
            utf8_length = (int) strlen(str);
          }
 
          unsigned int hash;
          Symbol* result = SymbolTable::lookup_only((char*)utf8_buffer, utf8_length, hash);
          if (result == NULL) {
            names[names_count] = (char*)utf8_buffer;
            lengths[names_count] = utf8_length;
            indices[names_count] = index;
            hashValues[names_count++] = hash;
            if (names_count == SymbolTable::symbol_alloc_batch_size) {
              SymbolTable::new_symbols(_loader_data, _cp, names_count, names, lengths, indices, hashValues, CHECK);
              names_count = 0;
            }
          } else {
            _cp->symbol_at_put(index, result);
          }
        }
        break;
      default:
        classfile_parse_error("Unknown constant tag %u in class file %s", tag, CHECK);
        break;
    }
  }
 
  // Allocate the remaining symbols
  if (names_count > 0) {
    SymbolTable::new_symbols(_loader_data, _cp, names_count, names, lengths, indices, hashValues, CHECK);
  }
 
  cfs0->set_current(cfs1.current());
}

循环处理length个常量池项,不过第一个常量池项不需要处理,所以循环下标index的值初始化为1。

如果要了解各个常量池项的具体结构,代码的逻辑理解起来其实并不难。所有项的第一个字节都是用来描述常量池元素类型,调用cfs->get_u1_fast()获取元素类型后,就可以通过switch语句分情况进行处理。

Search

    微信好友

    博士的沙漏

    Table of Contents