java.util.HashMap 源码解读及其进化

概述

java.util.HashMap 是JDK里散列的一个实现，JDK6里采用位桶+链表的形式实现，Java8里采用的是位桶+链表/红黑树的方式，非线程安全。关于散列可以看这篇文章，

这篇文章主要是对JDK6和Java8里java.util.HashMap的一些源码的解读。Java8里的改进主要是为了解决哈希碰撞攻击。

这个源码解读主要关注基础数据结构、put(key,value)逻辑和遍历所有键值对的逻辑。

JDK1.6

基础数据结构

先看下JDK1.6里面HashMap的数据结构，先是链表的结构：

static class Entry<K,V> implements Map.Entry<K,V> {
    final K key ;
    V value;
    Entry<K,V> next;  // 指向链表的下一个节点。
    int hash ;    // 根据key的hashCode计算得到，即使进行resize也不会改变。
        //  其他方法省略
}

单向链表。

位桶的结构：

public class HashMap<K, V> extends AbstractMap<K, V> implements Map<K, V>,
   Cloneable, Serializable {

     // 位桶，每个键值对都封装为Entry结构。
     // 位桶会初始化为一个大于用户指定的初始容量的initialCapacity的
   transient Entry<K,V>[] table;

     // HashMap发生结构性修改的次数，也用于判断是否存在并发修改或迭代视图的快速失败，
     // 如果存在这两种情况，抛出ConcurrentModificationException异常。
     // 用volatile保证了内存可见性
   transient volatile int modCount;

     // 实际存储的对键值对的数量
   transient int size;

     // 负载因子
     final float loadFactor;

     // 当键值对的数量达到或超过这个阀值时，将对位桶进行扩展，再把所有元素重新散列到新位桶。
     int threshold;

      // 下面从构造函数看看上面几个属性的关系。
     public HashMap(int initialCapacity, float loadFactor) {
          if (initialCapacity < 0)
               throw new IllegalArgumentException("Illegal initial capacity: "
                         + initialCapacity);

          if (initialCapacity > MAXIMUM_CAPACITY)
               initialCapacity = MAXIMUM_CAPACITY;

          if (loadFactor <= 0 || Float.isNaN(loadFactor))
               throw new IllegalArgumentException("Illegal load factor: "
                    + loadFactor);

          // 位桶的长度必须是2的幂，大于initialCapacity的最小的且是2的幂的数。
          // Find a power of 2 >= initialCapacity
          int capacity = 1;
          while (capacity < initialCapacity)
               capacity <<= 1;

          this.loadFactor = loadFactor;                                  
          threshold = ( int) (capacity * loadFactor); // 每次resize都会改变
          table = new Entry[capacity];
          init();          // 空方法，留给子类实现
     }

     // 其他方法和常量省略
}

基础方法

// 计算hash值h在长度为length的位桶的位置。
static int indexFor(int h, int length) {
     // 用位运算替代求模，这就是为什么位桶的长度必须是2的幂。
     return h & (length - 1);
}

// 根据key的hashCode h来计算一个新的hash值。
// 按源码里的说明，只用key的hashCode的低位字节容易碰撞冲突。
static int hash(int h) {
     // This function ensures that hashCodes that differ only by
     // constant multiples at each bit position have a bounded
     // number of collisions (approximately 8 at default load factor).
     h ^= (h >>> 20) ^ (h >>> 12);
     return h ^ (h >>> 7) ^ (h >>> 4);
}

put方法

public V put(K key, V value) {
     // 由于HashMap运行key为空，所以需要对key为null的进行特殊处理。
     if (key == null)
          return putForNullKey(value);

     // 根据key的hashCode h来计算一个新的hash值。
     int hash = hash(key.hashCode());

     // 确定key所在位桶的位置。
     int i = indexFor(hash, table.length);

     // 遍历链表，hashCode相等的不一定就是同一个对象。
     for (Entry<K, V> e = table[i]; e != null; e = e.next) {
          Object k;
          // hashCode相等的不一定就是同一个对象。
          // 先判断对象引用相等，一般情况下更新一个key-value的key
         //    都是同一个对象的，所以可以减少一次方法调用。
          if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
               // 如果key已存在，则更新对应的值
               V oldValue = e.value;
               e.value = value;
               e.recordAccess( this);     // 空方法
               return oldValue;
          }
     }

     // 更新计数器，如果正在迭代，则可以快速失败，
     // 抛出ConcurrentModificationException异常。
     modCount++;     
     addEntry(hash, key, value, i);     // 添加新节点
     return null ;
}

private V putForNullKey(V value) {
     // key为null的Entry总是放在位桶0。
     for (Entry<K, V> e = table[0]; e != null; e = e.next) {
           if (e.key == null) {
               V oldValue = e.value;
               e.value = value;
               e.recordAccess( this);
               return oldValue;
          }
     }
     modCount++;
     addEntry(0, null, value, 0);
     return null ;
}
// 感觉对key为null的Entry比较蛋疼，如果这个Entry作为一个独立的属性，
// 则在遍历的时候需要特殊处理下。


void addEntry(int hash, K key, V value, int bucketIndex) {
     // 取出所在位桶的链表的首节点。
     Entry<K, V> e = table[bucketIndex];

     // 把原头节点作为新节点的下一个节点，把新节点查到链表的首部。
     table[bucketIndex] = new Entry<K, V>(hash, key, value, e);

     // 如果达到或超过阀值，则进行扩容和重新散列。
     if (size++ >= threshold)
          resize(2 * table.length);
}

get方法

与put方法的逻辑很相似。

resize方法

resize时原有的Entry不会变，直接散列到新的位桶。

void resize(int newCapacity) {
     Entry[] oldTable = table;
     int oldCapacity = oldTable.length;
     if (oldCapacity == MAXIMUM_CAPACITY) {
          threshold = Integer.MAX_VALUE;
          return;
     }

     Entry[] newTable = new Entry[newCapacity]; // 创建指定容量的新的位桶
     transfer(newTable);                  // 把Entry散列到新的位桶
     table = newTable;
     threshold = ( int) (newCapacity * loadFactor);// 计算新的阀值
}

void transfer(Entry[] newTable) {
     // 这里把对象的属性赋值给本地变量是为了提升性能，方法的本地变量是存放在方法栈上的，
     // 对象的属性是存放在堆上的，访问栈比访问堆要高效。这种做法在很多地方都可以看到。
     Entry[] src = table;
     int newCapacity = newTable.length;
     for (int j = 0; j < src.length; j++) {     // 遍历位桶
          Entry<K, V> e = src[j];
          if (e != null) {
               src[j] = null;
               // 遍历链表
               do {
                    Entry<K, V> next = e.next;
                    // 计算Entry在新位桶的位置
                    int i = indexFor(e.hash, newCapacity);
                    // 下面两行把Entry插到链表的首部
                    e.next = newTable[i];
                    newTable[i] = e;
                    e = next;
               } while (e != null);
          }
     }
}

逻辑不算复杂。

遍历

这里只说遍历entrySet，它是通过一个HashIterator的内部类来实现的。

private abstract class HashIterator<E> implements Iterator<E> {
     Entry<K, V> next; // 待返回的下一个entry
     int expectedModCount; // 用于快速失败
     int index; // 遍历的当前槽位，默认从0号槽位开始
     Entry<K, V> current; // 当前已返回的最后的entry

     HashIterator() {
          expectedModCount = modCount;
          if (size > 0) { // advance to first entry
               Entry[] t = table;
              //  定位到第一个不为null的Entry
               while (index < t.length && (next = t[index++]) == null)
                    ;
          }
     }

     public final boolean hasNext() {
          return next != null;
     }

     final Entry<K, V> nextEntry() {
          // 检测是否存在并发修改。
          if (modCount != expectedModCount)
               throw new ConcurrentModificationException();

          Entry<K, V> e = next;

          // 防止不调用hasNext()方法判断是否还有更多元素而直接调用next()方法的情况。
          if (e == null)  
               throw new NoSuchElementException();

          if ((next = e.next) == null) {
               // 当前所在链表遍历完成，定位到下一个不为null的entry
               Entry[] t = table;
               while (index < t.length && (next = t[index++]) == null)
                    ;
           }
          current = e;
          return e;
     }

     // 其他方法省略
}

在Java8的进化

说明下：这个是基于Java 8 EA 版里的源码。

Java 8 里HashMap的一个重大改进是为了解决哈希碰撞攻击，这个哈希碰撞攻击简单说就是通过精心构建一组key，使这些key都映射到同一个槽位，使整个HashMap退化为一个单链表，这会大大劣化HashMap的操作性能，更多信息可以google之。

改进主要有下面这些：

对于key为null的entry，放在独立的属性里。
当某个槽位的链表的长度达到某个阀值时，这个链表将被转换为红黑树。
modCount去除了volatile修饰符，毕竟这个类是声明为非线程安全的。volatile变量的读写需要一些内存关卡，开销大点。
增加了一个hashSeed属性，用于提升key散列到槽的随机性。

在下面的代码逻辑里，红黑树内部如何处理，这里不涉及。

基础数据结构

// 链表节点
static class Entry<K, V> implements Map.Entry<K, V> {
     final K key;
     V value;
     Object next; // 可以是 Entry 或 TreeNode，类型不再是Entry
     final int hash;

     // 其他方法省略
}

// 红黑树的节点
final static class TreeNode<K, V> {
     TreeNode parent; // red-black tree links
     TreeNode left;
     TreeNode right;
     TreeNode prev; // needed to unlink next upon deletion
     boolean red;
     final HashMap.Entry<K, V> entry;     // 

     // 其他方法省略
}


// 红黑树
final class TreeBin {
     static final int TREE_THRESHOLD = 16;  // 链表长度达到这个阀值时将转换为红黑树
     TreeNode<K, V> root; // 红黑树的根节点
     TreeNode<K, V> first; // 有序遍历红黑树时的第一个节点
     // 其他方法省略
}

// HashMap的主要属性
public class HashMap<K, V> extends AbstractMap<K, V> implements Map<K, V>,
               Cloneable, Serializable {
     // 省略一些常量定义
     // 槽数组的类型变为Object，不再是Entry，因为槽还可以放TreeBin了。
     transient Object[] table = EMPTY_TABLE;
     transient int size;
     int threshold;
     final float loadFactor;

     // 去除了volatile修饰符，毕竟这个类是声明为非线程安全的。
     // volatile变量的读写需要一些内存关卡，开销大点
     transient int modCount;

     // 随机种子，增加key散列到槽的随机性。
     transient final int hashSeed;

     // 存储key为null的entry。
     transient Entry<K, V> nullKeyEntry = null;

     // 其他方法省略
}

基础方法

final int hash(Object k) {
     // hashSeed 是每个实例一个的，用于增加随机性。
     int h = hashSeed ^ k.hashCode();     

     // This function ensures that hashCodes that differ only by
     // constant multiples at each bit position have a bounded
     // number of collisions (approximately 8 at default load factor).
     h ^= (h >>> 20) ^ (h >>> 12);
     return h ^ (h >>> 7) ^ (h >>> 4);
}

// indexFor 方法没有改变
static int indexFor(int h, int length) {
     // assert Integer.bitCount(length) == 1 :
     // "length must be a non-zero power of 2";
     return h & (length - 1);
}

put方法

public V put(K key, V value) {
     // 如果草数组为空则初始化
     if (table == EMPTY_TABLE) {
          inflateTable(threshold);
     }
     // 特殊处理key为null的键值对
     if (key == null)
          return putForNullKey(value);

     // 根据key计算哈希值
     int hash = hash(key);

     // 定位槽位
     int i = indexFor(hash, table.length);

     boolean checkIfNeedTree = false; // 是否需要把链表转换为 TreeBin?

     if (table[i] instanceof Entry) {// 是个链表
          // Bin contains ordinary Entries. Search for key in the linked list
          // of entries, counting the number of entries. Only check for
          // TreeBin conversion if the list size is >= TREE_THRESHOLD.
          // (The conversion still may not happen if the table gets resized.)
          int listSize = 0;     // 计算链表长度
          Entry<K, V> e = (Entry<K, V>) table[i];
          for (; e != null; e = (Entry<K, V>) e.next) {
               Object k;
               // 查找key，如果找到则更新对应的值
               if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                    V oldValue = e.value;
                    e.value = value;
                    e.recordAccess( this);
                    return oldValue;
               }
               listSize++;
          }
          // 没有找到，需要添加Entry
          // 判断是否需要把链表转换为红黑树
          checkIfNeedTree = listSize >= TreeBin.TREE_THRESHOLD;

     } else if (table[i] != null) {
          // 这个槽为存放的不是链表，且不为空，则必然是红黑树
          TreeBin e = (TreeBin) table[i];
          TreeNode p = e.putTreeNode(hash, key, value, null);
          if (p == null) { // putTreeNode() 添加了一个新节点
               modCount++;
               size++;
               // 如果键值对的数量已达到阀值，重散列
               if (size >= threshold) {
                    resize(2 * table.length);
               }
               return null ;
          } else { // putTreeNode() 发现这个key已存在
               Entry<K, V> pEntry = (Entry<K, V>) p.entry;
               V oldVal = pEntry.value;
               pEntry.value = value;
               pEntry.recordAccess( this);
               return oldVal;
          }
     }
     modCount++;
     // 槽位为空或在链表里没有找到对应的key，新增节点。
     addEntry(hash, key, value, i, checkIfNeedTree);
     return null ;
}

void addEntry(int hash, K key, V value, int bucketIndex,
                boolean checkIfNeedTree) {
     // assert key != null;
     if ((size >= threshold) && (null != table[bucketIndex])) {
          resize(2 * table.length);
          hash = hash(key);  // 应该不需要重新计算哈希值啊？？
          bucketIndex = indexFor(hash, table.length);
     }
     createEntry(hash, key, value, bucketIndex, checkIfNeedTree);
}

void createEntry(int hash, K key, V value, int bucketIndex,
               boolean checkIfNeedTree) {
      // assert key != null;
     @SuppressWarnings( "unchecked")
     Entry<K, V> e = (Entry<K, V>) table[bucketIndex];
     // 把新的Entry查到链表首部先
     table[bucketIndex] = newEntry(hash, key, value, e);
     size++;

     if (checkIfNeedTree) {// 需要把链表转换为红黑树
          int listSize = 0;
          for (e = (Entry<K, V>) table[bucketIndex]; e != null; e = (Entry<K, V>) e.next) {
               listSize++;
               if (listSize >= TreeBin.TREE_THRESHOLD) { // Convert to TreeBin
                    // 如果key是String类型或实现了java.lang.Comparable接口才转换。
                    if (comparableClassFor(key) != null) {
                         TreeBin t = new TreeBin();// 创建一个红黑树
                         t.populate((Entry) table[bucketIndex]);// 把链表里的Entry填充到红黑树
                         table[bucketIndex] = t;// 把红黑树放到槽位上。
                    }
                    break;
               }
          }
     }
}

// 用独立的Entry来存储key为null的情况，简单多了。
private V putForNullKey(V value) {
     if (nullKeyEntry != null) {
          V oldValue = nullKeyEntry.value;
          nullKeyEntry.value = value;
          nullKeyEntry.recordAccess( this);
          return oldValue;
     }
     modCount++;
     size++; // newEntry() skips size++
     nullKeyEntry = newEntry(0, null, value, null);
     return null ;
}

get逻辑

final Entry<K, V> getEntry(Object key) {
     // 没有键值对
     if (size == 0) {
          return null ;
     }
     // key为null，直接返回对应的Entry
     if (key == null) {
          return nullKeyEntry;
     }

     // 计算哈希值，定位槽位
     int hash = hash(key);
     int bin = indexFor(hash, table.length);

     if (table[bin] instanceof Entry) {
          // 槽位存放的链表，逻辑与旧的相同
          Entry<K, V> e = (Entry<K, V>) table[bin];
          for (; e != null; e = (Entry<K, V>) e.next) {
               Object k;
               if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                    return e;
               }
          }
     } else if (table[bin] != null) {
          // 存放的是红黑树，委托给红黑树去查找
          TreeBin e = (TreeBin) table[bin];
          TreeNode p = e.getTreeNode(hash, (K) key);
          if (p != null) {
               // assert p.entry.hash == hash && p.entry.key.equals(key);
               return (Entry<K, V>) p.entry;
          } else {
               return null ;
          }
     }
     return null ;
}

遍历所有Entry的逻辑

遍历所有Entry仍然借助类HashIterator。

private abstract class HashIterator<E> implements Iterator<E> {
     Object next; // 下一个待返回的键值对，可以是Entry 或 TreeNode
     int expectedModCount; // 用于快速失败
     int index; // 当前遍历的槽位，默认从槽位0开始
     Object current; // 当前返回的最后一个键值对，可以是 Entry 或 TreeNode

     HashIterator() {
          expectedModCount = modCount;
          if (size > 0) { // advance to first entry
               if (nullKeyEntry != null) {
                    // nullKeyEntry作为一个独立属性，不会下一个节点，需要特殊处理，优先返回。
                    next = nullKeyEntry;
               } else {
                    findNextBin();
               }
          }
     }

     public final boolean hasNext() {
          return next != null;
     }

     @SuppressWarnings( "unchecked")
     final Entry<K, V> nextEntry() {
          if (modCount != expectedModCount) {
               throw new ConcurrentModificationException();
          }
          Object e = next;
          Entry<K, V> retVal;

          if (e == null)
               throw new NoSuchElementException();

          if (e instanceof TreeNode) { // 正在遍历红黑树
               retVal = (Entry<K, V>) ((TreeNode) e).entry;
               next = retVal.next;
          } else { // 正在遍历链表
               retVal = (Entry<K, V>) e;
               next = ((Entry<K, V>) e).next;
          }

          // 如果当前链表或红黑树已遍历完，定位到下一个槽。
          if (next == null) {
               findNextBin();
          }
          current = e;
          return retVal;
     }

     private void findNextBin() {
          // assert next == null;
          Object[] t = table;

          // 定位到下一个不为空的槽位
          while (index < t.length && (next = t[index++]) == null)
               ;
          if (next instanceof HashMap.TreeBin) {
               // 如果槽位上存放的是红黑树，则用next指向红黑树的第一个节点。
               next = ((TreeBin) next).first;
               // assert next != null; // There should be no empty TreeBins
          }
     }

     // 其他方法省略
}

欢迎关注我的微信公众号: coderbee笔记，可以更及时回复你的讨论。

coderbee笔记

日拱一卒，不期速成，厚积薄发。 wen866595@gmail.com

java.util.HashMap 源码解读及其进化

概述

JDK1.6

基础数据结构

基础方法

put方法

get方法

resize方法

遍历

在Java8的进化

基础数据结构

基础方法

put方法

get逻辑

遍历所有Entry的逻辑

发表回复取消回复

概述

JDK1.6

基础数据结构

基础方法

put方法

get方法

resize方法

遍历

在Java8的进化

基础数据结构

基础方法

put方法

get逻辑

遍历所有Entry的逻辑

发表回复 取消回复

发表回复取消回复