什么是 BitSet？

BitSet 类实现了一个按需增长的位向量。位 Set 的每一个组件都有一个 boolean 值。用非负的整数将 BitSet 的位编入索引。可以对每个编入索引的位进行测试、设置或者清除。通过逻辑与、逻辑或和逻辑异或操作，可以使用一个 BitSet 修改另一个 BitSet 的内容。

默认情况下，set 中所有位的初始值都是 false。

每个位 set 都有一个当前大小，也就是该位 set 当前所用空间的位数。注意，这个大小与位 set 的实现有关，所以它可能随实现的不同而更改。位 set 的长度与位 set 的逻辑长度有关，并且是与实现无关而定义的。

BitSet 的索引范围从 0 到 nbits-1 。

实现

public class BitSet implements Cloneable, java.io.Serializable {
    /*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */
    private final static int ADDRESS_BITS_PER_WORD = 6;
    private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
    private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;

    /* Used to shift left or right for a partial word mask */
    private static final long WORD_MASK = 0xffffffffffffffffL;

    /**
     * @serialField bits long[]
     *
     * The bits in this BitSet.  The ith bit is stored in bits[i/64] at
     * bit position i % 64 (where bit position 0 refers to the least
     * significant bit and 63 refers to the most significant bit).
     */
    private static final ObjectStreamField[] serialPersistentFields = {
        new ObjectStreamField("bits", long[].class),
    };

    /**
     * The internal field corresponding to the serialField "bits".
     */
    private long[] words;
  
  	...

可以看到，BitSet 的底层实现是使用 long 数组作为内部存储结构的，所以 BitSet 的大小为 long 类型大小 (64 位) 的整数倍。

构造函数

/**
 * Creates a new bit set. All bits are initially {@code false}.
 */
public BitSet() {
    initWords(BITS_PER_WORD);
    sizeIsSticky = false;
}

/**
 * Creates a bit set whose initial size is large enough to explicitly
 * represent bits with indices in the range {@code 0} through
 * {@code nbits-1}. All bits are initially {@code false}.
 *
 * @param  nbits the initial size of the bit set
 * @throws NegativeArraySizeException if the specified initial size
 *         is negative
 */
public BitSet(int nbits) {
    // nbits can't be negative; size 0 is OK
    if (nbits < 0)
        throw new NegativeArraySizeException("nbits < 0: " + nbits);

    initWords(nbits);
    sizeIsSticky = true;
}

BitSet ()：创建一个新的位 set，默认大小是 64 位。
BitSet (int nbits)：创建一个位 set，它的初始大小足以显式表示索引范围在 0 到 nbits-1 的位。

计算 words 的长度

在构造函数中，需要为字段 words （它本质是一个 long []）分配内存空间，那 BitSet 会给它分配多大呢？

    /*
     * BitSets are packed into arrays of "words."  Currently a word is
     * a long, which consists of 64 bits, requiring 6 address bits.
     * The choice of word size is determined purely by performance concerns.
     */
    private final static int ADDRESS_BITS_PER_WORD = 6;
    
		private void initWords(int nbits) {
        words = new long[wordIndex(nbits-1) + 1];
    }
    /**
     * Given a bit index, return word index containing it.
     */
    private static int wordIndex(int bitIndex) {
        return bitIndex >> ADDRESS_BITS_PER_WORD;
    }

words 数组的最大长度计算：(maxValue - 1) » 6 + 1

如果指定了 bitset 的初始化大小，那么会把他规整到一个大于或者等于这个数字的 64 的整倍数。比如 64 位，bitset 的大小是 1 个 long，而 65 位时，bitset 大小是 2 个 long，即 128 位。

读取方法

我们再来看看如何在给定一个数值时，定位到这个数值对应在 long [] 中的 bit 位：

/**
 * Returns the value of the bit with the specified index. The value
 * is {@code true} if the bit with the index {@code bitIndex}
 * is currently set in this {@code BitSet}; otherwise, the result
 * is {@code false}.
 *
 * @param  bitIndex   the bit index
 * @return the value of the bit with the specified index
 * @throws IndexOutOfBoundsException if the specified index is negative
 */
public boolean get(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    checkInvariants();

    int wordIndex = wordIndex(bitIndex);
    return (wordIndex < wordsInUse)
        && ((words[wordIndex] & (1L << bitIndex)) != 0);
}
private static int wordIndex(int bitIndex) {
    return bitIndex >> ADDRESS_BITS_PER_WORD;
}

首先，通过 currentNumber » 6 计算出位于哪一个 long（或者说 long [] 中的索引位置），(words[wordIndex] & (1L << bitIndex)) != 0 等价于判断 (words[wordIndex] & (1L << bitIndex)) 是否为 0。这里的含义是将 1 左移 bitIndex 个位，将右移后的结果值与 words [wordIndex] 进行与运算，若为 1，则表明 wordIndex 的对应 flag 为 true。

当 words 为 {0} 时，表示所有 flag 均为 false。
当 words 为 {1} 时，表示 0 的 flag 为 true，其余均为 false。
当 words 为 {2} 时，表示 0 的 flag 为 true， 1 的 flag 为 true，其余均为 false。

使用场景

常见的应用场景是对海量数据进行一些统计工作，比如日志分析、用户数统计等。

比如一道面试题：有 1 千万个随机数，随机数的范围在 1 到 1 亿之间。现在要求写出一种算法，将 1 到 1 亿之间没有在随机数中的数求出来？

代码示例如下：　

public static void main(String[] args)
{
    Random random=new Random();
    
    List<Integer> list=new ArrayList<>();
    for(int i=0;i<10000000;i++)
    {
        int randomResult=random.nextInt(100000000);
        list.add(randomResult);
    }
    System.out.println("产生的随机数有");
    for(int i=0;i<list.size();i++)
    {
        System.out.println(list.get(i));
    }
    BitSet bitSet=new BitSet(100000000);
    for(int i=0;i<10000000;i++)
    {
        bitSet.set(list.get(i));
    }
    
    System.out.println("0~1亿不在上述随机数中有"+bitSet.size());
    for (int i = 0; i < 100000000; i++)
    {
        if(!bitSet.get(i))
        {
            System.out.println(i);
        }
    }     
}

简化版实现

class BitMap {
    private long maxValue;
    private static int[] words;

    //构造函数中传入数据中的最大值（范围为 1-最大值）
    public BitMap(long maxValue) {
        this.maxValue = maxValue;
        // 根据长度算出，所需数组大小
        words = new int[(int) ((maxValue - 1) >> 5)];   // 用移位运算代替除法，效率更高
    }

    public int getBit(long index) {
        int word = words[(int) ((index - 1) >> 5)];
        //  (index - 1) & 31 相当于对 index-1 按 32 求余数
        int offset = (int) ((index - 1) & 31);
        return (word >> offset) & 0x01;
    }


    public void setBit(long index) {
        // 求出该index - 1所在bitMap的下标
        int wordIndex = (int) ((index - 1) >> 5);
        // 求出该值的偏移量(求余)
        int offset = (int) ((index - 1) & 31);
        int word = words[wordIndex];
        words[wordIndex] = word | (0x01 << offset);


        (words[wordIndex] & (bitIndex >> 1L))
    }

    public static void main(String[] args) {
        BitMap bitMap = new BitMap(32);
        bitMap.setBit(32);
        System.out.println(bitMap.getBit(1));
        System.out.println(bitMap.getBit(32));
    }
}

Reference

Java BitSet 使用场景和示例 - https://www.cnblogs.com/xujian2014/p/5491286.html
Java 面试中常用的 BitMap 代码 - https://www.jianshu.com/p/9e7f8f33a61a

【Java】源码 - BitSet