Java文件的简单读写、随机读写、NIO读写与使用MappedByteBuffer读写-java nio读写文件

[[383912]]

文件与目录的创建和删除较为简单，因此忽略这部分内容的介绍，我们重点学习文件的读写。本篇内容包括：

简单文件读写
随机访问文件读写
NIO文件读写-FileChannel
使用MappedByteBuffer读写文件

简单文件读写

FileOutputStream

由于流是单向的，简单文件写可使用FileOutputStream，而读文件则使用FileInputStream。

任何数据输出到文件都是以字节为单位输出，包括图片、音频、视频。以图片为例，如果没有图片格式解析器，那么图片文件其实存储的就只是按某种格式存储的字节数据罢了。

FileOutputStream指文件字节输出流，用于将字节数据输出到文件，仅支持顺序写入、支持以追加方式写入，但不支持在指定位置写入。

打开一个文件输出流并写入数据的示例代码如下。

public class FileOutputStreamStu{ 
    public void testWrite(byte[] data) throws IOException {                     
        try(FileOutputStream fos = new FileOutputStream("/tmp/test.file",true)) { 
            fos.write(data); 
            fos.flush(); 
        } 
    } 
}

注意，如果不指定追加方式打开流，new FileOutputStream时会导致文件内容被清空，而FileOutputStream的默认构建函数是以非追加模式打开流的。

FileOutputStream的参数1为文件名，参数2为是否以追加模式打开流，如果为true，则字节将写入文件的尾部而不是开头。

调用flush方法目的是在流关闭之前清空缓冲区数据，实际上使用FileOutputStream并不需要调用flush方法，此处的刷盘指的是将缓存在JVM内存中的数据调用系统函数write写入。如BufferedOutputStream，在调用BufferedOutputStream方法时，如果缓存未满，实际上是不会调用系统函数write的，如下代码所示。

public class BufferedOutputStream extends FilterOutputStream { 
    public synchronized void write(byte b[], int off, int len) throws IOException { 
        if (len >= buf.length) { 
            flushBuffer(); 
            out.write(b, off, len); 
            return; 
        } 
        if (len > buf.length - count) { 
            flushBuffer(); 
        } 
        System.arraycopy(b, off, buf, count, len); // 只写入缓存 
        count += len; 
    } 
}

FileInputStream

FileInputStream指文件字节输入流，用于将文件中的字节数据读取到内存中，仅支持顺序读取，不可跳跃读取。

打开一个文件输入流读取数据的案例代码如下。

public class FileInputStreamStu{ 
    public void testRead() throws IOException {     
        try (FileInputStream fis = new FileInputStream("/tmp/test/test.log")) { 
            byte[] buf = new byte[1024]; 
            int realReadLength = fis.read(buf); 
        } 
    } 
}

其中buf数组中下标从0到realReadLength的字节数据就是实际读取的数据，如果realReadLength返回-1，则说明已经读取到文件尾并且未读取到任何数据。

当然，我们还可以一个字节一个字节的读取，如下代码所示。

public class FileInputStreamStu{ 
    public void testRead() throws IOException {      
        try (FileInputStream fis = new FileInputStream("/tmp/test/test.log")) { 
            int byteData = fis.read(); // 返回值取值范围：[-1,255] 
            if (byteData == -1) { 
                return; // 读取到文件尾了 
            } 
            byte data = (byte) byteData; 
            // data为读取到的字节数据 
        } 
    } 
}

至于读取到的字节数据如何使用就需要看你文件中存储的是什么数据了。

如果整个文件存储的是一张图片，那么需要将整个文件读取完，再按格式解析成图片，而如果整个文件是配置文件，则可以一行一行读取，遇到\n换行符则为一行，代码如下。

public class FileInputStreamStu{ 
    @Test 
    public void testRead() throws IOException { 
        try (FileInputStream fis = new FileInputStream("/tmp/test/test.log")) { 
            ByteBuffer buffer = ByteBuffer.allocate(1024); 
            int byteData; 
            while ((byteData = fis.read()) != -1) { 
                if (byteData == '\n') { 
                    buffer.flip(); 
                    String line = new String(buffer.array(), buffer.position(), buffer.limit()); 
                    System.out.println(line); 
                    buffer.clear(); 
                    continue; 
                } 
                buffer.put((byte) byteData); 
            } 
        } 
    } 
}

Java基于InputStream、OutputStream还提供了很多的API方便读写文件，如BufferedReader，但如果懒得去记这些API的话，只需要记住FileInputStream与FileOutputStream就够了。

随机访问文件读写

RandomAccessFile相当于是FileInputStream与FileOutputStream的封装结合，即可以读也可以写，并且RandomAccessFile支持移动到文件指定位置处开始读或写。

RandomAccessFile的使用如下。

public class RandomAccessFileStu{ 
    public void testRandomWrite(long index,long offset){ 
        try (RandomAccessFile randomAccessFile = new RandomAccessFile("/tmp/test.idx", "rw")) { 
            randomAccessFile.seek(index * indexLength()); 
            randomAccessFile.write(toByte(index)); 
            randomAccessFile.write(toByte(offset)); 
        } 
    } 
}

RandomAccessFile构建方法：参数1为文件路径，参数2为模式，'r'为读，'w'为写;

seek方法：在linux、unix操作系统下就是调用系统的lseek函数。

RandomAccessFile的seek方法通过调用native方法实现，源码如下。

JNIEXPORT void JNICALL 
Java_java_io_RandomAccessFile_seek0(JNIEnv *env, 
                    jobject this, jlong pos) { 
    FD fd; 
    fd = GET_FD(this, raf_fd); 
    if (fd == -1) { 
        JNU_ThrowIOException(env, "Stream Closed"); 
        return; 
    } 
    if (pos < jlong_zero) { 
        JNU_ThrowIOException(env, "Negative seek offset"); 
    } 
    // #define IO_Lseek lseek 
    else if (IO_Lseek(fd, pos, SEEK_SET) == -1) { 
        JNU_ThrowIOExceptionWithLastError(env, "Seek failed"); 
    } 
}

Java_java_io_RandomAccessFile_seek0函数的参数1表示RandomAccessFile对象，参数2表示偏移量。函数中调用的IO_Lseek方法实际是操作系统的lseek方法。

RandomAccessFile提供的读、写、指定偏移量其实都是通过调用操作系统函数完成的，包括前面介绍的文件输入流和文件输出流也不例外。

NIO文件读写-FileChannel

Channel(通道)表示IO源与目标打开的连接，Channel类似于传统的流，但Channel本身不能直接访问数据，只能与Buffer进行交互。Channel(通道)主要用于传输数据，从缓冲区的一侧传到另一侧的实体(如File、Socket)，支持双向传递。

正如SocketChannel是客户端与服务端通信的通道，FileChannel就是我们读写文件的通道。FileChannel是线程安全的，也就是一个FileChannel可以被多个线程使用。对于多线程操作，同时只会有一个线程能对该通道所在文件进行修改。如果需要确保多线程的写入顺序，就必须要转为队列写入。

FileChannel可通过FileOutputStream、FileInputStream、RandomAccessFile获取，也可以通过FileChannel#open方法打开一个通道。

以通过FileOutputStream获取FileChannel为例，通过FileOutputStream或RandomAccessFile获取FileChannel方法相同，代码如下。

public class FileChannelStu{ 
    public void testGetFileCahnnel(){ 
        try(FileOutputStream fos = new FileOutputStream("/tmp/test.log"); 
            FileChannel fileChannel = fos.getChannel()){ 
           // do....    
        }catch (IOException exception){ 
        } 
    } 
}

需要注意，通过FileOutputStream获取的FileChannel只能执行写操作，通过FileInputStream获取的FileChannel只能执行读操作，原因可查看getChannel方法源码。

通过FileOutputStream或FileInputStream或RandomAccessFile打开的FileChannel，在流关闭时也会被关闭，可查看这几个类的close方法源码。

若想要获取一个同时支持读和写的FileChannel需要通过open方法打开，代码如下。

public class FileChannelStu{ 
    public void testOpenFileCahnnel(){ 
        FileChannel channel = FileChannel.open( 
                            Paths.get(URI.create("file:" + rootPath + "/" + postion.fileName)), 
                            StandardOpenOption.READ,StandardOpenOption.WRITE); 
        // do.... 
        channel.close(); 
    } 
}

open方法第二个变长参数传StandardOpenOption.READ和StandardOpenOption.WRITE即可打开一个双向读写的通道。

FileChannel允许对文件加锁，文件锁是进程级别的，不是线程级别的，文件锁可以解决多个进程并发访问、修改同一个文件的问题。文件锁会被当前进程持有，一旦获取到文件锁就要调用一次release释放锁，当关闭对应的FileChannel对象时或当前JVM进程退出时，锁也会自动被释锁。

文件锁的使用案例代码如下。

public class FileChannelStu{ 
    public void testFileLock(){ 
        FileChannel channel = this.channel; 
        FileLock fileLock = null; 
        try { 
            fileLock = channel.lock();// 获取文件锁 
            // 执行写操作 
            channel.write(...); 
            channel.write(...); 
        } finally { 
            if (fileLock != null) { 
                fileLock.release(); // 释放文件锁 
            } 
        } 
    } 
}

当然，只要我们能确保同时只有一个进程对文件执行写操作，那么就不需要锁文件。RocketMQ也并没有使用文件锁，因为每个Broker有自己数据目录，即使一台机器上部署多个Broker也不会有多个进程对同一个日记文件操作的情况。

上面例子去掉文件锁后代码如下。

public class FileChannelStu{ 
    public void testWrite(){ 
        FileChannel channel = this.channel; 
        channel.write(...); 
        channel.write(...); 
    } 
}

这里还存在一个问题，就是并发写数据问题。虽然FileChannel是线程安全的，但两次write并不是原子性操作，如果要确保两次write是连续写入的，还必须要加锁。在RocketMQ中，通过引用计数器替代了锁。

FileChannel提供的force方法用于刷盘，即调用操作系统的fsync函数，使用如下。

public class FileChannelStu{ 
    public void closeChannel(){ 
        this.channel.force(true); 
        this.channel.close(); 
    }         
}

force方法的参数表示除强制写入内容更改外，文件元数据的更改是否也强制写入。后面使用MappedByteBuffer时，可直接使用MappedByteBuffer的force方法。

FileChannel的force方法最终调用的C方法源码如下：

JNIEXPORT jint JNICALL 
Java_sun_nio_ch_FileDispatcherImpl_force0(JNIEnv *env, jobject this, 
                                          jobject fdo, jboolean md) 
{ 
    jint fd = fdval(env, fdo); 
    int result = 0; 
    if (md == JNI_FALSE) { 
        result = fdatasync(fd); 
    } else { 
        result = fsync(fd); 
    } 
    return handle(env, result, "Force failed"); 
}

参数md对应调用force方法传递的metaData参数。

使用FileChannel支持seek(position)到指定位置读或写数据，代码如下。

public class FileChannelStu{ 
    public void testSeekWrite(){ 
        FileChannel channel = this.channel; 
        synchronized (channel) {  
            channel.position(100); 
            channel.write(ByteBuffer.wrap(toByte(index))); 
            channel.write(ByteBuffer.wrap(toByte(offset))); 
        } 
    } 
}

上述例子的作用是将指针移动到物理偏移量100byte位置处，顺序写入index和offset。读取同理，代码如下。

public class FileChannelStu{ 
    public void testSeekRead(){ 
        FileChannel channel = this.channel; 
        synchronized (channel) {  
            channel.position(100); 
            ByteBuffer buffer = ByteBuffer.allocate(16); 
            int realReadLength = channel.read(buffer);  
            if(realReadLength==16){ 
                long index = buffer.getLong(); 
                long offset = buffer.getLong(); 
            } 
        } 
    } 
}

其中read方法返回的是实际读取的字节数，如果返回-1则代表已经是文件尾部了，没有剩余内容可读取。

使用MappedByteBuffer读写文件

MappedByteBuffer是Java提供的基于操作系统虚拟内存映射(MMAP)技术的文件读写API，底层不再通过read、write、seek等系统调用实现文件的读写。

我们需要通过FileChannel#map方法将文件的一个区域映射到内存中，代码如下。

public class MappedByteBufferStu{ 
  @Test 
  public void testMappedByteBuffer() throws IOException { 
      FileChannel fileChannel = FileChannel.open(Paths.get(URI.create("file:/tmp/test/test.log")), 
                StandardOpenOption.WRITE, StandardOpenOption.READ); 
      MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_WRITE, 0, 4096); 
      fileChannel.close(); 
      mappedByteBuffer.position(1024); 
      mappedByteBuffer.putLong(10000L); 
      mappedByteBuffer.force();     
  } 
}

上面代码的功能是通过FileChannel将文件[0~4096)区域映射到内存中，调用FileChannel的map方法返回MappedByteBuffer，在映射之后关闭通道，随后在指定位置处写入一个8字节的long类型整数，最后调用force方法将写入数据从内存写回磁盘(刷盘)。

映射一旦建立了，就不依赖于用于创建它的文件通道，因此在创建MappedByteBuffer之后我们就可以关闭通道了，对映射的有效性没有影响。

实际上将文件映射到内存比通过read、write系统调用方法读取或写入几十KB的数据要昂贵，从性能的角度来看，MappedByteBuffer适合用于将大文件映射到内存中，如上百M、上GB的大文件。

FileChannel的map方法有三个参数：

MapMode：映射模式，可取值有READ_ONLY(只读映射)、READ_WRITE(读写映射)、PRIVATE(私有映射)，READ_ONLY只支持读，READ_WRITE支持读写，而PRIVATE只支持在内存中修改，不会写回磁盘;
position和size：映射区域，可以是整个文件，也可以是文件的某一部分，单位为字节。

需要注意的是，如果FileChannel是只读模式，那么map方法的映射模式就不能指定为READ_WRITE。如果文件是刚刚创建的，只要映射成功，文件的大小就会变成(0+position+size)。

通过MappedByteBuffer读取数据示例如下：

public class MappedByteBufferStu{ 
    @Test 
    public void testMappedByteBufferOnlyRead() throws IOException { 
        FileChannel fileChannel = FileChannel.open(Paths.get(URI.create("file:/tmp/test/test.log")), 
                    StandardOpenOption.READ); 
        MappedByteBuffer mappedByteBuffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, 4096); 
        fileChannel.close(); 
        mappedByteBuffer.position(1024); 
        long value = mappedByteBuffer.getLong(); 
        System.out.println(value); 
    } 
}

mmap绕过了read、write系统函数调用，绕过了一次数据从内核空间到用户空间的拷贝，即实现零拷贝，MappedByteBuffer使用直接内存而非JVM的堆内存。

mmap只是在虚拟内存分配了地址空间，只有在第一次访问虚拟内存的时候才分配物理内存。在mmap之后，并没有将文件内容加载到物理页上，而是在虚拟内存中分配地址空间，当进程在访问这段地址时，通过查找页表，发现虚拟内存对应的页没有在物理内存中缓存则产生缺页中断，由内核的缺页异常处理程序处理，将文件对应内容以页为单位(4096)加载到物理内存中。

由于物理内存是有限的，mmap在写入数据超过物理内存时，操作系统会进行页置换，根据淘汰算法，将需要淘汰的页置换成所需的新页，所以mmap对应的内存是可以被淘汰的，被淘汰的内存页如果是脏页(有过写操作修改页内容)，则操作系统会先将数据回写磁盘再淘汰该页。

数据写过程如下：

1.将需要写入的数据写到对应的虚拟内存地址;
2.若对应的虚拟内存地址未对应物理内存，则产生缺页中断，由内核加载页数据到物理内存;
3.数据被写入到虚拟内存对应的物理内存;
4.在发生页淘汰或刷盘时由操作系统将脏页回写到磁盘。

RocketMQ正是利用MappedByteBuffer实现索引文件的读写，实现一个基于文件系统的HashMap。

RocketMQ在创建新的CommitLog文件并通过FileChannel获取MappedByteBuffer时会做一次预热操作，即每个虚拟内存页(Page Cache)都写入四个字节的0x00，并强制刷盘将数据写到文件中。这个动作的用处是通过读写操作把MMAP映射全部加载到物理内存中。并且在预热之后还做了一个锁住内存的操作，这是为了避免磁盘交换，防止操作系统把预热过的页临时保存到swap区，防止程序再次读取交换出去的数据页时产生缺页中断。

参考文献

【深入浅出Linux】关于mmap的解析

本文转载自微信公众号「Java艺术」，可以通过以下二维码关注。转载本文请联系Java艺术公众号。