Friday, November 28, 2014

Avoid using java.io.File.lastModified() for sorting Files

Keywords:
java File lastModified timestamp ext3 ext4 linux windows sort order milliseconds

Problem:
It seems like a basic use-case - order a list of files based on the order they were modified. File.lastModified() seems a reasonable choice for basing a Comparator on, based on the documentation:
A long value representing the time the file was last modified, measured in milliseconds since the epoch ...

So, assuming the files are not made any quicker than 1-per millisecond this should work fine right? No. For some reason, on linux based systems File.lastModified() is always rounded to the second. This appears to be a (legacy) limitation with 'ext3' (and earlier?) file-systems that persists to this day - even if the file-system is ext4 (with nanosecond precision). On a windows based system (with NTFS, 100ns precision) the File.lastModified() values are to the millisecond (I'm not the first person to notice the difference - File.lastModified() on windows vs linux).

Problems with sorting files on linux don't seem to be isolated to java "Order files by creation time to the millisecond in Bash" - though I suspect that thread is actually based on using an ext3 file-system, as on an ext4 file-system you can get this working with ls -latr --full-time.

Q: How do you know what type of file-system you're running?
A: df -T

Q: What precision will the file-system support?
A: See Comparison of file systems article and the 'Max Timestamp Resolution' column.

Q: If I'm still unsure, is there a test case I can try?
A: Here's a basic one:
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

public class FileModified {
    public static void main(String[] args) throws IOException {
        File file = File.createTempFile("timestamp-test-", ".txt");
        FileWriter writer = new FileWriter(file);
        writer.append("updated");
        writer.close();
        
        System.out.println("via " + file.getClass().getName() 
                    + "\n\t" + file.getAbsolutePath() + "\n\t\tmodified: " + file.lastModified() + "ms");        
    }
}

If you establish that the target environment(s) for your java application are on file-systems that do support a higher than 1 second precision how do you get access to this from java?


Solution:
The closest bug report I can find is JDK-6939260. But it appears that the bug title has been changed to propose exposing higher than millisecond precision for files - the side note made in the comments states "it's not possible to support this with java.io.File because it specifies that the last modified time is returned in milliseconds". This missed the point in the original bug description stating "The end of the number is always 000" - ie parking micro/nano second precision, shouldn't the method provide at least millisecond precision if the underlying filesystem supports this (or higher)?

It appears for new features (and bug fixes?) related to the capabilites of the file-system, this functionality is going to be implemented in java.nio.file. So if you're using java 1.8+ you may be able access more accurate timestamps via Files.getLastModifiedTime(Path,...). The updated test case becomes:
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.attribute.FileTime;
import java.util.concurrent.TimeUnit;

public class FileModified {
    public static void main(String[] args) throws IOException {
        File file = File.createTempFile("timestamp-test-", ".txt");
        FileWriter writer = new FileWriter(file);
        writer.append("updated");
        writer.close();
        
        System.out.println("via " + file.getClass().getName() 
                           + "\n\t" + file.getAbsolutePath() + "\n\t\tmodified: " + file.lastModified() + "ms");
        
        Path filePath = FileSystems.getDefault().getPath(file.getAbsolutePath());
        FileTime modified = Files.getLastModifiedTime(filePath);
        System.out.println("via " + modified.getClass().getName() + "\n\t" + filePath 
                           + "\n\t\tmodified: " + modified.toMillis() + "ms (" + modified.to(TimeUnit.NANOSECONDS) + "ns)");
    }
}

Note that although java.nio.file.attribute.FileTime exists in java 1.7, JDK-6939260 claims that it's only going to be fixed in 8 onwards (I can't tell which update - I tried the latest available and it still fails).

If you need to support java 1.7 or below on linux there is no staightforward option (short of parsing ls --full-time command output). If your system is in control of the files being written, my suggestion would be to maintain a separate record of creation/modified time - assuming order by time is important and you need higher than per-second precision.