Friday, November 28, 2014

Avoid using java.io.File.lastModified() for sorting Files

Keywords:
java File lastModified timestamp ext3 ext4 linux windows sort order milliseconds

Problem:
It seems like a basic use-case - order a list of files based on the order they were modified. File.lastModified() seems a reasonable choice for basing a Comparator on, based on the documentation:
A long value representing the time the file was last modified, measured in milliseconds since the epoch ...

So, assuming the files are not made any quicker than 1-per millisecond this should work fine right? No. For some reason, on linux based systems File.lastModified() is always rounded to the second. This appears to be a (legacy) limitation with 'ext3' (and earlier?) file-systems that persists to this day - even if the file-system is ext4 (with nanosecond precision). On a windows based system (with NTFS, 100ns precision) the File.lastModified() values are to the millisecond (I'm not the first person to notice the difference - File.lastModified() on windows vs linux).

Problems with sorting files on linux don't seem to be isolated to java "Order files by creation time to the millisecond in Bash" - though I suspect that thread is actually based on using an ext3 file-system, as on an ext4 file-system you can get this working with ls -latr --full-time.

Q: How do you know what type of file-system you're running?
A: df -T

Q: What precision will the file-system support?
A: See Comparison of file systems article and the 'Max Timestamp Resolution' column.

Q: If I'm still unsure, is there a test case I can try?
A: Here's a basic one:
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

public class FileModified {
    public static void main(String[] args) throws IOException {
        File file = File.createTempFile("timestamp-test-", ".txt");
        FileWriter writer = new FileWriter(file);
        writer.append("updated");
        writer.close();
        
        System.out.println("via " + file.getClass().getName() 
                    + "\n\t" + file.getAbsolutePath() + "\n\t\tmodified: " + file.lastModified() + "ms");        
    }
}

If you establish that the target environment(s) for your java application are on file-systems that do support a higher than 1 second precision how do you get access to this from java?


Solution:
The closest bug report I can find is JDK-6939260. But it appears that the bug title has been changed to propose exposing higher than millisecond precision for files - the side note made in the comments states "it's not possible to support this with java.io.File because it specifies that the last modified time is returned in milliseconds". This missed the point in the original bug description stating "The end of the number is always 000" - ie parking micro/nano second precision, shouldn't the method provide at least millisecond precision if the underlying filesystem supports this (or higher)?

It appears for new features (and bug fixes?) related to the capabilites of the file-system, this functionality is going to be implemented in java.nio.file. So if you're using java 1.8+ you may be able access more accurate timestamps via Files.getLastModifiedTime(Path,...). The updated test case becomes:
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.attribute.FileTime;
import java.util.concurrent.TimeUnit;

public class FileModified {
    public static void main(String[] args) throws IOException {
        File file = File.createTempFile("timestamp-test-", ".txt");
        FileWriter writer = new FileWriter(file);
        writer.append("updated");
        writer.close();
        
        System.out.println("via " + file.getClass().getName() 
                           + "\n\t" + file.getAbsolutePath() + "\n\t\tmodified: " + file.lastModified() + "ms");
        
        Path filePath = FileSystems.getDefault().getPath(file.getAbsolutePath());
        FileTime modified = Files.getLastModifiedTime(filePath);
        System.out.println("via " + modified.getClass().getName() + "\n\t" + filePath 
                           + "\n\t\tmodified: " + modified.toMillis() + "ms (" + modified.to(TimeUnit.NANOSECONDS) + "ns)");
    }
}

Note that although java.nio.file.attribute.FileTime exists in java 1.7, JDK-6939260 claims that it's only going to be fixed in 8 onwards (I can't tell which update - I tried the latest available and it still fails).

If you need to support java 1.7 or below on linux there is no staightforward option (short of parsing ls --full-time command output). If your system is in control of the files being written, my suggestion would be to maintain a separate record of creation/modified time - assuming order by time is important and you need higher than per-second precision.

Friday, June 20, 2014

Using JCR-SQL2 for querying ACLs in a Jackrabbit repository

Keywords:
jcr-sql2 query jackrabbit ACL ACE access controls nt:hierarchyNode rep:policy

Problem:
The ultimate problem was actually how do you handle 'gracefully' removing Principals from the (default) jackrabbit security workspace when there's the potential they are being referenced by either: (a) groups; or (b) access controls (ACLs). This may be a topic for another (much more detailed) post, but for now the focus is on (b) detecting if there are any ACLs in the repository that reference the Principal.

Aside: why is it a concern? Removing the Principal won't effect enforcement of the access controls, but anything attempting to process the ACL definition will hit an error for the missing Principal:
javax.jcr.InvalidItemStateException: Item does not exist anymore: 6e332039-2956-323c-8e82-212de8f88ff0`

The AccesControl documentation on the jackrabbit wiki states:
How Resource-based ACLs are stored
Resource-based ACLs are stored per resource/node in a special child node rep:policy. This one will have a list of rep:GrantACE child nodes (usually named allow, allow0,...) for grant access control entries and rep:DenyACE child nodes (usually named deny, deny0,...) for deny access control entries.

Each ACE node has a rep:principalName STRING property pointing to the user or group this ACE belongs to, and a rep:privileges NAME multi-value property, containing all the privileges of this ACE.

Note that you can read/browse these nodes using the JCR API, but cannot modify them. This must always happen through the JCR access control API.

How do you target these rep:policy items in a JCR-SQL2 query ... and more importantly has anyone done this before - to save me the time?


Solution:
I couldn't find an example, but by careful reading of the builtin_nodetypes.cnd (and some trial and error) the following query will list all access control entities (ACEs) that reference a given Principal (i.e a User or Group) with-in a given access control list (ACL) set on a resource (file or folder).
select resource.*, ace.*
    from [nt:hierarchyNode] as resource
    inner join [rep:ACL] as acl
       ON ISCHILDNODE(acl, resource)
    inner join [rep:ACE] as ace
       ON ISCHILDNODE(ace, acl)
    where ace.[rep:principalName] = "kevin"

The results will look like (in table form):
Result Node-pathresource.jcr:createdByresource.jcr:createdresource.jcr:primaryTypeace.rep:globace.rep:nodePathace.rep:principalNameace.jcr:primaryType
/files/examplejack-admin2014-05-06T07:08:09.100+11:00nt:folderkevinrep:GrantACE

Note that 'Result Node-path' - full path to the resource - won't (and can't) be a a "column" Value in the result javax.jcr.query.Row items but can be obtained via a javax.jcr.Node item referenced by the javax.jcr.query.Row. Also note that the Principal reference here is via the 'name' as stored in the protected rep:principalName field - even if using the default security workspace, this won't be the full 'principal path' (path to the org.apache.jackrabbit.api.security.principal.ItemBasedPrincipal).

Notes:
The stackoverflow [jcr-sql2] info page includes a pretty neat summary of what JCR-SQL2 is with links to the reference specs and implementations (note the above is specific to jackrabbit). The JCR 2.0 SQL-2 Grammar diagrams are particularly useful.