Is it possible to create a regular expression that finds strings that DO NOT match the pattern?

I have a situation where I need to pull a bunch of zip files from the server using a pre-created SFTP client. I only need those that do not have _PROCESSED in the file _PROCESSED . For example, covers.zip will be fine, but covers_PROCESSED.zip will not. I have a current working solution for this, where I run lsFiles() , which returns all the file names in a directory, and then runs a function that filters them based on whether they have this keyword in the file name. Then they are pulled from the server.

However, in the sftp client that I use, there is also this function: lsFiles(String pattern) , which returns everything that matches the pattern. I want to use this function to get only the file names that I want, as this will shorten and optimize my code. The problem is that I do not know how to create a regular expression that will only match those if they DO NOT have the given pattern (or if it is possible). Can someone tell me if this is possible, and if so, provide an example of how to do this?

+4
source share
3 answers

Of course the boss

 /(?!_PROCESSED)/ 

This is a negative view , and it is supported in almost all regexp variants.


I adapted the answer to this question to help you.

 public static File[] listFilesMatching(File root, String regex) { if(!root.isDirectory()) { throw new IllegalArgumentException(root+" is not directory."); } final Pattern p = Pattern.compile(regex); // careful: could also throw an exception! return root.listFiles(new FileFilter(){ @Override public boolean accept(File file) { return p.matcher(file.getName()).matches(); } }); } listFilesMatching(new File("/some/path"), "(?!_PROCESSED)") 

Here are the docs for FileFilter

+3
source

If your client filters with true regex then

 lsFiles("(?!_PROCESSED)\.zip") 

should return all zip files that were not processed. But usually file filters allow just wildcard expansion (out of a variety of *.zip ), so I would be surprised if this really worked. If this is not the case, then selecting all .zip files and filtering them is the right approach, but you already knew that.

0
source

You can exclude a specific substring using this type of pattern:

 ^(?>[^_]++|_(?!PROCESSED))+$ 

As you can see, this is an alternation between [^_]++ (everything that is not _ ) and _(?!PROCESSED) ( _ , which is not followed by PROCESSED ) .

The interest of this type of pattern is that you avoid checking character by character if there is no _PROCESSED somewhere, but only when you encounter the first character of the string you want to exclude. Thus, the test number is significantly reduced.

The function suggested by naomik seems to be suitable for what you are trying to do:

 listFilesMatching(new File("/some/path", "^(?>[^_]++|_(?!PROCESSED))+$"); 
0
source

Source: https://habr.com/ru/post/1493694/


All Articles