On a recent project I need to find all the possible permutations of a given URL. Stripping off subdomains, paths, and query parameters. Here is the first part of the solution. A method which takes a string and strips it down based on a given divider in a given direction at a given interval.
Here’s the code:
private static String[] getPermutations(String whole, String divider, int lim, int dir) { String[] chunks = whole.split((divider.matches("\\.") ? "\\" : "")+divider); System.out.println("chunks.length: "+chunks.length); if(chunks.length <= lim) { System.out.println("return whole: "+whole); return new String[]{whole}; } String[] permutations = new String[chunks.length-lim]; if(dir == 1) { permutations[0] = whole; System.out.println("permutations[0]: "+permutations[0]); for(int i = 1; i < chunks.length-lim; i++) { String permutation = ""; for(int o = i; o < chunks.length; o++) { permutation += (o == i ? "" : divider) + chunks[o]; } permutations[i] = permutation; System.out.println("permutations["+i+"]: "+permutations[i]); } } else if(dir == -1) { for(int i = 0; i < chunks.length-lim; i++) { String permutation = ""; for(int o = 0; o < chunks.length-i; o++) { permutation += (o == 0 ? "" : divider) + chunks[o]; } permutations[i] = permutation; System.out.println("permutations["+i+"]: "+permutations[i]); } } return permutations; }
Here is an example of it being used.
Input:
getPermutations("com",".",1, 1); getPermutations("google.com",".",1, 1); getPermutations("a.b.c.d.e.cool.google.com",".",1, 1); getPermutations("/path/asdf/a/b/c/d/e/f","/",0, -1); getPermutations("a=b&c=d&e=f","&",0, -1);
Output:
$ javac Runme.java && java Runme chunks.length: 1 return whole: com chunks.length: 2 permutations[0]: google.com chunks.length: 8 permutations[0]: a.b.c.d.e.cool.google.com permutations[1]: b.c.d.e.cool.google.com permutations[2]: c.d.e.cool.google.com permutations[3]: d.e.cool.google.com permutations[4]: e.cool.google.com permutations[5]: cool.google.com permutations[6]: google.com chunks.length: 9 permutations[0]: /path/asdf/a/b/c/d/e/f permutations[1]: /path/asdf/a/b/c/d/e permutations[2]: /path/asdf/a/b/c/d permutations[3]: /path/asdf/a/b/c permutations[4]: /path/asdf/a/b permutations[5]: /path/asdf/a permutations[6]: /path/asdf permutations[7]: /path permutations[8]: chunks.length: 3 permutations[0]: a=b&c=d&e=f permutations[1]: a=b&c=d permutations[2]: a=b
Next I'll need to combine the URL components to get a list of valid URLs..