With the advent of Java 8 and
java.util.stream
and
lambda expressions in it, one can do data munging in Java 8 as
the following:
Java 8:
public static void main(String[] args) {
HashMap<Integer, Character> nucleobases = new HashMap<> ();
nucleobases.put(1, 'A');
nucleobases.put(2, 'G');
nucleobases.put(3, 'T');
nucleobases.put(4, 'C');
range(0, 100)
// generate a stream containing random strings of length 10
.mapToObj(i -> randomNucleicAcidSequence(new Random(), nucleobases, 10))
// sort the elements in the stream to natural ordering
.sorted()
// group strings into sub-lists and wrap them into a stream
.collect(groupingBy(name -> name.charAt(0)))
// print each sub-list's common initial letter and the constituent strings
.forEach((letter, names) -> System.out.println(letter
+ "\n\t"
+ names.stream().collect(joining("\n\t"))));
}
public static String randomNucleicAcidSequence(Random r, Map<Integer, Character> map, int length) {
return r.ints(1, 4).limit(length).mapToObj(
x -> Character.toString(map.get(x))).collect(Collectors.joining());
}
This is remarkbly similar to a program written in Mathematica using the munging style I use all the time:
(* There is not a built-in RandomString[…] *)
nucleobases = {"A", "G", "T", "C"};
randomNucleicAcidSequence[length_Integer] := StringJoin[nucleobases[[#]] & /@ RandomInteger[{1, 4}, length]]
Composition[
Print[StringJoin[{#[[1]],":\n",StringJoin[Riffle[#[[2]],"\n"]]}]]& /@ # &,
(First[Characters[Part[#, 1]]]-> #) & /@ # &,
GatherBy[#, First[Characters[#]]&]&,
Sort,
Map[Function[{i}, randomNucleicAcidSequence[10]], #] &
][Range[0, 100]]
The output of both program print the mucleic acid sequences grouped by initial neuclobase:
A:
AAAATACCTC
AAATAATCAT
AACAGATACG
ACAACTACGG
ACCATCAAAT
...
C:
CAACGGGGTT
CAAGAAGAGC
CACACCCACA
CACACTCTAC
CAGGACCGGA
...
G:
GAACGTTTCA
GAACTAAGCG
GACCAGTTCT
GAGAACACGT
GAGCCGCCAC
...
T:
TAAAATTGCC
TAAGGTGAGG
TAGCCGGTTA
TAGGCGGTGA
TAGTTCGAGC
...
Data streams and algorithms for processing them is a recently hot research area in computer science. It seems to me it will be natural for Java standard library to include more and more stream algorithms in future.
Related: