Bookmark and Share

With the advent of Java 8 and java.util.stream and lambda expressions in it, one can do data munging in Java 8 as the following:

Java 8:

public static void main(String[] args) {

    HashMap<Integer, Character> nucleobases = new HashMap<> ();
    nucleobases.put(1, 'A');
    nucleobases.put(2, 'G');
    nucleobases.put(3, 'T');
    nucleobases.put(4, 'C');


    range(0, 100)
            // generate a stream containing random strings of length 10
            .mapToObj(i -> randomNucleicAcidSequence(new Random(), nucleobases, 10))
            // sort the elements in the stream to natural ordering
            .sorted()
            // group strings into sub-lists and wrap them into a stream
            .collect(groupingBy(name -> name.charAt(0)))
            // print each sub-list's common initial letter and the constituent strings
            .forEach((letter, names) -> System.out.println(letter
                    + "\n\t"
                    + names.stream().collect(joining("\n\t"))));
}

public static String randomNucleicAcidSequence(Random r, Map<Integer, Character> map, int length) {
    return r.ints(1, 4).limit(length).mapToObj(
            x -> Character.toString(map.get(x))).collect(Collectors.joining());
}

↑ Code in GitHub

This is remarkbly similar to a program written in Mathematica using the munging style I use all the time:

(* There is not a built-in RandomString[…] *)
nucleobases = {"A", "G", "T", "C"};

randomNucleicAcidSequence[length_Integer] := StringJoin[nucleobases[[#]] & /@ RandomInteger[{1, 4}, length]]

Composition[
    Print[StringJoin[{#[[1]],":\n",StringJoin[Riffle[#[[2]],"\n"]]}]]& /@ # &,
    (First[Characters[Part[#, 1]]]-> #) & /@ # &,
    GatherBy[#, First[Characters[#]]&]&,
    Sort,
    Map[Function[{i}, randomNucleicAcidSequence[10]], #] &
][Range[0, 100]]

The output of both program print the mucleic acid sequences grouped by initial neuclobase:

A:
AAAATACCTC
AAATAATCAT
AACAGATACG
ACAACTACGG
ACCATCAAAT
...
C:
CAACGGGGTT
CAAGAAGAGC
CACACCCACA
CACACTCTAC
CAGGACCGGA
...
G:
GAACGTTTCA
GAACTAAGCG
GACCAGTTCT
GAGAACACGT
GAGCCGCCAC
...
T:
TAAAATTGCC
TAAGGTGAGG
TAGCCGGTTA
TAGGCGGTGA
TAGTTCGAGC
...

Data streams and algorithms for processing them is a recently hot research area in computer science. It seems to me it will be natural for Java standard library to include more and more stream algorithms in future.

Related:

Comments on this page are closed.
blog comments powered by Disqus