As there is a lot to learn about streams and functional programming, and as it is hard to learn it purely from a book, I decided to see for myself whether I could make something interesting with it.

Below is code which I used for the following:

  • from gutenberg.org I downloaded text files of three books, namely Moby Dick, Oliver Twist and Pride and Prejudice.
  • I put these in the project folder and created a simple class Book (I did not include it in the code below)
  • A book object contains a field of type Path, pointing to the text file
  • Starting with the three paths, the code tries to find the ten most frequent used words written with a capital, excluding “I”
  • If it works well enough, we should see the main characters appear in the end result

The code:

Stream.of(book1, book2, book3) --I created a class Book with a Path field. 
        .collect(Collectors.toMap(Function.identity(),book->{
            String text = null;
            try {
                text = Files.readString(book.getPath());
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
            return text;
        }))

        .entrySet().stream()
        .map(entry->{
            String[] list = entry.getValue()
                    .replaceAll("[\\r\\n]"," ")
                    .replaceAll("\\s+", " ")
                    .replaceAll("Mr.", "Mr")
                    .replaceAll("Mrs.", "Mrs")
                    .replaceAll("\\. [A-Z]", "\\. ")
                    .replaceAll("  [A-Z]", " ")
                    .replaceAll("[A-Z][A-Z]", "[a-z][a-z]")
                    .split("[\\.?!,\\s\\s+]");
            List<String> list2 = new ArrayList<>();
            Map<String, Integer> frequencyTable = new HashMap<>();
            for (String word : list)
                if (word!="" && word.length()!=1 && word.charAt(0)>64 && word.charAt(0)<91)
                    frequencyTable.merge(word,1, Integer::sum);

            List<Map.Entry<String,Integer>> orderedFrequencyList = new ArrayList<>(frequencyTable.entrySet());
            Collections.sort(orderedFrequencyList, new Comparator<Map.Entry<String,Integer>>(){
                @Override
                public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
                    return o2.getValue()-o1.getValue();
                }
            });
            
            return Map.entry(entry.getKey(), orderedFrequencyList.subList(0,20));
        })
        .forEach(x->System.out.printf("%s - %s\n\n",x.getKey().getPath().getFileName(),x.getValue()));

The result:

Moby Dick.txt - [Ahab=379, Whale=210, Stubb=200, Queequeg=184, Captain=178, Starbuck=146, Sperm=133, Pequod=125, God=92, The=91]

Pride and Prejudice.txt - [Mr=954, Elizabeth=471, Darcy=359, Miss=263, Jane=231, Bingley=222, Bennet=153, Wickham=141, Collins=135, Lydia=114]

Oliver Twist.txt - [Mr=1189, Oliver=620, Bumble=320, Sikes=288, Jew=271, Fagin=261, The=167, Brownlow=144, Rose=139, Monks=117]

First evaluation:

  • It sort of works. Ahab, Whale, Elizabeth, Mr Darcy, Oliver, Bumble, they are on top with frequencies.
  • I’m terrible with regex, needed to google and use ai assistent a lot.
  • The try-catch, mandatory because of Files.readString takes a lot of lines.
  • Furthermore the code is reasonably compact. I like that.
  • I think for others it will take quite some time to understand how the code exactly works.

What I learned

  • While the code gets less verbose using functional programming/streams, creating it still takes much time. There is effort in every detail.
  • I think there are too many methods and classes available to become fluent in this part of the language, unless you specialize.
  • Fortunately there is good documentation (docs.oracle.com).
  • Java has really great methods. Files.readString() and the Collectors class methods for example.
  • The .merge() method from the Map interface is amazing. It puts a new entry in the map, or not if it is already there, and if so, you use the BiFunction to set a new value.
  • I learned to play with Map.Entry, it is in my vocabulaire now.
  • I wrote a Comparator to be used with Collections.sort.
  • One day I will learn better regex.

All in all a rather rewarding experience, really learned a lot.


<
Previous Post
Deadlock, starvation, livelock and race conditions
>
Next Post
Optionals