Small Map/Reduce Quiz

  • Due Oct 20, 2019 at 11:59pm
  • Points 100
  • Questions 6
  • Available until Oct 24, 2019 at 11:59pm
  • Time Limit None

Instructions

Use Map/Reduce to count the number of words in the Kafka book, metamorphosis.  Use the following version of the text (include all the extraneous text that is present in the file).

http://djp3.westmont.edu/gutenberg/stacks/gutenberg/5/2/0/5200/5200-h/5200-h.htm

You will be asked questions about your results. 

You should tokenize your words such that you accept any consecutive combination of alphabetic characters and apostrophes as being in a word. This regular expression will achieve the proper tokenization result:

(\p{IsAlphabetic}|['])+

You can test it here (click on "Java")
 
If you apply that regular expression to this text:
 
"What's happened to me?" he thought. It wasn't a dream. His room, a proper human room although a little too small, lay peacefully between its four familiar walls.
 
You should break up the text into:
What's
happened
to
me
he
thought
It
wasn't
a
dream
His
room
a
proper
human
room
although
a
little
too
small
lay
peacefully
between
its
four
familiar
walls

Then count words in lowercase according to US rules of lowercase.  You can achieve that with:

someString.toLowerCase(Locale.US);

 

Here is a series of 3 videos that are excellent tutorials:

 
 
Only registered, enrolled users can take graded quizzes