I learnt about Markov Chains today.
In mathematics, a Markov chain, named after Andrey Markov, is a discrete random process with the Markov property. A discrete random process means a system which can be in various states, and which changes randomly in discrete steps. It can be helpful to think of the system as evolving once a minute, although strictly speaking the "step" may have nothing to do with time. The Markov property states that the probability distribution for the system at the next step (and in fact at all future steps) only depends on the current state of the system, and not additionally on the state of the system at previous steps. Since the system changes randomly, it is generally impossible to predict the exact state of the system in the future. However, the statistical properties of the system at a great many steps in the future can often be described. In many applications it is these statistical properties that are important. - From the wikipedia entry on Markov Chains
It was so cool that I decided to do something with it. So I wrote a simple helper that generates articles for the app that I was testing. Here is the code in all its glory. It is not the fastest piece of code and is a memory hog, but who cares?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
module Markov | |
class TextGenerator | |
attr_accessor :markov | |
attr_accessor :input_file | |
def initialize(input_file) | |
@input_file=input_file | |
end | |
def get_words | |
content=File.read(@input_file).split(/[\.\?]/) | |
content.each do |sentence| | |
sentence.strip! | |
before = "" | |
sentence.split.each do |word| | |
if(word.strip != "") | |
yield before, word | |
before = word | |
end | |
end | |
end | |
end | |
def generate_markov | |
@markov = {} | |
if(File.exist?(@input_file)) | |
get_words do |before, word| | |
markov[before] ||= [] | |
markov[before] << word | |
end | |
end | |
@markov | |
end | |
def generate_paragraph(len=20) | |
sentences=[] | |
len.times do | |
sentences << generate_sentence | |
end | |
sentences.join " " | |
end | |
def generate_article(len=10) | |
paragraphs=[] | |
len.times do | |
paragraphs << generate_paragraph | |
end | |
paragraphs.join("\n\n") | |
end | |
def generate_sentence(len=20) | |
if(not @markov) | |
generate_markov | |
end | |
words = [] | |
before = "" | |
loop do | |
before = @markov[before][Kernel.rand(@markov[before].length)] rescue nil | |
break if before.nil? | |
break if (words + [before]).length >= len | |
words << before | |
end | |
return words.join(" ")+"." | |
end | |
end | |
end |
This is how you use it.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'markov' | |
puts Markov::TextGenerator.new('alice.txt').generate_article |
And this produced this fine article. Isn't this cool?