Well I've been way too busy with University (work and life) to even consider working on my media share project - which is sad because at the end of the year I won't be in halls any more and will not have much desire to make it a peer to peer system.
Anyway the point of this post is my Text Prediction system which was my final project for my programming module. Technically its very good if I do say so myself - but the write up that goes with it hangs in the balance. Will have to wait to see what mark I get.
Whilst I was working on it people started talking to me about binary trees and searches which I had no clue about. They learnt about them from another module which I am not taking. I considered working them in but I thought that was a bit lame. I wanted to make something more functional.
Several times I considered optimising the code. A binary search would be more efficient than a string search - an array in some places better than a vector, and a linked list better in others. Incidentally my program is composed almost entirely of vectors. The reason for this is that I just wanted to get the method done and not worry about the efficiency of the variables. I was having trouble using STL linked lists and arrays would be too much work with resizing so I used STL vectors instead.
Instead of wasting all my time worrying about efficiency or a certain technique I hadn't been taught which in my honest opinion other people were wasting effort on, I tried (and succeeded) in doing something different (although I clearly wasn't the first to do this by far).
There came a point during the project when every started talking about how many words their program could predict - each successive giving a bigger number because they found a bigger list of words online. I considered this utter rubbish because it doesn't display the predictive power of your program, just which website you went on to get your word list.
I wanted something with a higher predictive power so I started looking at Markov chains. My text prediction system assembles each group of 4 words into a word trigram, transition probability and resulting word set. Each time the same 4 words appear in order the information is added to the system causing transition frequencies to change.
This allows the system to predict entire sentences.
Strictly speaking this would not be considered a proper Markov chain because the probabilities are constantly changing but this is done because of the stochastic nature of the system.
There is no possible way I could assemble all possible trigrams as well as suppling appropriate transition probabilities. In addition this would cost memory with alot being wasted on sets the user never writes. The system is therefore setup so that it is constantly training - adapting to the way the user types.
I can't include the source code for a while but my last build when handing it in is available for download. It should be noted - as I mentioned several times in my write up to avoid losing marks - The interface you see is not the text prediction program. It should be considered a typing program (which was crudely coded in no time at all) + a fully featured text prediction program (or class - which is what this is all about ).
The text prediction code is simply dropped into the typing program which was used as a test program because the text prediction system only takes strings as input. The test program takes data from the text stream and sends it to the text prediction system. The text prediction system then returns the predictions and the test program - or whatever program the system has been dropped into is free to do what it wants with those predictions.