As Of 04 07 2008

Processing SMS from the database/ Testing

Task 12 – Completed

Next find synonyms for the selected words. Synonyms are found by posting each of the word to fee dictionary website ( and analyzing the HTML response from it. Again calculate tf-idf weight of each word from the database. Highest tf-idf synonyms are selected. The data is stored in “sms_synonym” table.


Fig 1. sms_synonym table

Find poetry lines from the database where the selected synonym is used in the same context as in the SMS. Select the final poetry line which maximize the tf weight and minimizes emotional weight difference to the users SMS. The result is stored in “sms_poem_line” table.


Fig 2. sms_poem_line table

Poetry Selection

Given a query of i words, the end result is to calculate this weight (w) for each word in every poem line.

\begin{equation} w_{i,d} = tf_{i,d} * log(n/df_{i}) \end{equation}

Where $tf_{i,d}$; is term frequency of the $i^{th}$ word in each poem line in a set of d poem lines. n is the total number of poem lines. $df_{i}$ is the document frequency of the $i^{th}$ word. For each word i, the system then returns the poem lines such that $\sum w_{i,d}$ is maximized.


Fig 3. Poetry selection data flow diagram

Revised Project Plan


Fig 4. New project plan

Changes to be made

  1. GUI to edit config files
  2. Options to enable polling results, display polling results, type of polling results display pie chart, bar chart, etc.
  3. backup "sms_log" data to text files
  4. config file editor
  5. instead of connecting to Internet use a off line dictionary to find synonyms
  6. Remove SMS max length restriction, it should be handled by display application
  7. documentations
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.