LIBSVM TUTORIAL PART 3 – Training the Model

Part 1
Part 2
Part 3
Part 4

Now that we have data to feed into the SVM as a training set, there is a couple more tweaks that need to be made.  First, we need to set the training set to have numeric values for the different classes of data ( 1 = SPAM and 0 = HAM) and also we need to make sure the inputs are in ascending order.  So the final input training file should look like:

1 1:1 2:1 3:1
1 3:1 4:1 5:1 6:1 7:1
1 1:1 4:1 8:1 9:1
1 1:1 10:1 11:1 12:1
1 1:1 2:1 4:1 7:1 10:1 12:1 13:1 14:1
0 15:1 16:1 17:1 18:1 19:1
0 16:1 20:1 21:1 22:1 23:1 24:1 25:1 26:1
0 16:1 27:1 28:1 29:1 30:1 31:1
0 22:1 23:1 24:1 26:1 32:1 33:1 34:1
0 16:1 18:1 22:1 28:1 35:1 36:1 37:1 38:1 39:1

As you can see each line represents one of our sample emails.  If the line starts with 1 then it represents a SPAM email and if the line starts with 0 then it represents a HAM email (not spam).  Then, each number:number sequence represents a word found in the email.  For example, any line with “1:1” means that the word “buy” was found in that email.

Now that we have the input file, save it as a file named “Spam.train”.

To create the predictive model, run the following command line (on Windows):

C:\Program Files\LibSVM\windows>svm-train.exe spam.train

*
optimization finished, #iter = 5
nu = 1.000000
obj = -7.583904, rho = 0.229345
nSV = 10, nBSV = 10
Total nSV = 10

After running this command, there will now be a “Spam.train.model” file that will be used as input when classifying any new emails.  We will see that in the next part.

2 thoughts on “LIBSVM TUTORIAL PART 3 – Training the Model

  1. Amzee says:

    Hi, thanks for the tutorial. I have a doubt, do u mean that we need to type each line of code? I have 14000 + cases of patients data, indicating whether the patient has disease or not, what shall I do?

  2. Firdaus says:

    @amzee if you have 14000+ data, you should use NN. benefit of SVM is that it can solve small sample problems.

Leave a Reply

Your email address will not be published. Required fields are marked *