LIBSVM TUTORIAL PART 3 – Training the Model

Part 1
Part 2
Part 3
Part 4

Now that we have data to feed into the SVM as a training set, there is a couple more tweaks that need to be made.  First, we need to set the training set to have numeric values for the different classes of data ( 1 = SPAM and 0 = HAM) and also we need to make sure the inputs are in ascending order.  So the final input training file should look like:

1 1:1 2:1 3:1
1 3:1 4:1 5:1 6:1 7:1
1 1:1 4:1 8:1 9:1
1 1:1 10:1 11:1 12:1
1 1:1 2:1 4:1 7:1 10:1 12:1 13:1 14:1
0 15:1 16:1 17:1 18:1 19:1
0 16:1 20:1 21:1 22:1 23:1 24:1 25:1 26:1
0 16:1 27:1 28:1 29:1 30:1 31:1
0 22:1 23:1 24:1 26:1 32:1 33:1 34:1
0 16:1 18:1 22:1 28:1 35:1 36:1 37:1 38:1 39:1

As you can see each line represents one of our sample emails.  If the line starts with 1 then it represents a SPAM email and if the line starts with 0 then it represents a HAM email (not spam).  Then, each number:number sequence represents a word found in the email.  For example, any line with “1:1” means that the word “buy” was found in that email.

Now that we have the input file, save it as a file named “Spam.train”.

To create the predictive model, run the following command line (on Windows):

C:\Program Files\LibSVM\windows>svm-train.exe spam.train

*
optimization finished, #iter = 5
nu = 1.000000
obj = -7.583904, rho = 0.229345
nSV = 10, nBSV = 10
Total nSV = 10

After running this command, there will now be a “Spam.train.model” file that will be used as input when classifying any new emails.  We will see that in the next part.

3 thoughts on “LIBSVM TUTORIAL PART 3 – Training the Model

  1. Amzee says:

    Hi, thanks for the tutorial. I have a doubt, do u mean that we need to type each line of code? I have 14000 + cases of patients data, indicating whether the patient has disease or not, what shall I do?

  2. Firdaus says:

    @amzee if you have 14000+ data, you should use NN. benefit of SVM is that it can solve small sample problems.

  3. With havin so much content do you ever run into any issues of plagorism
    or copyright infringement? My site has a lot of unique content I’ve either
    created myself or outsourced but it seems a lot of it is popping
    it up all over the internet without my permission. Do you know any
    ways to help reduce content from being stolen? I’d really appreciate it.

Leave a Reply

Your email address will not be published. Required fields are marked *