LIBSVM TUTORIAL PART 4 – Testing the Model

Part 1
Part 2
Part 3
Part 4

The whole purpose of using a Support Vector Machine is to be able to predict whether new instances of an object belong to a certain group.  This could be detecting whether documents are sensitive, stocks are a “buy”, or whether it will rain.  In our case, we are trying to determine whether emails are SPAM or not.  So to test this, we need to come up with more instances of emails.

For the first test, we will use a sample email:

James, can you pick up the dog?

To translate this to the proper input format to test, we once again need to take each word and map them to our previous vector of words:

james=16

can=???

you=17

pick=???

up=???

the=29

dog=26

Any words with ??? mean that we didn’t see those words during the training phase, so we will essentially ignore them for now.  So to convert that new email to the proper input format, it would look like:

0 16:1 17:1 26:1 29:1

Since we know that this email should not be classified as SPAM, we start it off with a “0”, then include a value for any words we already know about.  We will then save this text in a file named “sample.1.txt” and run the following command line to test it:

c:\Program Files\LibSVM\windows>svm-predict.exe sample.1.txt spam.train.model sample.1.predicted.txt
Accuracy = 100% (1/1) (classification)

The output on the command line tells us that the algorithm predicted the email was HAM and not SPAM since it had an accuracy of 100%.  Also, if you open up the sample.1.predicted.txt, you will see a single entry with “0” indicating that it predicted the first line in the input file belonged to the class “0” or HAM.

Now lets, add some new sample emails to test:

Cheap viagra by mail!

James, you need viagra.

The first email is obviously a SPAM type email, but the second one is a little more interesting.  If that was sent from a stranger, then it might be SPAM, however if my wife sent it, it may not 🙂

So let’s add them to our input file along with the first one, it would look like:

0 16:1 17:1 26:1 29:1
1 2:1 3:1 8:1 9:1
0 2:1 16:1 17:1

And then, lets run the algorithm to see what it thinks:

c:\Program Files\LibSVM\windows>svm-predict.exe sample.1.txt spam.train.model sample.1.predicted.txt
Accuracy = 100% (3/3) (classification)

As you can see, the algorithm did very well.  It thought that the first email was HAM, the second was SPAM, and the third was HAM.  The output file shows 0, 1, 0.

While this was a trivial and made up example, I hope that I met the overall goal, which was to show how to use LIBSVM for classification problems.  By building an input training set, generating a predictive model, and testing it against inputs we showed that Support Vector Machines can be powerful and easy to use tools in Machine Learning.

29 thoughts on “LIBSVM TUTORIAL PART 4 – Testing the Model

  1. blanche says:

    Found your blog while searching for a good LibSVM tutorial. Very Easy to understand and fun to read. I almost laughed when I read the third email for testing. Hope you didn’t need it. Thank you for the nice tutorial!

  2. James says:

    Glad it was helpful… and hopefully I won’t get that email from my wife any time soon 🙂

  3. A.Geethapriya says:

    Thank u Mr.James, as just started my research work in opinion mining,very helpful and got clear understanding of creating training and testing data using libsvm. But for long text and larger no of documents,is there any way to create train and test data automatically

    • James says:

      If you are interested in running some machine learning algorithms against larger data sets and are just getting started, I would recommend Weka. This tool has a UI to interact with and integrates with LibSVM and others.

      Also, Weka can read in test files and get them into the correct format for testing.
      http://www.cs.waikato.ac.nz/ml/weka/

      Let me know if you would like to have me do a tutorial on it, and I can see what I can do.

  4. Peace Abiemo says:

    Hello James

    Thanks for the tutorial, it has been helpful. I will be using libsvm but one set of my data is a measurement of two dependent parameters against time. So its more like a time series data, is this time of data learnable by SVM? I actually want to detect outliers.

    Thanks in advance

    • James says:

      Hi Peace,

      Time series data can be a little trickier, and I haven’t actually done any work that would relate exactly. Support Vector Machines are pretty good at not letting dependent parameters skew the results, so you may have luck.

      Also, if what you’re really looking for is an outlier detection, I would suggest to investigate a clustering algorithm. Check out k-Means. The idea is to arrange data in clusters, and then find the data points that are furthest from the centers of the clusters.

  5. Peter says:

    Hello James

    Very Easy to understand and fun to read.
    Thanks for the tutorial.

    Can you write a tutorial for LIBLINEAR?

  6. Nina says:

    Thanks! very helpful!:)

  7. ashish says:

    i want to classify words in a sentence to different parts of speech categories. I want to write features like words occuring before and and after a particular word, prefixes,suffixes etc….how should i write the train file.Morever this is a multi-classification problem.please help me

  8. Sandra says:

    You, Sir, are just AWESOME!!! Thanks a bunch for this tutorial! Found it just in time 🙂

  9. Andre says:

    Thank You very much!

    A tip for MatLab Users: It has another interface with different syntax.
    See: https://ece.uwaterloo.ca/~nnikvand/Coderep/libsvm-3.16/matlab/libsvm-3.pdf

    Usage =====
    matlab> model = svmtrain(training_label_vector, training_instance_matrix [, ‘libsvm_options’]);
    -training_label_vector:
    An m by 1 vector of training labels (type must be double).
    -training_instance_matrix:
    An m by n matrix of m training instances with n features. It can be dense or sparse (type must be double).
    -libsvm_options:
    A string of training options in the same format as that of
    LIBSVM.
    matlab> [predicted_label, accuracy, decision_values/prob_estimates] = svmpredict(testing_label_vector, testing_instance_matrix, model [, ‘libsvm_options’]);
    -testing_label_vector:
    An m by 1 vector of prediction labels. If labels of test
    data are unknown, simply use any random values. (type must be
    double)
    -testing_instance_matrix:
    An m by n matrix of m testing instances with n features. It can be dense or sparse. (type must be double)
    -model:
    The output of svmtrain.
    -libsvm_options:
    A string of testing options in the same format as that of
    LIBSVM.

  10. Jani says:

    Hello webmaster do you need unlimited content for your website ?
    What if you could copy article from other blogs, make it pass copyscape test and publish on your site – i know
    the right tool for you, just search in google:
    Ziakdra’s article tool

  11. Sujit says:

    hi…

    Can u explain , how kernel boundaries are generated ??
    also. Please explain what is there in sample.train.model file ???

    Thanks in advance …. 😀

  12. Del says:

    Very nice tutorial James.

    I am using libsvm java API for document classification of resumes. I am able to run prediction on test data and get the accuracy. How do I get the label for that particular prediction?
    How do I use libsvm for multiple classes. Please help.

  13. Nitin says:

    Thank you very much James… 🙂

  14. Divyaa says:

    Hi James,

    I am trying to implement image classification. My feature set will be histograms. Is there anyway I can do this using libsvm?
    Because libsvm assumes inputs as vectors, but I will have a histogram as my featureset. How do I work it, any idea?

    Thanks!

  15. Numl says:

    Dear all,

    I have a question regarding LibSVM. Can we use libsvm testing without labels. I mean first train the system with labels (e.g 1 or -1) and in testing, do not label and use the data to see which data row show which class and then compared with the original one.

    I want to do testing without labeling and then compare with original output to find difference.

    Waiting for your kind reply.

    Thanks in advance

  16. saad says:

    hello james,kindly tell that how libsvm can be used for regression

  17. Mayur Kulkarni says:

    Thanks for this funny but illustrative tutorial! Had fun reading it 🙂

  18. avinash says:

    Funny and easiest way to learn libsvm 🙂

  19. amine says:

    thanks for the tutorial. if some one need more deep knowledge in machine learning i recommended:
    https://www.coursera.org/learn/machine-learning

    amine.b

  20. Sayali says:

    I have text data descriptions.Using those descriptions I want to produce training set for further descriptions to be predicted whether they are valid or not.I am trying to achieve it using weka libsvm,but not getting desired output as desired. I want output as : 1 1:10 2:3 3:11 for positive one and -1 1:7 2:5 5:6 where 1 indicate positive -1 indicate negative and in 1:7 indicates 1st word occurred 7 times.How can I achieve this using libsvm ? and in what format I should build my text file or csv file ?

  21. Eliana says:

    Hi James! I have a question.

    You put cero 0 or one 1 in prediction file.
    But I want clasify, and I don’t know the label for the prediction file.

    I undertood that later the training model I use the prediction and I have a file with cero or one and this is the response.

    pd:Sorry my bad English

  22. Abdulwahab says:

    Thanks James

  23. 95Theo says:

    Hello admin, i must say you have very interesting content here.
    Your blog should go viral. You need initial traffic
    only. How to get it? Search for; Mertiso’s tips go viral

  24. H Pham says:

    I need to test a text data email. the email have 10 words . if only 1 word is spam and 9 words extant are not of train data… >> the email is spam or no spam

  25. Odysius says:

    Hello James,

    this is a great tutorials, can you share the code with me, please ,my email : odysius.anwar@gmail.com
    It will help me so much,

    Thank you

  26. SNEHITHAPRASAD says:

    Hi James! Thanks for such a great explanation.Now I am very happy that I got to know how to work with lib SVM. But, I have one question in my mind which is not letting me to sleep.Could you please suggest me on how to achieve it.The scenario is below.

    You kept zero 0 or one 1 in prediction file.
    But I want classify, and I don’t know the label for the prediction file.How to come across this.Your help is greatly appreciated.

Leave a Reply to avinash Cancel reply

Your email address will not be published. Required fields are marked *