LibSVM Tutorial Part 1 – Overview

Part 1
Part 2
Part 3
Part 4

Overview

Machine learning is a pretty complex topic that many articles online have been written about, but most of them are pretty hard to understand.  I would like to create an artifact on the web that might serve as a starting point to understanding the basics and figuring out how to use LibSVM and apply it to machine learning use cases.

Just some background about LibSVM… it is a “free” library that is available here.  Essentially, this library allows you to take some historical data, train your SVM to build a model, and then use this model to predict the outcome of new instances of your data.

The Data

For this tutorial, I’m going to be using the pretty standard use case of SPAM detection.  If we are able to look at past emails that have been marked as SPAM/Not SPAM, can we accurately predict whether a new email is SPAM or not?  While the data being used in this tutorial is obviously contrived, it will demonstrate how the same logic could be used for non-trivial cases.

Here we go…

Here are the sample emails we will use for our training set.  The first set of emails will be our SPAM set and the second will be valid, Not SPAM emails.

SPAM

Email1

“Buy Viagra cheap”

Email2

“Cheap drugs, with no prescriptions”

Email3

“Buy drugs by mail”

Email4

“Viagra, Cialis, ED, others”

Email5

“Buy prescriptions drugs like viagra, cialis, and others.”

Not SPAM
Email6

“Hi James you are great”

Email7

“James, here is a picture of my dog”

Email8

“Adding James to the email list”

Email9

“Send me a picture of your dog”

Email10

“James  we are going to give you a raise”

 

There you have it.  The initial data is 10 emails.  In the next step, we will pre-process these emails to a format that LibSVM understands, so that we can train our model.

 

5 thoughts on “LibSVM Tutorial Part 1 – Overview

  1. Ajanma says:

    Please tell jme hoe to generate the training set file in libsvm

  2. […] to make some sense of the instructional papers we were finding. Thankfully, Wenli found just about the simplest tutorial which really saved me from the crushing weight of my stress. It came to us in our time of need, and […]

  3. BestBrandy says:

    I have noticed you don’t monetize your blog, don’t waste your traffic,
    you can earn additional bucks every month because you’ve got high quality content.
    If you want to know how to make extra bucks, search for:
    Ercannou’s essential tools best adsense alternative

Leave a Reply

Your email address will not be published. Required fields are marked *