Category Archives: Programming

LibSVM Tutorial Part 1 – Overview

Part 1
Part 2
Part 3
Part 4

Overview

Machine learning is a pretty complex topic that many articles online have been written about, but most of them are pretty hard to understand.  I would like to create an artifact on the web that might serve as a starting point to understanding the basics and figuring out how to use LibSVM and apply it to machine learning use cases.

Just some background about LibSVM… it is a “free” library that is available here.  Essentially, this library allows you to take some historical data, train your SVM to build a model, and then use this model to predict the outcome of new instances of your data.

The Data

For this tutorial, I’m going to be using the pretty standard use case of SPAM detection.  If we are able to look at past emails that have been marked as SPAM/Not SPAM, can we accurately predict whether a new email is SPAM or not?  While the data being used in this tutorial is obviously contrived, it will demonstrate how the same logic could be used for non-trivial cases.

Here we go…

Here are the sample emails we will use for our training set.  The first set of emails will be our SPAM set and the second will be valid, Not SPAM emails.

SPAM

Email1

“Buy Viagra cheap”

Email2

“Cheap drugs, with no prescriptions”

Email3

“Buy drugs by mail”

Email4

“Viagra, Cialis, ED, others”

Email5

“Buy prescriptions drugs like viagra, cialis, and others.”

Not SPAM
Email6

“Hi James you are great”

Email7

“James, here is a picture of my dog”

Email8

“Adding James to the email list”

Email9

“Send me a picture of your dog”

Email10

“James  we are going to give you a raise”

 

There you have it.  The initial data is 10 emails.  In the next step, we will pre-process these emails to a format that LibSVM understands, so that we can train our model.

 

Free The Patents

I recently invested in a company called Vringo, whose sole reason to exist right now is to sue Google. The only reason I invested in it was for the money (hopefully), since it was the first patent troll that I’ve heard of that is a publicly traded company.

This has led to some weird thoughts rolling around in my head, relating to how I think the patent system is flawed. Should this company really be able to sue Google, just because they came up with an idea 10 years ago relating to showing relevant ads for users? Obviously, it wasn’t just the single idea that made Google all the money, but instead it was the fact that they had the best search engine. Their single idea that the patent is based on certainly didn’t keep Lycos in business.

There are some super smart people out there who can dream up ideas all day long. What if they patented all those ideas? Would they be a billionaire 10 years from now? I’m not sure.

The one thing I can think of to combat this weird use of patents, is to allow people to throw out ideas into a “public” space. Once the idea hits the public, no one else can patent it, right?

If we built an open database where people could just submit random ideas that they want someone to build “royalty free”, would there be any incentive to do so? Would people really spend 15 minutes to write out an intelligent description of an algorithm or process, that could be used to kill a patent lawsuit?

I’m going to think about this for a while. Maybe there could be some alternative incentives to get people to share…

Hello world!

Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!