Monthly Archives: October 2012

LibSVM Tutorial Part 1 – Overview

Part 1
Part 2
Part 3
Part 4

Overview

Machine learning is a pretty complex topic that many articles online have been written about, but most of them are pretty hard to understand.  I would like to create an artifact on the web that might serve as a starting point to understanding the basics and figuring out how to use LibSVM and apply it to machine learning use cases.

Just some background about LibSVM… it is a “free” library that is available here.  Essentially, this library allows you to take some historical data, train your SVM to build a model, and then use this model to predict the outcome of new instances of your data.

The Data

For this tutorial, I’m going to be using the pretty standard use case of SPAM detection.  If we are able to look at past emails that have been marked as SPAM/Not SPAM, can we accurately predict whether a new email is SPAM or not?  While the data being used in this tutorial is obviously contrived, it will demonstrate how the same logic could be used for non-trivial cases.

Here we go…

Here are the sample emails we will use for our training set.  The first set of emails will be our SPAM set and the second will be valid, Not SPAM emails.

SPAM

Email1

“Buy Viagra cheap”

Email2

“Cheap drugs, with no prescriptions”

Email3

“Buy drugs by mail”

Email4

“Viagra, Cialis, ED, others”

Email5

“Buy prescriptions drugs like viagra, cialis, and others.”

Not SPAM
Email6

“Hi James you are great”

Email7

“James, here is a picture of my dog”

Email8

“Adding James to the email list”

Email9

“Send me a picture of your dog”

Email10

“James  we are going to give you a raise”

 

There you have it.  The initial data is 10 emails.  In the next step, we will pre-process these emails to a format that LibSVM understands, so that we can train our model.

 

Getting Weather from Weather.gov using Javascript and JQuery

I had a simple task to get a 7 day forecast of the high and low temperatures. To start off with, I looked at a number of sources to get this information from. Here is a good discussion about weather APIs, mostly geared toward iPhone.

I decided to go with the NOAA Rest service, and get the data from Weather.gov.

Also, I ended up publishing it on Github if you want the full source.

The first step I did was use jQuery to hit the NOAA URI with the appropriate parameters:

$.get('http://graphical.weather.gov/xml/SOAP_server/ndfdXMLclient.php?whichClient=NDFDgenMultiZipCode&zipCodeList=' + zip +'&product=time-series&maxt=maxt&mint=mint&Submit=Submit'

This would return a huge XML structure which then needs to be parsed.

// Parse the XML response to get out the values we want
$(xml).find('temperature').each()
...
// Iterate over the day values and assigne to the right array
$(this).find('value').each(function(){
array[count] = $(this).text();
count = count + 1;
});

I guess the overall takeaway is that the NOAA provides a free interface to getting facts about the weather, though it seems like that interface may have been designed a while back, since it is a little cumbersome to use.

If you want more data than just the highs and lows for temperature, check out this website which shows you all the different options you can query for.

Tagged , ,