Night Time Curiosities: August 2012

Thursday, August 30, 2012

LaTeX Template for Lecture or Class Notes

A friend of mine asked me to create a LaTeX template for lecture notes. He wanted to learn LaTeX by typesetting all of his class notes. I think I created a pretty nice template and wanted to share it with everyone.

You can download the template from here! Happy TeXing!

Tuesday, August 28, 2012

Syntax Highlighting in LaTeX with Minted

Here is a quick tip for those of you who use MacPorts and want pretty syntax highlighting in your LaTeX documents. Recently I discovered a great looking package called minted, which depends on an external python syntax highlighter called Pygments. I couldn't find any decent tutorials online for a quick and easy setup, so I decided to write one. I hope it will be helpful for someone.

To my readers who didn't come upon this article through google, the minted package is a better alternative to the listings package. Minted allows you to present code in techno-color! This means the programs you have worked hard on, like:
int main() { printf("hello, world"); return 0; } will no longer be restricted to marginalizing black and white:

With minted your code will finally be noticed in the appendix of that report...

Disclaimer: Minted can do many things, but it will not make bad programs work. Though it will make them look like they could work.

How to Count the Number of Wikipedia Edits

This week I finally found some time to catch up on The Colbert Report and I discovered a mischievous little gem. Stephen Colbert and his staff brought up an old Washington Post article, "Wikipedia Edits Forecast Vice Presidential Picks" and subtly suggested that editing is like voting - only with no photo id checks. From five minutes of Googling, I found a torrent of news articles with litanies of what I summarized in the previous sentence. Then I discovered this article that somewhat substantiates the claim with an ugly excel chart, below.

To remind those who have forgotten, Mitt Romney announced Paul Ryan as his running mate on August 11, 2012.

In all the articles I read on this claim, I was surprised that almost no news source visualized the data or told readers how to gather the data. I've previously posted on some fun analytics with Wikipedia using Mathematica. However, with my recent clean installation of Mountain Lion, it has become clear to me that not everyone has access to Mathematica. In this post, I will show how to count the number of revisions/edits on a Wikipedia article using delightfully free python.

The python script I have written is quite easy to understand and depends on the Numpy external library. The script starts out by accessing some raw revisions/edits data for a certain article. To get the revision histories for a Wikipedia article there are two methods. One way is to scrap the html, which can take some effort. My code uses the second by simply calling the MediaWiki API. The revision data requested through the API is in an XML format. This data is read into python in an XML format and then parsed for only the date of the revision. The dates are then tallied by a tallying function I have written and sorted by date. Finally the revision/edits data is outputted by the python script as a properly formated array, just waiting to be plotted. The commented python code is presented below: import urllib2 import xml.etree.ElementTree as ET import dateutil.parser from collections import Counter import numpy as np def tally(datalist): lst = Counter(datalist) a = [] for i,j in lst.items(): a.append([i,j]) return a def revisions(articlename): articlename = articlename.replace(' ','_') # format article title for url url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=%s&' % articlename + \ 'rvprop=user|timestamp&rvlimit=max&redirects$rvuser&format=xml' # data limit is set to max for 500 edits req = urllib2.urlopen(url) data = req.read(); req.close() # reads url response root = ET.fromstring(data) # reads xml data group = root.find(".//revisions") results = [] ## gets revision times from xml data for elem in group.getiterator('rev'): timestamp = elem.get('timestamp') timestamp = dateutil.parser.parse(timestamp).date() # parses timestamp and returns only date results.append(timestamp) a = tally(results) # tallys by date datetally = np.array(sorted(a, key=lambda x: x[0])) # sorts tally by date return datetally Here is a quick example of how to plot with the array that is returned. I chose to use Tim Pawlenty and Marco Rubio to show the limitations of the MediaWiki API. I am also biased towards Pawlenty because of his amazing ads during the GOP primaries. There are Wikipedia pages that have low daily revision activity for long stretches of time and pages with high amounts of revisions in very short periods. The MediaWiki API will return only the previous 500 revisions on any article unless you have a super user status.

from matplotlib import pyplot as plt a = revisions('Tim Pawlenty') b = revisions('Marco Rubio') fig = plt.figure() graph = fig.add_subplot(111) graph.plot(a[:,0], a[:,1], 'r', b[:,0], b[:,1], 'b') fig.autofmt_xdate() plt.legend(('Tim Pawlenty', 'Marco Rubio'), loc='upper left') plt.title('Number of Wikipedia Edits') plt.show() My warning for this script, please use some common sense to interpret results. You can download the source here.

Wednesday, August 15, 2012

Historical Intraday Stock Price Data with Python

By popular request, this post will present how to acquire intraday stock data from google finance using python. The general structure of the code is pretty simple to understand. First a request url is created and sent. The response is then read by python to create an array or matrix of the financial data and a vector of time data. This array is created with the help of the popular Numpy package that can be downloaded from here. Then in one if-statement, the time data is then restructured into a proper unix time format and translated to a more familiar date string for each financial data point. The translated time vector is then joined with the financial array to produce a single easy to work with financial time series array. Since Numpy has been ported to Python 3, the code I wrote should be compatibile with both Python 2.X and 3.X. Here it is: import urllib2 import urllib import numpy as np from datetime import datetime urldata = {} urldata['q'] = ticker = 'JPM' # stock symbol urldata['x'] = 'NYSE' # exchange symbol urldata['i'] = '60' # interval urldata['p'] = '1d' # number of past trading days (max has been 15d) urldata['f'] = 'd,o,h,l,c,v' # requested data d is time, o is open, c is closing, h is high, l is low, v is volume url_values = urllib.urlencode(urldata) url = 'http://www.google.com/finance/getprices' full_url = url + '?' + url_values req = urllib2.Request(full_url) response = urllib2.urlopen(req).readlines() getdata = response del getdata[0:7] numberoflines = len(getdata) returnMat = np.zeros((numberoflines, 5)) timeVector = [] index = 0 for line in getdata: line = line.strip('a') listFromLine = line.split(',') returnMat[index,:] = listFromLine[1:6] timeVector.append(int(listFromLine[0])) index += 1 # convert Unix or epoch time to something more familiar for x in timeVector: if x > 500: z = x timeVector[timeVector.index(x)] = datetime.fromtimestamp(x) else: y = z+x*60 # multiply by interval timeVector[timeVector.index(x)] = datetime.fromtimestamp(y) tdata = np.array(timeVector) time = tdata.reshape((len(tdata),1)) intradata = np.concatenate((time, returnMat), axis=1) # array of all data with the properly formated times

The Walking Dead Model

Today, I finally discovered a useful application of my chemical engineering degree. To initiate engaging philosophical and mathematical conversations about zombies with grumpy old men.

As I sat at my terminal watching the first episode of The Walking Dead, a raspy voice next to me muttered, "That would never happen."

I replied with the usual, "Yes, I know zombies don't exist." Usually that avoids the unwanted confrontation and allows for a change of subject. It is a formulaic reply that Dale Carnegie would use - if he was a zombie enthusiast. Then he took me by surprise when he explained that the population of a small town would not be completely zombified or dead.

In all honesty, I have wondered what I would do in certain zombie apocalypse scenarios. But I have never thought about the larger picture like the population of a town. With a few Auntie Anne's napkins and 45 minutes before boarding we began outlining a population model using the zombie rules from the universe of The Walking Dead.

Installing Scipy from MacPorts on Mountain Lion

After doing a fresh install of Mountain Lion I proceeded with my age old tradition of reinstalling all of my favorite libraries from MacPorts. Everything installed wonderfully except for PyObjC, Scipy, and a few of their dependencies like gcc45 and libunwind-headers. With an hour of trial and error that included playing with the pip installer and fighting the temptation to switch to Homebrew, I found an easy and quick fix for MacPorts.

If you have already attempted to install any of those ports I have listed, please clean those ports that have failed to install. For example, my installation of the py27-scipy port failed so I must use the following command to clean it:

sudo port clean --all py27-scipy

The quick fix to allow MacPorts to install these wonderful ports properly is done by reconfirming the Xcode license agreement with the following command:

sudo xcodebuild -license

Continuing from my example, sudo port install py27-scipy should now execute and install without a hitch. The same goes for the other ports I listed earlier.