tag:blogger.com,1999:blog-68556567494138175242024-03-14T03:51:38.733-04:00Night Time CuriositiesA blog focused on analytics of various subjects using Python and C/C++, LaTeX tricks, and other neat things.Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.comBlogger31125tag:blogger.com,1999:blog-6855656749413817524.post-84688849136872973712013-01-29T11:07:00.000-05:002013-02-05T11:40:30.682-05:00How to Handle Annoyingly Large CSV Files with MySQLI recently found some time to look at the data for the <a href="http://www.kaggle.com/c/event-recommendation-engine-challenge" target="_blank">Event Recommendation Engine Challenge</a> on <a href="http://www.kaggle.com/" target="_blank">kaggle</a>. I was very annoyed when I saw one of the files was a 1.19 gig csv. My MacBook Pro only has 4 gigs of ram and most of it is already being used by <a href="http://www.chromium.org/" target="_blank">chromium</a>. In an effort to stymie off the realization that my laptop is five years old, I decided to create a MySQL database from that csv file. The conversion process was a bit trickier than I expected, so I'm writing this post as more of a note to myself.<br />
<br />
First make sure <a href="http://www.macports.org/" target="_blank">MacPorts</a> is installed. Then install Python 2.7, MySQL5, the MySQL5 server and the MySQL python library.<br />
<br />
<syntaxhighlighter class="brush: bash; gutter: false;">
sudo port install python27 mysql5 mysql5-server py27-mysql
</syntaxhighlighter>
<br />
Create the default databases by executing the following command:<br />
<br />
<syntaxhighlighter class="brush: bash; gutter: false;">
sudo -u mysql mysql_install_db5
</syntaxhighlighter>
<br />
Start MySQL (once the server has started, press control+c to escape):<br />
<br />
<syntaxhighlighter class="brush: bash; gutter: false;">
sudo /opt/local/lib/mysql5/bin/mysqld_safe &
</syntaxhighlighter>
<br />
Then secure the MySQL server with a password:<br />
<br />
<syntaxhighlighter class="brush: bash; gutter: false;">
/opt/local/lib/mysql5/bin/mysqladmin -u root password [password goes here]
</syntaxhighlighter>
<br />
Here comes the tricky part. For other languages to access MySQL we will need to create a shortcut to the MySQL socket.<br />
<br />
<syntaxhighlighter class="brush: bash;">
sudo ln -s /opt/local/var/run/mysql5/mysqld.sock /tmp/mysql.sock
sudo mkdir /var/mysql
sudo ln -s /opt/local/var/run/mysql5/mysqld.sock /var/mysql/mysql.sock
</syntaxhighlighter>
<br />
Now the fun of converting a large csv file to a SQL database begins. I'm only using python to do the conversion because the algorithm I wrote for the competition was done in python. If any readers are interested, I can write a quick post on converting in other languages. The rules of the competition forbids me from redistributing the data provided for the competition. So let's use a csv file containing the <a href="http://electoral-vote.com/evp2013/Info/datagalore.html" target="_blank">polling results</a> of the 2008 presidential elections for the rest of this blog post. The data looks something like this:<br />
<br />
<table align="center" border="1">
<tbody>
<tr>
<th>State</th>
<th>Obama</th>
<th>McCain</th>
<th>Start</th>
<th>End</th>
<th>Pollster</th>
</tr>
<tr>
<td>AL</td>
<td>36</td>
<td>61</td>
<td>27-Oct</td>
<td>28-Oct</td>
<td>SurveyUSA</td>
</tr>
<tr>
<td>AL</td>
<td>34</td>
<td>54</td>
<td>15-Oct</td>
<td>16-Oct</td>
<td>Capital Survey</td>
</tr>
<tr>
<td>AL</td>
<td>35</td>
<td>62</td>
<td>8-Oct</td>
<td>9-Oct</td>
<td>SurveyUSA</td>
</tr>
<tr>
<td align="center">⋮</td>
<td align="center">⋮</td>
<td align="center">⋮</td>
<td align="center">⋮</td>
<td align="center">⋮</td>
<td align="center">⋮</td>
</tr>
</tbody></table>
<br />
<br />
Conceptually, the script is very simple. A connection to the MySQL server is made from python. If the database already exists, it is deleted and a new one is created. The header of the csv file is then used to create fields for the new database. Then the data from the csv file is entered into the database and blank entries are converted to 'None.' The last bit tests the new 'polls' database by pulling all of the polling results from Florida.
<br />
<br />
<syntaxhighlighter class="brush: python;">
import csv
import MySQLdb
# open the connection to the MySQL server using MySQLdb
mydb = MySQLdb.connect(host='localhost',
user='root',
passwd='[your password goes here]'
)
cursor = mydb.cursor()
# create the database!
# if one with a similar name exists, delete it
cursor.execute('DROP DATABASE IF EXISTS test')
cursor.execute('CREATE DATABASE test')
cursor.execute("USE test")
# read the 2008-pres-polls.csv file using the python
csv_data = csv.reader(file('2008-pres-polls.csv'))
# read header
header = csv_data.next()
# builds fields from the header
fields=[]
for col in header:
fields.append(('%s VARCHAR(64)' % col))
sqlfields = ',\n'.join(fields)
# create a table named 'polls' with fields from the header
sqltbl = '''CREATE TABLE polls (%s)''' % sqlfields
cursor.execute(sqltbl)
# insert None for empty entries
def nullify(d):
def f(x):
if(x == ""):
return None
else:
return x
return [f(x) for x in d]
# builds the MySQL data insert command
def buildInsert(table, numfields):
datholder = (numfields-1) * "%s, " + "%s"
query = ("insert into %s" % table) + (" values (%s)" % datholder)
return query
# inserts data from each row of csv file
numfields = len(header)
query = buildInsert('polls', numfields)
for line in csv_data:
vals = nullify(line)
cursor.execute(query, vals)
#test database
cursor.execute("SELECT * FROM polls WHERE State = 'FL' ")
results = cursor.fetchall()
print results
</syntaxhighlighter>
<br />
Hope this is helpful to someone! You can download all of the files used in this post from <a href="http://bit.ly/UA4BqN" target="_blank">here</a>.Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-9421957481858388322013-01-23T14:16:00.003-05:002013-01-23T14:16:58.654-05:00Kaggling Dark HalosSuccess! I scored higher than <a href="http://www.oamp.fr/cosmology/lenstool/" target="_blank">Lenstool</a>!<br />
<div>
<br /></div>
<div>
<a href="http://www.kaggle.com/c/DarkWorlds" target="_blank">This</a> was one of those Kaggle competitions that I had no right to compete in. My goal was to beat the benchmark, which was a state of the art program in 2007. I will be the first to admit this wasn't the most ambitious goal. However, the closest I have ever been to an astronomy class was an episode of <a href="http://www.dailymotion.com/video/xjpqfn_the-magic-school-bus-1x01-gets-lost-in-space_shortfilms" target="_blank">The Magic School Bus</a> - it was the one where they explored all nine planets. Though my best method ranked 121 out of 357, I still consider it a victory (Lenstool ranked 201 out of 357.) My victory definitely exemplifies how childish logic can be successfully used in place of cosmological knowledge.<br />
<a name='more'></a></div>
<div>
<br /></div>
<div>
I imagined that galaxies would look perfectly circular without any dark matter influence. This also means that one galaxy has no influence over another galaxy. When dark matter is introduced, some force will cause the galaxies to look elongated or elliptical. </div>
<div>
<br /></div>
<div>
The python script I wrote to find the coordinates of dark halos reflects this logic. Random points are selected and used to calculate the force of elongation on each galaxy. The difference between the observed ellipticity and the calculated force of elongation is taken. Then the differences are squared and summed. By my childish logic, the lowest calculated value from the selected random points should be where the dark halo is located. For multiple dark halos, the elongation cause by the previously found dark halo(s) should be taken into account before further calculations.</div>
<div>
<br /></div>
<div>
For the competition, I created various potentials and regressed for coefficients using the training data. For this blog post, I chose a potential with a double power-law density distribution and randomly picked coefficients. This random potential actually works pretty well as shown below with the data from training skies 1, 101, and 201. The blue circles show where dark halos are located and the red circles are the estimated positions from my python script.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7Zk1JH6LhoNXdaxq-Jm6hl6NcZtlwO56JAeYL1C4ez0q1nWWt1UGLt3GbThQ_HCGV9f_ap3r1hUMcl6Dej4TrwCv1v5_5lJRnqKsGlJObL2d93EFPIpti0WwoYKjLFjqjiQIq9ert6nZA/s1600/figure_1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh7Zk1JH6LhoNXdaxq-Jm6hl6NcZtlwO56JAeYL1C4ez0q1nWWt1UGLt3GbThQ_HCGV9f_ap3r1hUMcl6Dej4TrwCv1v5_5lJRnqKsGlJObL2d93EFPIpti0WwoYKjLFjqjiQIq9ert6nZA/s400/figure_1.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Training sky 1 with one dark halo</td></tr>
</tbody></table>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIVZ_CslC8IpNb8D8kpoDdv___cili0u2ji9GWlKNgh_KEg_ikDtPyrrlG7Cwa9IIm9tj4p-0ptdgeyiS1pAZhgBI3XfdaQd9XIkmjYu9QUUjIWZUH1tgQYMPflPlDXeJRRUv36GIwrhLx/s1600/figure_101.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiIVZ_CslC8IpNb8D8kpoDdv___cili0u2ji9GWlKNgh_KEg_ikDtPyrrlG7Cwa9IIm9tj4p-0ptdgeyiS1pAZhgBI3XfdaQd9XIkmjYu9QUUjIWZUH1tgQYMPflPlDXeJRRUv36GIwrhLx/s400/figure_101.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Training sky 101 with two dark halos</td></tr>
</tbody></table>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzwXXzDJqyn1sg8ptx2dNaOOrwnieC8BILiPLD11YYRAbwWOsDA4kVw85LKvnqcKeMMP434wQr3gylPlCcJ5zpqsT03KKwOfKWi6F-v449uDLyQYySSoY1xI8m2N03sSflIG0AH65Ok2qs/s1600/figure_201.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzwXXzDJqyn1sg8ptx2dNaOOrwnieC8BILiPLD11YYRAbwWOsDA4kVw85LKvnqcKeMMP434wQr3gylPlCcJ5zpqsT03KKwOfKWi6F-v449uDLyQYySSoY1xI8m2N03sSflIG0AH65Ok2qs/s400/figure_201.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Training sky 201 with three dark halos</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
To run the script I have written please download it and the competition's training data from <a href="http://bit.ly/10Jvz3a" target="_blank">here</a> and make sure you have <a href="http://www.python.org/download/releases/2.7/" target="_blank">Python 2.7</a>, <a href="http://dan.iel.fm/emcee/" target="_blank">emcee</a>, and <a href="http://www.numpy.org/" target="_blank">NumPy</a> installed. Thanks should be given to David Harvey and Thomas Kitching for creating the training data to the <a href="https://www.kaggle.com/c/DarkWorlds" target="_blank">Observing Dark Worlds Competition</a>. I encourage you to download the data and as Ms. Frizzle says, "Take chances, make mistakes, and get messy."</div>
Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-17401371548286881942012-12-04T12:17:00.000-05:002012-12-04T12:17:07.061-05:00Smelling Dark MatterI've been using the little free time I have to compete in kaggle's <a href="http://www.kaggle.com/c/DarkWorlds" target="_blank">Observing Dark Worlds</a> competition. I have to say that this machine learning competition has given me a new appreciation of what astronomers deal with in their research. It also illustrated why so many of the astronomy majors I knew in college always looked so sleep deprived - I had attributed it to the nocturnal nature of twinkles. <div>
<br /></div>
<div>
In the Observing Dark Worlds competition, the goal is to pin point the locations of invisible dark matter halos with only the coordinates of galaxies and their respective ellipticity. Below is a quick visual example of the difficulty of this task (my code for the visualizations can be found <a href="http://www.kaggle.com/c/DarkWorlds/forums/t/3197/code-for-plotting-the-galaxies-in-python" target="_blank">here</a>).</div>
<div>
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiR59u8SX1gIbEmGMIERBO818m4KKQz39tzW8fJ3gjTmNM8mkKlxxlMNd0UKLw051YOHbqcIw_5tseW25_JbTIlAmp7Yn4KW5NLeMHGAu-BH55GQDO7LHJ3DsNIpRtvFFWp47K7kTCW65Ww/s1600/sky150_figure_3.png" imageanchor="1" style="clear: left; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiR59u8SX1gIbEmGMIERBO818m4KKQz39tzW8fJ3gjTmNM8mkKlxxlMNd0UKLw051YOHbqcIw_5tseW25_JbTIlAmp7Yn4KW5NLeMHGAu-BH55GQDO7LHJ3DsNIpRtvFFWp47K7kTCW65Ww/s320/sky150_figure_3.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">This is the visualization of the data from training sky 150.</td></tr>
</tbody></table>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHtJMsyB3cdBfB5voB-gX1H9kmEzjRq9umyJ83CH8TiQFfJkFgk_v3MhwBXA9v_sixs7mq0ab3QkmxjdCmu9mGM1GqDICdQWwpqpY3douCB5pHq0jiNDPjorBFS0ej-M_V8zhhfVgZQil1/s1600/sky150_figure_4.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHtJMsyB3cdBfB5voB-gX1H9kmEzjRq9umyJ83CH8TiQFfJkFgk_v3MhwBXA9v_sixs7mq0ab3QkmxjdCmu9mGM1GqDICdQWwpqpY3douCB5pHq0jiNDPjorBFS0ej-M_V8zhhfVgZQil1/s320/sky150_figure_4.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The two blue circles are the dark halos hiding in training sky 150.</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
I haven't made much progress in my analysis - in fact I have never been this stumped on any problem. So far I have only been able to determine the location of one dark matter halo exactly and generate a few probable locations of the other one. Below is the visualization of my progress for training sky 150 and training sky 124 from the competition. The red circles are my estimated locations and the blue circles are the true locations of the dark matter halos.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNMvjwlbF59zRLmi1TqL2JJqpoeIUhIPbNvBXtA1uFpDl96nbnnO537AO3Vw9FMyVXk9oPv87t7lJQgq1n3NtxBfET2n4NHCENYleTrDUCBVss7I_iuJqvHck-yuW6MdwjGVzT2CvB_OsR/s1600/sky150_figure_2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgNMvjwlbF59zRLmi1TqL2JJqpoeIUhIPbNvBXtA1uFpDl96nbnnO537AO3Vw9FMyVXk9oPv87t7lJQgq1n3NtxBfET2n4NHCENYleTrDUCBVss7I_iuJqvHck-yuW6MdwjGVzT2CvB_OsR/s320/sky150_figure_2.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Training Sky 150</td></tr>
</tbody></table>
<br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkZ85fsfTc0O02J0nLMJVjtSrjyCo9b77wSb3e73fR6Uos9vtCE59TAuLFCwfG3kVzCIMAtVndxJ_d2wRPJ71KRiXxUiM0eJX8qz3ispEuTBxccsKgdOeKEcXmztVYddxwbwMQGOXEqdjB/s1600/sky124_figure_1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkZ85fsfTc0O02J0nLMJVjtSrjyCo9b77wSb3e73fR6Uos9vtCE59TAuLFCwfG3kVzCIMAtVndxJ_d2wRPJ71KRiXxUiM0eJX8qz3ispEuTBxccsKgdOeKEcXmztVYddxwbwMQGOXEqdjB/s320/sky124_figure_1.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Training Sky 124</td></tr>
</tbody></table>
<div>
<br /></div>
<div>
If it is before the competition deadline and you are interested in my method or code, I am more than happy to win this competition through a team effort! For my regular readers, I will make a detailed posting after the competition deadline.</div>
Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com1tag:blogger.com,1999:blog-6855656749413817524.post-88178438535129258082012-09-04T14:02:00.001-04:002012-09-04T14:07:44.638-04:00What is the Probability of Getting (k) Heads in a Row for (n) Consecutive Tosses?I asked myself a fun question after reading a <a href="https://www.quantnet.com/threads/choosing-tt-vs-th-problem.10914/" target="_blank">post</a> on QuantNet. What are the odds of getting two, four, or six heads after five, ten, or a hundred consecutive tosses of a fair coin? It seemed like a fun high school leveled math problem and with some quick python I was able to generate a pretty graph to answer this question.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgj8Kxd_WgGkPGUTvz_PBc4O-G4HlCy6HgFf2XTT39zVgu4jcCbKE6omv6GRUSYg51SfoJpZjxsKI3dVemEKMLQG9BoTFTbiRRsLtSfx9tyBnr5OjfPoaBHFp2i96HxQ_XirX5LRaJ874_/s1600/OddsofHeadsinaRow.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgj8Kxd_WgGkPGUTvz_PBc4O-G4HlCy6HgFf2XTT39zVgu4jcCbKE6omv6GRUSYg51SfoJpZjxsKI3dVemEKMLQG9BoTFTbiRRsLtSfx9tyBnr5OjfPoaBHFp2i96HxQ_XirX5LRaJ874_/s400/OddsofHeadsinaRow.png" width="400" /></a></div>
<br />
<a name='more'></a>My approach to this problem is actually quite brutish because of the tricky counting. You cannot assume that the probability at each step in the sequence is identical. Even though the inherent probability of the fair coin is still 0.5, the sequences are not completely independent due to causality. Here is a quick demonstration for counting two heads out of five tosses to illustrate this point.<br />
<table>
<tbody>
<tr>
<td></td>
<th colspan="2"><br />
<h3>
Possible Outcomes</h3>
</th>
</tr>
<tr>
<td valign="top">Toss 1</td>
<td align="center">H</td>
<td align="center">T</td>
</tr>
<tr>
<td valign="top">Toss 2</td>
<td align="center"><span style="background-color: yellow;">HH</span><br />
TH</td>
<td align="center">HT<br />
TT</td>
</tr>
<tr>
<td valign="top">Toss 3</td>
<td align="center"><del>HHH</del><br />
HTH<br />
<span style="background-color: yellow;">THH</span><br />
TTH</td>
<td align="center"><del>HHT</del><br />
HTT<br />
THT<br />
TTT
</td>
</tr>
<tr>
<td valign="top">Toss 4</td>
<td align="center"><del>HHHH</del><br />
<del>HHTH</del><br />
<span style="background-color: yellow;">HTHH</span><br />
HTTH<br />
<del>THHH</del><br />
THTH<br />
<span style="background-color: yellow;">TTHH</span><br />
TTTH
</td>
<td align="center"><del>HHHT</del><br />
<del>HHTT</del><br />
HTHT<br />
HTTT<br />
<del>THHT</del><br />
THTT<br />
TTHT<br />
TTTT
</td>
</tr>
<tr>
<td valign="top">Toss 5</td>
<td align="center"><del>HHHHH</del><br />
<del>HHHTH</del><br />
<del>HTHHH</del><br />
<del>HHHTT</del><br />
<del>HHTTH</del><br />
HTHTH<br />
<del>THHHT</del><br />
<span style="background-color: yellow;">THTHH</span><br />
<del>HHTTT</del><br />
HTTHT<br />
<del>THHTT</del><br />
THTTH<br />
TTHTH<br />
HTTTT<br />
TTHTT<br />
TTTTH
</td>
<td align="center"><del>HHHHT</del><br />
<del>HHTHH</del><br />
<del>THHHH</del><br />
<del>HHTHT</del><br />
<del>HTHHT</del><br />
<span style="background-color: yellow;">HTTHH</span><br />
<del>THHTH</del><br />
<del>TTHHH</del><br />
HTHTT<br />
HTTTH<br />
THTHT<br />
<del>TTHHT</del><br />
<span style="background-color: yellow;">TTTHH</span><br />
THTTT<br />
TTTHT<br />
TTTTT
</td>
</tr>
</tbody></table>
<br />
The strikethroughs are the outcomes that have been eliminated because they have previously yielded to heads in a row. The highlighted sequences are the outcomes that can successfully produce two heads on that toss. Using the example count for a sequence of two heads out of three tosses, on the third toss there is one possibility of producing a sequence with two consecutive heads (instead of three) and there are six possible outcomes (instead of eight). The differences derives from the first possible successful sequencing of two heads in a row from the second toss.<br />
<br />
This counting was performed for the sequences of three and four heads in a row. The results are summarized in the tables below, along with the sequence of two heads in a row.<br />
<br />
<table align="center" border="1">
<caption> Two Heads in a Row</caption>
<tbody>
<tr>
<th scope="col"></th>
<th scope="col">Toss 1 </th>
<th scope="col">Toss 2 </th>
<th scope="col">Toss 3 </th>
<th scope="col">Toss 4 </th>
<th scope="col">Toss 5 </th>
<th scope="col">Toss 6 </th>
</tr>
<tr>
<th scope="row">Successes </th>
<td align="center">0 </td>
<td align="center">1 </td>
<td align="center">1 </td>
<td align="center">2 </td>
<td align="center">3 </td>
<td align="center">5 </td>
</tr>
<tr>
<th scope="row">Failures </th>
<td align="center">2 </td>
<td align="center">3 </td>
<td align="center">5 </td>
<td align="center">8 </td>
<td align="center">13 </td>
<td align="center">21 </td>
</tr>
<tr>
<th scope="row">Outcomes </th>
<td align="center">2 </td>
<td align="center">4 </td>
<td align="center">6 </td>
<td align="center">10 </td>
<td align="center">16 </td>
<td align="center">26 </td>
</tr>
</tbody></table>
<br />
<table align="center" border="1">
<caption> Three Heads in a Row</caption>
<tbody>
<tr>
<th scope="col"></th>
<th scope="col">Toss 1 </th>
<th scope="col">Toss 2 </th>
<th scope="col">Toss 3 </th>
<th scope="col">Toss 4 </th>
<th scope="col">Toss 5 </th>
<th scope="col">Toss 6 </th>
<th scope="col">Toss 7 </th>
</tr>
<tr>
<th scope="row">Successes </th>
<td align="center">0 </td>
<td align="center">0 </td>
<td align="center">1 </td>
<td align="center">1 </td>
<td align="center">2 </td>
<td align="center">4 </td>
<td align="center">7 </td>
</tr>
<tr>
<th scope="row">Failures </th>
<td align="center">2 </td>
<td align="center">4 </td>
<td align="center">7 </td>
<td align="center">13 </td>
<td align="center">24 </td>
<td align="center">44 </td>
<td align="center">81 </td>
</tr>
<tr>
<th scope="row">Outcomes </th>
<td align="center">2 </td>
<td align="center">4 </td>
<td align="center">8 </td>
<td align="center">14 </td>
<td align="center">26 </td>
<td align="center">48 </td>
<td align="center">88 </td>
</tr>
</tbody></table>
<br />
<table align="center" border="1">
<caption> Four Heads in a Row</caption>
<tbody>
<tr>
<th scope="col"></th>
<th scope="col">Toss 1 </th>
<th scope="col">Toss 2 </th>
<th scope="col">Toss 3 </th>
<th scope="col">Toss 4 </th>
<th scope="col">Toss 5 </th>
<th scope="col">Toss 6 </th>
<th scope="col">Toss 7 </th>
<th scope="col">Toss 8 </th>
</tr>
<tr>
<th scope="row">Successes </th>
<td align="center">0 </td>
<td align="center">0 </td>
<td align="center">0 </td>
<td align="center">1 </td>
<td align="center">1 </td>
<td align="center">2 </td>
<td align="center">4 </td>
<td align="center">8 </td>
</tr>
<tr>
<th scope="row">Failures </th>
<td align="center">2 </td>
<td align="center">4 </td>
<td align="center">8 </td>
<td align="center">15 </td>
<td align="center">29 </td>
<td align="center">56 </td>
<td align="center">108 </td>
<td align="center">208 </td>
</tr>
<tr>
<th scope="row">Outcomes </th>
<td align="center">2 </td>
<td align="center">4 </td>
<td align="center">8 </td>
<td align="center">16 </td>
<td align="center">30 </td>
<td align="center">58 </td>
<td align="center">112 </td>
<td align="center">216 </td>
</tr>
</tbody></table>
<br />
Here is the elegance that I miss about high school mathematics. You have probably already noticed that the successes, the failures, and the outcomes are expressible as <a href="http://en.wikipedia.org/wiki/Generalizations_of_Fibonacci_numbers" target="_blank">generalized fibonacci numbers</a>. The successes for the sequence involving two heads in a row are the well-known Fibonacci number. The successes for the sequence involving three heads in a row are the lesser known tribonacci numbers. The successes for the sequence involving three heads in a row are the almost unknown tetranacci numbers. These high order sequences can be recursively expressed with n as number of tosses, k as the number of heads in a row, and for k > 2 as:<br />
<br />
\[Fib_{k}=F_{k}^{(n)}=\sum_{i=1}^{n} F_{k-i}^{(n)}\]<br />
Then to calculate the probability of success for k heads in a row at the n<i><span style="font-size: x-small;">th</span></i> toss would be:<br />
<br />
\[probability_n=1-\frac{Fib_{k}(n+k)}{2 \times Fib_{k}(n-1+k)}\]<br />
To calculate the total probability of seeing k heads in a row after n tosses would be:<br />
<br />
\[probability=1-\frac{Fib_{k}(1+k)}{2 \times Fib_{k}(1-1+k)} \times \frac{Fib_{k}(2+k)}{2 \times Fib_{k}(2-1+k)} \times \frac{Fib_{k}(3+k)}{2 \times Fib_{k}(3-1+k)} \ldots \times \frac{Fib_{k}(n+k)}{2 \times Fib_{k}(n-1+k)}\]<br />
This result is readily translated into python to produce the pretty graph at the beginning of the post. The code is presented below:<br />
<syntaxhighlighter class="brush: python;">
from decimal import Decimal
# function for generalized fibonacci sequences
def Fib(n,k):
if n < k-1:
return 0
elif n == k-1:
return 1
else:
f =[0] * (n+1)
for i in xrange(k-1):
f[i] = 0
f[k-1] = 1
for i in xrange(k,n+1):
a = 0
for j in xrange(k+1):
a += f[i-j]
f[i] = a
return f[n]
# function to calculate failures/outcomes
def ProbHnFail(n,k):
b = 2*Fib(n-1+k,k) # number of outcomes
if n < k:
a = b
else:
a = Fib(n+k,k) # number of failures
if a == 0:
a = b
p = (Decimal(a)/Decimal(b))
return p
# calculates and plots probabilities
import numpy as np
Tosses = 100
t1 = np.arange(1, Tosses+1, 1)
H2=[]
for x in t1:
y = ProbHnFail(x,2)
H2.append(y)
a = 1-np.cumprod(H2)
H3=[]
for x in t1:
y = ProbHnFail(x,4)
H3.append(y)
b = 1-np.cumprod(H3)
H4=[]
for x in t1:
y = ProbHnFail(x,6)
H4.append(y)
c = 1-np.cumprod(H4)
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(t1,a, t1,b, t1,c)
ax.set_ylim([0,1.01])
ax.grid(True)
ax.legend(('HH','HHHH','HHHHHH'), 'best')
ax.set_xlabel('Number of Consecutive Tosses')
ax.set_ylabel('Probability of Appearance')
ax.set_title('Probability after n-Tosses')
plt.show()
</syntaxhighlighter>
<br />
Happy flipping!Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-40066324492402123982012-08-30T14:48:00.002-04:002012-12-12T14:38:03.498-05:00LaTeX Template for Lecture or Class NotesA friend of mine asked me to create a LaTeX template for lecture notes. He wanted to learn LaTeX by typesetting all of his class notes. I think I created a pretty nice template and wanted to share it with everyone.<br />
<br />
<iframe src="http://docs.google.com/viewer?url=https%3A%2F%2Fdl.dropbox.com%2Fu%2F5311161%2FBlogSource%2Fpdfs%2FLecture_Notes-example.pdf&embedded=true" width="600" height="780" style="border: none;"></iframe><br />
<br />
You can download the template from <a href="http://bit.ly/N1Pp3k" target="_blank">here</a>! Happy TeXing!Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com4tag:blogger.com,1999:blog-6855656749413817524.post-39068000637019736332012-08-28T13:54:00.001-04:002012-08-28T13:58:03.476-04:00Syntax Highlighting in LaTeX with MintedHere is a quick tip for those of you who use MacPorts and want pretty syntax highlighting in your LaTeX documents. Recently I discovered a great looking package called <a href="http://www.ctan.org/pkg/minted" target="_blank">minted</a>, which depends on an external python syntax highlighter called <a href="http://pygments.org/" target="_blank">Pygments</a>. I couldn't find any decent tutorials online for a quick and easy setup, so I decided to write one. I hope it will be helpful for someone.<br />
<br />
To my readers who didn't come upon this article through google, the minted package is a better alternative to the listings package. Minted allows you to present code in techno-color! This means the programs you have worked hard on, like:<br />
<syntaxhighlighter class="brush: c">
int main() {
printf("hello, world");
return 0;
}
</syntaxhighlighter>
will no longer be restricted to marginalizing black and white:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO82SDhXAXHDqaG1NS0s_Dc0JrQobmhp8IO18uSv0tLzzEB51KJyuki42LST3CO_wGxrYKkjV1_-N6bS-8pDTm93ob6wXSd28biUq8h7tqX9CE-JIUXrInXqMJO8ePciIm1Cg1BdHd0nUd/s1600/minimal-listings.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="67" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgO82SDhXAXHDqaG1NS0s_Dc0JrQobmhp8IO18uSv0tLzzEB51KJyuki42LST3CO_wGxrYKkjV1_-N6bS-8pDTm93ob6wXSd28biUq8h7tqX9CE-JIUXrInXqMJO8ePciIm1Cg1BdHd0nUd/s200/minimal-listings.png" width="200" /></a></div>
<br />
With minted your code will finally be noticed in the appendix of that report...<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuXxsfg7g8Fy4aCIRJO1zsnLS4pKfb94Y5Hhz1mI0YzrQXPjJf0d5Si_3LzkFvfT6wmyBP5Bh3wqRG5HiCh809SocxV2QQVtLLJBF3U1kPOnnsIk9Vmd_PBuHOuo5E1tLqGHOHKwKCMNmM/s1600/minimal.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="71" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuXxsfg7g8Fy4aCIRJO1zsnLS4pKfb94Y5Hhz1mI0YzrQXPjJf0d5Si_3LzkFvfT6wmyBP5Bh3wqRG5HiCh809SocxV2QQVtLLJBF3U1kPOnnsIk9Vmd_PBuHOuo5E1tLqGHOHKwKCMNmM/s200/minimal.png" width="200" /></a></div>
<br />
<span style="font-size: xx-small;">Disclaimer: Minted can do many things, but it will not make bad programs work. Though it will make them look like they could work.</span>
<br />
<a name='more'></a><br />
<h3>
The Setup</h3>
1. Make sure you have <a href="http://www.macports.org/" target="_blank">MacPorts</a> and <a href="http://tug.org/mactex/" target="_blank">LaTeX</a> installed.<br />
<br />
2. Install Python. If you don't have it installed already, here is an example of how you would install Python 2.7 with MacPorts in the terminal:<br />
<syntaxhighlighter class="brush: shell">
sudo port install python27 </syntaxhighlighter><br />
3. Install the proper Pygments package. For continuity, I'm going to show how to do this with the version of Python installed in step 2. In the terminal type the following:<br />
<syntaxhighlighter class="brush: shell">
sudo port install py27-pygments </syntaxhighlighter><br />
4. Finally, create a soft link from the Pygments package to your /usr/local/bin folder and the setup is complete.<br />
<syntaxhighlighter class="brush: shell">
sudo ln -s /opt/local/Library/Frameworks/Python.framework/Versions/2.7/bin/pygmentize /usr/local/bin/pygmentize </syntaxhighlighter><br />
<h3>
The Usage</h3>
Let's run some examples! One of the basic examples provided in the <a href="http://mirror.jmu.edu/pub/CTAN/macros/latex/contrib/minted/minted.pdf" target="_blank">minted documentation</a> is called minimal.tex. If you are feeling lazy, you can download it from <a href="http://bit.ly/PYJr2P" target="_blank">here</a>. The TeX file contains the first program everyone writes in C:<br />
<syntaxhighlighter class="brush: latex">
\documentclass{article}
\usepackage{minted}
\begin{documents}
\begin{minted}[linenos]{c}
int main() {
printf("hello, world");
return 0;
}
\end{minted}
\end{document}
</syntaxhighlighter>
<br />
The minted documentation assumes that you are compiling minimal.tex in the terminal. If this is true, to compile minimal.tex you need only to send this command:<br />
<syntaxhighlighter class="brush: shell">
pdflatex -shell-escape minimal
</syntaxhighlighter>
I usually like to use <a href="http://pages.uoregon.edu/koch/texshop/" target="_blank">TeXShop</a> for my LaTeX needs. For us TeXShoppers, we must do some tinkering to mimic the "-shell-escape" command that is needed to compile the minted package. Go to the TeXShop Preferences, choose the Engine tab, and type "-shell-escape" under the pdfTeX/Latex section.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDGyfCK_-lyAs5SRKUOQUaK9rLohmdrAS5k2H82yDN06vK3__SXdnVA_RWqwcTGkX0KLClYfT6YmVbGMhmwQTS9s19hv9A6V4LPGt9jpsUc0BKia8EN9YPoZWMM2z_A13Sf7s4gA05d-BG/s1600/Screen.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDGyfCK_-lyAs5SRKUOQUaK9rLohmdrAS5k2H82yDN06vK3__SXdnVA_RWqwcTGkX0KLClYfT6YmVbGMhmwQTS9s19hv9A6V4LPGt9jpsUc0BKia8EN9YPoZWMM2z_A13Sf7s4gA05d-BG/s400/Screen.png" width="319" /></a></div>
<br />
Now typeset it! Both methods should result in a PDF containing:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuXxsfg7g8Fy4aCIRJO1zsnLS4pKfb94Y5Hhz1mI0YzrQXPjJf0d5Si_3LzkFvfT6wmyBP5Bh3wqRG5HiCh809SocxV2QQVtLLJBF3U1kPOnnsIk9Vmd_PBuHOuo5E1tLqGHOHKwKCMNmM/s1600/minimal.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="115" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuXxsfg7g8Fy4aCIRJO1zsnLS4pKfb94Y5Hhz1mI0YzrQXPjJf0d5Si_3LzkFvfT6wmyBP5Bh3wqRG5HiCh809SocxV2QQVtLLJBF3U1kPOnnsIk9Vmd_PBuHOuo5E1tLqGHOHKwKCMNmM/s320/minimal.png" width="320" /></a></div>
<br />Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com4tag:blogger.com,1999:blog-6855656749413817524.post-1743962022671293432012-08-22T15:12:00.001-04:002012-08-22T15:12:33.715-04:00How to Count the Number of Wikipedia EditsThis week I finally found some time to catch up on The Colbert Report and I discovered a mischievous little gem. Stephen Colbert and his staff brought up an old Washington Post article, "<a href="http://www.washingtonpost.com/wp-dyn/content/article/2008/08/29/AR2008082902691.html" target="_blank">Wikipedia Edits Forecast Vice Presidential Picks</a>" and subtly suggested that editing is like voting - only with no photo id checks. From five minutes of Googling, I found a torrent of news articles with litanies of what I summarized in the previous sentence. Then I discovered <a href="http://techpresident.com/news/22680/how-spot-romneys-vice-president-pick-advance?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed:+techpres+%28techPresident%29" target="_blank">this article</a> that somewhat substantiates the claim with an ugly excel chart, below.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://techpresident.com/files/Screen%20shot%202012-08-06%20at%2012.40.52%20PM_0.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="220" src="http://techpresident.com/files/Screen%20shot%202012-08-06%20at%2012.40.52%20PM_0.png" width="320" /></a></div>
<br />
To remind those who have forgotten, Mitt Romney announced Paul Ryan as his running mate on August 11, 2012.<br />
<br />
In all the articles I read on this claim, I was surprised that almost no news source visualized the data or told readers how to gather the data. I've previously <a href="http://nighttimecuriosities.blogspot.com/2012/04/can-i-trust-wikipedia.html" target="_blank">posted</a> on some fun analytics with Wikipedia using Mathematica. However, with my recent clean installation of Mountain Lion, it has become clear to me that not everyone has access to Mathematica. In this post, I will show how to count the number of revisions/edits on a Wikipedia article using delightfully free python.<br />
<br />
The python script I have written is quite easy to understand and depends on the Numpy external library. The script starts out by accessing some raw revisions/edits data for a certain article. To get the revision histories for a Wikipedia article there are two methods. One way is to scrap the html, which can take some effort. My code uses the second by simply calling the <a href="http://www.mediawiki.org/wiki/MediaWiki" target="_blank">MediaWiki</a> API. The revision data requested through the API is in an XML format. This data is read into python in an XML format and then parsed for only the date of the revision. The dates are then tallied by a tallying function I have written and sorted by date. Finally the revision/edits data is outputted by the python script as a properly formated array, just waiting to be plotted. The commented python code is presented below:
<syntaxhighlighter class="brush: python;">
import urllib2
import xml.etree.ElementTree as ET
import dateutil.parser
from collections import Counter
import numpy as np
def tally(datalist):
lst = Counter(datalist)
a = []
for i,j in lst.items():
a.append([i,j])
return a
def revisions(articlename):
articlename = articlename.replace(' ','_') # format article title for url
url = 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=%s&' % articlename + \
'rvprop=user|timestamp&rvlimit=max&redirects$rvuser&format=xml' # data limit is set to max for 500 edits
req = urllib2.urlopen(url)
data = req.read(); req.close() # reads url response
root = ET.fromstring(data) # reads xml data
group = root.find(".//revisions")
results = []
## gets revision times from xml data
for elem in group.getiterator('rev'):
timestamp = elem.get('timestamp')
timestamp = dateutil.parser.parse(timestamp).date() # parses timestamp and returns only date
results.append(timestamp)
a = tally(results) # tallys by date
datetally = np.array(sorted(a, key=lambda x: x[0])) # sorts tally by date
return datetally
</syntaxhighlighter>
Here is a quick example of how to plot with the array that is returned. I chose to use Tim Pawlenty and Marco Rubio to show the limitations of the MediaWiki API. I am also biased towards Pawlenty because of his <a href="http://www.youtube.com/watch?v=YfkNEq1XioE" target="_blank">amazing ads</a> during the GOP primaries. There are Wikipedia pages that have low daily revision activity for long stretches of time and pages with high amounts of revisions in very short periods. The MediaWiki API will return only the previous 500 revisions on any article unless you have a super user status.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9mw9ICNi8yD7wVamDqMtPTSKrr6KGvhT9gqwM49PU6NY2P3Z2ReK3pK3fqVzB1cVc7M1wUXtMTsZgtZk2hgsGIIgyeYbv52uIBuyYQ7huBfzqGyZxYWrOEQzId0J9ga866_YuKg2DLO-B/s1600/wikipediaedits.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9mw9ICNi8yD7wVamDqMtPTSKrr6KGvhT9gqwM49PU6NY2P3Z2ReK3pK3fqVzB1cVc7M1wUXtMTsZgtZk2hgsGIIgyeYbv52uIBuyYQ7huBfzqGyZxYWrOEQzId0J9ga866_YuKg2DLO-B/s400/wikipediaedits.png" width="400" /></a></div>
<br />
<syntaxhighlighter class="brush: python;">
from matplotlib import pyplot as plt
a = revisions('Tim Pawlenty')
b = revisions('Marco Rubio')
fig = plt.figure()
graph = fig.add_subplot(111)
graph.plot(a[:,0], a[:,1], 'r', b[:,0], b[:,1], 'b')
fig.autofmt_xdate()
plt.legend(('Tim Pawlenty', 'Marco Rubio'), loc='upper left')
plt.title('Number of Wikipedia Edits')
plt.show()
</syntaxhighlighter>
My warning for this script, please use some common sense to interpret results. You can download the source <a href="http://bit.ly/R0Rd9c" target="_blank">here</a>.Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com1tag:blogger.com,1999:blog-6855656749413817524.post-24858148390608092592012-08-15T13:30:00.001-04:002012-09-11T13:22:48.284-04:00Historical Intraday Stock Price Data with PythonBy popular request, this post will present how to acquire intraday stock data from google finance using python. The general structure of the code is pretty simple to understand. First a request url is created and sent. The response is then read by python to create an array or matrix of the financial data and a vector of time data. This array is created with the help of the popular Numpy package that can be downloaded from <a href="http://numpy.scipy.org/" target="_blank">here</a>. Then in one if-statement, the time data is then restructured into a proper unix time format and translated to a more familiar date string for each financial data point. The translated time vector is then joined with the financial array to produce a single easy to work with financial time series array. Since Numpy has been ported to Python 3, the code I wrote should be compatibile with both Python 2.X and 3.X. Here it is:
<syntaxhighlighter class="brush: python;">
import urllib2
import urllib
import numpy as np
from datetime import datetime
urldata = {}
urldata['q'] = ticker = 'JPM' # stock symbol
urldata['x'] = 'NYSE' # exchange symbol
urldata['i'] = '60' # interval
urldata['p'] = '1d' # number of past trading days (max has been 15d)
urldata['f'] = 'd,o,h,l,c,v' # requested data d is time, o is open, c is closing, h is high, l is low, v is volume
url_values = urllib.urlencode(urldata)
url = 'http://www.google.com/finance/getprices'
full_url = url + '?' + url_values
req = urllib2.Request(full_url)
response = urllib2.urlopen(req).readlines()
getdata = response
del getdata[0:7]
numberoflines = len(getdata)
returnMat = np.zeros((numberoflines, 5))
timeVector = []
index = 0
for line in getdata:
line = line.strip('a')
listFromLine = line.split(',')
returnMat[index,:] = listFromLine[1:6]
timeVector.append(int(listFromLine[0]))
index += 1
# convert Unix or epoch time to something more familiar
for x in timeVector:
if x > 500:
z = x
timeVector[timeVector.index(x)] = datetime.fromtimestamp(x)
else:
y = z+x*60 # multiply by interval
timeVector[timeVector.index(x)] = datetime.fromtimestamp(y)
tdata = np.array(timeVector)
time = tdata.reshape((len(tdata),1))
intradata = np.concatenate((time, returnMat), axis=1) # array of all data with the properly formated times
</syntaxhighlighter>
<div>
<a name='more'></a>Here is a quick example of what can be done with this data in Python 2.7 using the <a href="http://numpy.scipy.org/" target="_blank">Numpy</a> and <a href="http://matplotlib.sourceforge.net/" target="_blank">matplotlib</a> libraries. It is nothing fancy, just the usual moving average computations and some styling preferences.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrEEgz8vrE7gaaU5CCd50apajIX7-zQpDnqBvL37V3VV94RmJNIFEqJKi1JblP62Z4KyhfxNtgpvOOz3BXsoFiH-4mgX1_p_x5w71obeEKZGuN93h9Y1R7bWUDgBXG3gXyFaZCoFnYYPVY/s1600/jpmquotes.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="480" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrEEgz8vrE7gaaU5CCd50apajIX7-zQpDnqBvL37V3VV94RmJNIFEqJKi1JblP62Z4KyhfxNtgpvOOz3BXsoFiH-4mgX1_p_x5w71obeEKZGuN93h9Y1R7bWUDgBXG3gXyFaZCoFnYYPVY/s640/jpmquotes.png" width="640" /></a></div>
<div>
<br /></div>
The python code that was used to create the plot above can is posted below:
<syntaxhighlighter class="brush: python;">
### Example on how to use and plot this data
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
import matplotlib.dates as mdates
import matplotlib.ticker as mticker
import numpy as np
def relative_strength(prices, n=14):
deltas = np.diff(prices)
seed = deltas[:n+1]
up = seed[seed>=0].sum()/n
down = -seed[seed<0].sum()/n
rs = up/down
rsi = np.zeros_like(prices)
rsi[:n] = 100. - 100./(1.+rs)
for i in range(n, len(prices)):
delta = deltas[i-1] # cause the diff is 1 shorter
if delta>0:
upval = delta
downval = 0.
else:
upval = 0.
downval = -delta
up = (up*(n-1) + upval)/n
down = (down*(n-1) + downval)/n
rs = up/down
rsi[i] = 100. - 100./(1.+rs)
return rsi
def moving_average(p, n, type='simple'):
"""
compute an n period moving average.
type is 'simple' | 'exponential'
"""
p = np.asarray(p)
if type=='simple':
weights = np.ones(n)
else:
weights = np.exp(np.linspace(-1., 0., n))
weights /= weights.sum()
a = np.convolve(p, weights, mode='full')[:len(p)]
a[:n] = a[n]
return a
def moving_average_convergence(p, nslow=26, nfast=12):
"""
compute the MACD (Moving Average Convergence/Divergence) using a fast and slow exponential moving avg'
return value is emaslow, emafast, macd which are len(p) arrays
"""
emaslow = moving_average(p, nslow, type='exponential')
emafast = moving_average(p, nfast, type='exponential')
return emaslow, emafast, emafast - emaslow
plt.rc('axes', grid=True)
plt.rc('grid', color='0.75', linestyle='-', linewidth=0.5)
textsize = 9
left, width = 0.1, 0.8
rect1 = [left, 0.7, width, 0.2]
rect2 = [left, 0.3, width, 0.4]
rect3 = [left, 0.1, width, 0.2]
fig = plt.figure(facecolor='white')
axescolor = '#f6f6f6' # the axies background color
ax1 = fig.add_axes(rect1, axisbg=axescolor) #left, bottom, width, height
ax2 = fig.add_axes(rect2, axisbg=axescolor, sharex=ax1)
ax2t = ax2.twinx()
ax3 = fig.add_axes(rect3, axisbg=axescolor, sharex=ax1)
### plot the relative strength indicator
prices = intradata[:,4]
rsi = relative_strength(prices)
t = intradata[:,0]
fillcolor = 'darkgoldenrod'
ax1.plot(t, rsi, color=fillcolor)
ax1.axhline(70, color=fillcolor)
ax1.axhline(30, color=fillcolor)
ax1.fill_between(t, rsi, 70, where=(rsi>=70), facecolor=fillcolor, edgecolor=fillcolor)
ax1.fill_between(t, rsi, 30, where=(rsi<=30), facecolor=fillcolor, edgecolor=fillcolor)
ax1.text(0.6, 0.9, '>70 = overbought', va='top', transform=ax1.transAxes, fontsize=textsize)
ax1.text(0.6, 0.1, '<30 = oversold', transform=ax1.transAxes, fontsize=textsize)
ax1.set_ylim(0, 100)
ax1.set_yticks([30,70])
ax1.text(0.025, 0.95, 'RSI (14)', va='top', transform=ax1.transAxes, fontsize=textsize)
ax1.set_title('%s daily'%ticker)
### plot the price and volume data
low = intradata[:,3]
high = intradata[:,2]
deltas = np.zeros_like(prices)
deltas[1:] = np.diff(prices)
up = deltas>0
ax2.vlines(t[up], low[up], high[up], color='black', label='_nolegend_')
ax2.vlines(t[~up], low[~up], high[~up], color='black', label='_nolegend_')
ma5 = moving_average(prices, 5, type='simple')
ma50 = moving_average(prices, 50, type='simple')
linema5, = ax2.plot(t, ma5, color='blue', lw=2, label='MA (5)')
linema50, = ax2.plot(t, ma50, color='red', lw=2, label='MA (50)')
props = font_manager.FontProperties(size=10)
leg = ax2.legend(loc='center left', shadow=True, fancybox=True, prop=props)
leg.get_frame().set_alpha(0.5)
volume = (intradata[:,4]*intradata[:,5])/1e6 # dollar volume in millions
vmax = volume.max()
poly = ax2t.fill_between(t, volume, 0, label='Volume', facecolor=fillcolor, edgecolor=fillcolor)
ax2t.set_ylim(0, 5*vmax)
ax2t.set_yticks([])
### compute the MACD indicator
fillcolor = 'darkslategrey'
nslow = 26
nfast = 12
nema = 9
emaslow, emafast, macd = moving_average_convergence(prices, nslow=nslow, nfast=nfast)
ema9 = moving_average(macd, nema, type='exponential')
ax3.plot(t, macd, color='black', lw=2)
ax3.plot(t, ema9, color='blue', lw=1)
ax3.fill_between(t, macd-ema9, 0, alpha=0.5, facecolor=fillcolor, edgecolor=fillcolor)
ax3.text(0.025, 0.95, 'MACD (%d, %d, %d)'%(nfast, nslow, nema), va='top',
transform=ax3.transAxes, fontsize=textsize)
# turn off upper axis tick labels, rotate the lower ones, etc
for ax in ax1, ax2, ax2t, ax3:
if ax!=ax3:
for label in ax.get_xticklabels():
label.set_visible(False)
else:
for label in ax.get_xticklabels():
label.set_rotation(30)
label.set_horizontalalignment('right')
ax.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
class MyLocator(mticker.MaxNLocator):
def __init__(self, *args, **kwargs):
mticker.MaxNLocator.__init__(self, *args, **kwargs)
def __call__(self, *args, **kwargs):
return mticker.MaxNLocator.__call__(self, *args, **kwargs)
# at most 5 ticks, pruning the upper and lower so they don't overlap
# with other ticks
ax2.yaxis.set_major_locator(MyLocator(5, prune='both'))
ax3.yaxis.set_major_locator(MyLocator(5, prune='both'))
plt.show()
</syntaxhighlighter>Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com9tag:blogger.com,1999:blog-6855656749413817524.post-8747397773150810692012-08-11T20:24:00.000-04:002012-08-15T13:35:13.974-04:00The Walking Dead ModelToday, I finally discovered a useful application of my chemical engineering degree. To initiate engaging philosophical and mathematical conversations about zombies with grumpy old men.<br />
<br />
As I sat at my terminal watching the first episode of <a href="http://www.amazon.com/gp/product/B005CA4SQK/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=B005CA4SQK&linkCode=as2&tag=nighttimecuri-20">The Walking Dead</a><img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=nighttimecuri-20&l=as2&o=1&a=B005CA4SQK" style="border: none !important; margin: 0px !important;" width="1" />, a raspy voice next to me muttered, "That would never happen."<br />
<br />
I replied with the usual, "Yes, I know zombies don't exist." Usually that avoids the unwanted confrontation and allows for a change of subject. It is a formulaic reply that Dale Carnegie would use - if he was a zombie enthusiast. Then he took me by surprise when he explained that the population of a small town would not be completely zombified or dead.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://img.photobucket.com/albums/v698/Leuthen2/WalkingDead/WalkingDeadLocations/TWDLScreencaps/Hospital001.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="225" src="http://img.photobucket.com/albums/v698/Leuthen2/WalkingDead/WalkingDeadLocations/TWDLScreencaps/Hospital001.jpg" width="400" /></a></div>
<br />
In all honesty, I have wondered what I would do in certain zombie apocalypse scenarios. But I have never thought about the larger picture like the population of a town. With a few Auntie Anne's napkins and 45 minutes before boarding we began outlining a population model using the zombie rules from the universe of The Walking Dead.<br />
<a name='more'></a><br />
In The Walking Dead, every person possesses some type of dormant infection that will turn people who die of natural causes into zombies. The bites from these zombies will turn the living into zombies after a period of time. Zombies can be annihilated by the destruction of the reptilian brain. For those readers who are chemical engineers, you should notice that these rules are mathematically very similar to a time dependent batch reaction. The correct model is a simple mass balance, since bodies are conserved.<br />
<br />
The balance for those who are living includes births, non-zombie related deaths, and those who are bitten from zombie related encounters, respectively:<br />
<br />
\[\frac{dL}{dt}=B-nL-xLZ\]<br />
<br />
The balance for those who are actively infected includes those who are bitten from zombie related encounters, those who turn into zombies, and the ones that die from non-zombie related causes, respectively:<br />
<br />
\[\frac{dI}{dt}=xLZ-pI-nI\]<br />
<br />
The balance for zombies contains those who turn into zombies from bites, those who are resurrected from non-zombie related deaths, and those who are killed by an encounter with the living, respectively:<br />
<br />
\[\frac{dZ}{dt}=pI-rT-aLZ\]<br />
<br />
The final balance is a "kill" count that includes the living who died of non-zombie related causes, the actively infected who die of 'natural' causes, zombies that have been destroyed, and the dead who have been resurrected into zombies, respectively:<br />
<br />
\[\frac{dT}{dt}=nL+nI+aLZ-rT\]<br />
<br />
Here is a summary of the notation I used in my model:<br />
<br />
<ul>
<li>L: the number of living</li>
<li>I: the number of bitten</li>
<li>Z: the number of zombies</li>
<li>T: the number of people "killed"</li>
<li>B: the population birth rate</li>
<li>n: the chance of a non-zombie related death</li>
<li>x: the chance of living being bitten in a zombie encounter</li>
<li>p: the chance of a bitten person turning into a zombie</li>
<li>r: the chance a dead person is resurrected into a zombie</li>
<li>a: the chance a zombie is totally destroyed</li>
</ul>
<br />
Without access to Mathematica on my laptop, I had to resort to a more old school method, Python. The development of the model and coding went pretty smoothly. It was coming to an agreement on the probabilities of the model that caused us to leave our discussion unresolved before zone 4 boarding. To Tim the taxidermist from Texas and my blog readers, please tell me what you think plausible probabilities are, using the model in the following python code.<br />
<syntaxhighlighter class="brush: python;">
# modeling the walking dead
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
B = 0 # births
n = 0.0001 # natural death probability (per day)
x = 0.0055 # transmission probability (per day)
r = 0.9000 # resurrect probability (per day)
a = 0.0075 # destroy probability (per day)
p = 0.5000 # infection probability (per day)
# solve the system dy/dt = f(y, t)
def f(y, t):
Li = y[0]
Ii = y[1]
Zi = y[2]
Ti = y[3]
# our modeling equations
f0 = B - x*Li*Zi - n*Li
f1 = x*Li*Zi - p*Ii - n*Ii
f2 = p*Ii + r*Ti - a*Li*Zi
f3 = n*Li + n*Ii + a*Li*Zi - r*Ti
return [f0, f1, f2, f3]
# initial conditions
L0 = 500 # initial population
I0 = 0 # initial infected population
Z0 = 0 # initial zombie population
T0 = 0 # initial death population
y0 = [L0, I0, Z0, T0] # initial condition vector
t = np.linspace(0, 30, 5000) # time grid
# solve the DEs
soln = odeint(f, y0, t)
L = soln[:, 0]
I = soln[:, 1]
Z = soln[:, 2]
T = soln[:, 3]
# plot results
plt.figure()
plt.plot(t, L, label='Living')
plt.plot(t, Z, label='Zombies')
plt.xlabel('Days from outbreak')
plt.ylabel('Population')
plt.title('Apocalypse through Rules of The Walking Dead')
plt.legend(loc=0)
#plt.show()
plt.figure()
plt.plot(t, L, label='Living')
plt.plot(t, I, label='Bitten')
plt.xlabel('Days from outbreak')
plt.ylabel('Population')
plt.title('Apocalypse through Rules of The Walking Dead')
plt.legend(loc=0)
plt.show()
</syntaxhighlighter>
The probabilities used in my code were found through some trial and error to closely match those in The Walking Dead series on AMC. Assuming that the zombie outbreak began when Rick Grimes was shot, Rick was in a coma for about three weeks before the military is seen opening fire on the hoards of zombies at the hospital. Then a few days later Rick wakes up to find no living person around. The plots created from the model above replicates this timing pretty well.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiSzjkpwr5RtJ6WICqlTYm83jduz7Nwt-6Kv1JGhGoQZlY9RTvzNHZNK8YwYi5_OOZjceV-vtQWJM-RQnY4SkG25NJ2Bn9lst-gxGUn2QGAp9dQU1L4oDZ3aGHlw5fyKOGBqZP0Vpie8mp/s1600/livingvzombies.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiSzjkpwr5RtJ6WICqlTYm83jduz7Nwt-6Kv1JGhGoQZlY9RTvzNHZNK8YwYi5_OOZjceV-vtQWJM-RQnY4SkG25NJ2Bn9lst-gxGUn2QGAp9dQU1L4oDZ3aGHlw5fyKOGBqZP0Vpie8mp/s400/livingvzombies.png" width="400" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyILelcc0MwQcq2jOboZLrMQbCDFwG0w5_BYf2nHZ_Q8mt-Q3sZ09TZy0ltmu0tOjcFsYiK_GM9fBpAgocaVPWB7PmbWsDRyuKdAUkl1idG18ZPkP45qXAGLNThADAe4ygPc39OJMEOIL6/s1600/livingvbitten.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyILelcc0MwQcq2jOboZLrMQbCDFwG0w5_BYf2nHZ_Q8mt-Q3sZ09TZy0ltmu0tOjcFsYiK_GM9fBpAgocaVPWB7PmbWsDRyuKdAUkl1idG18ZPkP45qXAGLNThADAe4ygPc39OJMEOIL6/s400/livingvbitten.png" width="400" /></a></div>
<br />Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-7887677212170315362012-08-07T11:28:00.002-04:002012-08-07T11:30:57.790-04:00Installing Scipy from MacPorts on Mountain LionAfter doing a fresh install of Mountain Lion I proceeded with my age old tradition of reinstalling all of my favorite libraries from <a href="http://www.macports.org/" target="_blank">MacPorts</a>. Everything installed wonderfully except for <a href="http://pyobjc.sourceforge.net/" target="_blank">PyObjC</a>, <a href="http://www.scipy.org/" target="_blank">Scipy</a>, and a few of their dependencies like gcc45 and libunwind-headers. With an hour of trial and error that included playing with the pip installer and fighting the temptation to switch to <a href="http://mxcl.github.com/homebrew/" target="_blank">Homebrew</a>, I found an easy and quick fix for MacPorts.<br />
<div>
<br /></div>
<div>
If you have already attempted to install any of those ports I have listed, please clean those ports that have failed to install. For example, my installation of the py27-scipy port failed so I must use the following command to clean it:</div>
<div>
<br /></div>
<syntaxhighlighter class="brush: bash">
sudo port clean --all py27-scipy
</syntaxhighlighter>
<br />
<div>
The quick fix to allow MacPorts to install these wonderful ports properly is done by reconfirming the Xcode license agreement with the following command:</div>
<div>
<br /></div>
<syntaxhighlighter class="brush: bash">
sudo xcodebuild -license
</syntaxhighlighter>
<br />
<div>
Continuing from my example, <i>sudo port install py27-scipy</i> should now execute and install without a hitch. The same goes for the other ports I listed earlier. </div>Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-375810942461883002012-07-30T17:10:00.001-04:002012-07-30T17:10:33.904-04:00Creating a Sonic Screwdriver with an Arduino Nano, Part 1I was pretty excited to hear about the release of a <a href="http://www.engadget.com/2012/07/11/think-geek-mark-vii-sonic-screwdriver-universal-remote/" target="_blank">Sonic Screwdriver remote</a>. But after watching the YouTube demo, it seems kind of lame. The appeal of the Sonic Screwdriver, used by the good Doctor, comes from its universal applicability to help the Doctor get out of sticky situations or rid general annoyances. A programmable remote just doesn't cut it.<br />
<div>
<br /></div>
<div>
For those in the US who have never caught an old episode of Doctor Who at 5:30 AM in the morning on PBS or watch the newer episodes on <a href="http://www.amazon.com/gp/product/B003HYSSZK" target="_blank">Amazon Prime</a>, a Sonic Screwdriver is akin to<span style="background-color: white;"> MacGyver's swiss army knife, but only more futuristic - the BBC prop makers added an LED to show that it is from the future.</span></div>
<div>
<span style="background-color: white;"><br /></span></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.wired.com/geekdad/wp-content/uploads/2012/04/a28d8ccd461c2af9_289715_327710703909630_127031120644257_1507609_1089709115_o.preview.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="273" src="http://www.wired.com/geekdad/wp-content/uploads/2012/04/a28d8ccd461c2af9_289715_327710703909630_127031120644257_1507609_1089709115_o.preview.jpg" width="400" /></a></div>
<br /></div>
<div>
<span style="background-color: white;"><br /></span></div>
<div>
<span style="background-color: white;">My disappointment in the Sonic Screwdriver remote (<a href="http://www.thinkgeek.com/product/ee4a/" target="_blank">available on ThinkGeek</a>) has inspired me to design a Sonic Screwdriver that is more true to fiction. I just can't see the Doctor sitting somewhere waving his screwdriver infront of an old Panasonic remote. In this post I will go my plans to integrate a <a href="http://www.tvbgone.com/" target="_blank">TV-B-Gone</a> into a Sonic Screwdriver. In future posts, I may show how to integrate other functions such as </span><span style="background-color: white;">128 kHz</span><span style="background-color: white;"> RFID technology (to open my apartment or work doors) and possibly integrating 13.56 MHz MIFARE </span><span style="background-color: white;">technology (in DC we use SmarTrip for public transportation).</span></div>
<div>
<span style="background-color: white;"></span><br />
<a name='more'></a></div>
<div>
Here is a list of materials necessary for the first screwdriver prototype:<br />
<br /></div>
<a href="http://www.amazon.com/gp/product/B003M71Z2K/ref=as_li_ss_tl?ie=UTF8&tag=nighttimecuri-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=B003M71Z2K" target="_blank">Doctor Who Eleventh Doctor Die Cast Sonic Screwdriver</a><img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=nighttimecuri-20&l=as2&o=1&a=B003M71Z2K" style="border: none !important; margin: 0px !important;" width="1" />
<br />
<a href="http://www.amazon.com/gp/product/B003YVL34O/ref=as_li_ss_tl?ie=UTF8&tag=nighttimecuri-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=B003YVL34O" target="_blank">Arduino Nano v3.0</a><img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=nighttimecuri-20&l=as2&o=1&a=B003YVL34O" style="border: none !important; margin: 0px !important;" width="1" /><br />
IR LED (950 nm)<br />
100 Ohm Resistor (I prefer the metal-film ones)<br />
NPN Transistor (I had a 2N4401 laying around)<br />
Breadboard<br />
Tact Button<br />
Some Wires<br />
A23 Battery<br />
N-Battery Holder<br />
<a href="http://www.amazon.com/gp/product/B000B5YIYS/ref=as_li_ss_tl?ie=UTF8&tag=nighttimecuri-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=B000B5YIYS" target="_blank">Weller WP35 35-Watt Professional Soldering Iron</a><img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=nighttimecuri-20&l=as2&o=1&a=B000B5YIYS" style="border: none !important; margin: 0px !important;" width="1" />
<br />
<a href="http://www.amazon.com/gp/product/B00030AP48/ref=as_li_ss_tl?ie=UTF8&tag=nighttimecuri-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=B00030AP48" target="_blank">American Terminal AT-31604 60-40 Rosin Core Solder (4 Ounces)</a><img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=nighttimecuri-20&l=as2&o=1&a=B00030AP48" style="border: none !important; margin: 0px !important;" width="1" /><br />
<br />
Now lets get started first by prototyping a TV-B-Gone with the Arduino Nano. Usually this process would take a bit of knowledge of C and microcontroller programming, but luckily <a href="http://www.arcfn.com/2010/11/improved-arduino-tv-b-gone.html" target="_blank">Ken Shirriff</a> has already created an Arduino port that will serve the purposes of this post. Simply follow the schematics below to set up the Arduino Nano TV-B-Gone that works with Ken Shirriff's <a href="https://github.com/shirriff/Arduino-TV-B-Gone" target="_blank">code</a>.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTEnI1vC5zp_ss2t2e_HSctp7DaP0A2ZECAnyEO9P2sLjeuzPEQQ-UJ9n_vwC4zi8XrSRQO3BrDrSjWNX7-fCUFFm_pS04ZO7zrTWjerz5W56fIrLvgEaouSH_0qTvVeLJuhh1nHFBx5JQ/s1600/arduino_nano-tvbgone.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="288" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiTEnI1vC5zp_ss2t2e_HSctp7DaP0A2ZECAnyEO9P2sLjeuzPEQQ-UJ9n_vwC4zi8XrSRQO3BrDrSjWNX7-fCUFFm_pS04ZO7zrTWjerz5W56fIrLvgEaouSH_0qTvVeLJuhh1nHFBx5JQ/s400/arduino_nano-tvbgone.png" width="400" /></a></div>
<br />
To test the set-up above, connect the Arduino Nano to your computer with a mini-B USB cable. Open up the .pde file downloaded from Ken's blog with the <a href="http://arduino.cc/en/Main/Software" target="_blank">Arduino IDE</a>. In the tolbar, under Tools > Board, please make sure the correct board is selected. If you ordered the Arduino Nano v3.0 from the link above, the correct selection is listed as "Arduino Nano w/ ATmega328." Under Tools > Serial Port, make sure the selected port is connected to the Arduino. On Lion OS X 10.7.4, I ran into a slight problem where my mac could not detect the port my Nano was connected on. I was able to fix this by installing an FTDI driver for my system, from <a href="http://www.ftdichip.com/Drivers/VCP.htm" target="_blank">here</a>. Now the correct serial port (/dev/tty.usbserial-AE01K9BX) will appear when my Arduino is connected. When both the Board and Serial Port are correctly selected, press the upload button (the right pointing arrow on the top left corner). After uploading is complete, press the tact button on the breadboard and use a digital camera to check that the IR LED is pulsing (pictured below). If it does not pulse, check to see if your connections match the schematic above. You should note that the transistor has it's flat side facing towards you, the short leg on the IR LED is the ground, and the tact button is made up of two separate strips of conductive metal on the bottom. For my European readers, please remember to ground pin 5 or follow <a href="http://bit.ly/PgmYKb" target="_blank">this schematic</a>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyeWjEaoZl2QhBUS_3rIhDKSdKjKOMk53XpNBTJ_26GGfaYXpyngHbLwn63Y49HppVF5lV0ZFeOyGyJ6sBLtBhJsdmSy4e_j_yc3RIW8gnZC_-AzhnScWmkcLSp4zU5HDNPvclOcYNqLl1/s1600/arduino_tvbgone_breadboard.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyeWjEaoZl2QhBUS_3rIhDKSdKjKOMk53XpNBTJ_26GGfaYXpyngHbLwn63Y49HppVF5lV0ZFeOyGyJ6sBLtBhJsdmSy4e_j_yc3RIW8gnZC_-AzhnScWmkcLSp4zU5HDNPvclOcYNqLl1/s320/arduino_tvbgone_breadboard.jpg" width="320" /></a></div>
<br />
<br />
To disassemble the head of the Sonic Screwdriver screwdriver, just pull really hard and the insides should come out easily.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9dRP1q1twKu2cHKfT6zQ1V1C0-yUj0KEKXxGX2a3-or17IH51JHrPKu7lt3UqSZWHHq-WfnA2I_lGUdOjlnSl2UK55kSewV_owzRD4JK7hB_rD9ABmD321K-LBbAie8chDq1PdXaORzj6/s1600/sonicscrewdriver-inside.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj9dRP1q1twKu2cHKfT6zQ1V1C0-yUj0KEKXxGX2a3-or17IH51JHrPKu7lt3UqSZWHHq-WfnA2I_lGUdOjlnSl2UK55kSewV_owzRD4JK7hB_rD9ABmD321K-LBbAie8chDq1PdXaORzj6/s320/sonicscrewdriver-inside.jpg" width="320" /></a></div>
<br />
<br />
The next steps will be a bit trickier. I chose to use the Arduino Nano for the board's tiny size, which fits perfectly into the head of the unassembled Sonic Screwdriver. However, the headers of the assembled nano makes it oversized.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgL-X31jL56_ptdXwfDZz36-W8SaMQt8KNhu1VZ0EUXbDqlrU_mWBy9rjBxkenBtWwImmf73RRFEaBnNlEK6dAiXpPhhqFehuXyWCiMQ6azvnSs3ALEr-ljPsVGM6U38DspXD2UEVTtptIE/s1600/nano_inside.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgL-X31jL56_ptdXwfDZz36-W8SaMQt8KNhu1VZ0EUXbDqlrU_mWBy9rjBxkenBtWwImmf73RRFEaBnNlEK6dAiXpPhhqFehuXyWCiMQ6azvnSs3ALEr-ljPsVGM6U38DspXD2UEVTtptIE/s320/nano_inside.jpg" width="240" /></a></div>
<br />
One solution is to start the project by buying an unassembled Arduino Nano. The solution I chose was to desolder and remove the headers, which will be covered in my next post. For those of you who just can't wait, all of the necessary parts to complete the project has been presented in this post. To get the basic functions of a TV-B-Gone in a Sonic Screwdriver just solder or <a href="http://www.amazon.com/gp/product/B000Z9H7ZW/ref=as_li_ss_tl?ie=UTF8&tag=nighttimecuri-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=B000Z9H7ZW" target="_blank">wire glue</a><img alt="" border="0" height="1" src="http://www.assoc-amazon.com/e/ir?t=nighttimecuri-20&l=as2&o=1&a=B000Z9H7ZW" style="border: none !important; margin: 0px !important;" width="1" />
everything together, connect the N-battery holder to the VIN pin (next to the ground pin), and you are ready to go! The next posts will cover how to assemble the screwdriver with sound effects and increasing the range of the screwdriver.Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com3tag:blogger.com,1999:blog-6855656749413817524.post-83339167551810437282012-07-27T14:00:00.000-04:002012-07-28T13:34:12.662-04:00Face and Eye Tracking Fun!I recently discovered the <a href="http://www.openframeworks.cc/" target="_blank">openframeworks</a> toolkit. It is an open source C++ toolkit that wraps together many commonly used libraries for simple and fast experimentation. I absolutely love it!<br />
<br />
When I shared code in the past, it took quite a bit of time to document what libraries, which version, and sometimes (even with C++) which operating system the new user must install for the code to compile properly. With <a href="http://www.openframeworks.cc/" target="_blank">openframeworks</a>, that annoyance is a thing of the past.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXP3PTUn9CXX8eNhfvUoIfqHMi5LxY14NQ_H0UEjVA840klHpvPPNsTs9D5BhgZR8S7y-OaPvz36B5u9CLuh3ihgWreX4aAJpfe0tgH-tdnWHjw7mKd6ioXJI6390vfsVZAG-9RS2LvOEd/s1600/facetracking.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="298" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXP3PTUn9CXX8eNhfvUoIfqHMi5LxY14NQ_H0UEjVA840klHpvPPNsTs9D5BhgZR8S7y-OaPvz36B5u9CLuh3ihgWreX4aAJpfe0tgH-tdnWHjw7mKd6ioXJI6390vfsVZAG-9RS2LvOEd/s400/facetracking.png" width="400" /></a></div>
<br />
For this post I decided to make a quick and fun example adopted from one of openframeworks' provided examples on detecting faces in pictures. The example can be found under the folder: examples/addons/opencvHaarFinderExample.<br />
<br />
I decided to extend their example to detect faces from a live webcam or a quicktime movie. I was amazed at how few lines of code it took. To get started,<span style="background-color: white;"> download the appropriate toolkit for your IDE,</span><span style="background-color: white;"> </span><a href="http://www.openframeworks.cc/download/" style="background-color: white;" target="_blank">here</a><span style="background-color: white;">.</span><span style="background-color: white;"> Then copy the folder </span><span style="background-color: white;">opencvHaarFinderExample in </span><span style="background-color: white;">examples/addons/ to apps/myApps/. Then replace the code in testApp.cpp with the following:</span><br />
<syntaxhighlighter class="brush: cpp">
#include "testApp.h"
//--------------------------------------------------------------
void testApp::setup(){
camWidth = 480;
camHeight = 360;
//uncomment and comment lines below to test different detectors
finder.setup("haarcascade_frontalface_default.xml");
//finder.setup("haarcascade_frontalface_alt.xml");
//finder.setup("haarcascade_frontalface_alt2.xml");
//finder.setup("haarcascade_eyeglasses.xml");
//finder.setup("haarcascade_eye.xml");
#ifdef _USE_LIVE_VIDEO
VideoGrabber.setVerbose(true);
VideoGrabber.initGrabber(camWidth,camHeight,OF_IMAGE_COLOR);
colorImg.allocate(camWidth,camHeight,OF_IMAGE_COLOR);
#else
VideoPlayer.loadMovie("ThemeFromShaft.mov");
VideoPlayer.play();
#endif
}
//--------------------------------------------------------------
void testApp::update(){
ofBackground(100,100,100);
bool newFrame = false;
#ifdef _USE_LIVE_VIDEO
VideoGrabber.grabFrame();
newFrame = VideoGrabber.isFrameNew();
#else
VideoPlayer.idleMovie();
newFrame = VideoPlayer.isFrameNew();
#endif
if (newFrame){
#ifdef _USE_LIVE_VIDEO
colorImg.setFromPixels(VideoGrabber.getPixels(), camWidth, camHeight, OF_IMAGE_COLOR);
#else
colorImg.setFromPixels(VideoPlayer.getPixels(), camWidth, camHeight, OF_IMAGE_COLOR);
#endif
finder.findHaarObjects(colorImg);
}
}
//--------------------------------------------------------------
void testApp::draw(){
ofSetHexColor(0xffffff);
colorImg.draw(0,0);
ofNoFill();
for(int i = 0; i < finder.blobs.size(); i++) {
ofRectangle dim = finder.blobs[i].boundingRect;
ofRect(dim.x, dim.y, dim.width, dim.height);
}
ofSetHexColor(0xffffff);
char reportStr[1024];
sprintf(reportStr, "Number Detected: %lu fps: %f", finder.blobs.size(), ofGetFrameRate());
ofDrawBitmapString(reportStr, 20, 20+camHeight);
}
//--------------------------------------------------------------
void testApp::keyPressed (int key){
switch (key){
case ' ':
break;
}
}
//--------------------------------------------------------------
void testApp::mouseMoved(int x, int y ){
}
//--------------------------------------------------------------
void testApp::mouseDragged(int x, int y, int button){
}
//--------------------------------------------------------------
void testApp::mousePressed(int x, int y, int button){
}
//--------------------------------------------------------------
void testApp::mouseReleased(int x, int y, int button){
}
//--------------------------------------------------------------
void testApp::windowResized(int w, int h){
}
</syntaxhighlighter>
<span style="background-color: white;">Then replace the code in testApp.h with the following:</span><br />
<syntaxhighlighter class="brush: cpp">
#ifndef _TEST_APP
#define _TEST_APP
#include "ofMain.h"
#include "ofxCvHaarFinder.h"
#define _USE_LIVE_VIDEO //comment to use video, uncomment to use webcam
class testApp : public ofBaseApp{
public:
void setup();
void update();
void draw();
void keyPressed (int key);
void mouseMoved(int x, int y );
void mouseDragged(int x, int y, int button);
void mousePressed(int x, int y, int button);
void mouseReleased(int x, int y, int button);
void windowResized(int w, int h);
int camWidth, camHeight;
#ifdef _USE_LIVE_VIDEO
ofVideoGrabber VideoGrabber;
#else
ofVideoPlayer VideoPlayer;
#endif
ofxCvHaarFinder finder;
ofImage colorImg;
};
#endif
</syntaxhighlighter>
For those mac users out there, just download <a href="http://bit.ly/MKnbuF">this</a>, open /apps/myApps/HeadTracking/TrackingExample.xcodeproj in Xcode (3.2 and above) and hit run!Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-16921187920631642012012-07-11T13:24:00.002-04:002012-07-11T13:24:55.375-04:00Historical Minute-by-Minute Stock Prices in MATLABThis is an extension of <a href="http://nighttimecuriosities.blogspot.com/2011/04/google-finance-backfill.html" target="_blank">my first post</a> on this blog. My first post went over how to gather the minute to minute stock price data from Google Finance using Mathematica. Using the same general programming structure, I've created a function to do this in MATLAB. <div>
<span style="background-color: white;"><br /></span></div>
<div>
The IntraDayStockData function can be obtained <a href="http://bit.ly/Lfvm0k">here</a>. <span style="background-color: white;">This function takes necessary inputs of a stock symbol and the name of the exchange the stock is traded in. This function also takes optional inputs of quote </span><span style="background-color: white;">frequency</span><span style="background-color: white;"> (in seconds) and number of previous trading days. By default, the optional inputs are set for 1 minute quote intervals and 15 previous trading days. The output of this function are the time, the time in string format, trade volume, highest prices, lowest prices, and closing prices. Here is a quick example for getting the minute by minute quotes of the latest trading day (today) for JP Morgan Chase:</span></div><br />
<syntaxhighlighter class="brush: matlabkey">
jpm = IntraDayStockData('JPM','NYSE','60','1d');
plot(jpm.date,jpm.close,'b-'); hold on;
plot(jpm.date,jpm.high,'r-'); hold on;
plot(jpm.date,jpm.low,'g-');
datetick('x',16);
</syntaxhighlighter>
<div>
<span style="background-color: white;"><br /></span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgeFkx1DwZoyM1BE25zk5IG40CSIidvTFTDfCETPDtMUbK-RPy6U3v7oZoL3edVkr4wfhxJIULukdwWWBAlkCkCVGLaRfgA6OoAkAHpXMo4h88F5D1zre7wxm-X7ciVZRDG5xjoSjxPQwxB/s1600/jpmexample.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgeFkx1DwZoyM1BE25zk5IG40CSIidvTFTDfCETPDtMUbK-RPy6U3v7oZoL3edVkr4wfhxJIULukdwWWBAlkCkCVGLaRfgA6OoAkAHpXMo4h88F5D1zre7wxm-X7ciVZRDG5xjoSjxPQwxB/s400/jpmexample.png" width="400" /></a></div>Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com4tag:blogger.com,1999:blog-6855656749413817524.post-8910514579865298092012-07-03T12:38:00.000-04:002012-07-03T12:38:39.007-04:00Where are the Homicides in Chicago?Recently, CNN did a <a href="http://www.cnn.com/video/#/video/bestoftv/2012/07/02/exp-eb-chicago-violence-crisis.cnn?iref=allsearch" target="_blank">report</a> on the high number of homicides that is plaguing Chicago. One of the sure fire solutions suggested was to place more police in the areas where these killings occur. It got me wondering where these homicides actually occur.<br />
<br />
Using the code from my <a href="http://nighttimecuriosities.blogspot.com/2012/05/chicago-crime-updated.html" target="_blank">previous post</a>, I called the crime API that the city of Chicago provides and plotted out all the reported homicides between August 1st, 2011 and May 31st, 2012. There were 403 in total.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjY54Fif3aNEHy_KkKD0GyJW1F0nk48wzv25MnrreuQlH6vZw8mRj-_b3j4MMYVxFhD4hAN5GO_vNfYSWYEtbs__76U1L4TVUqVkGTYJLDs1nAylYfOfFRFCM-Tqmwvdn373ecsQsPeJyNL/s1600/Chicago_Homicides.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="190" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjY54Fif3aNEHy_KkKD0GyJW1F0nk48wzv25MnrreuQlH6vZw8mRj-_b3j4MMYVxFhD4hAN5GO_vNfYSWYEtbs__76U1L4TVUqVkGTYJLDs1nAylYfOfFRFCM-Tqmwvdn373ecsQsPeJyNL/s640/Chicago_Homicides.png" width="640" /></a></div>
<br />
<span style="background-color: white;">Now where did these incidents occur?</span><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrfG7SjVPCcO6TXAMBpgefXF3YxO6zLHWa_D9f64vcmQ0YJ2ShyphenhypheniryvFUi7JGil90L-Bf3lstsRKXjlD1-v53gTmRfwmnyp2Nm0wOfrb-6Ltr12ePCy-qm3su2-dP5xI423KLxMUW3MMdw/s1600/Chicago_HomicidesGIS.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrfG7SjVPCcO6TXAMBpgefXF3YxO6zLHWa_D9f64vcmQ0YJ2ShyphenhypheniryvFUi7JGil90L-Bf3lstsRKXjlD1-v53gTmRfwmnyp2Nm0wOfrb-6Ltr12ePCy-qm3su2-dP5xI423KLxMUW3MMdw/s640/Chicago_HomicidesGIS.png" width="534" /></a></div>
<br />
<br />
Initially it looks as if the homicides are wide spread throughout the city. However, if we do a breakdown by neighborhood it becomes obvious that some are much worse off than others. <span style="background-color: white;">Below is a table of each Chicago neighborhood and its associated number of homicides </span><span style="background-color: white;">between August 1st, 2011 and May 31st, 2012</span><span style="background-color: white;">. I've also provided a more convenient graphical representation of this table. The Mathematica code and data files I used for this post can be found <a href="http://bit.ly/M3ICBm" target="_blank">here</a>.</span><br />
<br />
<div class="post-column-left">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKjWUTIA9KW6s7lHc9QpHgIMDTd2eLFHRYirsni5BpWIlY2meVv1YNopl1i3ggZuFQKUh71KMdMNZtfx1iF1lf4dAJ-0uBjS1hdttUwxP1uLTd_ZCmUpCQSQ0qBe_Eilwv5H0scBxBsAXp/s1600/Chicago_HomicidesbyNeighborhoods.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKjWUTIA9KW6s7lHc9QpHgIMDTd2eLFHRYirsni5BpWIlY2meVv1YNopl1i3ggZuFQKUh71KMdMNZtfx1iF1lf4dAJ-0uBjS1hdttUwxP1uLTd_ZCmUpCQSQ0qBe_Eilwv5H0scBxBsAXp/s1600/Chicago_HomicidesbyNeighborhoods.png" /></a></div>
<br />
<div class="post-column-right">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyh4AHOzLFJK8ehQhAulEiq7N3sp_YFLtQ6ngjs_h5drc1ta29B7ZHMz469ou0IMo7jl37WJluDJqJk6vvByRMZiwwaMti8D3ITROB6sA3tdZMlSqcL-MVh6eKZPqOEjn_MYWJMoqp9lA7/s1600/Chicago_HomicideDensity.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyh4AHOzLFJK8ehQhAulEiq7N3sp_YFLtQ6ngjs_h5drc1ta29B7ZHMz469ou0IMo7jl37WJluDJqJk6vvByRMZiwwaMti8D3ITROB6sA3tdZMlSqcL-MVh6eKZPqOEjn_MYWJMoqp9lA7/s320/Chicago_HomicideDensity.png" width="320" /></a></div>
<br />Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-31123700497335251662012-06-27T16:20:00.001-04:002012-06-28T13:24:13.139-04:00Hidden Markov Model of Chicago CrimesThis is a bit of a side note from my attempts to model the number of daily crimes in the city of Chicago. I've been treating the Chicago crime data as a parametric random process where the parameters of the stochastic process can be estimated in a well-defined manner. My first few attempts used various autoregressive models that assumed the crime data was a signal that cross-correlated with itself. From my knowledge of statistics, this is the proper and prescribed treatment for the data. But I have been wondering how well a strictly stochastic model like the Hidden Markov Model (HMM) will perform.<br />
<br />
A hidden Markov model is not that hard of a conceptual stretch when applied to crime modeling. Lets assume that all criminal activities in Chicago evolved according to a Markov process, which may have a finite or infinite amount of states. The underlying Markov process is not directly observable because these states are hidden. Luckily, for each state there is an associated stochastic process which produces an observable signal (a crime report). So as this criminal system in Chicago evolves over time, delegated by the hidden Markov chain, it produces crime reports as observable outcomes.<br />
<br />
For a full explanation of the mathematical theory behind HMMs please refer to <a href="http://www.cs.cornell.edu/Courses/cs4758/2012sp/materials/hmm_paper_rabiner.pdf" target="_blank">this</a> article by Lawrence Rabiner. Even though Rabiner focuses on HMM applications to speech recognition, it covers in great detail the theory used in my implementation. To simplify things for myself and to cut down on computation time, I created a hidden Markov model with univariate Gaussian outcomes. The model is fitted using a maximum likelihood estimation, where the values of the parameters are those that maximizes the likelihood of the observed data. Here, I used the <a href="http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm" target="_blank">Baum-Welch Algorithm</a> for estimating the maximum likelihood.<br />
<br />
The observed Chicago crime data is the total number of daily reported crimes between July 1, 2011 and March 31, 2012. A log difference transformation of this data was performed to create a more normal distribution of observables. This is all done in Mathematica with six lines of code:
<br />
<pre class="prettyprint lang-mma">csv = Import[
"https://data.cityofchicago.org/api/views/x2n5-8w5q/rows.csv",
"CSV"];
start = "07/01/2011"; end = "03/31/2012";
range = Join[
First[Position[
If[StringMatchQ[#, start ~~ __], 1, 0] & /@
Drop[csv, 1][[All, 2]], 1]] + 1,
Last[Position[
If[StringMatchQ[#, end ~~ __], 1, 0] & /@
Drop[csv, 1][[All, 2]], 1]] + 1];
crimeseries =
Tally[DateString[
DateList[{ToString[#], {"Month", "/", "Day", "/", "Year", " ",
"Hour12", ":", "Minute", " ", "AMPM"}}], {"MonthName", " ",
"Day", " ", "Year"}] & /@ Take[csv, range][[All, 2]]];
</pre>
The HMM fitting of the crime data uses three functions I created in Mathematica:
<br />
<pre class="prettyprint lang-mma">MarkovChainEquilibrium[TransitionMatrix_] :=
Inverse[Transpose[TransitionMatrix] -
IdentityMatrix[Length[TransitionMatrix]] + 1].ConstantArray[1,
Length[TransitionMatrix]];
RandomInitial[Data_, States_] := Module[
{TransMat, i, vnGamma, ProbVec, StateMean, vnQ, StateStD, t, mnW},
TransMat = (#/Total[#]) & /@ (RandomReal[{0.01, 0.99}, {States,
States}]);
ProbVec = MarkovChainEquilibrium[TransMat];
StateMean = ((#/Total[#]) &[
RandomReal[{-0.99, 0.99}, States]]) Mean[Data];
StateStD =
Sqrt[((#/Total[#]) &[RandomReal[{0.1, 0.99}, States]]) Variance[
Data]];
{ProbVec, TransMat, StateMean, StateStD}];
Options[HMM] = {"LikelihoodTolerance" -> 10^(-7),
"MaxIterations" -> 500};
HMM[Data_, {InitialIota_, InitialA_, InitialMean_, InitialStD_},
OptionsPattern[]] := Module[
{TransMat, matAlpha, matB, matBeta, nBIC, matGamma, ProbVec,
StateLogLikelihood, iMaxIter, StateMean, iN, StateNu, StateStD, t,
iT, nTol, matWeights, matXi},
nTol = OptionValue["LikelihoodTolerance"];
iMaxIter = OptionValue["MaxIterations"];
iT = Length[Data]; iN = Length[InitialA];
ProbVec = InitialIota; TransMat = InitialA;
StateMean = InitialMean; StateStD = InitialStD;
matB =
Table[PDF[NormalDistribution[StateMean[[#]], StateStD[[#]]],
Data[[t]]] & /@ Range[iN], {t, 1, iT}];
matAlpha = Array[0. &, {iT, iN}];
StateNu = Array[0. &, iT];
matAlpha[[1]] = ProbVec matB[[1]];
StateNu[[1]] = 1/Total[matAlpha[[1]]];
matAlpha[[1]] *= StateNu[[1]];
For[t = 2, t <= iT, t++,
matAlpha[[t]] = (matAlpha[[t - 1]].TransMat) matB[[t]];
StateNu[[t]] = 1/Total[matAlpha[[t]]];
matAlpha[[t]] *= StateNu[[t]]];
StateLogLikelihood = {-Total[Log[StateNu]]};
Do[matBeta = Array[0. &, {iT, iN}];
matBeta[[iT, ;;]] = StateNu[[iT]];
For[t = iT - 1, t >= 1, t--,
matBeta[[t]] =
TransMat.(matBeta[[t + 1]] matB[[t + 1]]) StateNu[[t]]];
matGamma = (#/Total[#]) & /@ (matAlpha matBeta);
matXi = Array[0. &, {iN, iN}];
For[t = 1, t <= iT - 1, t++,
matXi += (#/Total[Flatten[#]]) &[
TransMat KroneckerProduct[matAlpha[[t]],
matBeta[[t + 1]] matB[[t + 1]]]]];
TransMat = (#/Total[#]) & /@ matXi;
ProbVec = matGamma[[1]];
matWeights = (#/Total[#]) & /@ Transpose[matGamma];
StateMean = matWeights.Data;
StateStD = Sqrt[Total /@ (matWeights (Data - # & /@ StateMean)^2)];
matB =
Table[PDF[NormalDistribution[StateMean[[#]], StateStD[[#]]],
Data[[t]]] & /@ Range[iN], {t, 1, iT}];
matAlpha = Array[0. &, {iT, iN}];
StateNu = Array[0. &, iT];
matAlpha[[1]] = ProbVec matB[[1]];
StateNu[[1]] = 1/Total[matAlpha[[1]]];
matAlpha[[1]] *= StateNu[[1]];
For[t = 2, t <= iT, t++,
matAlpha[[t]] = (matAlpha[[t - 1]].TransMat) matB[[t]];
StateNu[[t]] = 1/Total[matAlpha[[t]]];
matAlpha[[t]] *= StateNu[[t]]];
StateLogLikelihood =
Append[StateLogLikelihood, -Total[Log[StateNu]]];
If[StateLogLikelihood[[-1]]/StateLogLikelihood[[-2]] - 1 <= nTol,
Break[]],
{iMaxIter}];
nBIC = -2 StateLogLikelihood[[-1]] + (iN (iN + 2) - 1) Log[iT];
{"\[Iota]" -> ProbVec, "a" -> TransMat, "\[Mu]" -> StateMean,
"\[Sigma]" -> StateStD, "\[Gamma]" -> matGamma, "BIC" -> nBIC,
"LL" -> StateLogLikelihood}];
</pre>
The first function computes the equilibrium distribution for an ergodic Markov chain. The second function generates a random parameter set with inputs of the observed data and the number of states. Output from the second function is a list containing the initial state probability vector, the transition matrix of the Markov chain, the mean of each state, and the standard deviation of each state. The third function is the maximum likelihood estimated fit of the HMM. It takes inputs of the observed data and a random parameter set from the second function. The output from the third function is a list that contains the initial state probability vector, the transition matrix, the mean vector, the standard deviation vector, a vector of periodic state probabilities, the Bayesian Information Criterion (BIC) for the fit, and a history of the loglikelihood for each iteration.<br />
<br />
As an example of how to use these functions, lets generate a two state fit with the log differenced crime data. The first step is to generate a random initial parameter set. To hedge the risk of generating a few really bad initial guesses I chose to generate a set of 20 initial guess and run them for 10 iterations. From this set, I then chose the candidate with the highest loglikelihood. This candidate will then be used as a starting point for fitting the model.
<br />
<pre class="prettyprint lang-mma">guess = HMM[crimedifference[[All, 2]], #, "MaxIterations" -> 10] & /@
Table[RandomInitial[crimedifference[[All, 2]], 2], {20}];
bestguessLL =
Flatten[Position[#, Max[#]] &[Last["LL" /. #] & /@ guess]][[1]];
hmmfit = HMM[
crimedifference[[All,
2]], {"\[Iota]", "a", "\[Mu]",
"\[Sigma]"} /. (guess[[bestguessLL]])];
</pre>
It is usually a good idea to check the convergence of the likelihood function to make sure nothing has gone awry.
<br />
<pre class="prettyprint lang-mma">ListPlot["LL" /. hmmfit, PlotRange -> All]
</pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFRQ7ZXpxJOUT9705auQSpzYOAUD8FFyXxHnxFCnCrqRXQ3WCu8yYeptGPSMOW_2_sHFtsVTS9U2UklMnFlmD3kbUrLNjdSByZ8dv_4XJxtsN2O5kR6OfmB_2hhp1voypPzRpUw_WDeoDe/s1600/convergence.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="184" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFRQ7ZXpxJOUT9705auQSpzYOAUD8FFyXxHnxFCnCrqRXQ3WCu8yYeptGPSMOW_2_sHFtsVTS9U2UklMnFlmD3kbUrLNjdSByZ8dv_4XJxtsN2O5kR6OfmB_2hhp1voypPzRpUw_WDeoDe/s320/convergence.png" width="320" /></a></div>
<br />
To cut down on unnecessary typing, I have written a Do loop to evaluate models with different number of states and used a For loop to create a table of the different models with their associated BIC.
<br />
<pre class="prettyprint lang-mma">MaxGaussians = 3;
Do[InitialGuess["Crimes", i] =
HMM[crimedifference[[All, 2]], #, "MaxIterations" -> 10] & /@
Table[RandomInitial[crimedifference[[All, 2]], i], {20}];
BestGuessLL["Crimes", i] =
Flatten[Position[#, Max[#]] &[
Last["LL" /. #] & /@ InitialGuess["Crimes", i]]][[1]];
FittedHMM["Crimes", i] =
HMM[crimedifference[[All,
2]], {"\[Iota]", "a", "\[Mu]",
"\[Sigma]"} /. (InitialGuess["Crimes", i][[
BestGuessLL["Crimes", i]]])],
{i, 1, MaxGaussians, 1}]
bic = {}; states = {};
For[i = 1, i < MaxGaussians + 1, i++, AppendTo[states, i];
j = "BIC" /. FittedHMM["Crimes", i]; AppendTo[bic, j]]
TableForm[Transpose[{states, bic}],
TableHeadings -> {None, {"#States", "BIC"}},
TableAlignments -> Center]
</pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1b4iWDPC_I8-aCYI6vujeejzyaXUhRO7SUpuA_JskvFfRfvOPixGYm4oeaLL_ZZhAISg7Lo3JgK5e5P0y_d0ApTn0jVXi84nvAajf88tnx1F4ScPKThtb-OcF-Ep7nRPJYWWIVJka5h7d/s1600/bictable.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1b4iWDPC_I8-aCYI6vujeejzyaXUhRO7SUpuA_JskvFfRfvOPixGYm4oeaLL_ZZhAISg7Lo3JgK5e5P0y_d0ApTn0jVXi84nvAajf88tnx1F4ScPKThtb-OcF-Ep7nRPJYWWIVJka5h7d/s1600/bictable.png" /></a></div>
<br />
By the BIC, it seems that a HMM with two univariate Gaussian states is the best model. If we examine the Markov chain equilibrium distribution, it seems crimes in Chicago are at a highly volatile with decreasing crimes state about 91% of the time and a low volatility with moderate increases in crimes state about 9% of the time. Looking at the main diagonal of the transition matrix, the large transitions indicate that if crimes in Chicago are found in one state then it will stay in that state for a while.
<br />
<pre class="prettyprint lang-mma">Print["Equilibrium Distribution = ",
MatrixForm[MarkovChainEquilibrium["a" /. FittedHMM["Crimes", 2]]]]
Print["Transition Matrix from Data = ",
MatrixForm["a" /. FittedHMM["Crimes", 2]]]
</pre>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNOUBPdQYyCi8Et_X67IPypNg1jHzReIWttM5kI1f35j1yNmDXu-Jcn6elhYZzQo0N69FYW7IXrajYs8DTy2iNuyrJBZayozjb6tkezkV_tFMYBYFmlW6zmepjah1vxWeTs2retVBkDUaj/s1600/eqdist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhNOUBPdQYyCi8Et_X67IPypNg1jHzReIWttM5kI1f35j1yNmDXu-Jcn6elhYZzQo0N69FYW7IXrajYs8DTy2iNuyrJBZayozjb6tkezkV_tFMYBYFmlW6zmepjah1vxWeTs2retVBkDUaj/s1600/eqdist.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRJjzYol0zjljSf0kn8Kqc_8G5CH0eBJRVeebfGp-PGcZpF_nL4JqiYU5YNYk5jQ3TaQ_bVKo4IVnzbG_IwZtBUdZfZxKCmq2ECKJHd9u0UafLSXKofE7fokICgSl1i1Hku94whnqh-Clr/s1600/transmat.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="26" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRJjzYol0zjljSf0kn8Kqc_8G5CH0eBJRVeebfGp-PGcZpF_nL4JqiYU5YNYk5jQ3TaQ_bVKo4IVnzbG_IwZtBUdZfZxKCmq2ECKJHd9u0UafLSXKofE7fokICgSl1i1Hku94whnqh-Clr/s320/transmat.png" width="320" /></a></div>
<br />
That is all for now, I'll try to create a simulation with the HMM data for the next post. All the files used for this post can be found <a href="http://bit.ly/LtELfJ" target="_blank">here</a>.Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-33552882665096504002012-06-26T10:20:00.001-04:002012-06-28T13:25:08.203-04:00First Attempt at Modeling Crime in Chicago, Part 3I've been trying to keep my attempts to model crimes in Chicago relatively simple. I was hoping to be able to generate some decent predictions by examining a few months of crime data as a signal using a simple autoregressive moving average model (ARMA). And I've finally made some progress!<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6AjWRddukhYkKVvawHavvve3h7GtW1Dpk0AiMGASwx2xAkGlOiMWUzNIDMMhmBQ4y7si2oARx9izIrzIyBcwFP0pccSQA57ZKAsIAGAiex0EvnlL5IZxuMToWm6nrQX8N6Fef2r_GIiL5/s1600/Chicago_Crime-ARMASimulation1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="208" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg6AjWRddukhYkKVvawHavvve3h7GtW1Dpk0AiMGASwx2xAkGlOiMWUzNIDMMhmBQ4y7si2oARx9izIrzIyBcwFP0pccSQA57ZKAsIAGAiex0EvnlL5IZxuMToWm6nrQX8N6Fef2r_GIiL5/s640/Chicago_Crime-ARMASimulation1.png" width="640" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBgO0iHHclM8KlcaGebfZKYb2Rev5xGHUF8OoE4dzzb9CLrIb4oLprO1C76vzliQUi6hlhkVk8Z3sBQyMOQazmx1yD5tFJb3Z8P5-RfU_eV33FN9ORTKS2KOwxn01GwETLwzJ-gOIR9p6a/s1600/Chicago_Crime-ARMASimulation2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="208" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgBgO0iHHclM8KlcaGebfZKYb2Rev5xGHUF8OoE4dzzb9CLrIb4oLprO1C76vzliQUi6hlhkVk8Z3sBQyMOQazmx1yD5tFJb3Z8P5-RfU_eV33FN9ORTKS2KOwxn01GwETLwzJ-gOIR9p6a/s640/Chicago_Crime-ARMASimulation2.png" width="640" /></a></div>
<br />
I found that a ARMA(2, 3) process acts as a reasonable descriptor for the first difference of the crime data between July 1st, 2011 and March 31st, 2012. The graphs above are simulations for April 2012 that were generated with my regressed ARMA(2, 3) parameters. My choices of using an ARMA model and fitting the first difference of the data is explained in a <a href="http://nighttimecuriosities.blogspot.com/2012/06/first-attempt-at-modeling-crime-in_12.html" target="_blank">previous post</a>. The Mathematica code that performs the fitting and diagnostic checks is quite extensive so I may release the work for this post as a package in a later post and document all of the functions then. But if you want to play around with my code and do not mind working with Mathematica, you can find everything used to create this post <a href="http://bit.ly/KKofwy" target="_blank">here</a> (Warning! It is still a work in progress).Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-51298650444444237692012-06-21T13:25:00.001-04:002012-06-28T13:26:08.024-04:00Simple Step to Create a Slicker SiteIn my last few posts I had been skimping on a great deal of procedural details and presented only on some fun results. This could be blamed on my dwindling resources of free time and my compulsion to make things presentable. These reasons are significant hurdles to blogging about a semi-programming language like Mathematica, a document markup language like LaTeX, or intermediate-level languages like C++. Other than displaying these codes as plaintext or a cropped screenshot, there are few efficient options to posting them and even fewer to display the code's syntax in a manner that would fit a personalized blog design on Blogger.<br />
<br />
A fantastic solution to this problem is the combined use of <a href="http://code.google.com/p/google-code-prettify/" target="_blank">google-code-prettify</a> and <a href="http://alexgorbatchev.com/SyntaxHighlighter/" target="_blank">SyntaxHighlighter</a>. The setup of these JavaScripts on Blogger is quite simple, the configurations are limitless, and the results are visually appealing.<br />
<br />
For example, if I wanted to write a quick crazy mathematica post on how to gather all the historical drawings for the <a href="http://www.megamillions.com/" target="_blank">Mega Millions</a> lottery. It would've taken me some time to create a Blogger-ready-sized png picture of the code to upload to Blogger and it would've annoyed my blog readers because they would have to manually type out the code below.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy9VWk3AfYXE5MZ64z4bRvXSv4ySXHeZksacZhC8fLxG-FMGmmPEKeTUEH1ZqlWUxWWB0PkH6j1Bppd50TeVOxeQg40JNOaE9VffApOlWCE82U31KZoShWNfCiyNtVySjDT1NMzymiYoEz/s1600/MegaMillions.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="104" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgy9VWk3AfYXE5MZ64z4bRvXSv4ySXHeZksacZhC8fLxG-FMGmmPEKeTUEH1ZqlWUxWWB0PkH6j1Bppd50TeVOxeQg40JNOaE9VffApOlWCE82U31KZoShWNfCiyNtVySjDT1NMzymiYoEz/s640/MegaMillions.png" width="640" /></a></div>
<br />
Using google-code-prettify I can copy and paste my code in nanoseconds and my readers would also be able to use the code to pick their winning Mega Millions numbers right away.<br />
<pre class="prettyprint lang-mma">dat[x_] :=
Drop[Drop[
Flatten[Import[
"http://www.usamega.com/mega-millions-history.asp?p=" <>
ToString[x] <> "", "Data"], 4], 5], -3]
n = 61;
dat1 = ToString[Flatten[dat[#] & /@ Range[n], 1]];
data = ToExpression[
StringReplace[dat1, {"\[CenterDot]" -> ",", "+" -> ","}]];
number1 = data[[All, 4]];
number2 = data[[All, 5]];
number3 = data[[All, 6]];
number4 = data[[All, 7]];
number5 = data[[All, 8]];
megaball = data[[All, 9]];
</pre>
To get google-code-prettify set up on Blogger, you will need a free <a href="https://www.dropbox.com/" target="_blank">DropBox</a> or <a href="https://www.box.com/" target="_blank">box</a> account and the three files I provide <a href="http://bit.ly/L8rFnU" target="_blank">here</a>. <span style="background-color: white;">The lang-mma.js and prettify.css files I have provided for this post were forked from the</span><span style="background-color: white;"> </span><a href="http://meta.mathematica.stackexchange.com/questions/268/including-google-code-prettify-language-extension-for-mathematica-into-this-site" target="_blank">Mathematica - Stack Exchange</a><span style="background-color: white;">.</span><span style="background-color: white;"> Then follow these steps:</span><br />
<span style="background-color: white;"></span><br />
<ol><span style="background-color: white;">
<li><span style="background-color: white;">Place the uncompressed files anywhere in your preferred cloud drive account.</span></li>
<br />
<li><span style="background-color: white;">Get the sharable hyperlinks of those three files.</span></li>
<br />
<li><span style="background-color: white;">In Blogger (with the new 2012 interface) go to the Template tab on the left, then navigate to the Edit HTML button and proceed.</span></li>
<br />
<li><span style="background-color: white;">Copy and paste the following code into your HTML template (best spot would be right before </head>) and insert the appropriate links in step 2 and save.</span></li>
</span></ol>
<br />
<span style="background-color: white;">
</span>
<br />
<div>
<syntaxhighlighter class="brush: html">
<link href='INSERT LINK TO prettify.css' rel='stylesheet' type='text/css'/>
<script language='javascript' src='INSERT LINK TO prettify.js' type='text/javascript'/>
<script language='javascript' src='INSERT LINK TO lang-mma.js' type='text/javascript'/>
<script type='text/javascript'>
document.addEventListener('DOMContentLoaded',function() {
prettyPrint();
});
</script>
</syntaxhighlighter>
<br />
You can now use the script to create Mathematica styled syntax in a Blogger post by using the tag name "pre" and the "prettyprint" class in the HTML code of a Blogger post. Try it out with the Mega Millions Mathematica code from above.<br />
<br />
<syntaxhighlighter class="brush: html">
<pre class="prettyprint lang-mma">
<!--Paste Mathematica code here-->
</pre>
</syntaxhighlighter>
<br />
As seen on his post for other languages, I prefer to use <a href="http://alexgorbatchev.com/SyntaxHighlighter/" style="background-color: white;" target="_blank">SyntaxHighlighter</a> over google-code-prettify. The instructions on how to setup and use SyntaxHighlighter on Blogger can be found <a href="http://alexgorbatchev.com/SyntaxHighlighter/manual/installation.html" target="_blank">here</a>, so I will not be covering it - unless requested. You can find all the files used for this post <a href="http://bit.ly/L8rFnU" target="_blank">here</a>. Happy Blogging Everyone!</div>Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-16666555376064008252012-06-12T17:21:00.001-04:002012-06-28T13:26:36.174-04:00First Attempt at Modeling Crime in Chicago, Part 2In the last <a href="http://nighttimecuriosities.blogspot.com/2012/06/first-attempt-at-modeling-crime-in.html" target="_blank">post</a>, my main goal was to develop a Mathematica 8 program that could regress coefficients for general autoregressive (AR) models. In my excitement, I skipped an essential step of data modeling. I forgot to check if the data could be accurately described by an AR model.<br />
<br />
To keep things simple for the remainder of my attempts to model the number of daily crimes in Chicago, I will be using only the data between July 1st, 2011 and March 31st, 2012. This will also give me a reason to use the new <a href="http://nighttimecuriosities.blogspot.com/p/data.html" target="_blank">Data section</a> of my blog.<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh02FJ4vlypWB5TePU8dfovtnx3C-8lto11FBiSs_bdziU6y0rarFjVOUWLTg4VgXPm-cVEID_PW6NeNN_fXDSXobmhzeDvTE2O2Aj8XydQWNdcR2eVut-wN6eGwARnaTFb_F0w_hKBNhOG/s1600/CrimeTimeSeries.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="424" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh02FJ4vlypWB5TePU8dfovtnx3C-8lto11FBiSs_bdziU6y0rarFjVOUWLTg4VgXPm-cVEID_PW6NeNN_fXDSXobmhzeDvTE2O2Aj8XydQWNdcR2eVut-wN6eGwARnaTFb_F0w_hKBNhOG/s640/CrimeTimeSeries.png" width="640" /></a></div>
<div>
<br />
<div>
<br /></div>
<div>
Before examining the legitimacy of an AR model selection, I need to check if the data is stationary. The basis for any time series analysis is a stationary time series, because essentially we can only develop models and forecasts for stationary time series. In my experience, I have found no clear demarcation between stationary and non-stationary data. The usual approach to determine stationarity is to plot the autocorrelation function and if the plot doesn't dampen at long lags than the data is most likely not stationary. This logic escapes me. It may be because my statistical skills come from my engineering and computational chemistry training but I don't think the autocorrelation function is defined for non-stationary processes.</div>
<div>
<br /></div>
<div>
One of the tricks to transform a non-stationary dataset to a stationary one is to take the difference of the non-stationary dataset. So for me, a <a href="http://en.wikipedia.org/wiki/Variogram" target="_blank">variogram</a> is the more logical tool for determining stationarity. A variogram gives a ratio of the variance of differences some <i>k</i> time units apart and the variance of the differences only one time unit apart. If you look at the equations below, as <i>k</i> goes to infinity the difference between <i>k</i> lag and <i>k</i>+1 lag will eventually be the same. Ideally, you can conclude a dataset is stationary when the variogram shows a stable asymptote.</div>
<div>
<br />
<span style="font-size: large;">\[G_{k}=\frac{V(z_{t+k}-z_{t})}{V(z_{t+1}-z_{t})},\;k=1,2,3...\]</span><br />
<div style="text-align: left;">
where</div>
<span style="background-color: transparent; font-size: large;">\[V(z_{t+k}-z_{t})=\frac{\sum_{t=1}^{n-k}(d^{ k}_{t}-(n-k)^{-1}\sum d^{ k}_{t})^2}{n-k-1}\]</span><br />
<span style="font-size: large;">\[d^{k}_{t}=z_{t+k}-z_{t}\]</span></div>
</div>
<br />
In Mathematica, a sample variogram can be calculated accordingly:<br />
<br />
<span style="background-color: transparent;">LagDifferences[list_, k_Integer?Positive] /; k < Length[list] := </span><span style="background-color: transparent;">Drop[list, k] - Drop[list, -k];</span><br />
<br />
<div class="p1">
Variogram[list_, k_] := (#/First@#) &@(Variance /@ Table[LagDifferences[list, i], {i, k}]);</div>
<br />
Now we can finally move on to model selection. As a rule of thumb, an AR(<i>p</i>) model can adequately model a set of data if the autocorrelation function (Acf) plot looks like an infinitely damped exponential or sine wave that tails off and the partial autocorrelation function (PAcf) plot is cut off after <i>p</i> lags. Plots of the variogram, autocorrelation function, and partial autocorrelation function for the Chicago crime data and differences of the data are shown below.<br />
<br />
<table border="1">
<colgroup span="1" style="background: #ffffff;">
</colgroup><colgroup span="2" style="background: #ffffff;">
</colgroup><thead>
<tr>
<th></th>
<th>Variogram</th>
</tr>
</thead>
<tbody>
<tr>
<td>Crime Series</td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuoIQg_cdZD-mA0ZQOeMkWGMt8Bzf_WQMK90Pj9bHjf8ok5iBZO3_-wJrSAMeCQcyBE4SwsoKHGB2vpfqyF3zKu9p85sVFWlOvclr1lS-vkQmD7SxxoKCKb3S7lf7aFin5DxmPIXpHRCeZ/s1600/CrimeSeries-variogram.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="128" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuoIQg_cdZD-mA0ZQOeMkWGMt8Bzf_WQMK90Pj9bHjf8ok5iBZO3_-wJrSAMeCQcyBE4SwsoKHGB2vpfqyF3zKu9p85sVFWlOvclr1lS-vkQmD7SxxoKCKb3S7lf7aFin5DxmPIXpHRCeZ/s200/CrimeSeries-variogram.png" width="200" /></a></div>
<br /></td>
</tr>
<tr>
<td>1st Difference</td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpaxxWBa9EQE14u3J7ABpvEqRgQYy299qSoA__noaW_LIjJ5E1tLUuH_tFhyphenhyphenDHfXDBYQPCYdJZ0vGa8pzeJuEIE3OdyGEI1ebulo9o6r2JCAhL1vrSURM_Jq31zzIgVBhzggFAJZQWGooB/s1600/Diff1CrimeSeries-variogram.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="128" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpaxxWBa9EQE14u3J7ABpvEqRgQYy299qSoA__noaW_LIjJ5E1tLUuH_tFhyphenhyphenDHfXDBYQPCYdJZ0vGa8pzeJuEIE3OdyGEI1ebulo9o6r2JCAhL1vrSURM_Jq31zzIgVBhzggFAJZQWGooB/s200/Diff1CrimeSeries-variogram.png" width="200" /></a></div>
<br /></td>
</tr>
<tr>
<td>2nd Difference</td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGcgYLbv9v4n9wL5OYHdbO51ygnPiP5szjsIq-FXC4FrSwFtW86M5V5QpXMIJ7mcAw-GYDOW1P-B9M9-0_aDuaVUUqwvQj-rfj3PJXLMeLyiLJHYALNADCXCvP-afBigBnlRabrB0ZYrY3/s1600/Diff2CrimeSeries-variogram.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="128" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjGcgYLbv9v4n9wL5OYHdbO51ygnPiP5szjsIq-FXC4FrSwFtW86M5V5QpXMIJ7mcAw-GYDOW1P-B9M9-0_aDuaVUUqwvQj-rfj3PJXLMeLyiLJHYALNADCXCvP-afBigBnlRabrB0ZYrY3/s200/Diff2CrimeSeries-variogram.png" width="200" /></a></div>
<br /></td>
</tr>
</tbody>
</table>
<br />
<table border="1">
<colgroup span="1" style="background: #ffffff;">
</colgroup><colgroup span="2" style="background: #ffffff;">
</colgroup><thead>
<tr>
<th></th>
<th>Autocorrelation</th>
<th>Partial Autocorrelation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Crime Series</td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgs12MsrXe7WZlJBvXJo_YHl2VN_tRq0r0C9eeGwk8Ks2LSRRAbEkbtHOCPTQ7PLD2MLDuYDz1dyLHIH1p2Oz8BvMnIxBdX7Hm8MUO7vW8XOrmcJ9rUqPXuDL4RyyXF7yZ0lfKQPV4Qtquh/s1600/CrimeSeries-acf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgs12MsrXe7WZlJBvXJo_YHl2VN_tRq0r0C9eeGwk8Ks2LSRRAbEkbtHOCPTQ7PLD2MLDuYDz1dyLHIH1p2Oz8BvMnIxBdX7Hm8MUO7vW8XOrmcJ9rUqPXuDL4RyyXF7yZ0lfKQPV4Qtquh/s200/CrimeSeries-acf.png" width="200" /></a></div>
<br /></td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg30WvV9aJxid_J9Pjc0jNEOKYvK_rSZoyS9IZmzid8_XkVBHA82GXXY4VIEaUNEUQ6ZAnCCdwoY_7wPtlE9Vln83LQ4eR1NFkhmHJUXtI8PpJYsWGaE0h3huH5cL17OE1I7QG1S2sMM9o2/s1600/CrimeSeries-pacf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg30WvV9aJxid_J9Pjc0jNEOKYvK_rSZoyS9IZmzid8_XkVBHA82GXXY4VIEaUNEUQ6ZAnCCdwoY_7wPtlE9Vln83LQ4eR1NFkhmHJUXtI8PpJYsWGaE0h3huH5cL17OE1I7QG1S2sMM9o2/s200/CrimeSeries-pacf.png" width="200" /></a></div>
<br /></td>
</tr>
<tr>
<td>1st Difference</td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjX5v9Yz5N18vG9q3eDlPavFmHu9Na56zSf_y2VP1C320utjP9xhfTIfyyVKvE2twkcxU2PAA7DHr7bfW23VF9Q3RBZZ4BV2aX7VTrw_xAqYfl0VKP73eOAwa_zHLw9kRT3nifLy_mKzj15/s1600/Diff1CrimeSeries-acf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjX5v9Yz5N18vG9q3eDlPavFmHu9Na56zSf_y2VP1C320utjP9xhfTIfyyVKvE2twkcxU2PAA7DHr7bfW23VF9Q3RBZZ4BV2aX7VTrw_xAqYfl0VKP73eOAwa_zHLw9kRT3nifLy_mKzj15/s200/Diff1CrimeSeries-acf.png" width="200" /></a></div>
<br /></td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgw4hSczbY8nHOCByAQamnpcPbHL9BTqkn5WMYgyhgb4MgE09VoQHV6YoPQPDvGdV3hAPZRsBrCWO8_vJD1mA38sAjbZaUF66tBRPiL7vSmXBiFR3nmeRR_3DrThHqDm2tCDiTlcuudFLSx/s1600/Diff1CrimeSeries-pacf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgw4hSczbY8nHOCByAQamnpcPbHL9BTqkn5WMYgyhgb4MgE09VoQHV6YoPQPDvGdV3hAPZRsBrCWO8_vJD1mA38sAjbZaUF66tBRPiL7vSmXBiFR3nmeRR_3DrThHqDm2tCDiTlcuudFLSx/s200/Diff1CrimeSeries-pacf.png" width="200" /></a></div>
<br /></td>
</tr>
<tr>
<td>2nd Difference</td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUUpg3UsFTY_Un09-hGYXlNy2btKfVVEhjnOHLF-o9XE2VTztOFOvU1ZVq8hTaG-Dm-j6CLdTaToYYFINN-c3GjL_mOUk948VPvZLGOZUuAqmclpBP9d0ystUzcwjSqWr-wFBNbj6pfnDy/s1600/Diff2CrimeSeries-acf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgUUpg3UsFTY_Un09-hGYXlNy2btKfVVEhjnOHLF-o9XE2VTztOFOvU1ZVq8hTaG-Dm-j6CLdTaToYYFINN-c3GjL_mOUk948VPvZLGOZUuAqmclpBP9d0ystUzcwjSqWr-wFBNbj6pfnDy/s200/Diff2CrimeSeries-acf.png" width="200" /></a></div>
<br /></td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht1BCRFE61YcXkzmflXIuHPNdBvL7AnE9px4-oXx9CPIFM5qN7_o-uSVaFnMQBhxsc5pae4IYXBMlo8Yy8pj9cIWsSsICM4uT0iwkUth_twqCpOVeTGgl54DVIFLc3KCi3eJzUKqPDS0hB/s1600/Diff2CrimeSeries-pacf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht1BCRFE61YcXkzmflXIuHPNdBvL7AnE9px4-oXx9CPIFM5qN7_o-uSVaFnMQBhxsc5pae4IYXBMlo8Yy8pj9cIWsSsICM4uT0iwkUth_twqCpOVeTGgl54DVIFLc3KCi3eJzUKqPDS0hB/s200/Diff2CrimeSeries-pacf.png" width="200" /></a></div>
<br /></td>
</tr>
</tbody>
</table>
<br />
It looks like taking the first difference will produce a more stationary time series. The large negative autocorrelation at lag 1 in both the Acf and PAcf plot suggests the data may be better described with a moving average model component rather than with just an autoregressive model. Looks like it is bullocks on me for jumping the gun. I'll be back next post with a moving average model made in Mathematica. Until then, have fun with the source files for this post <a href="http://bit.ly/LkCKTW" target="_blank">here</a>.Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-3791483302486749892012-06-05T18:56:00.003-04:002012-06-05T18:56:38.685-04:00First Attempt at Modeling Crime in ChicagoIn my last <a href="http://nighttimecuriosities.blogspot.com/2012/05/chicago-crime-updated.html" target="_blank">post</a>, I showed how to use the <a href="http://data.cityofchicago.org/api/docs" target="_blank">Socrata Open Data API</a> (SODA) to download the data of the crimes reported in Chicago, how to plot the locations, and how to count the number of crimes that occurred in a certain area. This post will chronicle my first attempts to model that data in Mathematica.<br />
<br />
The time series analysis add-on to Mathematica 8 costs about $295. So I've decided to write my own autoregressive model (AR) package as a start to modeling the crimes in Chicago. My AR code in Mathematica is based off the <a href="http://cran.r-project.org/web/packages/FitAR/index.html" target="_blank">FitAR</a> R-package and its associated <a href="http://www.jstatsoft.org/v28/i02/paper" target="_blank">paper</a>. The mathematics and derivations of autoregressive models are already heavily covered on other <a href="http://en.wikipedia.org/wiki/Autoregressive_model" target="_blank">websites</a>, so I will not be explaining it here. The alpha version of my code can be found <a href="http://bit.ly/LxBFsp" target="_blank">here</a>, note it is not complete and does not yet have all the functions of FitAR or the Mathematica time series add-on.<br />
<br />
My code, at this point, does provide fits that are comparable to the FitAR package. Using the default "lynx" data in R the following fits and associated residual autocorrelation plots were produced:<br />
<br />
<style type="text/css">
.mask{
position: relative;
overflow: hidden;
margin: 0px auto;
width: 100%;
background-color: #ffffff
}
.header{
float: left;
width: 100%;
background-color: #ffffff
}
.colleft{
position: relative;
width: 100%;
right: 50%;
background-color: #ffffff
}
.col1{
position: relative;
overflow: hidden;
float: left;
width: 48%;
left: 101%;
background-color: #ffffff
}
.col2{
position: relative;
overflow: hidden;
float: left;
width: 48%;
left: 3%;
background-color: #ffffff
}
.footer{
float: left;
width: 100%;
background-color: #ffffff
}
</style><br />
<div class="mask">
<div class="colleft">
<div class="col1">
Mathematica<br />
<br />
<table border="1" summary="Results from my Mathematica implementation of a AR(1) fit for the lynx data."><caption><em>AR(1) Fit</em></caption> <tbody>
<tr><th></th><th>MLE</th><th>sd</th><th>Z-ratio</th></tr>
<tr><th>phi(1)</th><td>0.71729</td><td>0.0652589</td><td>10.9914</td></tr>
<tr><th>mu</th><td>1537.94</td><td>364.477</td><td>4.21957</td></tr>
</tbody></table>
<br />
<div style="text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBXtA4gyEbHuzXRoMdj8jNX41FvBnVOrsvaKka-RFoGRiPKin9D8VVwR0IVvq5W5Z1jg5Md1HKGHrsUAkqCo778l6_p09-RZnPBgH7mhMi-tZ-xiOTqRU4f6PfOCx2XMlMZTIGR7sI-IgK/s1600/Residual+Auto+Mathematica+AR1.png" imageanchor="1" style="clear: left; margin-bottom: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBXtA4gyEbHuzXRoMdj8jNX41FvBnVOrsvaKka-RFoGRiPKin9D8VVwR0IVvq5W5Z1jg5Md1HKGHrsUAkqCo778l6_p09-RZnPBgH7mhMi-tZ-xiOTqRU4f6PfOCx2XMlMZTIGR7sI-IgK/s200/Residual+Auto+Mathematica+AR1.png" width="200" /></a></div>
<br />
<table border="1" summary="Results from my Mathematica implementation of a AR(4) fit for the lynx data."><caption><em> </em></caption><caption><em> </em></caption><caption><em style="background-color: transparent;"> </em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;">AR(4) Fit</em></caption><tbody>
<tr><th></th><th>MLE</th><th>sd</th><th>Z-ratio</th></tr>
<tr><th>phi(1)</th><td>1.12428</td><td>0.0905874</td><td>12.411</td></tr>
<tr><th>phi(2)</th><td>-0.716667</td><td>0.136707</td><td>-5.24237</td></tr>
<tr><th>phi(3)</th><td>0.262661</td><td>0.136707</td><td>1.92135</td></tr>
<tr><th>phi(4)</th><td>-0.253983</td><td>0.0905874</td><td>-2.80374</td></tr>
<tr><th>mu</th><td>1537.94</td><td>135.755</td><td>11.3288</td></tr>
</tbody></table>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBme_fuxN-aqAOWqpSIqGwBzPrFqfkmu4RM3A8CBCmXLUK6Nk8I8ODAexIlbmTsm0fRg-XOG7jEluYHXV3iGj5YjqpTC9vd4hYX1EVnK8vhTxcKlBZ_R4Lvj3kC_3xWvY-Q1tWQdQ5UlBJ/s1600/Residual+Auto+Mathematica+AR4.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBme_fuxN-aqAOWqpSIqGwBzPrFqfkmu4RM3A8CBCmXLUK6Nk8I8ODAexIlbmTsm0fRg-XOG7jEluYHXV3iGj5YjqpTC9vd4hYX1EVnK8vhTxcKlBZ_R4Lvj3kC_3xWvY-Q1tWQdQ5UlBJ/s200/Residual+Auto+Mathematica+AR4.png" width="200" /></a></div>
<br />
<table border="1" summary="Results from my Mathematica implementation of subset AR fit for the lynx data."><caption><em> </em></caption><caption><em> </em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;">Subset AR(1,2,4,5,7,10,11) Fit</em></caption><tbody>
<tr><th></th><th>MLE</th><th>sd</th><th>Z-ratio</th></tr>
<tr><th>phi(1)</th><td>0.820455</td><td>0.0178649</td><td>45.9256</td></tr>
<tr><th>phi(2)</th><td>-0.632818</td><td>0.0972914</td><td>-6.50436</td></tr>
<tr><th>phi(4)</th><td>-0.141951</td><td>0.0684639</td><td>-2.07337</td></tr>
<tr><th>phi(5)</th><td>0.141893</td><td>0.0748193</td><td>1.89647</td></tr>
<tr><th>phi(7)</th><td>0.202038</td><td>0.0992876</td><td>2.03488</td></tr>
<tr><th>phi(10)</th><td>-0.314105</td><td>0.0917768</td><td>-3.42249</td></tr>
<tr><th>phi(11)</th><td>-0.368658</td><td>0.0870617</td><td>-4.23444</td></tr>
<tr><th>mu</th><td>6.68591</td><td>0.100657</td><td>66.4225</td></tr>
</tbody></table>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcPbqc-_k9ri-KrDRrOaRyuAnn2ZxItfJMJeOg5nKypZuFTR4dyW7uxm26Bhti3iwmKoNHhDPZk38ZPejxa8JgHdsFBzCJ5kaKfWHDVa8bWr5rA0Sqaksg2LrdRQQY3S5_Dq7EoXaNN_Nf/s1600/Residual+Auto+Mathematica+AR.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="81" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhcPbqc-_k9ri-KrDRrOaRyuAnn2ZxItfJMJeOg5nKypZuFTR4dyW7uxm26Bhti3iwmKoNHhDPZk38ZPejxa8JgHdsFBzCJ5kaKfWHDVa8bWr5rA0Sqaksg2LrdRQQY3S5_Dq7EoXaNN_Nf/s200/Residual+Auto+Mathematica+AR.png" width="200" /></a></div>
</div>
<div class="col2">
R<br />
<br />
<table border="1" summary="This is the R produced AR(1) fit of the lynx data."><caption><em>AR(1) Fit</em></caption> <tbody>
<tr><th></th><th>MLE</th><th>sd</th><th>Z-ratio</th></tr>
<tr><th>phi(1)</th><td>0.717303</td><td>0.0652577</td><td>10.9918</td></tr>
<tr><th>mu</th><td>1538.02</td><td>363.986</td><td>4.22548</td></tr>
</tbody></table>
<br />
<div style="text-align: left;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5IpJu-sa1gOa3lcCWFKpq-4LgEEX9ljRKTvUNCNpkkZyLqVP7pfnDmzizlCgzt5_kOhXumKMZ8raRmLzHbGdOOtLXmrJX-UMy0kEM3VF9JfTo_KC69hvXFnjhrY1JHR2ThLRV7P78-_9i/s1600/Residual+AutoCorrelation-AR1.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="86" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg5IpJu-sa1gOa3lcCWFKpq-4LgEEX9ljRKTvUNCNpkkZyLqVP7pfnDmzizlCgzt5_kOhXumKMZ8raRmLzHbGdOOtLXmrJX-UMy0kEM3VF9JfTo_KC69hvXFnjhrY1JHR2ThLRV7P78-_9i/s200/Residual+AutoCorrelation-AR1.png" width="200" /></a></div>
<br />
<table border="1" style="background-color: transparent; text-align: center;" summary="This is the R produced AR(4) fit of the lynx data."><caption><em style="background-color: transparent;"> </em></caption><caption><em style="background-color: transparent;"> </em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;">AR(4) Fit</em></caption><tbody>
<tr><th></th><th>MLE</th><th>sd</th><th>Z-ratio</th></tr>
<tr><th>phi(1)</th><td>1.12463</td><td>0.0905803</td><td>12.4158</td></tr>
<tr><th>phi(2)</th><td>-0.717396</td><td>0.1367150</td><td>-5.24738</td></tr>
<tr><th>phi(3)</th><td>0.263355</td><td>0.136715</td><td>1.92630</td></tr>
<tr><th>phi(4)</th><td>-0.254273</td><td>0.0905803</td><td>-2.80716</td></tr>
<tr><th>mu</th><td>1538.02</td><td>135.469</td><td>11.3533</td></tr>
</tbody></table>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht-q94uED5M1wkx0TZgKSA69HzlDEpidkICt899oLXt8w9mNJOxvQGJRv8D-PbZfW-b2-zOwPSiN5ymj4QuDuHNIovkZftkRm0647MQovH4SfjU69cLX4JDIo1UlI_YUHNixTolJo6NRtq/s1600/Residual+AutoCorrelation-AR4.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="85" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEht-q94uED5M1wkx0TZgKSA69HzlDEpidkICt899oLXt8w9mNJOxvQGJRv8D-PbZfW-b2-zOwPSiN5ymj4QuDuHNIovkZftkRm0647MQovH4SfjU69cLX4JDIo1UlI_YUHNixTolJo6NRtq/s200/Residual+AutoCorrelation-AR4.png" width="200" /></a></div>
<br />
<table border="1" style="background-color: transparent; text-align: center;" summary="This is the R produced subset AR fit of the lynx data."><caption><em style="background-color: transparent;"> </em></caption><caption><em style="background-color: transparent;"> </em></caption><caption><i><br /></i></caption><caption><i><br /></i></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;"><br /></em></caption><caption><em style="background-color: transparent;">Subset AR(1,2,4,5,7,10,11) Fit</em></caption><tbody>
<tr><th></th><th>MLE</th><th>sd</th><th>Z-ratio</th></tr>
<tr><th>phi(1)</th><td>0.8204490</td><td>0.01786005</td><td>45.937651</td></tr>
<tr><th>phi(2)</th><td>-0.6328433</td><td>0.09731087</td><td>-6.503316</td></tr>
<tr><th>phi(4)</th><td>-0.1420888</td><td>0.06847123</td><td>-2.075161</td></tr>
<tr><th>phi(5)</th><td>0.1421388</td><td>0.07481409</td><td>1.899894</td></tr>
<tr><th>phi(7)</th><td>0.2021250</td><td>0.09927376</td><td>2.036036</td></tr>
<tr><th>phi(10)</th><td>-0.3140954</td><td>0.09178210</td><td>-3.422186</td></tr>
<tr><th>phi(11)</th><td>-0.3686589</td><td>0.08706171</td><td>-4.234455</td></tr>
<tr><th>mu</th><td>6.6859329</td><td>0.09999207</td><td>66.864631</td></tr>
</tbody></table>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOohuQV2yXOooeoTlSg7zHAvKs-7pKmgsSTyJmNKmsFfT4PsU6KqqawlRwJzEexLQ0897xl3lfWXauC7GWylAECCB53XLUAcCs0RXg-y7S69lY14MkYL-Fa_ouRN_fdWB0SRbIJelNcOpm/s1600/Residual+AutoCorrelation-AR.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="85" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOohuQV2yXOooeoTlSg7zHAvKs-7pKmgsSTyJmNKmsFfT4PsU6KqqawlRwJzEexLQ0897xl3lfWXauC7GWylAECCB53XLUAcCs0RXg-y7S69lY14MkYL-Fa_ouRN_fdWB0SRbIJelNcOpm/s200/Residual+AutoCorrelation-AR.png" width="200" /></a></div>
</div>
<br /></div>
</div>
<br />
Now that I know my code is correct, the next step will be figuring out if an autoregressive model can properly describe the crime data from Chicago. The lynx data from above is almost trivial when compared to the chaotic mess of crime data.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsbKdrLsi-1RgQ63W-zHyUB0SctW19l-uGM5yk1zQRF1IbCst8NWn6CwKU7apEGejAFN7KQMze_CcHhy0DQflgLLp-huK4U6uSyhr-IjZz_s7-toY9v_L6YvwxjQrcY7IZVVmMBX_E29Oo/s1600/TimeSeriesPlotLynx.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="155" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjsbKdrLsi-1RgQ63W-zHyUB0SctW19l-uGM5yk1zQRF1IbCst8NWn6CwKU7apEGejAFN7KQMze_CcHhy0DQflgLLp-huK4U6uSyhr-IjZz_s7-toY9v_L6YvwxjQrcY7IZVVmMBX_E29Oo/s400/TimeSeriesPlotLynx.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Time Series Plot of lynx data</td></tr>
</tbody></table>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDKlOBahndchv9DQF8J_Qr3gy92eewYJBzaH3lhqsyS1e2JmlHQcT2U5ylLpb3NJARjCbkOrZi-jSFUpyFfudPf5kiKzj_bcCQ4taNmpd-T3qBdAXCFx82qRRp5WOh8G37MPQk5knapdez/s1600/Chicago_Crime-ARTimeSeries.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="265" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDKlOBahndchv9DQF8J_Qr3gy92eewYJBzaH3lhqsyS1e2JmlHQcT2U5ylLpb3NJARjCbkOrZi-jSFUpyFfudPf5kiKzj_bcCQ4taNmpd-T3qBdAXCFx82qRRp5WOh8G37MPQk5knapdez/s400/Chicago_Crime-ARTimeSeries.png" width="400" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Plot of daily totals of crimes in Chicago.</td></tr>
</tbody></table>
<br />
Just for fun, this is the residual autocorrelation of an AR(5) regressed with the crime series data from above:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8dibbJd8b2XJTPA6YCTDdjh7tnGO8mUai7m41FiC5h03UnkGnMIjt-JKnyjKzlTpJ2DPAQH0WqSmTgvyAlkK_lTB51LjHkZcFwjdWafZ1jB9vdBBqoltC5mEC-yFcxPpKd5HsCJEJZqM7/s1600/Chicago_Crime-ARTimeSeriesAR5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="130" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj8dibbJd8b2XJTPA6YCTDdjh7tnGO8mUai7m41FiC5h03UnkGnMIjt-JKnyjKzlTpJ2DPAQH0WqSmTgvyAlkK_lTB51LjHkZcFwjdWafZ1jB9vdBBqoltC5mEC-yFcxPpKd5HsCJEJZqM7/s320/Chicago_Crime-ARTimeSeriesAR5.png" width="320" /></a></div>
<br />
Not a model anyone should ever depend on. The files used in this post can be found <a href="http://bit.ly/LxBFsp" target="_blank">here</a>.Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-62129985359968635812012-05-07T14:00:00.002-04:002012-05-07T14:00:54.049-04:00Chicago Crime - UpdatedI was a bit dismayed to discover that the EveryBlock API stopped working. One of my most popular posts, <a href="http://nighttimecuriosities.blogspot.com/2011/05/chicago-crime-in-24-hours.html" target="_blank">Chicago Crime</a>, depended on this API heavily.<br />
<br />
Here I present an alternative data source for my previous <a href="http://nighttimecuriosities.blogspot.com/2011/05/chicago-crime-in-24-hours.html" target="_blank">post</a> and a fun update. The update is a quick and dirty method to create popular crime density plots, as shown below.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvNK-7nTncKig7LCIGkrbxd_2SWs97FS8kiT9ISNyZc3u8B00NqXmahLlGnM8NKr2pMTuT8MD8xUuJw_26al4tZwMmgscJuM7r-hcOCsbwqpCQhoi2cyJPzADG4d1loZqrKQaRExAb8WAq/s1600/Chicago_Crime_DensityPoints.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvNK-7nTncKig7LCIGkrbxd_2SWs97FS8kiT9ISNyZc3u8B00NqXmahLlGnM8NKr2pMTuT8MD8xUuJw_26al4tZwMmgscJuM7r-hcOCsbwqpCQhoi2cyJPzADG4d1loZqrKQaRExAb8WAq/s1600/Chicago_Crime_DensityPoints.png" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
In the spirit of transparency, the city of Chicago has taken the initiative to electronically release a great deal of data with <a href="http://data.cityofchicago.org/api/docs" target="_blank">Socrata Open Data API</a> (SODA). Luckily for us, this release includes all filed police reports. The records available through this API range from the year 2001 to the present, which dwarfs the two week range provided by EveryBlock's API. However, I would not recommend downloading all of this data as the text file is approximately 2GB in size.<br />
<br />
In this post I use only the filed police reports from one year prior to present. The data is available in both JSON and CSV format. I only use the CSV format here for the convenience of those with an earlier version of Mathematica. To import this data into Mathematica:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvANcQltDsP6XavdflvT2Si7atBo98YIbcjOLX0f0EnvQ5KLVod6lLWBJpLz9KkZOF423gZIZlelmnbyZD9sj9C9237WO6Hs247fEUzuj2vnKO-ue1BBXwxtSkk_jnrTvxo4d3cO7FxQZK/s1600/Chicago_Crime_csv.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvANcQltDsP6XavdflvT2Si7atBo98YIbcjOLX0f0EnvQ5KLVod6lLWBJpLz9KkZOF423gZIZlelmnbyZD9sj9C9237WO6Hs247fEUzuj2vnKO-ue1BBXwxtSkk_jnrTvxo4d3cO7FxQZK/s1600/Chicago_Crime_csv.png" /></a></div>
<br />
<br />
<br />
<br />
A plot of all reported crimes in Chicago between February 1st and March 31st of 2012 was created with the data.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhn2qVAdyzIRp2wOn1tKrvNLLUa2phHbIUqSE9afAqRBuTrv-ej5St3jeOEH4YzL3m5ohHbKeUq0F4AcCOqoOQDc3lckMMxSyFBVTBaEY9AakDAbFxqgXosShMSsJD3gTMoH2Z7K6Q0F2C_/s1600/Chicago_Crime_Plot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhn2qVAdyzIRp2wOn1tKrvNLLUa2phHbIUqSE9afAqRBuTrv-ej5St3jeOEH4YzL3m5ohHbKeUq0F4AcCOqoOQDc3lckMMxSyFBVTBaEY9AakDAbFxqgXosShMSsJD3gTMoH2Z7K6Q0F2C_/s1600/Chicago_Crime_Plot.png" /></a></div>
<br />
Plot of incidents for any period within the year can be easily created with the following Mathematica code. Restrict the data to the range of interest by replacing the dates of the start and end strings. Dates must be in the format of mm/dd/yyyy. This code simply finds the positions of the strings that match the inputed dates and plots the daily totals of incidents.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4IGV4_WowJCKeZwdV8rwSq7NfuBJWG6Y-gKYcJJBGYG6cFoGjXKLwZEWDD-iJ4eIdPw4vStoWdsXNUzM8QYZrrbZsAjIinn3AtzuY1r94RBJnwCG_xV03EKr9g6_FTDAKVIhKFFKtsy1v/s1600/Chicago_Crime_AllPlot.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4IGV4_WowJCKeZwdV8rwSq7NfuBJWG6Y-gKYcJJBGYG6cFoGjXKLwZEWDD-iJ4eIdPw4vStoWdsXNUzM8QYZrrbZsAjIinn3AtzuY1r94RBJnwCG_xV03EKr9g6_FTDAKVIhKFFKtsy1v/s1600/Chicago_Crime_AllPlot.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
The data can be parsed into primary descriptions including arson, homicide, sex offense, weapons violation, crim sexual assault, interfere with public officer, gambling, public peace violation, criminal trespass, liquor law violation, assault, deceptive practice, burglary, criminal damage, robbery, motor vehicle theft, offense involving children, narcotics, theft, other offense, and battery. Below is the plot and code to filter for only narcotics incidents.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdthLzhtjFtVX7OR266QxNjPy2ZO8iWmgqEaj8SY8i7xa4_fRPws_PM5oA0oeTZwxMTVM9LN3y_9pA85c4EStezrI5hLBXxYV4PATx2EKNUuOkl78rtj097V7LgRLvMMPFGbxtq46FWCC-/s1600/primary.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdthLzhtjFtVX7OR266QxNjPy2ZO8iWmgqEaj8SY8i7xa4_fRPws_PM5oA0oeTZwxMTVM9LN3y_9pA85c4EStezrI5hLBXxYV4PATx2EKNUuOkl78rtj097V7LgRLvMMPFGbxtq46FWCC-/s1600/primary.png" /></a></div>
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU_k0hZcBFtA6yenlYLM8BNAsUmrLNF72A7FOE3KaidlpBJaBU5M0wdIOkFaH_JQRYl8yM_6BVguxxT79UkAwTc0PdW-xi8gYs7NeenjQtgSUMxLU2RzPaRD22px2AMHKkCuwqFvtWmKWW/s1600/Chicago_Crime_Prime.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="292" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiU_k0hZcBFtA6yenlYLM8BNAsUmrLNF72A7FOE3KaidlpBJaBU5M0wdIOkFaH_JQRYl8yM_6BVguxxT79UkAwTc0PdW-xi8gYs7NeenjQtgSUMxLU2RzPaRD22px2AMHKkCuwqFvtWmKWW/s640/Chicago_Crime_Prime.png" width="640" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
A secondary description can also be used to parse the data. Some secondary descriptions include aggravated: handgun, poss: crack, armed: handgun, harassment by telephone, poss: heroin(white), to land, retail theft, unlawful entry, strongarm - no weapon, to property, poss: cannabis 30gms or less, from building, $500 and under, forcible entry, to vehicle, domestic battery simple, simple, automobile, over $500, telephone threat, and wireroom/sports. Below is the plot and code to filter for incidents with a primary description of narcotics and a secondary description of poss: cannabis 30gms or less.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb9ps66sF5FKTjJ9bMevQAMaWVNoqqeMf5V-1jhEKnNGccImXOe0j4Bl7hpIoBXqajhXSgeoO7atm8CQ95qaFkUkrchtvKHD-Y_hUyk8dt5WDB8uSdFDCVHvQx_sL2FYK3MaqX19XerW4o/s1600/secondary.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhb9ps66sF5FKTjJ9bMevQAMaWVNoqqeMf5V-1jhEKnNGccImXOe0j4Bl7hpIoBXqajhXSgeoO7atm8CQ95qaFkUkrchtvKHD-Y_hUyk8dt5WDB8uSdFDCVHvQx_sL2FYK3MaqX19XerW4o/s1600/secondary.png" /></a></div>
<br />
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZL3_jwIIFqy8uJ-xPw9t9EkeYouLTJ5qemc_Dpaq3uWVsbKXqN2jyhNl0-gdVEZ0DNz52UOQhm5YoOGeIQB5SoRg2pKwmTEyxATRPGGVSfbxSIzews2HBmEGDbytUpV-ExCPCFCVzJT7I/s1600/Chicago_Crime_Second.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="292" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZL3_jwIIFqy8uJ-xPw9t9EkeYouLTJ5qemc_Dpaq3uWVsbKXqN2jyhNl0-gdVEZ0DNz52UOQhm5YoOGeIQB5SoRg2pKwmTEyxATRPGGVSfbxSIzews2HBmEGDbytUpV-ExCPCFCVzJT7I/s640/Chicago_Crime_Second.png" width="640" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
The incidents may also be separated into ones where arrests were made and ones where no arrests were made. Below is code to filter for incidents with a primary description of narcotics and a secondary description of poss: cannabis 30gms or less with arrests and without. Both plots comparing to total narcotics incidents are shown.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDXegyoTcVQAU1Dp9QnxceVSjhbQUdWUzF_Zim2v2jTXM0nfJZAcKIarj24DIvX0lAS-4M4ka_RBztrajzsKkQ8tRtZPnB0XuH0Wo4r4VPP1SwmXQU2sSO2C2Sa9sGu9wL90BsWlepqVf8/s1600/arrests.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDXegyoTcVQAU1Dp9QnxceVSjhbQUdWUzF_Zim2v2jTXM0nfJZAcKIarj24DIvX0lAS-4M4ka_RBztrajzsKkQ8tRtZPnB0XuH0Wo4r4VPP1SwmXQU2sSO2C2Sa9sGu9wL90BsWlepqVf8/s1600/arrests.png" /></a></div>
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiC27vCKuyYpFywCgmZ4g3N0oYgsALy3hHLgs-DceHwG7fF2ruLxVgx1vmmE3AhfMVD3ObvU3BQ0V80HWC3RtjDIcHJTPdZQS0MDFL2TNULYyBjM5M5N8osgiOAz8F-6pNyHuNtR0j27OOX/s1600/Chicago_Crime_Arrests.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="292" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiC27vCKuyYpFywCgmZ4g3N0oYgsALy3hHLgs-DceHwG7fF2ruLxVgx1vmmE3AhfMVD3ObvU3BQ0V80HWC3RtjDIcHJTPdZQS0MDFL2TNULYyBjM5M5N8osgiOAz8F-6pNyHuNtR0j27OOX/s640/Chicago_Crime_Arrests.png" width="640" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
The method to display where certain incidents occur in chicago is similar to my previous post, <a href="http://nighttimecuriosities.blogspot.com/2011/05/chicago-crime-in-24-hours.html" target="_blank">Chicago Crime</a>. Please refer to that posting for a more detailed explanation on how to create the plot below. The plot here is a geographical visualization of where arrests occurred relating to incidents of narcotics with possession of cannabis with 30 grams or less.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4-TmsWUVNX9WrCeKcTV2uSj048_TkiJjA7pV3VewjDoigzKQegq9fkFV-l_lym8cvgHW1VDrdhnzvd9B0Zop3lO70uczz8siGLjKvuKvStkaTCeX16_tfB78m9iF6oHatA-Xb3VlvLr3y/s1600/Chicago_Crime_GIS.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4-TmsWUVNX9WrCeKcTV2uSj048_TkiJjA7pV3VewjDoigzKQegq9fkFV-l_lym8cvgHW1VDrdhnzvd9B0Zop3lO70uczz8siGLjKvuKvStkaTCeX16_tfB78m9iF6oHatA-Xb3VlvLr3y/s1600/Chicago_Crime_GIS.png" /></a></div>
<br />
The method to tabulate the number of incidents within neighborhoods is also similar to my previous post, <a href="http://nighttimecuriosities.blogspot.com/2011/05/chicago-crime-in-24-hours.html" target="_blank">Chicago Crime</a>. This calculation totals the numbers of incidents within each neighborhood boundary. It is the most costly in computational time and power because of the generality in the counting code that allows it to use any boundaries (such as census tracts, police beats, etc.) The tabulation of arrests made relating to incidents of narcotics with possession of cannabis with 30 grams or less in each neighborhood is presented.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwemZ_qBuksAbT1_VTJ6bNIAuh71UKfqWbvcTmkXE4op1lay0SOV-1MMPKPPdtBv5huB1gdWmGhc1-X6AyN-lk6k6xjE3u89_nturuHShhjjL2em1lysdsJ1ftFidRK6BW-Sjr5-QFdudX/s1600/Chicago_Crime_Count.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgwemZ_qBuksAbT1_VTJ6bNIAuh71UKfqWbvcTmkXE4op1lay0SOV-1MMPKPPdtBv5huB1gdWmGhc1-X6AyN-lk6k6xjE3u89_nturuHShhjjL2em1lysdsJ1ftFidRK6BW-Sjr5-QFdudX/s1600/Chicago_Crime_Count.png" /></a></div>
<br />
While doing some initial research for this post, I found "crime hotspots" as a popular visualization option. A similar visualization is easily created in mathematica with three simple lines of code.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisU0-vOrKNvbHbb3KlhZ8uNE3O9gy1MAj4unlSz4vGEi7vchMEJFgr7qhdJZAQtnIIpFo3LFJguvpYT1c3yvD-NPKUw6G1kmsVVctXdOm3V5yREYhMxI1JbknVmId6p9vyKNRPBkEwnU0D/s1600/Chicago_Crime_DensityCode.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEisU0-vOrKNvbHbb3KlhZ8uNE3O9gy1MAj4unlSz4vGEi7vchMEJFgr7qhdJZAQtnIIpFo3LFJguvpYT1c3yvD-NPKUw6G1kmsVVctXdOm3V5yREYhMxI1JbknVmId6p9vyKNRPBkEwnU0D/s1600/Chicago_Crime_DensityCode.png" /></a></div>
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd6W2o4HdcGTnpPErUAWcwVBbtqKBMPoKzJQC6GcbPDPYvpRjFKUoqhZRAuyvjSQgweMh7KdsVbHq9jdK2EtVNM8uPgk_R5t1wO26l-6eHxEI4N8efR5muD05Xd4gc5bBtBwdXvJ3suoJz/s1600/Chicago_Crime_Density.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhd6W2o4HdcGTnpPErUAWcwVBbtqKBMPoKzJQC6GcbPDPYvpRjFKUoqhZRAuyvjSQgweMh7KdsVbHq9jdK2EtVNM8uPgk_R5t1wO26l-6eHxEI4N8efR5muD05Xd4gc5bBtBwdXvJ3suoJz/s320/Chicago_Crime_Density.png" width="320" /></a></div>
<br />
The Mathematica code used in this post can be downloaded <a href="http://bit.ly/KAeu4H" target="_blank">here</a>. Some fun analysis of crimes in chicago will follow in future posts.Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-35370848727574206822012-04-19T13:06:00.000-04:002012-04-19T13:07:02.882-04:00Can I Trust Wikipedia?A question as dreaded as "Who's on first?" And one that I am forced to ask as I feel lucky often. One quick and fun way to address how much you can trust a wikipedia article is by looking at what else a contributor edits. This can easily be done through the magic of the <a href="https://www.mediawiki.org/wiki/API" target="_blank">MediaWiki</a> API and <a href="http://www.wolfram.com/mathematica/" target="_blank">Mathematica</a>. Here is the result, with a <a href="http://www.cnn.com/" target="_blank">CNN</a> inspired example, of Mitt Romney's wikipedia <a href="http://en.wikipedia.org/wiki/Mitt_Romney" target="_blank">article</a> from April 19th, 2012.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyT4kFoWjb2p7mrmVuecRSjWojXuRMUUIFo65TMcEzAo1e_0VD3XPrnSrnsu4HgwW1fVyBsQQrd_viKZdhIX3vyRKWRPrB5mFbzhexXV9iTZKTgsH0C1TEb-hRD02u7pXkoK6PKDSXE2-f/s1600/mitt+romney+wikipedia+analysis.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="498" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyT4kFoWjb2p7mrmVuecRSjWojXuRMUUIFo65TMcEzAo1e_0VD3XPrnSrnsu4HgwW1fVyBsQQrd_viKZdhIX3vyRKWRPrB5mFbzhexXV9iTZKTgsH0C1TEb-hRD02u7pXkoK6PKDSXE2-f/s640/mitt+romney+wikipedia+analysis.jpg" width="640" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">The top 20 contributors to the Mitt Romney wikipedia article are highlighted in yellow and common articles edited by these contributors are highlighted in blue.</td></tr>
</tbody></table>
<br />
Personally, I see red flags when contributors of the Mitt Romney article are also subject-matter experts of <a href="http://en.wikipedia.org/wiki/Katy_Perry" target="_blank">Katy Perry</a>, <a href="http://en.wikipedia.org/wiki/Springfield_(The_Simpsons)" target="_blank">Springfield (The Simpsons)</a>, and <a href="http://en.wikipedia.org/wiki/Savage_Love" target="_blank">Savage Love</a>. I wouldn't put too much faith in its accuracy.<br />
<br />
A copy of the notebook can be downloaded <a href="http://bit.ly/I5gcI5" target="_blank">here</a>. To use it, simply copy the title of the wikipedia article of interest and paste it into the third line. More to come the next time I skip lunch!Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-12179900385973837972011-11-26T11:32:00.001-05:002011-12-16T11:33:52.001-05:00Making Sense of Census-less Data<div class="separator" style="clear: both; text-align: left;">
<span style="text-align: -webkit-auto;">I have always found that supplemental 3D visualizations help provide better insights on census data. While playing with the compiled data from the 2000 census, from </span><a href="http://www.cityofchicago.org/content/city/en/depts/doit/supp_info/census_maps.html" style="text-align: -webkit-auto;" target="_blank">cityofchicago.org</a><span style="text-align: -webkit-auto;">, I found an interesting anomaly.</span></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhE-jC6qV0OvbPqAeVsKTVg1AzczmGA_NRVJdR61E-ASV7fdeZvIsEf73KsizLvvQ-LUWhQWZ-Xytzki0u38aLtm17xKTS9G0sEq-voeek0THozYvvHt2s0pv8yAaXy3E66uvebPEJs6W3N/s1600/Income_MedianHousehold.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhE-jC6qV0OvbPqAeVsKTVg1AzczmGA_NRVJdR61E-ASV7fdeZvIsEf73KsizLvvQ-LUWhQWZ-Xytzki0u38aLtm17xKTS9G0sEq-voeek0THozYvvHt2s0pv8yAaXy3E66uvebPEJs6W3N/s640/Income_MedianHousehold.png" width="488" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidFBk7vXIcYha49A7xFIJ4CyKr8lKVusOuTe86L4LgWAuaGqvl1iJ2c783ljjB_7dJd9n9qSYQauP-wnZluDdy2Sz-pjon_O2U3q25S7yA89SPxYtl7VaB_TBBq389ibQX5QRG_1HDFfLW/s1600/Chicago3Dmap.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: center;"><img border="0" height="595" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEidFBk7vXIcYha49A7xFIJ4CyKr8lKVusOuTe86L4LgWAuaGqvl1iJ2c783ljjB_7dJd9n9qSYQauP-wnZluDdy2Sz-pjon_O2U3q25S7yA89SPxYtl7VaB_TBBq389ibQX5QRG_1HDFfLW/s640/Chicago3Dmap.png" width="640" /></a></div>
<br />
After some investigation, I found the red spike represents 15 families with a median income of $255,000. Was there mischief afoot by those 15 families?<br />
<br />
I'll let the more curious readers decide. Attached <a href="http://bit.ly/vLIhTY" target="_blank">here</a> is the census data and a Mathematica notebook. The notebook has been generalized to allow users to create 3D visualizations of all provided census data.<br />
<br />
Download <a href="http://bit.ly/vLIhTY" target="_blank">here</a>Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-65289483772302438482011-11-10T14:17:00.000-05:002011-11-10T15:18:38.979-05:00Batman Letterhead in ScribTeXAs soon as I made my last post, I got a few emails asking how to use LaTeX to create the Batman letterhead.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgssPZ7T9RDioQbpY-I_7DP7pLBADi-Lt9lyBGp6rGB4oBZefoeRSaTs5HN3_FKWUOi9HuG6Q-aimr7RSEh6AQu81rEFibucGTvQhPkliVNNzcRxFEh9SBFxJoVEA-D1Rs9NHlj8b7OpvlR/s1600/Batman_Letterhead.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgssPZ7T9RDioQbpY-I_7DP7pLBADi-Lt9lyBGp6rGB4oBZefoeRSaTs5HN3_FKWUOi9HuG6Q-aimr7RSEh6AQu81rEFibucGTvQhPkliVNNzcRxFEh9SBFxJoVEA-D1Rs9NHlj8b7OpvlR/s1600/Batman_Letterhead.jpg" /></a></div>
<br />
Through the magic of "the cloud," there is an easy way to play with the letterhead I created without downloading LaTeX. First, download the source file, <a href="http://bit.ly/rY6OFc" target="_blank">here</a>.<br />
<br />
There is a great website called <a href="http://www.scribtex.com/" target="_blank">ScribTeX</a> that allows the compilation of TeX files through the website. Signing up is painless and free. Once you have signed up and logged on, you will see the Dashboard page. Click on New Project and you will be asked to name the new project. I chose the name "Batman Letterhead."<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_VDwy84XwcGpEbCO7bu8XZMYPzbbcttIneHE-y7P6oU9t67s1bwTKhJhnF9ozdqpxsss_2beQVQcMve79uEEz4ISvdq7gHx8RJMwg-9mBe3f9-TPLW23I1zyw_t2MexVP4vdbEHW23N4L/s1600/new+proj.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="360" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_VDwy84XwcGpEbCO7bu8XZMYPzbbcttIneHE-y7P6oU9t67s1bwTKhJhnF9ozdqpxsss_2beQVQcMve79uEEz4ISvdq7gHx8RJMwg-9mBe3f9-TPLW23I1zyw_t2MexVP4vdbEHW23N4L/s640/new+proj.png" width="640" /></a></div>
<br />
Then upload the TeX file containing the letterhead.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjC328-OBD83Qx7MPBxCFHtruZ8dxfCrMh1eYyIJgo-yfqOck2dSb_UeVbXmetauHaMqkZ9WXR52E1fLcoOE5VJ9wMq9_Zm2oytaoHDhJpFAG0V3_voT8ri8dl-N08_ibeZHzGrzVKvMHa/s1600/upload.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="362" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjC328-OBD83Qx7MPBxCFHtruZ8dxfCrMh1eYyIJgo-yfqOck2dSb_UeVbXmetauHaMqkZ9WXR52E1fLcoOE5VJ9wMq9_Zm2oytaoHDhJpFAG0V3_voT8ri8dl-N08_ibeZHzGrzVKvMHa/s640/upload.png" width="640" /></a></div>
<br />
For the TeX file to compile properly, you must change the compiler to <i>latex</i> from ScribTeX's default <i>pdflatex</i> compiler. The <i>pdflatex</i> compiler compiles directly to a pdf. The <i>latex</i> compiler will create a postscript file before creating a pdf. The packages used to plot the Batman logo (pstricks and pst-plot) only make sense to postscript files and cannot be used with <i>pdflatex</i>. To change the compiler, click on "Settings" from the project page then the "Compiler Settings" tab. After choosing <i>latex</i>, click "Save" then "Files."<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgK3YFVaN2qU4jDp0x7S6eVjkZNzKJoZwTLXZRfFXgK-FEaR5b60v71ydv3VZDEi3oIyKXDYboVotmiaHCXoT_7mFGuUpBwk8L9nSnt7HmbxlOjCFXL9DH_jDqqI4LxZznc3EW73GILydmp/s1600/compiler.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="362" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgK3YFVaN2qU4jDp0x7S6eVjkZNzKJoZwTLXZRfFXgK-FEaR5b60v71ydv3VZDEi3oIyKXDYboVotmiaHCXoT_7mFGuUpBwk8L9nSnt7HmbxlOjCFXL9DH_jDqqI4LxZznc3EW73GILydmp/s640/compiler.png" width="640" /></a></div>
<br />
Now open the uploaded TeX file and you will see that it is the same file as my previous post with one major difference. Instead of using a separate file containing the data points, this one has the data points stored within.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvhcxH5lrXb4CKFI-lovStz-BJAPWW3SmppCSwM5u3inOuh4oPlAToCxJy_8sQo5owHIcfT3fqTvbXOi25unogRh4OL4oMKgSZfe2VahcHc-h0tYuLOVNkbDOZ2YgXkMeRzMPdqxc-ZKw8/s1600/changedtex.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="362" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvhcxH5lrXb4CKFI-lovStz-BJAPWW3SmppCSwM5u3inOuh4oPlAToCxJy_8sQo5owHIcfT3fqTvbXOi25unogRh4OL4oMKgSZfe2VahcHc-h0tYuLOVNkbDOZ2YgXkMeRzMPdqxc-ZKw8/s640/changedtex.png" width="640" /></a></div>
<br />
To compile the letterhead, click "Compile" and a new window will open with the compiled PDF.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2dzXO4z6aTGml7GnYxjR69nZJYBz2zN43o-DhXJfAiliJbYYk9eI1S_Qsc3snhNfHujPAYFQQxsdu0vUpJ7hinsjXAGDiUJnsT43CCl83VxGe-lJGzPUJAARFlR8qLyHg8KCDmiIQRKdX/s1600/compiledpdf.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="362" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2dzXO4z6aTGml7GnYxjR69nZJYBz2zN43o-DhXJfAiliJbYYk9eI1S_Qsc3snhNfHujPAYFQQxsdu0vUpJ7hinsjXAGDiUJnsT43CCl83VxGe-lJGzPUJAARFlR8qLyHg8KCDmiIQRKdX/s640/compiledpdf.png" width="640" /></a></div>
<br />
The TeX file used in this post can be found <a href="http://bit.ly/rY6OFc" target="_blank">here</a>. Enjoy TeXing as Batman!Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-33400765113749023052011-11-10T13:04:00.001-05:002011-11-10T13:05:40.515-05:00Batman LetterheadAfter reading a post from <a href="http://www.hardocp.com/news/2011/07/29/batman_equation/" target="_blank">HardOCP</a> that gave the roots of the Batman symbol, I could not help imagining the letterhead of the world's greatest detective. The letterhead needs to say "Batman is classy."<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgssPZ7T9RDioQbpY-I_7DP7pLBADi-Lt9lyBGp6rGB4oBZefoeRSaTs5HN3_FKWUOi9HuG6Q-aimr7RSEh6AQu81rEFibucGTvQhPkliVNNzcRxFEh9SBFxJoVEA-D1Rs9NHlj8b7OpvlR/s1600/Batman_Letterhead.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgssPZ7T9RDioQbpY-I_7DP7pLBADi-Lt9lyBGp6rGB4oBZefoeRSaTs5HN3_FKWUOi9HuG6Q-aimr7RSEh6AQu81rEFibucGTvQhPkliVNNzcRxFEh9SBFxJoVEA-D1Rs9NHlj8b7OpvlR/s1600/Batman_Letterhead.jpg" /></a></div>
<div>
<br />
I created this letterhead using the "Batman equation", Mathematica, and LaTeX. The computer in the Bat Cave is probably Unix-based and cannot use Microsoft Word. Therefore, Batman must use LaTeX. The first step to creating this letterhead is to plot the symbol. A post from <a href="http://playingwithmathematica.com/2011/08/06/the-batman-curve/" target="_blank">Playing With Mathematica</a> has done this for me already with seven simple lines.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgswIapieDJy1yDw2rOJE31he7R-uQcQ-RpyUni5OxvpuiT5LqGS1GUX_Jl5bIUm8JNM90RINu9A67GAoWdwU-ZWoo6-SwwMBzhN3WAgn8WrMsQoRIwu05n5g1cgj7_geMMRUywVOcTYGsD/s1600/BatmanEquations.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="87" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgswIapieDJy1yDw2rOJE31he7R-uQcQ-RpyUni5OxvpuiT5LqGS1GUX_Jl5bIUm8JNM90RINu9A67GAoWdwU-ZWoo6-SwwMBzhN3WAgn8WrMsQoRIwu05n5g1cgj7_geMMRUywVOcTYGsD/s640/BatmanEquations.png" width="640" /></a></div>
<div>
<br /></div>
<div>
The Show function is used to overlay the six plots to reveal the desired product.<br />
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg41wNuAmxUY4rAacOfME8Omo6Q5-ubAz66DKOYd1x2K7OAdbDtV2uHg19aPpV4HyhEy_tTQCOVf-WgqyjLJWMtp2DX2jVGGMcaNz_5eqDtSEjPxQKCPS9de83elTSVcGOVPjBrMW8minJ5/s1600/BatmanPlot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg41wNuAmxUY4rAacOfME8Omo6Q5-ubAz66DKOYd1x2K7OAdbDtV2uHg19aPpV4HyhEy_tTQCOVf-WgqyjLJWMtp2DX2jVGGMcaNz_5eqDtSEjPxQKCPS9de83elTSVcGOVPjBrMW8minJ5/s320/BatmanPlot.png" width="320" /></a></div>
<div>
<br />
To extract the points used for the final plot the following command must be executed for each root equation. <br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgl9BWZvVDZPw4CZuRnfn8RtuRxF-CuDMvVjQpJUl7QfsWbsjeteTQ5c0nxeF_yoRMS7f-zInx2Y6LDjKJHgGrN96UAief0n0SYWKH4NVlWLgM7smYAOI2nbQhabIvIdRrWBSO-O7UQkob3/s1600/Batman-extractplot.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="23" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgl9BWZvVDZPw4CZuRnfn8RtuRxF-CuDMvVjQpJUl7QfsWbsjeteTQ5c0nxeF_yoRMS7f-zInx2Y6LDjKJHgGrN96UAief0n0SYWKH4NVlWLgM7smYAOI2nbQhabIvIdRrWBSO-O7UQkob3/s320/Batman-extractplot.png" width="320" /></a></div>
<br />
<br />
<br />
After extracting and compiling all data points into one .dat file, we are ready to create the letterhead in LaTeX. The following LaTeX code relies on the pstricks and pst-plot packages. It plots the points in the saved data file and creates the bar that separates Batman's contact information.<br />
<br />
\begin{document}<br />
\readdata{\BatData}{BatData.dat}<br />
\noindent\begin{minipage}[b]{5cm}<br />
\begin{center}<br />
\psset{xunit=0.35cm,yunit=0.5cm}<br />
\begin{pspicture}(0,0)(0,2)<br />
\listplot[plotstyle=dots]{\BatData}<br />
\end{pspicture}<br />
\end{center}<br />
\end{minipage}<br />
\hfill<br />
\begin{minipage}[b]{7cm}<br />
\begin{flushright}<br />
\footnotesize{\itshape 1007 Mountain Drive {\scriptsize$\bullet$} Gotham City, NY 10027}<br />
\end{flushright}<br />
\end{minipage}<br />
\vskip-2mm<br />
{\hspace{40mm}\hfill\rule[0.5mm]{130mm}{0.5pt}}<br />
\vskip-2mm<br />
\hfill<br />
\begin{minipage}[b]{7cm}<br />
\begin{flushright}<br />
\footnotesize{\itshape 555.555.5555 {\scriptsize$\bullet$} \href{mailto:Bruce@WayneCorp.com}{Bruce@WayneCorp.com}}<br />
\end{flushright}<br />
\end{minipage}</div>
<div>
<br /></div>
<div>
All of the files discussed in this post can be downloaded <a href="http://bit.ly/sUUXDA" target="_blank">here</a>, along with a <a href="http://bit.ly/t7xLHk" target="_blank">letter</a> Batman would've sent to The Joker.</div>
<div>
<br /></div>
<div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQqJjiyAXYQmmhUY9GmPICvzEQzoG0UCPnXADSbPj9qQPAIIMbVaHJu75zKAT0ne2W_FIaUfGm2HL7OinfB7VoywYPZe31cqM46z3eWP4K1Lzf9l3bfNzKgbskc0roPs718cvEgo0iSlZM/s1600/Batman_Letter.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQqJjiyAXYQmmhUY9GmPICvzEQzoG0UCPnXADSbPj9qQPAIIMbVaHJu75zKAT0ne2W_FIaUfGm2HL7OinfB7VoywYPZe31cqM46z3eWP4K1Lzf9l3bfNzKgbskc0roPs718cvEgo0iSlZM/s1600/Batman_Letter.jpg" /></a></div>
</div>Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0tag:blogger.com,1999:blog-6855656749413817524.post-44973880600046909402011-11-08T22:08:00.001-05:002012-06-21T13:38:13.898-04:00Google Insight Data to Predict Stock Market Trends, Part 1Previously, I have demonstrated <a href="http://nighttimecuriosities.blogspot.com/2011/11/installing-pythonika.html" target="_blank">how</a> to install pythonika and <a href="http://nighttimecuriosities.blogspot.com/2011/11/after-reading-web-search-queries-can.html" target="_blank">why</a> weekends need to be considered in analyzing market trends. Here I will share my code on importing Google Insights data into Mathematica and provide a neat example of how to parse and analyze the data.<br />
<div>
<br /></div>
<div>
To get started, please install pythonika and download my <a href="http://bit.ly/Mwg7iq" target="_blank">sample notebook</a>. Change the first line to point to your compiled executable of pythonika.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4wI2Nfy_wVWjdr-w8r_pXyw6Nt-EsjrViqy3lHmm-qyWGUqUjaZd-FqI36U4OwZ5dl-QTyLTqynjjFeJ0AOmnRkGxI4PK4luG0iXn-anRNnF9ocb2lQ5BMffs2R9w9HYElcpJx2EHVogU/s1600/path.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4wI2Nfy_wVWjdr-w8r_pXyw6Nt-EsjrViqy3lHmm-qyWGUqUjaZd-FqI36U4OwZ5dl-QTyLTqynjjFeJ0AOmnRkGxI4PK4luG0iXn-anRNnF9ocb2lQ5BMffs2R9w9HYElcpJx2EHVogU/s1600/path.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Then replace "USERNAME" and "PASSWORD" in the following line with your google account.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuUP620YeySR-1B1dGfngWNwYDNelS7vZFLYzaTtyZUmUIiYtNN7y-xfo-oNfzqfDPUE3egWEgyXWaCYGb9uXAF2KQsIjMZ7NCsNtEF7xFiORMzM-bjPg05fbnBfU6zv48QVgyaT__mDwq/s1600/login.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhuUP620YeySR-1B1dGfngWNwYDNelS7vZFLYzaTtyZUmUIiYtNN7y-xfo-oNfzqfDPUE3egWEgyXWaCYGb9uXAF2KQsIjMZ7NCsNtEF7xFiORMzM-bjPg05fbnBfU6zv48QVgyaT__mDwq/s1600/login.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
Evaluating the python cell, will download the daily search data of "AAPL" in the last 90 days. To limit the search to the last 30 days, the date variable must be set to "today 01-m." An important note, any data beyond 90 days are averaged over a week and will need to be parsed in a manner different from the example here.</div>
<div>
<br /></div>
<div>
This command will parse the CSV data for the dates and total daily searches:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqvkNvOhPFk_jScN1UKd5ZmIX593gwGlsrlFlQDWOXPvtCY06rqFRUDby4Tz15r3hw0ls3Df-f8kfabZ0DE4OCyeMZYA3IFJ5_QccaB3F6WX_6IkgTPPQZOEoizQrMARKQFJGYZ5gwCEch/s1600/parse.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="19" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqvkNvOhPFk_jScN1UKd5ZmIX593gwGlsrlFlQDWOXPvtCY06rqFRUDby4Tz15r3hw0ls3Df-f8kfabZ0DE4OCyeMZYA3IFJ5_QccaB3F6WX_6IkgTPPQZOEoizQrMARKQFJGYZ5gwCEch/s640/parse.png" width="640" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
Since I am interested in the financials of Apple (AAPL), specifically its daily traded volume, I have set :</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTQeAoKl9cc4-6fgyna_sK4rxpAeBGoM6nteSUtinI7w2KVR0EHsYoptkLAPCAdKxqnLrU4cPjK2POuE3CnbEWbNcjXY2Fahfan7Ji3PtSIkOgB3fEewzeiRZQIPsLjcShuneK6S1KzXTT/s1600/financedata.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjTQeAoKl9cc4-6fgyna_sK4rxpAeBGoM6nteSUtinI7w2KVR0EHsYoptkLAPCAdKxqnLrU4cPjK2POuE3CnbEWbNcjXY2Fahfan7Ji3PtSIkOgB3fEewzeiRZQIPsLjcShuneK6S1KzXTT/s1600/financedata.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br />
The variables googstartdate and googenddate are automatically set to the date range indicated in the first cell. The next part of the code is used to pair up trading dates with query dates, filter out the weekends, and remove lines with missing data.</div>
<div>
<br /></div>
<div>
There may be a more efficient method of matching two lists of unmatched lengths in Mathematica, but this was the first one I thought up that could account for holidays. Essentially, I have created a set of {dates, queries} and another set of {dates, stock volume} and must match the dates up to create one set of {dates, queries, stock volume}. First I joined the lists together, then I reordered the two lists by:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEURBhr96li24zfukQx5c6NwE3zyFSVsovrm4agQD81uBVD8-4XWrEVnaC-nfuZWqvMaZ3z9WnKkro137fw6W0c45lYTTcd-6oGGQxAT0Y6EbfV5YhlKIUZqMVEbrmwPWIw_2mwZPNM7yE/s1600/order.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="24" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEURBhr96li24zfukQx5c6NwE3zyFSVsovrm4agQD81uBVD8-4XWrEVnaC-nfuZWqvMaZ3z9WnKkro137fw6W0c45lYTTcd-6oGGQxAT0Y6EbfV5YhlKIUZqMVEbrmwPWIw_2mwZPNM7yE/s640/order.png" width="640" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
The new set will contain disparities from weekends and possibly blank results from the Google Insight data. To filter out these disparities from the new set, this command is applied:</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIQNcTI00aQMuFCiTGNfO88auSNyJBZFGpyAW6JpK4PlK1lG6jkpmW1H4-Xq3o7MZqcyG6frDIqoI04JfHRsm1xz_Sm-kUmeqLVDM0C9wfoK0YsjE9875-oM3k2E5dYPCJDT1TvAyOMXhk/s1600/filter.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjIQNcTI00aQMuFCiTGNfO88auSNyJBZFGpyAW6JpK4PlK1lG6jkpmW1H4-Xq3o7MZqcyG6frDIqoI04JfHRsm1xz_Sm-kUmeqLVDM0C9wfoK0YsjE9875-oM3k2E5dYPCJDT1TvAyOMXhk/s1600/filter.png" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
A quick plot of the filtered data and unfiltered data will show if weekends were properly filtered out.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9xXKtOpvwDpZALKCxLRxpDECbfuwVmOljUNCpq4OHGZhyphenhyphenE9Fb4c7OJ8ZVrs4RTNOKrLrkuu4W_23fkhgLVHMsYbwleGVpk6NqBlSuwK8m-AlDOeNXz7MTsGRLQsutvHh5PskOQAZwDnK1/s1600/weekends.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg9xXKtOpvwDpZALKCxLRxpDECbfuwVmOljUNCpq4OHGZhyphenhyphenE9Fb4c7OJ8ZVrs4RTNOKrLrkuu4W_23fkhgLVHMsYbwleGVpk6NqBlSuwK8m-AlDOeNXz7MTsGRLQsutvHh5PskOQAZwDnK1/s1600/weekends.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Unfiltered for weekends data</td></tr>
</tbody></table>
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyCPXl3wOH-wngTq8pYtc8ZRDeCbhKqIwvDSij6ESvefVtuRrrygdOhnVGJjinX-rNjqLPDIp2c4ZCIoh8eihysmNqmt-faks9rRRgOYyximifaE8P1w_QzWR7ZCumD0e88rB1SicFWsgz/s1600/noweekends.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjyCPXl3wOH-wngTq8pYtc8ZRDeCbhKqIwvDSij6ESvefVtuRrrygdOhnVGJjinX-rNjqLPDIp2c4ZCIoh8eihysmNqmt-faks9rRRgOYyximifaE8P1w_QzWR7ZCumD0e88rB1SicFWsgz/s1600/noweekends.png" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Filtered for weekends data</td></tr>
</tbody></table>
<div>
<br />
There have been a few papers that prove query volumes can be used to predict trading volumes. But anyone can clearly see this is not the case with Apple. This observation can be quantified with a direct Granger Causality test. I have provided the code necessary to perform the test, but I leave it to the reader to test the lag sensitivity and establish the causal relation or feedback. I will explain the code used for the Granger Causality test.<br />
<br />
To create a new set in the form of {ylag1, ylag2, ylag3, xlag1, xlag2, xlag3, y} with a max lag of 3:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVefG40xpdoV5jMuOHOFkzEqA0ETsYZTfI8kTPTQc662HkWHqmggjSnNnDuzHsGcAz5ywH8tMLawgxEURACCC8idAj-wHv3rfdki9wYmevE5e3w7C_qG89dqT_hzCL3YzOjl9NoZ7c2rVJ/s1600/lag.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="38" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVefG40xpdoV5jMuOHOFkzEqA0ETsYZTfI8kTPTQc662HkWHqmggjSnNnDuzHsGcAz5ywH8tMLawgxEURACCC8idAj-wHv3rfdki9wYmevE5e3w7C_qG89dqT_hzCL3YzOjl9NoZ7c2rVJ/s640/lag.png" width="640" /></a></div>
<br />
<br />
<br />
<br /></div>
<div>
Then to perform the OLS regression and sum the squared residuals:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgv8t8P-D02Ou2Cm4Fhimgiwn4h-n00gPQu_W9VceMggqJNIbIpKZ8PB0nVuF2fx7MLr7xsMgooLhXTOAmv_U8mYXoE9IQvrdF-IuZWfz5aTZyUtUnMPcVks_reRy4j0jaKsMZmXhJSmTxG/s1600/fit.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgv8t8P-D02Ou2Cm4Fhimgiwn4h-n00gPQu_W9VceMggqJNIbIpKZ8PB0nVuF2fx7MLr7xsMgooLhXTOAmv_U8mYXoE9IQvrdF-IuZWfz5aTZyUtUnMPcVks_reRy4j0jaKsMZmXhJSmTxG/s1600/fit.png" /></a></div>
<br />
<br />
<br />
<br /></div>
<div>
<br /></div>
<div>
To compute the test statistic:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiN5csU6rW0Ml4fPalZS3XZwJfp-epT9GKd7zVQ_2zi-s__SclN6eNaPuEt88B-iHO-7Fv064Ly4SJnDCP95oMOg4JIsJH7E9LXxUmNm8iWEAVHksDvjIlR3aX6mJ6WhJ7f_raHbcBRzEko/s1600/teststat.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiN5csU6rW0Ml4fPalZS3XZwJfp-epT9GKd7zVQ_2zi-s__SclN6eNaPuEt88B-iHO-7Fv064Ly4SJnDCP95oMOg4JIsJH7E9LXxUmNm8iWEAVHksDvjIlR3aX6mJ6WhJ7f_raHbcBRzEko/s1600/teststat.png" /></a></div>
<br /></div>
<div>
<br /></div>
<div>
<br />
And since xlag = ylag, we only need to compute one f-test to get the corresponding p-value by:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGdGNPY7fcZx5fg6dV2MuPU6uqqY8azZmhVvwIwQR2SUYhdFXA-a94LErlPoe7_sHjl5pAjWjN5GaJeuopOB4lU1I3SanJRVuSTCbQeplDMJMGCWuwnH-lP9RVn0-2WUgqegoeqXHtdtgD/s1600/ftest.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiGdGNPY7fcZx5fg6dV2MuPU6uqqY8azZmhVvwIwQR2SUYhdFXA-a94LErlPoe7_sHjl5pAjWjN5GaJeuopOB4lU1I3SanJRVuSTCbQeplDMJMGCWuwnH-lP9RVn0-2WUgqegoeqXHtdtgD/s1600/ftest.png" /></a></div>
<br /></div>
<div>
<br /></div>
<div>
<br />
The Mathematica notebook for this post can be downloaded <a href="http://bit.ly/Mwg7iq" target="_blank">here</a>.</div>
<div>
<br /></div>Miguel Wonghttp://www.blogger.com/profile/00102716660812077803noreply@blogger.com0