Saturday, November 26, 2011

Making Sense of Census-less Data

I have always found that supplemental 3D visualizations help provide better insights on census data. While playing with the compiled data from the 2000 census, from cityofchicago.org, I found an interesting anomaly.


After some investigation, I found the red spike represents 15 families with a median income of $255,000. Was there mischief afoot by those 15 families?

I'll let the more curious readers decide. Attached here is the census data and a Mathematica notebook. The notebook has been generalized to allow users to create 3D visualizations of all provided census data.

Download here

Thursday, November 10, 2011

Batman Letterhead in ScribTeX

As soon as I made my last post, I got a few emails asking how to use LaTeX to create the Batman letterhead.


Through the magic of "the cloud," there is an easy way to play with the letterhead I created without downloading LaTeX. First, download the source file, here.

There is a great website called ScribTeX that allows the compilation of TeX files through the website. Signing up is painless and free. Once you have signed up and logged on, you will see the Dashboard page. Click on New Project and you will be asked to name the new project. I chose the name "Batman Letterhead."


Then upload the TeX file containing the letterhead.


For the TeX file to compile properly, you must change the compiler to latex from ScribTeX's default pdflatex compiler. The pdflatex compiler compiles directly to a pdf. The latex compiler will create a postscript file before creating a pdf. The packages used to plot the Batman logo (pstricks and pst-plot) only make sense to postscript files and cannot be used with pdflatex. To change the compiler, click on "Settings" from the project page then the "Compiler Settings" tab.  After choosing latex, click "Save" then "Files."


Now open the uploaded TeX file and you will see that it is the same file as my previous post with one major difference. Instead of using a separate file containing the data points, this one has the data points stored within.


To compile the letterhead, click "Compile" and a new window will open with the compiled PDF.


The TeX file used in this post can be found here. Enjoy TeXing as Batman!

Batman Letterhead

After reading a post from HardOCP that gave the roots of the Batman symbol, I could not help imagining the letterhead of the world's greatest detective. The letterhead needs to say "Batman is classy."


I created this letterhead using the "Batman equation", Mathematica, and LaTeX. The computer in the Bat Cave is probably Unix-based and cannot use Microsoft Word. Therefore, Batman must use LaTeX. The first step to creating this letterhead is to plot the symbol. A post from Playing With Mathematica has done this for me already with seven simple lines.


The Show function is used to overlay the six plots to reveal the desired product.


To extract the points used for the final plot the following command must be executed for each root equation.




After extracting and compiling all data points into one .dat file, we are ready to create the letterhead in LaTeX. The following LaTeX code relies on the pstricks and pst-plot packages. It plots the points in the saved data file and creates the bar that separates Batman's contact information.

\begin{document}
\readdata{\BatData}{BatData.dat}
\noindent\begin{minipage}[b]{5cm}
\begin{center}
\psset{xunit=0.35cm,yunit=0.5cm}
\begin{pspicture}(0,0)(0,2)
\listplot[plotstyle=dots]{\BatData}
\end{pspicture}
\end{center}
\end{minipage}
\hfill
\begin{minipage}[b]{7cm}
\begin{flushright}
\footnotesize{\itshape 1007 Mountain Drive {\scriptsize$\bullet$} Gotham City, NY 10027}
\end{flushright}
\end{minipage}
\vskip-2mm
{\hspace{40mm}\hfill\rule[0.5mm]{130mm}{0.5pt}}
\vskip-2mm
\hfill
\begin{minipage}[b]{7cm}
\begin{flushright}
\footnotesize{\itshape 555.555.5555 {\scriptsize$\bullet$} \href{mailto:Bruce@WayneCorp.com}{Bruce@WayneCorp.com}}
\end{flushright}
\end{minipage}

All of the files discussed in this post can be downloaded here, along with a letter Batman would've sent to The Joker.

Tuesday, November 8, 2011

Google Insight Data to Predict Stock Market Trends, Part 1

Previously, I have demonstrated how to install pythonika and why weekends need to be considered in analyzing market trends.  Here I will share my code on importing Google Insights data into Mathematica and provide a neat example of how to parse and analyze the data.

To get started, please install pythonika and download my sample notebook.  Change the first line to point to your compiled executable of pythonika.




Then replace "USERNAME" and "PASSWORD" in the following line with your google account.







Evaluating the python cell, will download the daily search data of "AAPL" in the last 90 days.  To limit the search to the last 30 days, the date variable must be set to "today 01-m."  An important note, any data beyond 90 days are averaged over a week and will need to be parsed in a manner different from the example here.

This command will parse the CSV data for the dates and total daily searches:



Since I am interested in the financials of Apple (AAPL), specifically its daily traded volume, I have set :




The variables googstartdate and googenddate are automatically set to the date range indicated in the first cell.  The next part of the code is used to pair up trading dates with query dates, filter out the weekends, and remove lines with missing data.

There may be a more efficient method of matching two lists of unmatched lengths in Mathematica, but this was the first one I thought up that could account for holidays. Essentially, I have created a set of {dates, queries} and another set of {dates, stock volume} and must match the dates up to create one set of {dates, queries, stock volume}.  First I joined the lists together, then I reordered the two lists by:




The new set will contain disparities from weekends and possibly blank results from the Google Insight data.  To filter out these disparities from the new set, this command is applied:



A quick plot of the filtered data and unfiltered data will show if weekends were properly filtered out.

Unfiltered for weekends data

Filtered for weekends data

There have been a few papers that prove query volumes can be used to predict trading volumes.  But anyone can clearly see this is not the case with Apple.  This observation can be quantified with a direct Granger Causality test.  I have provided the code necessary to perform the test, but I leave it to the reader to test the lag sensitivity and establish the causal relation or feedback.  I will explain the code used for the Granger Causality test.

To create a new set in the form of {ylag1, ylag2, ylag3, xlag1, xlag2, xlag3, y} with a max lag of 3:





Then to perform the OLS regression and sum the squared residuals:






To compute the test statistic:




And since xlag = ylag, we only need to compute one f-test to get the corresponding p-value by:




The Mathematica notebook for this post can be downloaded here.

Monday, November 7, 2011

Installing Pythonika

Pythonika enables Mathematica to run python.  I ran into some problems installing it with Mathematica 8 and Lion 10.7.2 OS X from the official pythonika webpage.  Here are some quick tips for those performing a similar install.

  • download my appended version, here
  • install 64-bit Python 2.7.1, here 
  • soft link mathlink.framework inside the Mathematica application into the /Library/Frameworks directory using this command in the terminal
    • ln -s /Applications/Mathematica.app/SystemFiles/Links/MathLink/DeveloperKit//CompilerAdditions/mathlink.framework ~/Library/Frameworks/


In my appended version, the main modifications were to the file Makefile.osx.  For everything to compile properly the new makefile changes the CADDSDIR to point to ${MLINKDIR}/CompilerAdditions, deletes the SYS path, specifies CC=g++, and adds -lstdc++ before -framework CoreFoundation.  Simple examples on how to use pythonkia can be found in the Pythonika.nb file on the google codes site.

Web Search Queries Predict Stock Market Trading Volumes, really?

After reading "Web search queries can predict stock market volumes" I decided to replicate the study using Google Insights for Search to look for more meaningful trends.  A more in-depth post will be done later, here I am just presenting some cautious considerations for the readers of the study.

Part of the curious methodology of the paper was its filtering of non-working days.  Using "AAPL" as an example, we can see why.  How would the inclusion of the increased weekend queries influence the tests for granger causality?


Normalized data of daily trading and query volumes for AAPL excluding weekends for last 90 days

Normalized data of daily trading and query volumes for AAPL with weekends for last 90 days