Tuesday, June 5, 2012

First Attempt at Modeling Crime in Chicago

In my last post, I showed how to use the Socrata Open Data API (SODA) to download the data of the crimes reported in Chicago, how to plot the locations, and how to count the number of crimes that occurred in a certain area. This post will chronicle my first attempts to model that data in Mathematica.

The time series analysis add-on to Mathematica 8 costs about $295. So I've decided to write my own autoregressive model (AR) package as a start to modeling the crimes in Chicago. My AR code in Mathematica is based off the FitAR R-package and its associated paper. The mathematics and derivations of autoregressive models are already heavily covered on other websites, so I will not be explaining it here. The alpha version of my code can be found here, note it is not complete and does not yet have all the functions of FitAR or the Mathematica time series add-on.

My code, at this point, does provide fits that are comparable to the FitAR package. Using the default "lynx" data in R the following fits and associated residual autocorrelation plots were produced:


Mathematica

AR(1) Fit
MLEsdZ-ratio
phi(1)0.717290.065258910.9914
mu1537.94364.4774.21957





AR(4) Fit
MLEsdZ-ratio
phi(1)1.124280.090587412.411
phi(2)-0.7166670.136707-5.24237
phi(3)0.2626610.1367071.92135
phi(4)-0.2539830.0905874-2.80374
mu1537.94135.75511.3288






Subset AR(1,2,4,5,7,10,11) Fit
MLEsdZ-ratio
phi(1)0.8204550.017864945.9256
phi(2)-0.6328180.0972914-6.50436
phi(4)-0.1419510.0684639-2.07337
phi(5)0.1418930.07481931.89647
phi(7)0.2020380.09928762.03488
phi(10)-0.3141050.0917768-3.42249
phi(11)-0.3686580.0870617-4.23444
mu6.685910.10065766.4225

R

AR(1) Fit
MLEsdZ-ratio
phi(1)0.7173030.065257710.9918
mu1538.02363.9864.22548





AR(4) Fit
MLEsdZ-ratio
phi(1)1.124630.090580312.4158
phi(2)-0.7173960.1367150-5.24738
phi(3)0.2633550.1367151.92630
phi(4)-0.2542730.0905803-2.80716
mu1538.02135.46911.3533






Subset AR(1,2,4,5,7,10,11) Fit
MLEsdZ-ratio
phi(1)0.82044900.0178600545.937651
phi(2)-0.63284330.09731087-6.503316
phi(4)-0.14208880.06847123-2.075161
phi(5)0.14213880.074814091.899894
phi(7)0.20212500.099273762.036036
phi(10)-0.31409540.09178210-3.422186
phi(11)-0.36865890.08706171-4.234455
mu6.68593290.0999920766.864631



Now that I know my code is correct, the next step will be figuring out if an autoregressive model can properly describe the crime data from Chicago. The lynx data from above is almost trivial when compared to the chaotic mess of crime data.

Time Series Plot of lynx data

Plot of daily totals of crimes in Chicago.

Just for fun, this is the residual autocorrelation of an AR(5) regressed with the crime series data from above:


Not a model anyone should ever depend on. The files used in this post can be found here.

No comments:

Post a Comment