<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Gosset&#039;s student</title>
	<atom:link href="http://gossetsstudent.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://gossetsstudent.wordpress.com</link>
	<description>&#34;Fisher would have discovered it anyway...&#34;</description>
	<lastBuildDate>Fri, 16 Dec 2011 16:09:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='gossetsstudent.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Gosset&#039;s student</title>
		<link>http://gossetsstudent.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://gossetsstudent.wordpress.com/osd.xml" title="Gosset&#039;s student" />
	<atom:link rel='hub' href='http://gossetsstudent.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Machine learning techniques in the biomedical literature</title>
		<link>http://gossetsstudent.wordpress.com/2011/12/16/machine-learning-techniques-in-the-biomedical-literature/</link>
		<comments>http://gossetsstudent.wordpress.com/2011/12/16/machine-learning-techniques-in-the-biomedical-literature/#comments</comments>
		<pubDate>Fri, 16 Dec 2011 09:52:29 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cancer classification]]></category>
		<category><![CDATA[machine learning]]></category>
		<category><![CDATA[random forests]]></category>
		<category><![CDATA[support vector machines]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=269</guid>
		<description><![CDATA[There are relatively few articles published on using machine learning techniques on what many would consider &#8220;classical&#8221; biomedical study designs (e.g a sample size of 200 and about 10 parameters) and approaches to dealing with . But they may start being published. This is a list to get going with. No all of the article [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=269&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>There are relatively few articles published on using machine learning techniques on what many would consider &#8220;classical&#8221; biomedical study designs (e.g a sample size of 200 and about 10 parameters) and approaches to dealing with .  But they may start being published.  This is a list to get going with.  No all of the article below fit into the above criteria but I&#8217;ve kept them here as they&#8217;re interesting (at least to me).  </p>
<p>This post was motivated by <a href="http://stats.stackexchange.com/questions/1856/application-of-machine-learning-techniques-in-small-sample-clinical-studies">this question</a> on Crossvalidated.  I will add to it as I find them or people point them out to me.  It&#8217;s very short at the moment!  Let me know of any broken links.  </p>
<p><strong>Articles</strong><br />
Statnikov A, Wang L, Aliferis CF <a href="http://www.ncbi.nlm.nih.gov/pubmed/18647401">A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics.</a> 2008 Jul 22;9:319.</p>
<p>Van Loon K, Guiza F, Meyfroidt G, Aerts JM, Ramon J, Blockeel H, Bruynooghe M, Van Den Berghe G, Berckmans D.  <a href="http://www.ncbi.nlm.nih.gov/pubmed?term=19745380%20">Dynamic data analysis and data mining for prediction of clinical stability</a>.  Stud Health Technol Inform. 2009;150:590-4.</p>
<p>Luaces O, Taboada F, Albaiceta GM, Domínguez LA, Enríquez P, Bahamonde A; GRECIA Group.Predicting the probability of survival in intensive care unit patients from a small number of variables and training examples.Artif Intell Med. 2009 Jan;45(1):63-76. Epub 2009 Jan 29.</p>
<p>Wu TT, Chen YF, Hastie T, Sobel E, Lange K. Genome-wide association analysis by lasso penalized logistic regression.  Bioinformatics. 2009 Mar 15;25(6):714-21. Epub 2009 Jan 28.</p>
<p>Schwaighofer A, Schroeter T, Mika S, Blanchard G. Comb Chem High Throughput Screen. <a href="http://www.ncbi.nlm.nih.gov/pubmed/19519325">How wrong can we get?</a> A review of machine learning approaches and error bars. 2009 Jun;12(5):453-68.</p>
<p>Huang H, Chanda P, Alonso A, Bader JS, Arking DE. <a href="http://www.ncbi.nlm.nih.gov/pubmed/21829371">Gene-based tests of association. </a>PLoS Genet. 2011 Jul;7(7):e1002177. Epub 2011 Jul 28.</p>
<p>Liu Z, Shen Y, Ott J. <a href="http://www.ncbi.nlm.nih.gov/pubmed/21958005">Multilocus association mapping using generalized ridge logistic regression</a>. BMC Bioinformatics. 2011 Sep 29;12:384.<br />
<strong><br />
Theses</strong></p>
<p>Hug, Caleb W. <a href="http://dspace.mit.edu/handle/1721.1/38326">Predicting the risk and trajectory of intensive care patients using survival models</a>. 2006 Massachusetts Institute of Technology </p>
<p><strong>Talks / slides / videos:</strong><br />
Victoria Stodden&#8217;s <a href="http://www.stanford.edu/~vcs/talks/MicrosoftMay082008.pdf">slides</a></p>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/cancer-classification/'>cancer classification</a>, <a href='http://gossetsstudent.wordpress.com/tag/machine-learning/'>machine learning</a>, <a href='http://gossetsstudent.wordpress.com/tag/random-forests/'>random forests</a>, <a href='http://gossetsstudent.wordpress.com/tag/support-vector-machines/'>support vector machines</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/269/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/269/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/269/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/269/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/269/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/269/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/269/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/269/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=269&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2011/12/16/machine-learning-techniques-in-the-biomedical-literature/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>
	</item>
		<item>
		<title>Logistic regression &#8211; simulation for a power calculation&#8230;</title>
		<link>http://gossetsstudent.wordpress.com/2010/11/18/logistic-regression-simulation-for-a-power-calculation/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/11/18/logistic-regression-simulation-for-a-power-calculation/#comments</comments>
		<pubDate>Thu, 18 Nov 2010 17:58:38 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Logistic regression]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=205</guid>
		<description><![CDATA[Please note - I&#8217;ve spotted a problem with the approach taken in this post &#8211; it seems to underestimate power in certain circumstances. I&#8217;ll post again with a correction or a more full explanation when I&#8217;ve sorted it. So, I posted an answer on cross validation regarding logistic regression.   I thought I&#8217;d post it [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=205&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<blockquote><p>
<strong>Please note </strong>- <em>I&#8217;ve spotted a problem with the approach taken in this post &#8211; it seems to underestimate power in certain circumstances.  I&#8217;ll post again with a correction or a more full explanation when I&#8217;ve sorted it. </em> </p></blockquote>
<p>So, I posted an answer on cross validation regarding logistic regression.   I thought I&#8217;d post it in a little more depth here, with a few illustrative figures.  It&#8217;s based on the approach which <a href="http://stats.stackexchange.com/questions/2988/sample-size-calculation-for-univariate-logistic-regression/3008#3008">Stephen Kolassa</a> described.  </p>
<p>Power calculations for logistic regression are discussed in some detail in Hosmer and Lemeshow (Ch 8.5).  One approach with R is to simulate a dataset a few thousand times, and see how often your dataset gets the p value right.  If it does 95% of the time, then you have 95% power.</p>
<p>In this code we use the approach which <a href="http://sas-and-r.blogspot.com/2009/06/example-72-simulate-data-from-logistic.html">Kleinman and Horton</a> use to simulate data for a logistic regression.  We then initially calculate the overall proportion of events.   To change the number of events adjust odds.ratio.  The independent variable is assumed to be normally distributed with mean 0 and variance 1.  </p>
<p><pre class="brush: plain;">
nn &lt;- 950
runs &lt;- 10000
intercept &lt;- log(9)
odds.ratio &lt;- 1.5
beta &lt;- log(odds.ratio)
proportion  &lt;-  replicate(
              n = runs,
              expr = {
                  xtest &lt;- rnorm(nn)
                  linpred &lt;- intercept + (xtest * beta)
                  prob &lt;- exp(linpred)/(1 + exp(linpred))
                  runis &lt;- runif(length(xtest),0,1)
                  ytest &lt;- ifelse(runis &lt; prob,1,0)
                  prop &lt;- length(which(ytest &lt;= 0.5))/length(ytest)
                  }
            )
summary(proportion)
</pre></p>
<p>This plot shows how the intercept and odds ratio affect the overall proportion of events per trial:<br />
<a href="http://gossetsstudent.files.wordpress.com/2010/11/odds-ratio-and-intercept1.gif"><img src="http://gossetsstudent.files.wordpress.com/2010/11/odds-ratio-and-intercept1.gif?w=600&#038;h=553" alt="" title="odds ratio and intercept" width="600" height="553" class="alignnone size-full wp-image-222" /></a></p>
<p>When you&#8217;re happy that the proportion of events is right (with some prior knowledge of the dataset), you can then fit a model and calculate a p value for that model.  We use R&#8217;s inbuilt function replicate to do this 10,000 times, and count the proportion where it gets it right (i.e. p &lt; 0.05).  The proportion of the time that the simulation correctly get&#039;s the p &lt; 0.05 is essentially the power of the logistic regression for your number of cases, odds ratio and intercept.</p>
<p><pre class="brush: plain;">
result &lt;-  replicate(
              n = runs,
              expr = {
                  xtest &lt;- rnorm(nn)
                  linpred &lt;- intercept + (xtest * beta)
                  prob &lt;- exp(linpred)/(1 + exp(linpred))
                  runis &lt;- runif(length(xtest),0,1)
                  ytest &lt;- ifelse(runis &lt; prob,1,0)
                  summary(model &lt;- glm(ytest ~ xtest,  family = &quot;binomial&quot;))$coefficients[2,4] &lt; .05
                  }
            )
print(sum(result)/runs)
</pre></p>
<p>I checked it against the examples given in <a href="http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19980730)17:14%3C1623::AID-SIM871%3E3.0.CO;2-S/abstract">Hsieh, 1999</a>.  It seemed to work pretty well calculating the power to be within ~ 1% of the power of the examples given in table II of that paper.</p>
<p>We can do some interesting things with R.  I simulated a range of odds ratios and a range of sample sizes.  The plot of these looks like this (each line represents an odd ratio):-<br />
<a href="http://gossetsstudent.files.wordpress.com/2010/11/power-calculation.gif"><img src="http://gossetsstudent.files.wordpress.com/2010/11/power-calculation.gif?w=600" alt="" title="power calculation"   class="alignnone size-full wp-image-217" /></a></p>
<p>We can also keep the odds ratio constant, but adjust the proportion of events per trial.  This looks like this (each line represents an event rate):<br />
<a href="http://gossetsstudent.files.wordpress.com/2010/11/logistic.png"><img src="http://gossetsstudent.files.wordpress.com/2010/11/logistic.png?w=600&#038;h=587" alt="" title="logistic regression power - varying event rate" width="600" height="587" class="alignnone size-full wp-image-225" /></a></p>
<p>As ever, if anyone can spot an error or suggest a simpler way to do this then let me know.  I haven&#8217;t tested my simulation against any packages which calculate power for logistic regression, but if anyone can it would be great to hear from you.</p>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/logistic-regression/'>Logistic regression</a>, <a href='http://gossetsstudent.wordpress.com/tag/r/'>R</a>, <a href='http://gossetsstudent.wordpress.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/205/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/205/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/205/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=205&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/11/18/logistic-regression-simulation-for-a-power-calculation/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/11/odds-ratio-and-intercept1.gif" medium="image">
			<media:title type="html">odds ratio and intercept</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/11/power-calculation.gif" medium="image">
			<media:title type="html">power calculation</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/11/logistic.png" medium="image">
			<media:title type="html">logistic regression power - varying event rate</media:title>
		</media:content>
	</item>
		<item>
		<title>Science is vital &#8211; what we don&#8217;t know yet</title>
		<link>http://gossetsstudent.wordpress.com/2010/10/06/science-is-vital-what-we-dont-know-yet/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/10/06/science-is-vital-what-we-dont-know-yet/#comments</comments>
		<pubDate>Wed, 06 Oct 2010 22:07:40 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=202</guid>
		<description><![CDATA[This post is not about R (for a change). For working UK scientists, science is vital &#8211; sign the on-line petition to preserve science funding. For my contribution of what we don&#8217;t know yet - We don&#8217;t know whether we can use biomarkers of kidney injury to personalise the doses of medications to maximise the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=202&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This post is not about R (for a change).  For working UK scientists, science is vital &#8211; sign the on-line petition to <a href="http://scienceisvital.org.uk/">preserve science funding</a>.</p>
<p>For my contribution of what we don&#8217;t know yet -</p>
<p>We don&#8217;t know whether we can use biomarkers of kidney injury to personalise the doses of medications to maximise the dose for the patient whilst minimizing any renal side effects.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/202/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/202/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/202/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=202&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/10/06/science-is-vital-what-we-dont-know-yet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>
	</item>
		<item>
		<title>How to check if a file exists with HTTP and R</title>
		<link>http://gossetsstudent.wordpress.com/2010/09/01/how-to-check-if-a-file-exists-with-http-and-r/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/09/01/how-to-check-if-a-file-exists-with-http-and-r/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 21:05:28 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[General post]]></category>
		<category><![CDATA[Curl]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[RCurl]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=192</guid>
		<description><![CDATA[So, there&#8217;s probably an easier way to do this (please let me know if you know it)&#8230; Suppose you&#8217;re working with a system which creates (binary) files and posts them for download on a website. You know the names of the files that will be created. However, they may not have been made yet (they&#8217;re [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=192&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So, there&#8217;s probably an easier way to do this (please let me know if you know it)&#8230;</p>
<p>Suppose you&#8217;re working with a system which creates (binary) files and posts them for download on a website.  You know the names of the files that will be created.  However, they may not have been made yet (they&#8217;re generated on the fly, and appear in a vaguely random order over time).  There are several of them and you want to know which ones are there yet, and when there are enough uploaded, run an analysis.</p>
<p>I spent quite a bit of time trying to work this out, and eventually came up with the following solution:</p>
<p><pre class="brush: plain;">
require(RCurl)
newurl &lt;- c(&quot;http://cran.r-project.org/web/packages/RCurl/RCurl.pdf&quot;,
            &quot;http://cran.r-project.org/web/packages/RCurl/RCurl2.pdf&quot;)
for (n in 2:1){
   z &lt;- &quot;&quot;
   try(z &lt;- getBinaryURL(newurl[n], failonerror = TRUE))   
   if (length(z) &gt; 1) {print(paste(newurl[n], &quot; exists&quot;, sep = &quot;&quot;))
      } else {print(paste(newurl[n], &quot; doesn't exist&quot;, sep =  &quot;&quot;))}
   }
</pre></p>
<p>What this does is uses RCurl to download the file into a variable z.  Then your system will check to see if z now contains the file.  </p>
<p>If the file doesn&#8217;t exist, getBinaryURL() returns an error, and your loop (if you are doing several files) will quit.  Wrapping the getBinaryURL() in try() means that the error won&#8217;t stop the loop from trying the next file (if you don&#8217;t trust me, try doing the above without the try wrapper).  You can see how wrapping this in a loop could quickly go through several files and download ones which exist.</p>
<p>I&#8217;d really like to be able to do this, but not actually download the whole file (e.g. just the first 100 bytes) to see how many files of interest have been created, and if enough have, then download them all.  I just can&#8217;t work out how to yet &#8211; I tried the range option of getBinaryURL() but this just crashed R.  This would be useful if you are collecting data in real time, and you know you need at least (for example) 80% of the data to be available before you jump into a computationally expensive algorithm.  </p>
<p>So, there must be an easier way to do all this, but can I find it? &#8230;</p>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/curl/'>Curl</a>, <a href='http://gossetsstudent.wordpress.com/tag/r/'>R</a>, <a href='http://gossetsstudent.wordpress.com/tag/rcurl/'>RCurl</a>, <a href='http://gossetsstudent.wordpress.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/192/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/192/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/192/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/192/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/192/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/192/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/192/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/192/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/192/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/192/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/192/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/192/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/192/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/192/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=192&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/09/01/how-to-check-if-a-file-exists-with-http-and-r/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>
	</item>
		<item>
		<title>An HSV colour wheel in R</title>
		<link>http://gossetsstudent.wordpress.com/2010/08/09/a-hsv-colour-wheel-in-r/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/08/09/a-hsv-colour-wheel-in-r/#comments</comments>
		<pubDate>Mon, 09 Aug 2010 22:51:25 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[colour wheel]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=174</guid>
		<description><![CDATA[If you&#8217;ve read any of my previous posts, you&#8217;ll notice that they&#8217;re rather scanty on colour. There&#8217;s a reason for this. Mainly, that to get a good colour output takes some time. I recently read a commentary in Nature methods (sorry if you don&#8217;t have access to it, but this looks like it may be [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=174&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;ve read any of my previous posts, you&#8217;ll notice that they&#8217;re rather scanty on colour.  There&#8217;s a reason for this.  Mainly, that to get a good colour output takes some time.  I recently read a commentary in Nature methods (sorry if you don&#8217;t have access to it, but this looks like it may be the first part of an interesting series of articles), which discusses <a href="http://www.nature.com/nmeth/journal/v7/n8/full/nmeth0810-573.html">colour in graphics</a>.  The author suggests a colour wheel, and I thought I&#8217;d have a go in R:</p>
<p><a href="http://gossetsstudent.files.wordpress.com/2010/08/colour-wheel.png"><img src="http://gossetsstudent.files.wordpress.com/2010/08/colour-wheel.png?w=300&#038;h=300" alt="" title="colour wheel" width="300" height="300" class="alignnone size-medium wp-image-177" /></a></p>
<p>You have to click on it to read the text, sorry.  There&#8217;s probably much easier ways to do it, and it takes a silly amount of time to render (several seconds! &#8211; all those nested loops), but this code below makes the colour wheel.  If you set the variables t.hue, t.sat and t.val, the bottom right box is the resulting colour (the box just to the bottom right of the colour wheel is the hue with sat and val set to 1.0).  Then on the right is the plot of val, and below is the plot of sat.  As you go anti-clockwise from the x axis round your hue increases from 0.0 to 1.0.</p>
<p>So you can play around with colour, see what works and what doesn&#8217;t.  This uses the <a href="http://en.wikipedia.org/wiki/HSL_and_HSV">HSV </a>approach, which seemed okay for my purposes.  rgb2hsv() converts rgb into hsv (obviously), if you are more familiar with the RGB approach.  There are lots of other resources for colour in R, one of my favourites is <a href="http://research.stowers-institute.org/efg/R/Color/Chart/">here</a>, and of course you can always search <a href="http://www.r-bloggers.com/">R-bloggers</a>.</p>
<p><pre class="brush: plain;">
## colour plot

require(graphics)
t.hue &lt;- 0.65     ## this is the user entered hue, sat and value
t.sat &lt;- 0.5
t.val &lt;- 0.9
def.par &lt;- par(no.readonly = TRUE)
layout( matrix(c(1,1,2,1,1,2,3,3,4), 3, 3, byrow = TRUE))

## prepare the plot for the wheel 
x &lt;- (-100:100)*0.01
y &lt;- (-100:100)*0.01
## blank plot to prepare the axis
plot(x,y, pch = 20, col = 0, bty = &quot;n&quot;, xaxt = &quot;n&quot;, yaxt = &quot;n&quot;, ann = F) 

## make the wheel
for (x in (-100:100)*0.01){
  for (y in (-100:100)*0.01){
    theta &lt;- atan2(y,x)     # theta is the angle
    hue &lt;-  Mod(theta/(pi)) # make the hue dependent upon the angle 
    sat &lt;- (x^2 + y^2)      # make the saturation depend upon distance from origin
    if (x^2 + y^2 &lt;= 1){
       if (y &gt; 0) {points(x,y, pch = 19, col = hsv(h = hue/2, s = sat, v = 1))}
       if (y &lt; 0) {points(-x,y, pch = 19, col = hsv(h = hue/2 + 0.5, s = sat, v = 1))}
      }
    }
  }
legend(&quot;center&quot;, &quot;hue&quot;, bty = &quot;n&quot;)
text(0.9,0, labels = &quot;0.0&quot;)
text(0,0.9, labels = &quot;0.25&quot;)
text(-0.9,0, labels = &quot;0.5&quot;)
text(0,-0.9, labels = &quot;0.75&quot;) 
## bottom right colour box inset into wheel
for (x in (80:100)*0.01){
  for (y in (-80:-100)*0.01){
    points (x,y, pch = 19, col = hsv(t.hue, s = 1, v = 1))
    }
  }

## right sided v scale 
x &lt;- (0:100)*0.01
y &lt;- (0:100)*0.01
plot(x,y, pch = 20, col = 0, xaxt = &quot;n&quot;, yaxt = &quot;n&quot;, bty = &quot;n&quot;, ann = F)
for (x in (50:100)*0.01){
  for (y in (0:100)*0.01){
    hue &lt;-  t.hue
    sat &lt;- 1
    points(x,y, pch = 19, col = hsv(h = hue, s = sat, v = y))
    }
  }
legend(&quot;topleft&quot;, &quot;value&quot;, bty = &quot;n&quot;)
arrows(0.0, t.val, 0.5, t.val,length = 0.01, angle = 20)

  ## bottom saturation scale 
x &lt;- (0:100)*0.01
y &lt;- (0:100)*0.01
plot(x,y, pch = 20, col = 0, xaxt = &quot;n&quot;, yaxt = &quot;n&quot;, bty = &quot;n&quot;, ann = F)
for (x in (0:100)*0.01){
  for (y in (0:50)*0.01){
    hue &lt;-  t.hue
    points(x,y, pch = 19, col = hsv(h = hue, s = x, v = 1))
    }
  }
legend(&quot;topleft&quot;, &quot;saturation&quot;, bty = &quot;n&quot;)
arrows(t.sat,1.0, t.sat, 0.5, length = 0.01, angle = 20)

## bottom right plot
x &lt;- (0:100)*0.01
y &lt;- (0:100)*0.01
plot(x,y, pch = 20, col = 0, xaxt = &quot;n&quot;, yaxt = &quot;n&quot;, bty = &quot;n&quot;, ann = F)
for (x in (0:25)*0.01){
  for (y in (0:100)*0.01){    
    points(x,y, pch = 19, col = hsv(h = t.hue, s = t.sat, v = t.val))
    }
  }
legtr &lt;- paste( &quot;hue=&quot;, t.hue, sep = &quot;&quot;)
legr  &lt;- paste( &quot;sat=&quot;, t.sat, sep = &quot;&quot;)
legbr &lt;- paste(&quot;val=&quot;, t.val, sep = &quot;&quot;)
legend(&quot;topright&quot;, legtr, bty = &quot;n&quot;)
legend(&quot;right&quot;, legr, bty = &quot;n&quot;)
legend(&quot;bottomright&quot;, legbr, bty = &quot;n&quot;)

## reset the graphics display to default
par(def.par)
</pre></p>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/colour-wheel/'>colour wheel</a>, <a href='http://gossetsstudent.wordpress.com/tag/r/'>R</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/174/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/174/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/174/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=174&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/08/09/a-hsv-colour-wheel-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/08/colour-wheel.png?w=300" medium="image">
			<media:title type="html">colour wheel</media:title>
		</media:content>
	</item>
		<item>
		<title>Summary plots</title>
		<link>http://gossetsstudent.wordpress.com/2010/08/02/159/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/08/02/159/#comments</comments>
		<pubDate>Mon, 02 Aug 2010 11:40:46 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=159</guid>
		<description><![CDATA[So, when you first look at some data, it&#8217;s helpful to get a feel of it. One way to do this is to do a plot or two. I&#8217;ve found myself continuously doing the same series of plots for different datasets, so in the end I wrote this short code to put all the plots [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=159&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>So, when you first look at some data, it&#8217;s helpful to get a feel of it.  One way to do this is to do a plot or two.   I&#8217;ve found myself continuously doing the same series of plots for different datasets, so in the end I wrote this short code to put all the plots together as a time saving device.  Not pretty, but gets the job done.  </p>
<p>The output looks like this:<br />
<a href="http://gossetsstudent.files.wordpress.com/2010/08/summary1.png"><img src="http://gossetsstudent.files.wordpress.com/2010/08/summary1.png?w=285&#038;h=300" alt="" title="summary" width="285" height="300" class="alignnone size-medium wp-image-169" /></a></p>
<p>So on the top a histogram with a normal distribution plot.  On the right a QQ normal plot, with an Anderson Darling p value.  Then in the middle on the left is the same data put into different numbers of bins, to see how this affects the look of the data.  And on the right, we pretend that each value is the next one in a time series with equal time intervals between readings, and plot these.  Below this is the ACF and PACF plots.  </p>
<p>Hope someone else finds this useful.  If there&#8217;s easier ways to do this, let me know.  To use the code &#8211; put your data into a text file as a series of numbers called data.txt in the working directory, and run this code: </p>
<p><pre class="brush: plain;">
## univariate data summary
require(nortest)
data &lt;- as.numeric(scan (&quot;data.txt&quot;))
# first job is to save the graphics parameters currently used
def.par &lt;- par(no.readonly = TRUE)
par(&quot;plt&quot; = c(.2,.95,.2,.8))
layout( matrix(c(1,1,2,2,1,1,2,2,4,5,8,8,6,7,9,10,3,3,9,10), 5, 4, byrow = TRUE))

#histogram on the top left
h &lt;- hist(data, breaks = &quot;Sturges&quot;, plot = FALSE)
xfit&lt;-seq(min(data),max(data),length=100)
yfit&lt;-yfit&lt;-dnorm(xfit,mean=mean(data),sd=sd(data))
yfit &lt;- yfit*diff(h$mids[1:2])*length(data)
plot (h, axes = TRUE, main = &quot;Sturges&quot;)
lines(xfit, yfit, col=&quot;blue&quot;, lwd=2)
leg1 &lt;- paste(&quot;mean = &quot;, round(mean(data), digits = 4))
leg2 &lt;- paste(&quot;sd = &quot;, round(sd(data),digits = 4)) 
legend(x = &quot;topright&quot;, c(leg1,leg2), bty = &quot;n&quot;)

## normal qq plot
qqnorm(data, bty = &quot;n&quot;, pch = 20)
qqline(data)
p &lt;- ad.test(data)
leg &lt;- paste(&quot;Anderson-Darling p = &quot;, round(as.numeric(p[2]), digits = 4))
legend(x = &quot;topleft&quot;, leg, bty = &quot;n&quot;)

## boxplot (bottom left)
boxplot(data, horizontal = TRUE)
leg1 &lt;- paste(&quot;median = &quot;, round(median(data), digits = 4))
lq &lt;- quantile(data, 0.25)
leg2 &lt;- paste(&quot;25th quantile =  &quot;, round(lq,digits = 4)) 
uq &lt;- quantile(data, 0.75)
leg3 &lt;- paste(&quot;75th quantile = &quot;, round(uq,digits = 4)) 
legend(x = &quot;top&quot;, leg1, bty = &quot;n&quot;)
legend(x = &quot;bottom&quot;, paste(leg2, leg3, sep = &quot;; &quot;), bty = &quot;n&quot;)


## the various histograms with different bins
h2 &lt;- hist(data,  breaks = (0:12 * (max(data) - min (data))/12)+min(data), plot = FALSE)
plot (h2, axes = TRUE, main = &quot;12 bins&quot;)

h3 &lt;- hist(data,  breaks = (0:10 * (max(data) - min (data))/10)+min(data), plot = FALSE)
plot (h3, axes = TRUE, main = &quot;10 bins&quot;)
 
h4 &lt;- hist(data,  breaks = (0:8 * (max(data) - min (data))/8)+min(data), plot = FALSE)
plot (h4, axes = TRUE, main = &quot;8 bins&quot;)

h5 &lt;- hist(data,  breaks = (0:6 * (max(data) - min (data))/6)+min(data), plot = FALSE)
plot (h5, axes = TRUE,main = &quot;6 bins&quot;)

## the time series, ACF and PACF
plot (data, main = &quot;Time series&quot;, pch = 20)
acf(data, lag.max = 20)
pacf(data, lag.max = 20)

## reset the graphics display to default
par(def.par)
</pre></p>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/r/'>R</a>, <a href='http://gossetsstudent.wordpress.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/159/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=159&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/08/02/159/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/08/summary1.png?w=285" medium="image">
			<media:title type="html">summary</media:title>
		</media:content>
	</item>
		<item>
		<title>Visualizing 3d data &#8211; plotting quartiles separately</title>
		<link>http://gossetsstudent.wordpress.com/2010/07/30/visualizing-3d-data-plotting-quartiles-separately/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/07/30/visualizing-3d-data-plotting-quartiles-separately/#comments</comments>
		<pubDate>Fri, 30 Jul 2010 11:47:59 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[General post]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=130</guid>
		<description><![CDATA[In this previous post, we&#8217;ve looked at displaying three dimensional data.  One major problem is when there is a high density of data, it can be difficult to see what&#8217;s going on in a 3 dimensional plot. One way of looking at the data in more detail is to break it up.  Take a look [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=130&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In this <a href="http://gossetsstudent.wordpress.com/2010/07/23/turning-your-data-into-a-3d-chart/">previous post</a>, we&#8217;ve looked at displaying three dimensional data.  One major problem is when there is a high density of data, it can be difficult to see what&#8217;s going on in a 3 dimensional plot.</p>
<p>One way of looking at the data in more detail is to break it up.  Take a look at this graph:<br />
<a href="http://gossetsstudent.files.wordpress.com/2010/07/quartiles-plot.png"><img src="http://gossetsstudent.files.wordpress.com/2010/07/quartiles-plot.png?w=600" alt="" title="quartiles plot"   class="alignnone size-full wp-image-145" /></a></p>
<p>This is a plot of data of air quality in Nottingham, UK, taken hourly in 2009 (the code to create it in base R is on the bottom of the page).  On the left is a scatterplot of NO2 against ozone (plot A).   The different colours indicate grouping the data by the level of ozone into quartiles.  On the right are plots of the NO vs NO2 for the same data, but a  separate plot for each quartile of the ozone data.  The points are all colour co-ordinated, so the red points indicating the upper quartile of the ozone data in plot A are matched by red points in plot B.</p>
<p>So you can see by comparing plot E and D, that at the lowest quartile of ozone levels, there is a greater spread of both NO2 and NO.</p>
<p>How this is done is pretty simple (most of the code is to make things vaguely pretty).  Essentially, the values for x,y and z are put into a matrix xyz.  The rows of the matrix are ordered according to the z variable.  The rows which deliniate each quartile are calculated, and then the plots for B to E of x vs y are drawn, using only the rows for that quartile.  The axes are plotted so that they are the same scale for each of the plots.  There&#8217;s not much room for the axis labels &#8211; so these are added afterwards with the legend command.  </p>
<p>Then on the left the plot for y (on the horizontal axis) and z (on the vertical axis) is drawn, with some added lines to show where the boundaries of each quartile lie.  The colours are stored in the xyz matrix in the col column.  Like most of my code, the graph is portable, you just need to input different values for x, y and z and re-label the names for each variable.  The <a href="http://www.box.net/shared/ucok2j31m7">original dataset</a> is the same one which I have used for my previous posts.  It is from the <a href="http://www.airquality.co.uk/data_and_statistics.php">UK airquality</a> database.  If you copy this file into your working directory and run the code below, you&#8217;ll repeat the plot.</p>
<p>Any suggestions for improvements / comments would be most appreciated!</p>
<p><pre class="brush: plain;">
## name the columns of the data
columns &lt;- c(&quot;date&quot;, &quot;time&quot;, &quot;NO&quot;, &quot;NO_status&quot;, &quot;NO_unit&quot;, &quot;NO2&quot;,
	&quot;NO2_status&quot;, &quot;NO2_unit&quot;, &quot;ozone&quot;, &quot;ozone_status&quot;, &quot;ozone_unit&quot;, 
	&quot;SO2&quot;, &quot;SO2_status&quot;, &quot;SO2_unit&quot;)
## read in the data, store it in variable data
data &lt;- read.csv(&quot;27899712853.csv&quot;, header = FALSE, skip = 7, 
	col.names = columns, stringsAsFactors = FALSE)

## now make the x,y and z variables

x &lt;- data$NO
y &lt;- data$NO2
z &lt;- data$ozone
cols &lt;- rep(1,length(z))

xyz &lt;- cbind (x,y)
xyz &lt;- cbind(xyz,z)
xyz &lt;- cbind(xyz,cols)
colq1 &lt;- 6
colq2 &lt;- 4
colq3 &lt;- 3
colq4 &lt;- 2

xl &lt;- &quot;NO&quot;
yl &lt;- &quot;NO2&quot;
zl &lt;- &quot;Ozone&quot;

point &lt;- 20 

# re order by z
xyz &lt;- xyz[order(xyz[,3]),]
# now define the row numbers for the quartile boundries
maxxyz &lt;-  nrow(xyz)
q1xyz &lt;- round(maxxyz/4)
medianxyz &lt;-  round(maxxyz/2)
q3xyz &lt;- round(maxxyz*3/4)

# assign colours to xyz$col
xyz[1:q1xyz,4] &lt;- colq1
xyz[q1xyz:medianxyz,4] &lt;- colq2
xyz[medianxyz:q3xyz,4] &lt;- colq3
xyz[q3xyz:nrow(xyz),4] &lt;- colq4

# define the maximum values for x,y, and z 
# these are used to ensure all the axes are the same scale
maxx &lt;- x[which.max(x)]
maxy &lt;- y[which.max(y)]
maxz &lt;- z[which.max(z)]


# now make the plot
# first job is to save the graphics parameters currently used
def.par &lt;- par(no.readonly = TRUE)
# define the margins around each plot
par(&quot;mar&quot; = c(2,2,0.5,0.5))
# make the layout for the plot
layout(matrix(c(5,1,5,2,5,3,5,4), 4, 2, byrow = TRUE))

# now do the four plots on the right
plot(xyz[q3xyz:maxxyz,1],xyz[q3xyz:maxxyz,2], col = colq4, 
	xlab = xl, ylab = yl, pch=point, xlim = c(0,maxx), 
	ylim = c(0,maxy))
legend(x = &quot;right&quot;, yl, bty = &quot;n&quot;)
legend(x = &quot;topright&quot;, &quot;B&quot;, bty = &quot;n&quot;)

plot(xyz[medianxyz:q3xyz,1],xyz[medianxyz:q3xyz,2], col = colq3,
	pch=point, xlim = c(0,maxx), ylim = c(0,maxy))
legend(x = &quot;right&quot;, yl, bty = &quot;n&quot;)
legend(x = &quot;topright&quot;, &quot;C&quot;, bty = &quot;n&quot;)

plot(xyz[q1xyz:medianxyz,1],xyz[q1xyz:medianxyz,2], col = colq2, 
	pch=point, xlim = c(0,maxx), ylim = c(0,maxy))
legend(x = &quot;right&quot;, yl, bty = &quot;n&quot;)
legend(x = &quot;topright&quot;, &quot;D&quot;, bty= &quot;n&quot;)

plot(xyz[0:q1xyz,1],xyz[0:q1xyz,2], col = colq1, pch=point, 
	xlim = c(0,maxx), ylim = c(0,maxy))
legend(x = &quot;right&quot;, yl, bty = &quot;n&quot;)
legend(x = &quot;bottom&quot;, xl, bty = &quot;n&quot;)
legend(x = &quot;topright&quot;, &quot;E&quot;, bty = &quot;n&quot;)

# now do the plot on the left
plot(xyz[,2],xyz[,3], col = xyz[,4], pch=point, xlim = c(0,maxy))
legend(x = &quot;bottom&quot;, yl, bty = &quot;n&quot;)
legend(x = &quot;right&quot;, zl, bty = &quot;n&quot;)
legend(x = &quot;topright&quot;, &quot;A&quot;, bty = &quot;n&quot;)

abline(h=xyz[q1xyz,3],col=3,lty=2)
abline(h=xyz[medianxyz,3],col=4)
abline(h=xyz[q3xyz,3],col=5,lty=2)

## reset the graphics display to default
par(def.par)
</pre></p>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/r/'>R</a>, <a href='http://gossetsstudent.wordpress.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/130/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/130/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/130/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=130&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/07/30/visualizing-3d-data-plotting-quartiles-separately/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/07/quartiles-plot.png" medium="image">
			<media:title type="html">quartiles plot</media:title>
		</media:content>
	</item>
		<item>
		<title>Turning your data into a 3d chart</title>
		<link>http://gossetsstudent.wordpress.com/2010/07/23/turning-your-data-into-a-3d-chart/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/07/23/turning-your-data-into-a-3d-chart/#comments</comments>
		<pubDate>Fri, 23 Jul 2010 17:28:05 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[3d graphic]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=107</guid>
		<description><![CDATA[Some charts are to help you analyse data. Some charts are to wow people. 3d charts are often the latter, but occasionally the former. In this post, we&#8217;ll look at how to turn your data into a 3d chart. Let&#8217;s use the data from this previous post. Use the code which turns the .csv spreadsheet [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=107&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Some charts are to help you analyse data.  Some charts are to wow people.  3d charts are often the latter, but occasionally the former.  In this post, we&#8217;ll look at how to turn your data into a 3d chart.  </p>
<p>Let&#8217;s use the data from this <a href="http://gossetsstudent.wordpress.com/2010/07/20/matrix-scatterplot-of-the-airquality-data-using-lattice/">previous post</a>.  Use the code which turns the .csv spreadsheet into 3 variables, x, y, and z.</p>
<p>3d charts generally need other packages.  We&#8217;ll kick off with scatterplot3d, which perhaps makes things too easy:</p>
<pre>
library(scatterplot3d)
scatterplot3d(x,y,z, highlight.3d = T, angle = 75, scale.y = .5)</pre>
<p><a href="http://gossetsstudent.files.wordpress.com/2010/07/scatter3d.png"><img class="alignnone size-medium wp-image-109" title="scatter3d" src="http://gossetsstudent.files.wordpress.com/2010/07/scatter3d.png?w=300&#038;h=271" alt="" width="300" height="271" /></a></p>
<p>The difficulty with 3d plots is that by definition, you&#8217;re looking at a 3d plot on a 2d surface.  Wouldn&#8217;t you like to be able to rotate that plot around a bit?  We&#8217;ll use the package rgl.  Then type:</p>
<pre>
library(rgl)
plot3d(x,y,z)</pre>
<p>This pulls up an interactive window which you can rotate.  Very helpful? Perhaps, but there are too many plots.  Perhaps you only want to look at the middle 33% of the plots (i.e. look at a subset of the plot)?</p>
<pre>
startplot &lt;- 33
endplot &lt;- 66
a &lt;- round(startplot/100*length(x))
b &lt;- round(endplot/100*length(x))
plot3d(x[a:b],y[a:b],z[a:b], col = heat.colors(1000))</pre>
<p>This looks much better.  We&#8217;ve said we&#8217;d start at 33% of the way through the x,y,z co-ordinates, and end at 66% with the startplot and endplot variables.  This is helpful &#8211; remember this is one year of data, and we&#8217;ve just displayed the middle of the year.  The heatmap also helps to distinguish between plots, but in this case it doesn&#8217;t add any extra data &#8211; more of that in posts to come.</p>
<p><a href="http://gossetsstudent.files.wordpress.com/2010/07/3dplot.png"><img src="http://gossetsstudent.files.wordpress.com/2010/07/3dplot.png?w=300&#038;h=202" alt="" title="3dplot" width="300" height="202" class="alignnone size-medium wp-image-118" /></a></p>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/3d-graphic/'>3d graphic</a>, <a href='http://gossetsstudent.wordpress.com/tag/r/'>R</a>, <a href='http://gossetsstudent.wordpress.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/107/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/107/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/107/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=107&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/07/23/turning-your-data-into-a-3d-chart/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/07/scatter3d.png?w=300" medium="image">
			<media:title type="html">scatter3d</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/07/3dplot.png?w=300" medium="image">
			<media:title type="html">3dplot</media:title>
		</media:content>
	</item>
		<item>
		<title>Quick scatterplot with associated histograms</title>
		<link>http://gossetsstudent.wordpress.com/2010/07/22/quick-scatterplot-with-associated-histograms/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/07/22/quick-scatterplot-with-associated-histograms/#comments</comments>
		<pubDate>Thu, 22 Jul 2010 09:17:46 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=82</guid>
		<description><![CDATA[R can produce some beautiful graphics, and there are some excellent packages, such as lattice and ggplot2 to represent data in original ways.  But sometimes, all you want to do is explore the realtionship between pairs of variables with the minimum of fuss. In this post we&#8217;ll use the data which we imported in the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=82&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>R can produce some beautiful graphics, and there are some excellent packages, such as lattice and ggplot2 to represent data in original ways.  But sometimes, all you want to do is explore the realtionship between pairs of variables with the minimum of fuss.</p>
<p>In this post we&#8217;ll use the data which we imported in the <a href="http://gossetsstudent.wordpress.com/2010/07/20/getting-going-importing-data-and-plotting-a-simple-graphic/">previous post</a> to make a quick graphic.  I&#8217;ll assume you already got as far as importing the data and placing the variable for NO concentration into x and ozone into y.</p>
<p>We&#8217;re going to make a scatterplot with the histogram of x below the x axis, and the histogram of y rotated anti-clockwise through 90 degrees and alongside the y axis (all will become clear).  The first thing is to set up the graphics display:</p>
<pre>## start by saving the original graphical parameters
def.par &lt;- par(no.readonly = TRUE)
## then change the margins around each plot to 1
par("mar" = c(1,1,1,1))
## then set the layout of the graphic
layout(matrix(c(2,1,1,2,1,1,4,3,3), 3, 3, byrow = TRUE))</pre>
<p>The layout command tells R to split the graphical output into a 3 by 3 array of panels.  Each panel is given a number corresponding to the order in which graphics are plotted into it.  To see this array, type:</p>
<pre>matrix(c(2,1,1,2,1,1,4,3,3), 3, 3, byrow = TRUE)</pre>
<p>This output shows that the display is split into 4 zones.  The top right is a large area for plot one, the top left is a smaller panel for plot 2, and the bottom right is for plot 3.</p>
<p>So then, we need something for the top right &#8211; a straight forward scatter plot of x vs y (we set the maximum for the x axis with the xlim parameter of plot and using the maxx variable, which contains the maximum value held in the vector:</p>
<pre>maxx &lt;- x[which.max(x)]
maxy &lt;- y[which.max(y)]
plot(x, y, xlab = "", ylab = "", pch = 20, bty = "n", 
   xlim = c(0, maxx), ylim = c(0,maxy))</pre>
<p>Then, we need to create a histogram of the y values, and plot it to the left of the histogram appropriately orientated.  To do this we first store a histogram into the variable yh, and then plot it with the barplot command.  The reason for this is that barplots can be easily rotated:</p>
<pre>breaks &lt;- 50
yh &lt;- hist(y, breaks = (maxy/breaks)*(0:breaks), plot = FALSE)
barplot(-(yh$intensities),space=0,horiz=T, axes = FALSE)</pre>
<p>The breaks variable stores the number of bins into which the histogram is divided, maxy is the maximum value for the vector y, yh is the histogram, and then barplot extracts the heights of the bars from the histogram object draws it as a bar chart, but flips it on its side.  The negative sign before yh$intensities points the bars to the left rather than the right.<br />
We do the same for the x values, and also then reset the graphics display to defaults.</p>
<pre>xh &lt;- hist(x, breaks = (maxx/breaks)*(0:breaks), plot = FALSE)
barplot(-(xh$intensities),space=0,horiz=F, axes = FALSE)
## reset the graphics display to default
par(def.par)</pre>
<p>We get this output:<br />
<a href="http://gossetsstudent.files.wordpress.com/2010/07/scat-hist.png"><img src="http://gossetsstudent.files.wordpress.com/2010/07/scat-hist.png?w=271&#038;h=300" alt="" title="scat-hist" width="271" height="300" class="alignnone size-medium wp-image-100" /></a><br />
The advantage of this over the straight scatterplot is that you can see the density of overlapping points on the histogram.  I&#8217;ve set the number of bins in the histogram to 50 &#8211; it&#8217;s worth playing around with this with your data.  There are more elegant ways of doing this, but if you have paired variables x and y, and you want to quickly look at their distributions and association, this code works fine.</p>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/r/'>R</a>, <a href='http://gossetsstudent.wordpress.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/82/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/82/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/82/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/82/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/82/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/82/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/82/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/82/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=82&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/07/22/quick-scatterplot-with-associated-histograms/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/07/scat-hist.png?w=271" medium="image">
			<media:title type="html">scat-hist</media:title>
		</media:content>
	</item>
		<item>
		<title>Matrix scatterplot of the Airquality data using lattice</title>
		<link>http://gossetsstudent.wordpress.com/2010/07/20/matrix-scatterplot-of-the-airquality-data-using-lattice/</link>
		<comments>http://gossetsstudent.wordpress.com/2010/07/20/matrix-scatterplot-of-the-airquality-data-using-lattice/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 23:47:43 +0000</pubDate>
		<dc:creator>respiratoryclub</dc:creator>
				<category><![CDATA[General post]]></category>
		<category><![CDATA[Lattice]]></category>
		<category><![CDATA[Matrix scatterplot]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://gossetsstudent.wordpress.com/?p=66</guid>
		<description><![CDATA[In this post we will build on the last one, and create a matrix scatterplot. The package lattice allows for some really excellent graphics. In case you haven&#8217;t already seen it I recommend the R Graph Gallery for some examples of what it can do &#8211; browse the graphics by package used to create them. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=66&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In this post we will build on the <a href="http://gossetsstudent.wordpress.com/2010/07/20/getting-going-importing-data-and-plotting-a-simple-graphic/">last one</a>, and create a matrix scatterplot.  The package <a href="http://cran.r-project.org/web/packages/lattice/index.html">lattice</a> allows for some really excellent graphics.  In case you haven&#8217;t already seen it I recommend the <a href="http://addictedtor.free.fr/graphiques/">R Graph Gallery</a> for some examples of what it can do &#8211; browse the graphics by package used to create them.  We&#8217;ll use the same dataset as last time, where we made a plot of the NO levels in the atmosphere vs ozone levels for Nottingham, UK.</p>
<p>First step is to load the lattice package.</p>
<pre>require("lattice")</pre>
<p>Download the dataset from <a href="http://www.box.net/shared/ucok2j31m7">here</a>, and put the file in your working directory.  Now we&#8217;ll put the dataset into the matrix <span style="color:#993366;">data</span>.</p>
<pre>columns &lt;- c("date", "time", "NO", "NO_status", "NO_unit",
      "NO2", "NO2_status", "NO2_unit", "ozone", "ozone_status",
      "ozone_unit", "SO2", "SO2_status", "SO2_unit")
data &lt;- read.csv("27899712853.csv", header = FALSE,
      skip = 7, col.names = columns, stringsAsFactors = FALSE)
x &lt;- data$NO
y &lt;- data$ozone
z &lt;- data$SO2</pre>
<p>So that it&#8217;s easier to follow, I&#8217;ve extracted 3 vectors from the matrix: <span style="color:#993366;">x</span>, <span style="color:#993366;">y</span>, and <span style="color:#993366;">z</span>.   These are the columns of the data for NO, ozone and SO2.  Hopefully this will help you follow things.  When working with graphs, I usually do this (in the last post I extracted x and y).  If I make a nice graphic I can then &#8220;cut and paste&#8221; it into another program, and just change the data in <span style="color:#993366;">x</span>, <span style="color:#993366;">y</span> and <span style="color:#993366;">z</span> and hey presto, the same graphic is instantly used with new data.</p>
<p>For a matrix scatterplot, we need to make a matrix of the variables to compare.  We join the vectors into a matrix and then name the columns.</p>
<pre>mat &lt;- cbind(x,y)
mat &lt;- cbind(mat,z)
colnames(mat) &lt;- c("NO", "ozone", "SO2")</pre>
<p>You can look at the first 10 lines of mat with</p>
<pre>mat[1:10,]</pre>
<p>Finally we create the matrix plot:</p>
<pre>title &lt;- "Matrix scatterplot of air polutants"
print(splom(mat, main = title))</pre>
<p>The final result is here:<br />
<a href="http://gossetsstudent.files.wordpress.com/2010/07/matrix-scatterplot.png"><img class="alignnone size-medium wp-image-71" title="matrix scatterplot" src="http://gossetsstudent.files.wordpress.com/2010/07/matrix-scatterplot.png?w=300&#038;h=300" alt="" width="300" height="300" /></a></p>
<p>For those unfamiliar with scatterplots &#8211; this plot is essentially 3 scatterplots of x vs y, x vs z and y vs z.  The middle left plot is the scatterplot created in this <a href="http://gossetsstudent.wordpress.com/2010/07/20/getting-going-importing-data-and-plotting-a-simple-graphic/">previous post</a>.  The package lattice can do lots more than this &#8211; get help on line for it with the command</p>
<pre>?lattice</pre>
<br /> Tagged: <a href='http://gossetsstudent.wordpress.com/tag/lattice/'>Lattice</a>, <a href='http://gossetsstudent.wordpress.com/tag/matrix-scatterplot/'>Matrix scatterplot</a>, <a href='http://gossetsstudent.wordpress.com/tag/r/'>R</a>, <a href='http://gossetsstudent.wordpress.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/gossetsstudent.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/gossetsstudent.wordpress.com/66/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/gossetsstudent.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/gossetsstudent.wordpress.com/66/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/gossetsstudent.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/gossetsstudent.wordpress.com/66/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/gossetsstudent.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/gossetsstudent.wordpress.com/66/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/gossetsstudent.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/gossetsstudent.wordpress.com/66/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/gossetsstudent.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/gossetsstudent.wordpress.com/66/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/gossetsstudent.wordpress.com/66/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/gossetsstudent.wordpress.com/66/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=gossetsstudent.wordpress.com&amp;blog=14765443&amp;post=66&amp;subd=gossetsstudent&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://gossetsstudent.wordpress.com/2010/07/20/matrix-scatterplot-of-the-airquality-data-using-lattice/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/f827197d6ac59cd0224e67e5f6e684a4?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">respiratoryclub</media:title>
		</media:content>

		<media:content url="http://gossetsstudent.files.wordpress.com/2010/07/matrix-scatterplot.png?w=300" medium="image">
			<media:title type="html">matrix scatterplot</media:title>
		</media:content>
	</item>
	</channel>
</rss>
