statscloud

StatsCloud and R

How StatsCloud works alongside the programming language

When I first started developing statscloud, I had a choice to make about how to handle the actual "stats" part of it. Naturally, there were a few options open to me on this; I could either use an already well-established statistics package (courtesy of R) to run all the statistical analyses, or I could develop my own one from scratch. So, what to do?

Using R was a very tempting option. It's widely regarded as the best tool for maths and statistics, it has a huge array of libraries, and excellent community support. Using R in an app is actually pretty simple too as, if you're using it to handle all the stats, the only thing you need to do is write the code to communicate with it. Really, the biggest advantage of using R is that the functions for pretty much every statistical analysis have already been written for it, and the thought of rewriting all those myself brought me out in hives.

There's a problem though: R doesn't run on smartphones or tablets, and that's a bit of a problem when lots of people use them. As touch devices are massively popular (and overtaking desktops and laptops as peoples' go-to device), creating a stats app that couldn't be installed or run on them simply wasn't an option for me. So, this needed a bit of thought.

As it happens, you can make use of R on a touch device, but only when you have it running on a server somewhere and tether your device to it so they can send data backwards and forwards to each other constantly. There's nothing really wrong with that (that's how a lot of websites work) but, for an app focused solely on mathematical computation, it creates an awful lot of overhead you don't need. Also, to interact constantly with a server, you'd usually need to pay to access one - or set one up and maintain it yourself - and noone wants to do that.

No. By far the best solution is to run the analyses on your device. Fortunately, every device already has a programming language built into it: Javascript, the programming language of your web browser. Javascript already powers some advanced modern-day web apps (such as Google Docs and Microsoft Office 365) and, being a programming language, just like R, it's capable of running sophisticated mathematical computations too. So, why involve R on a server at all? If you're already using a programming language that's capable of running the analyses you want, firing up another one to do it and waiting for it to give you the results seems a bit pointless. It's like preheating the oven to cook a pizza and then ordering one in.

For these reasons, I decided against using R in statscloud. Practically, it just didn't make sense. Instead, all the statistical analyses are done on your device in Javascript, so there's no need to communicate constantly with a server. When you load up statscloud, you load up the entire stats package with it, so everything is native to your device. That means you can carry on using statscloud without an internet connection, and you don't have to worry about sending lots of data backwards and forwards across a network.

I didn't want statscloud to just be a user interface for R either because, ironically, the best thing about R is that it doesn't have one. R is simply a programming language that uses a command-line interface to work, and it's precisely this that makes it so powerful. In any R script, you can include the code to download, install and load up any library you want, run any analyses you want, and then run everything on any computer you want. Provided the R file includes everything you need, everything will just work.

As soon as you make a user interface to replace the command-line interface in R, you lose all of that flexibility - and sort of undermine the point of it. If a webapp like statscloud used R for its back-end, you'd end up with the worst of both worlds; no freedom to customise R and complete dependency on a server to access it. There would be a cost in user-experience too; sending data over the internet and getting some back again adds a fair bit of latency to your workflow, and you'll get a face full of spinny wheels every time you want to run or refresh an analysis.

In statscloud, I wanted to get the best of both: an app that breaks free from servers wherever possible and gives you an option to export to R locally if you want to. This, to me, seems to fit a good educational model for learning statistics too. You start of with a nice, user-friendly graphical representation of what you want to do then, when you're ready to, migrate to R and settle in for the more advanced stuff.

I recognise how valuable it is to learn R and to enjoy the flexibility the command-line interface gives you. Because of this, I've spent quite a lot of time working on the new "R Export" window so you can do exactly this. This feature allows you to pack up your whole project and move it to R any time you want; when you've just declared your data or you've finished running your analyses. Of course, just dumping all the R code on your lap without any context isn't all that helpful, so I've made sure to annotate every line in the console so you can follow exactly what R is doing every step of the way.

In summary, R is a brilliant tool when it's installed on your device with access to its command-line interface. As the back-end for a web app though? Not so much. If you want to use R in your data analysis, nothing beats actually using R, and the best environment to use R in is one you have full control over. That means having it running locally on your computer, with access to the Command Line so you can install and run any packages you want.

No stats app really can serve as a replacement for R, and statscloud doesn't try to; instead, it provides a stepping stone to R and helps transition new users to it. So, if you think you'd like to get started with R, but you'd like a nice graphical user interface to help get you there, you're in the right place!

Daniel