Although I’ve been an R user for some time, and have taught a variety of courses in R for statistics, I’ve never been a great user of the data science elements of R; I had a little spare time over the summer and have been trying to catch up with the tidyverse, mostly by starting with Hadley Wickham’s excellent book, R for Data Science

Whilst I’m not sure I’ll ever be a data scientist, I find the power of this quite amazing, especially compared to how I used to teach graphing in R. It does take a little more time, but filtering large data sets in R, and graphing becomes a breeze.

I’ve been working for some time on a statistical model for test cricket, which seems quite promising. I’ve used the yorkr package , modified a little for test cricket, in order to download every ball of test cricket from the excellent cricsheet website. There’s some 415 published test matches, and after some data issues I’ve so far successfully converted 399 of them.

Anyway, to demonstrate how easy it is to get interesting results using the tidyverse, here’s some data on the number of runs scored and overs faced for each test wicket.

 

Continue reading

I do a lot of teaching in various forms, and I am constantly recommending resources to students. Here is a collection of some of the most frequently useful resources.

Business Statistics/ MBA/ Statistics for Economics

There are many first courses in Business statistics at Undergraduate level that spend a lot of time talking about samples for market research, inference from samples, hypothesis testing, using the normal distribution, t distribution, chi-squared distribution. Many students haven’t done A-level statistics, and find this difficult. I find these books useful:

 

 

0. For an introduction to the mathematics of what is needed, this book is very detailed; it even explains, for example, that 5t means “5 multiplied by t”, so doesn’t assume knowledge that university students might have not covered, or not remembered from school. For those struggling with mathematics rather than statistics, this is an ideal book.

 

 

1. The Schaums Outlines Series is very good indeed. It provides a very quick overview of the problem, and then lots of worked examples, and then exercises with solutions. The only way to succeed at Mathematics and Statistics is to practice, and this book gives lots of opportunity to practice. I’ve included a link to a latest edition on amazon, but these books can be picked up on the internet for just a few pounds.

There are two different versions, one aimed for a straight statistics course, and the second aimed particularly at business students. They both share the same kind of material, and depends on how applied your course is. There are other inexpensive books in the series.

 

2. Much of the stuff introduced on Business/MBA courses, is actually A Level (High School) statistics in disguise! In the UK, many bright students don’t study statistics even at A Level, which is a shame, so often students come across is as part of a Business/Economics course at undergraduate or even Masters level for the first time. There are a lot of free resources online which are good, but as a textbook I recommend A Concise Course in Advanced Level Statistics with worked examples. There are various older editions of this book with fewer examples, but as statistics taught at school hasn’t changed in 30 years, you can probably safely pick up an old edition for a few pounds.

 

 

If anyone wants the pleasure (ahem) of learning how to do statistics with R on a short course, or knows someone that does, there’s a course in April with a really excellent tutor. Cannot speak highly enough of him.

Computing and Modelling with R

10th-12th April 2018

The course is split into three days; participants can attend one day or more. All days will consist of interactive workshops, together with  time for guided computational practice on the material, supported by the lecturer and additional experts on the R language. Lunch will be provided on each day. Computers are provided, or participants can use their own laptop.

Day 1 is suitable for people with no experience of R, and will be an introduction to programming in R. There is little mathematical statistical knowledge assumed, and will be an introduction to the programming language.

Day 2 will be suitable for those that have attended Day 1, or who have some previous experience in  R. It will give an overview of statistical modelling in R.

Day 3 will focus on more advanced techniques for programming in R. It will focus on methods for visualisation in data science, with applications driven from Biological applications, and assumes some programming knowledge in R, such as that from Day 1 of the course.

More details here

“I have two children, one of whom is a boy born on a Tuesday. What is the probability that my other child is a boy.” This is a great fun puzzle that you don’t need much background in  probability to understand…

I made a youtube video to demonstrate the answer- well really to play with some new software and do something  useful to do. However, the answer isn’t the only one that is reasonable