Skip to content

He’s getting ready to write a book

Eric Novik does some open-source planning:

My co-author, Jacki Buros, and I [Novik] have just signed a contract with Apress to write a book tentatively entitled “Predictive Analytics with R”, which will cover programming best practices, data munging, data exploration, and single and multi-level models with case studies in social media, healthcare, politics, marketing, and the stock market.

Why does the world need another R book? We think there is a shortage of books that deal with the complete and programmer centric analysis of real, dirty, and sometimes unstructured data. Our target audience are people who have some familiarity with statistics, but do not have much experience with programming. . . .

The book is projected to be about 300 pages across 8 chapters. This is my first experience with writing a book and everything I heard about the process tells me that this is going to be a long and arduous endeavor lasting anywhere from 6 to 8 months.

Novik emailed me and wrote:

The work seems overwhelming. I always wondered how you manage to produce such high volume of high quality content. What’s the secret?

The first secret is, I wouldn’t try to write a book in 6 to 8 months. The first edition of Bayesian Data Analysis took several years. Each new edition took awhile too. So if “long and arduous” to you means “6 to 8 months,” I think your time management skills are already much better than mine!


  1. tom says:

    I look forward to it!!

  2. My favorite offering on programming practices is Hunt and Thomas’s Pragmatic Programmer. I won’t say “best practices” because of the business-speak associations and the false implication that there is a best way to run every project. A one-off project to munge data for a paper is very different from writing flight controller software, with public R packages somewhere in between. I also liked the first of Beck’s Extreme Programming books (which aren’t that extreme, actually, in their focus on testing and pair programming and simple incremental design goals); but the whole thing got merged into the “agile” thing, which is too dogmatic and again too business-speak-like for me. McConnell’s Code Complete is classic and packed with reasonable advice, but I find it a bit dry.

    It’s much easier to write a book of independent chapters and case studies than to write a narrative monograph that builds as it goes. This divide-and-conquer approach is also the key to modular programming.

    Technical writing is like programming in that testing is essential. So my biggest suggestion is to get feedback early and often and listen to it. You can do what Andrew does and dole out different chapters to different people for feedback (more divide and conquer). I usually write with my past self as an intended audience, but alas, I don’t have my past self as a test reader.

    • Eric Novik says:

      Thanks Bob. Good advice.


      PS I like McConnell’s Code Complete. I also like “Think Python: How to Think Like a Computer Scientist.”

  3. Ethan Bolker says:

    Ben Bolker wrote _Ecological Models and Data in R_ (, which provides his community the combination of theory, programming and real data in his discipline that you propose for yours. It took him a lot longer than a year, with much class testing and concomitant feedback along the way.

    One further recommendation: several reviews (see some at note that one of the book’s strengths is that it was written with sweave (knitr would do too) so that the R code in the text is (essentially) error free because it is interpreted by R every time TeX is invoked to typeset it.

    Truth in advertising – I’m a proud dad.

    • Eric Novik says:

      Thanks Ethan, I will take a look. We originally wanted to write the book using knitr, but tech trade book publishers like Apress are not set up to handle LaTeX.