Goodliffe, Becoming a Better Programmer (O’Reilly, 2014): A non-professional’s take

Goodliffe’s Becoming a Better Programmer is marketed to a wide range of readers: to veterans, newcomers, and also to those who do some programming on the side as a hobby (Hi!). This is not entirely accurate– the book clearly has professional developers in mind most of the time– but I found the book to be an interesting discussion of aspects of the art and craft of programming all the same.

The first two parts of the book discuss a number of features of writing and working with code, both the theoretical/philosophical side and lower-level issues like producing and maintaining consistently formatted code. These sections are clearly oriented primarily towards professional developers, who are probably working in a production environment, have an existing codebase to work with, and may well be under pressure to skimp on design or testing in order to ship code more quickly. Even so, in talking about all of the problems that particular code or a particular codebase can have, these parts also talk at length about the principles and design behind good, sane code, and I found these sections useful and interesting. He discusses cohesion and coupling, omitting needless code (the YAGNI principle), and producing simple and sufficient code– along with practical advice about stuff like testing and version control.

The last three parts of the book are concerned with the softer side of being a developer, both personally and interpersonally– working well with your team, responding to superiors, and even personal things like ethical considerations and the importance of good posture. These sections are lighter weight (and often briefer), and sometimes repetitively summarize earlier points in the book. But they’re an easy read, and can be humorous in a way that the sometimes strained jokes of other sections aren’t. (For example, Goodliffe talks about your relationship with your primary language as a marriage, but then notes that, unlike most marriages, it can be quite helpful to play around on your “spouse.”)

It’s worth pointing out that Goodliffe’s book seems much more oriented towards discussion than to armchair reading. Each chapter takes up a subject, discusses different approaches to that subject (sometimes briefly), and then gives a set of questions. In most cases, Goodliffe is undogmatic– he lays out his position in the text, but the questions leave open the possibility that other experienced developers might have a different take. This format seems like it would work well for reading with a mentor (as the book suggests) or even a book club.

Goodliffe’s language-agnostic approach makes the book broadly accessible but also somewhat abstract. I think the book would have been stronger if he were clearer about applying principles to particular languages. Goodliffe’s book will not replace the resources that give advice and best practices for the idiomatic use of whatever language you’re working in, therefore, but it’s a quick read, and may get you thinking about the way you code, even if it’s only something you do in your spare time.

Janssens, Data Analysis at the Command Line (O’Reilly, 2014)

It normally takes me a week or two to read through a new tech book, but Janssens’ Data Science at the Command Line went by quickly. In part, this was because I was unusually excited about the premise of the book. I’ve been working with a number of my own data files recently, both on the command line and in Perl, and I was eager to learn new tricks and techniques. Does Janssens’ book live up to my (admittedly high) expectations? Partly, but the book was also a quick read because it’s more limited than I had hoped.

To start with the positive, Janssens’ book introduces users to a number of the most important command line tools: sed, awk, and grep, among others. A real strength of the book is that Janssens covers a number of lesser-known tools that are welcome additions to the usual suspects: jq (for working with JSON data), curlicue (a curl variant that handles the hassle of OAuth authentication), and the tools of csvkit (for both working with CSV files and converting other formats to CSV). Janssens has even written a few of his own tools that serve to soften the sometimes steep learning curve of the command line.

Furthermore, Janssens gives a helpful overview of ways of working with data on the command line. Like many users, I know a fair amount about working with text at the command line, but Janssens opens up topics like creating attractive visualizations and using GNU Parallel for managing parallel commands. In giving this overview, Janssens demonstrates how the philosophy of the *nix command line can be applied to data analysis. However, the book seems to be intended to prove the viability of doing data analysis at the command line more than to serve as a systematic introduction. Important points are occasionally glossed over; the book fails to mention that regular users will be unable to chmod files outside of their home directory without sudo, for example (pp. 44-5). Likewise, I imagine many readers would benefit from a clear discussion of using the tee command to drop data into a file when you’re piping data all over the place. Janssens gives examples of using sed and awk, but with only brief explanations of how they operate; I imagine that many users will need to turn to the clearer, more systematic discussions in other resources (like Classic Shell Scripting or Unix Power Tools) to really move beyond the examples Janssens provides.

Furthermore, if you’re more comfortable with another way of working with your data than the command line, I’m not convinced that the command line is always the best approach. Some of the approaches Janssens suggests are rather clunky, for example. There are heaps of XML (and HTML) data out there, but the book suggests the awkward approach of converting HTML to JSON to CSV. Having spent time fussing with XML parsing, I genuinely understand the attraction of this approach, but it would have been nice if he’d covered both proper XML parsing as well as just dumping everything into CSV files. (To be frank, I’m not sure there is a robust way to work with XML on the command line, though.)

In conclusion, then, Janssens’ book is worth a read, and I will be exploring the possibilities of command line data analysis in greater detail after reading this book. On the other hand, Janssens book is something of a missed opportunity: it is not the final statement on the subject, and is a bit skimpy as an introductory resource.

DuBois, MySQL Cookbook, 3rd ed. (O’Reilly, 2014)

MySQL (and SQL more generally) is a funny beast; or maybe it’s more like a familiar (but still odd) country. You can get around on the subway and get your bearings on the street, but you go to the post office or try to do something at the bank and realize that you’re still not thoroughly acclimated to the country. I’ve felt that way about MySQL from time to time– I have a sense of how to write basic queries, but there are parts of MySQL I don’t visit because I don’t have any idea they’re there; and there are things about MySQL that still seem a bit odd. The new, third edition of DuBois’s MySQL Cookbook is a rich handbook for MySQL, a Baedeker for those occasions when you need to venture out into the MySQL countryside.

There are many, many helpful recipes in this book, to the extent that picking particular ones out for praise becomes difficult. The discussion of working with strings and character set collations is quite nice, and shows how to do a fair amount within MySQL; Chapter 9 (on stored routines and triggers) is great; and the discussion of joins in Chapter 14 was one of the best I’ve read. I’ve felt at times that my ability to work with MySQL has been hampered by an imperfect sense of its possibilities; and browsing the Cookbook demystifies and demonstrates how to do useful things in MySQL.

However, one weakness of the book is that it tells you how to do lots of things in MySQL (and often covers the range of ways to do things in MySQL) without giving a sense of whether it makes sense to do it in MySQL at all as opposed to a proper programming language. For example, the book discusses regular expressions within SQL, but leaves for the end the important point that SQL regular expressions do not work with multibyte character sets (UTF-8, most crucially).  Similarly, some of the non-MySQL code in the book seems old-fashioned; chapters 18-21 include some Ruby and Python code, but lean very heavily on Perl. To return to the travel metaphor, it’s like a travel guide that is very solid on its main subject, but much less reliable about the towns across the border. On the other hand, though, the book is quite good at telling you what is MySQL-dependent, and what is portable across SQL and database implementations. As this qualification suggests, this caveat only reduces the usefulness of the book slightly, though; as a guide to using MySQL, the book is both thorough and impressive.