Janssens, Data Analysis at the Command Line (O’Reilly, 2014)

It normally takes me a week or two to read through a new tech book, but Janssens’ Data Science at the Command Line went by quickly. In part, this was because I was unusually excited about the premise of the book. I’ve been working with a number of my own data files recently, both on the command line and in Perl, and I was eager to learn new tricks and techniques. Does Janssens’ book live up to my (admittedly high) expectations? Partly, but the book was also a quick read because it’s more limited than I had hoped.

To start with the positive, Janssens’ book introduces users to a number of the most important command line tools: sed, awk, and grep, among others. A real strength of the book is that Janssens covers a number of lesser-known tools that are welcome additions to the usual suspects: jq (for working with JSON data), curlicue (a curl variant that handles the hassle of OAuth authentication), and the tools of csvkit (for both working with CSV files and converting other formats to CSV). Janssens has even written a few of his own tools that serve to soften the sometimes steep learning curve of the command line.

Furthermore, Janssens gives a helpful overview of ways of working with data on the command line. Like many users, I know a fair amount about working with text at the command line, but Janssens opens up topics like creating attractive visualizations and using GNU Parallel for managing parallel commands. In giving this overview, Janssens demonstrates how the philosophy of the *nix command line can be applied to data analysis. However, the book seems to be intended to prove the viability of doing data analysis at the command line more than to serve as a systematic introduction. Important points are occasionally glossed over; the book fails to mention that regular users will be unable to chmod files outside of their home directory without sudo, for example (pp. 44-5). Likewise, I imagine many readers would benefit from a clear discussion of using the tee command to drop data into a file when you’re piping data all over the place. Janssens gives examples of using sed and awk, but with only brief explanations of how they operate; I imagine that many users will need to turn to the clearer, more systematic discussions in other resources (like Classic Shell Scripting or Unix Power Tools) to really move beyond the examples Janssens provides.

Furthermore, if you’re more comfortable with another way of working with your data than the command line, I’m not convinced that the command line is always the best approach. Some of the approaches Janssens suggests are rather clunky, for example. There are heaps of XML (and HTML) data out there, but the book suggests the awkward approach of converting HTML to JSON to CSV. Having spent time fussing with XML parsing, I genuinely understand the attraction of this approach, but it would have been nice if he’d covered both proper XML parsing as well as just dumping everything into CSV files. (To be frank, I’m not sure there is a robust way to work with XML on the command line, though.)

In conclusion, then, Janssens’ book is worth a read, and I will be exploring the possibilities of command line data analysis in greater detail after reading this book. On the other hand, Janssens book is something of a missed opportunity: it is not the final statement on the subject, and is a bit skimpy as an introductory resource.

Advertisements

DuBois, MySQL Cookbook, 3rd ed. (O’Reilly, 2014)

MySQL (and SQL more generally) is a funny beast; or maybe it’s more like a familiar (but still odd) country. You can get around on the subway and get your bearings on the street, but you go to the post office or try to do something at the bank and realize that you’re still not thoroughly acclimated to the country. I’ve felt that way about MySQL from time to time– I have a sense of how to write basic queries, but there are parts of MySQL I don’t visit because I don’t have any idea they’re there; and there are things about MySQL that still seem a bit odd. The new, third edition of DuBois’s MySQL Cookbook is a rich handbook for MySQL, a Baedeker for those occasions when you need to venture out into the MySQL countryside.

There are many, many helpful recipes in this book, to the extent that picking particular ones out for praise becomes difficult. The discussion of working with strings and character set collations is quite nice, and shows how to do a fair amount within MySQL; Chapter 9 (on stored routines and triggers) is great; and the discussion of joins in Chapter 14 was one of the best I’ve read. I’ve felt at times that my ability to work with MySQL has been hampered by an imperfect sense of its possibilities; and browsing the Cookbook demystifies and demonstrates how to do useful things in MySQL.

However, one weakness of the book is that it tells you how to do lots of things in MySQL (and often covers the range of ways to do things in MySQL) without giving a sense of whether it makes sense to do it in MySQL at all as opposed to a proper programming language. For example, the book discusses regular expressions within SQL, but leaves for the end the important point that SQL regular expressions do not work with multibyte character sets (UTF-8, most crucially).  Similarly, some of the non-MySQL code in the book seems old-fashioned; chapters 18-21 include some Ruby and Python code, but lean very heavily on Perl. To return to the travel metaphor, it’s like a travel guide that is very solid on its main subject, but much less reliable about the towns across the border. On the other hand, though, the book is quite good at telling you what is MySQL-dependent, and what is portable across SQL and database implementations. As this qualification suggests, this caveat only reduces the usefulness of the book slightly, though; as a guide to using MySQL, the book is both thorough and impressive.

Fluent Conference 2013: JavaScript & Beyond Complete Video Compilation (O’Reilly)

I want to begin this review by emphasizing that I am not primarily a Javascript person, though it’s hard to do web stuff and not need to do Javascript now and again. I also want to point out that, as other reviewers have noted as well, I have not had time to watch all of these videos, and further exploration will likely uncover more useful material from this conference.

That said, I found the videos from this conference only somewhat useful for viewers, like me, who only work with Javascript in passing. Some of the topics– such as the extended tutorial on AngularJS– will probably seldom be of use to the casual JS user (though I thought this presentation was very well done). Other topics were more immediately relevant– such as Manor’s presentation on improving your jQuery, but Manor’s occasionally nervous manner distracted from the content of his presentation to some extent. Not all of the presenters of this conference were equally confident as public speakers; this might not be a problem in the moment, but might make the collection somewhat less useful as a point of reference.

I wonder, too, whether the value of such a video collection is diminished by the passing of time. Brendan Eich has given a more recent talk on the state of Javascript in the interim, and it’s unclear whether Content Security Policy, the subject of Ben Vinegar’s talk and a method of preventing XSS by restricting the domain of origin of a script, is still a live subject in 2014– much of the activity on the subject seems to have petered out in 2013, after a lot of activity in 2011 and 2012.

The advantage of such a conference (and such a video collection) is that it can give you insight into the way things work in the wild, in production environments, and in the leading companies in the world– insight into the way that leading figures in the field are currently thinking about a subject. This thinking can take a while to be codified in books and other instructional materials. As a casual JS user, I hesitate to say that I found the snapshot of the Javascript world from this 2013 conference to be essential viewing, but other viewers may get more from it.

Then again, one of the perks of attending a conference– and even watching the videos online– is serendipity: as an afterthought you attend a talk that turns out to be quite useful. I found many of the non-JS talks to be rather good. McKesson and and Wilson’s brief talk on “responsive publishing” and the O’Reilly Atlas project– the interface of books, ebooks, and web publishing– was actually really fascinating; Kalin’s talk on licensing was too brief, but interesting and informative; and Verou’s discussion of web standards was illuminating.  Similarly, Bootstrap isn’t something that’s been on my radar, but I was glad to have an extended introduction in Jen Kramer’s tutorial.

Sklar and Trachtenberg, PHP Cookbook (3rd ed., 2014)

Sklar and Trachtenberg’s PHP Cookbook is a difficult book to review; the book is clearly written with at least two different audiences in mind, and this means that parts of the book vary in sophistication and depth. On the one hand, the book is intended in part to complement Sklar’s (2004!) Learning PHP 5, to serve as a second book for PHP novices, to cover some of the many topics that book’s “PHP with training wheels” approach did not. On the other hand, the book is intended for readers who are familiar with the basics of the language who want to learn how to do things well in PHP.

For the beginner, for example, Chap. 6 begins with an introduction to functions, and Chap. 7 on objects likewise begins with a very gentle introduction to objects; Chap. 1 covers the basics of working with substrings, and Chap. 18 introduces some issues in PHP security. However, beginners might be better served by reading the relevant sections in Tatroe et al.’s 2013 Programming PHP. Both cover much of the same ground, but I find Programming‘s coverage to be clearer and in greater depth.

This book also does a lot of what it says on the tin by providing a reference for a lot of situations you might run into when programming PHP. Need to work with email? Drop in some regular expressions? Mess around with an object using array syntax? (Look at 4.25 for the latter– a nice trick.) I personally don’t spend all of my time in PHP, and it’s nice to have code snippets at hand when you need them.

More than these individual recipes, the value of the book to my mind lies in the sections for the programmer who realizes that there are different ways to tackle a particular problem in PHP. PHP comes with a heap of built-in functions (some of which are redundant); these functions are further complemented by libraries, packages, and software that extend and supplement the core of PHP. A lot of PHP programming is simply coming to terms with all of these competing ways to do things, and this is one of the strengths of the Cookbook. Sklar and Trachtenberg often tell you which function to use, and why (though some parts simply list different possibilities without much differentiation). To pick an example at random, (*rimshot*) they explain why the (built-in) mt_rand() function is better than the (built-in) rand() function for generating random numbers within a particular range. This is not something Programming PHP is always good at, actually, and its function reference simply lists all of the functions without explaining differences between them (php.net can often be helpful in this regard, too).

It limits the usefulness of the book, however, that you have to go digging for these sections, that you don’t know in advance whether the recipe you’re interested in is a brief or introductory discussion for the beginning user or a helpful guide through the PHP wilds. In some ways, this book is like the maps nature parks often give out to tourists: some parts only give you a vague idea, while other parts of the map can be a reliable guide to the terrain. Sklar and Trachtenberg’s PHP Cookbook can still help you to get around, but it’s a good idea to keep your wits about you, and to make use of other resources, as well.

Korpela, Unicode Explained (O’Reilly, 2006)

Korpela’s Unicode Explained was originally intended for three audiences, I think. The first was the casual user who might need to make some basic use of Unicode in everyday life (entering a little bit of Unicode in Windows, for example). The second was the advanced user who might need to draw on some Unicode wizardry in a few specialized cases: programming, HTML or other markup, or the internet. The final audience is those wanting an introduction to the principles behind Unicode without making a brute force attack on the Unicode Standard itself. The passage of time means that the book may now be less useful for either casual users or advanced users (who really do need current information). Nevertheless, Korpela’s work remains helpful, and his discussion of the theoretical side of Unicode is excellent, both clear and nuanced.

Major changes have happened in the Unicode world since this book was written. Characters and scripts have been added of course, but the real difference is that Unicode support is much more pervasive than it was when the book was written. Unicode is much, much more common on web sites now, Emacs 24 finally has long-overdue support for certain Unicode features such as bidi text (though vim is still a holdout in this case), and most programming languages have come to adopt Unicode as fundamental to the way that strings work (such as the adoption of Unicode in Python 3).

Very many of these changes occurred after the 2006 publication date of Korpela’s book, and this means that at points the book reads like a period piece– the changes were in the foreseeable future, but not there yet. This also means that some parts of the book are very out of date. The section on Perl, for example, is completely out of date: however sluggish it may have seemed in 2006, Perl has now adopted Unicode to such an extent that it’s even changed some of the fundamental ways Perl works. Long-beloved character class shortcuts speak Unicode now, which means it’s often less trouble to just use full character classes. (For more on Unicode in Perl, check out the relevant sections in the llama and– if you’re brave of heart– in the camel.)

Much of the utility of a book like this is expert discussion of such advanced topics; having to check the book’s information against more recent sources defeats the purpose. On the other hand, my sense is that because much of the core structure of Unicode was in place in 2006, many of the basic ways of working with Unicode are the same. Though I’m no expert, my sense is that working with Unicode in HTML is unchanged, even for complicated stuff (bidi), while working with Unicode in MS Word can still be a pain.

But the real reason to read this book isn’t so much the practical advice– much of which you’d be better off looking up on StackExchange anyway– but the lucid explication of the structure and design of the Unicode framework. Not all of the explanations are equally clear– I found the first chapter a bit muddled, oddly– but Korpela remains a useful guide to the Unicode terrain.

Learning PHP, MySQL, JavaScript, CSS, & HTML5, by Robin Nixon (3rd ed., O’Reilly, 2014)

If you work around your house or mess with electronics, you come to have a sentimental attachment to your toolbox (or at least I do). You and your toolbox have seen a lot together. More than this, there’s a practicality to your toolbox: in a small space, you have a lot of what you need to handle a variety of situations. Robin Nixon’s Learning PHP, MySQL, JavaScript, CSS, & HTML5 is the web programming equivalent of a well-stocked toolbox. It’s not going to have what you need for all possible situations as a web programmer, but it packs a lot of utility into a compact space.

The core of the book is Nixon’s concise coverage of the basics of PHP, MySQL, and JavaScript. The book then delves into each of these topics in a further chapter or two– giving some further ways to use PHP, or tips on working with MySQL databases. Along the way, Nixon covers many of the most common ways readers are likely to want to use these tools: working with forms, cookies, sessions, and authentication, for example. For such a comprehensive book, it does an admirable job of thoroughly explaining topics (such as AJAX) that other books often skim over with a few code snippets. The coverage of topics as substantial as these in a single book enforces brevity, and might suggest that topics get less coverage than they merit. It is to Nixon’s great credit, I think, that there are far fewer gaps than one might expect. The only really egregious one, to my mind, is that jQuery only merits a brief mention (on p. 420).

The second major part of the book moves from the web programming side of things to consider CSS and HTML5. Aspiring web designers should be aware that– despite the book’s occasional claims to be for those who want to learn how to “style and lay out” web pages (p. xxii)– this is not the book from which to learn the nuances of web design with CSS and HTML5. The book does not cover the new semantic elements in HTML5 (though an explanation for this is given on p. 601), nor does it cover all of web design’s intricacies (divs, spans, floats). The book’s chapters on CSS could serve as a helpful primer or refresher for the web programmer who needs to do some light web design work, though. The coverage of CSS3 and HTML5 is good; Nixon discusses many of the ways that HTML5 and CSS3 are changing (and often simplifying) the way to do things on the web, from streamlining layout and display to displaying audio and video, while still explaining how to support older browsers.

This may not be the final word on web programming– given how much things are in transition at the moment, it is hard to know how any single book could be– but as an introduction and a practical set of tools, this book is nevertheless recommended.

Building Web Apps with WordPress, by Messenlehner and Coleman (O’Reilly, 2014)

Writers of books on WordPress are presented with a bit of a quandary, I think. On the one hand, one of the best resources for working with WordPress is the WordPress Codex itself, which is free, complete, regularly updated, and can cover a lot more territory than neatly fits within the covers of any one book. On the other hand, writers of WordPress books have to contend with the fact that a phenomenal book on WordPress already exists, Williams, Damstra, and Stern’s Professional WordPress: Design and Development. Messenlehner and Coleman’s Building Web Apps with WordPress enters this crowded field and acquits itself reasonably well. It’s no Professional WordPress, and it’s not the book it might have been, but it is a solid addition.

The book covers a lot of territory, starting from the basics of WordPress as a CMS and an app platform all the way to how to optimize your WordPress performance. The basic idea of the book, then, is that it will take you from some basic understanding of WordPress and WordPress plugins through to scaling and optimizing your wildly successful app in a production environment. Early chapters introduce WordPress and give some rough idea of how it works. Chapters 4 through 8 are the core of the book, and cover themes, custom post types, users and roles, other miscellaneous APIs and objects, and security. Later chapters introduce more specialized, supplementary topics such as mobile WordPress apps, and ecommerce apps.

Despite the clear layout of the chapters, the organization could be better, and important material is hidden in unexpected places. For example, the section of chapter 5 on custom post types does not actually cover the functions used to work with post metadata. Fundamental concepts like the loop, hooks, and the standard WordPress global variables are not in chapter 2, on WordPress Basics, but buried in Chapter 3, on Leveraging WordPress Plugins.

The quality of the chapters varies. Some chapters of the book are introductory overviews while others are advanced discussions; some are crammed full of advice, insight, and helpful code examples while others are essentially a function reference (a wider failing of PHP books, in my experience). Thorough, insightful discussions of WordPress development are scattered through the book: their comparison of custom taxonomies and post metadata in chapter 5, for example, is one of the best discussions I’ve seen. In general, though, I think the book is hampered by the decision to make it cover WordPress from basics to advanced topics. This means that the book competes with Professional WordPress on its own turf (not to mention a whole host of other books that cover the basics of WordPress), rather than striking out for fresher territory.

Messenlehner and Coleman do have experience designing and building apps, and it would have been interesting to get a deeper perspective on the nuances of WordPress app development. For one, there are a range of ways to interact with your data in WordPress, everything from the WP_Query class to the $wpdb object to using custom tables. Some of this is touched on, in chapter 3 and much later in chapter 16. But the commitment of the book to the whole basics-to-advanced gamut means that these discussions are less sustained, and less helpful, than they might have been if they had just dropped the pretense. This might also help to resolve some of the organizational problems the book has: they discuss working with custom tables in chapter 3, but the full explanation doesn’t come until chapter 16. (Part of the explanation has to do with performance when querying post metadata, which is not discussed in the discussions of post metadata in chapters 2 and 5.)

For similar reasons, the book uses a single app as the example throughout the book (their Schoolpress app). As a number of reviewers on Amazon have pointed out, this sample app is not in fact complete (in private beta, at the moment), nor is the code up on github. If this app was in the early design stages when the book was written, one possibility would have been to give more thorough consideration to a range of examples, a range of design possibilities: an app where much of the work is in the theme, and the code in the functions.php file; a middle-of-the-road app, with some custom post types; and a very complex app like Schoolpress. I don’t think the fact that the Schoolpress app is incomplete is entirely fatal, but it seems like a missed opportunity: if the development process of Schoolpress hasn’t gone as smoothly as anticipated, the book might well have been enriched by the lessons of the development process.

Though I’ve dwelt on the book’s problems, the book contains insightful discussions of working with WordPress and making your app work well– though they may not be where you’d expect. With some reorganization, and a clearer sense of the book’s purpose, a second edition of this book may well earn a place next to Professional WordPress as an essential work for WordPress development.