Müller & Guido, Introduction to Machine Learning with Python (O’Reilly, 2016)

From O’Reilly and others, there’s been a profusion of data science books in the past few years. Given that many of these books are intended to introduce readers to data science methods and tools, it’s perhaps unsurprising that many of these books overlap at various points: you’ve got to introduce the reader to NumPy, pandas, matplotlib and the rest somehow, after all.

Müller & Guido’s Introduction to Machine Learning with Python is distinct from many of these other works in both its stated aims and in its execution. In contrast to many of the more introductory books on data science, Müller & Guido give readers with a serious interest in the practice of machine learning a thorough introduction to scikit-learn. That is to say, their Introduction largely eschews coverage of the data science tools often treated in introductory data science texts (though they briefly note the other tools they draw upon in Chapter 1). At the same time, because their book focuses on practice and scikit-learn, they neither discuss the mathematical underpinnings of machine learning, nor do they cover writing algorithms from scratch.

What is here is a comprehensive overview of things already implemented in scikit-learn (which is a considerable amount, as they show). More precisely, they focus on classification and regression in supervised learning, and clustering and signal decomposition in unsupervised learning. If your interest falls in those areas (particularly the former), their coverage is quite good. Chapters 2 and 3 discuss the algorithms for supervised and unsupervised learning respectively, and in considerable detail. That said– and though it’s somewhat less thorough– I might turn to the discussion of some of the same algorithms in Chapter 5 of VanderPlas’ Python Data Science Handbook before Müller & Guido’s; VanderPlas’ treatment is more conversational and less dry. (Note, however, that Müller & Guido do cover more territory.) Similarly, I was left wanting more from Chapter 7’s coverage of working with text.

Müller & Guido’s book really shines, though, when it discusses all of the other things that go into machine learning, beyond their march through the algorithms themselves. Chapter 4 discusses ways to numerically model categorical variables, also (briefly) covering ANOVA and other techniques of feature selection; Chapter 5 covers cross-validation and techniques for carefully tuning model parameters; Chapter 6 compellingly explains the importance of using the Pipeline class to prevent data leakage (during preprocessing, for example); and Chapter 8 discusses where scikit-learn and Python fit within the wider horizons of machine learning. The strongest parts of the book, then– and the parts where it’s the most fun to read– are where Müller & Guido discuss the practical details of machine learning. (One wonders if they felt a bit hamstrung by avoiding the mathematics of the algorithms they discuss.) There are points where the book is less engaging than other introductory data science books, but then it’s not really in the same category; rather than an introductory overview of the entire landscape, Müller & Guido provide a clear, comprehensive, detailed guidebook to one particular part of the map.

Advertisements

Why Twitter has lost; Or, against brevity

In the wake of the rumors that Twitter was going to up its character limit, I started spiffing up my Twitter profiles: I added a few photos, started adding people to my various lists, and even started using it a bit more. Then, of course, it seems that those rumors provoked such a backlash within the hardcore Twitter community that Jack Dorsey was forced to shelve any modifications to the format. Here we have, in a nutshell, the reason that Twitter has lost: it’s utterly unwilling to make any modifications to its established product that might make it attractive or useful to those who aren’t already committed users.

For better or worse, for example, I’m friends with a lot of colleagues on Facebook. This is annoying– sometimes I just want to post something silly or random, and it’s annoying that I have so many professional colleagues mixed up in my FB. (And yes, I know there are ways to tweak that, but who has the time?)

In a way that’s only rivaled by a very few high quality email listservs, Facebook is the place I go to hear what people in my field are talking about and working on (and from a usability standpoint, it’s actually easier to skim and follow discussions on Facebook than in my Gmail). My colleagues make comments about work they’ve been doing, share fellowship, grant, and job postings, pose questions, and generally take advantage of the fact that we’re all working on our computers N hours a day. My colleagues from grad school have a really phenomenal little group that often contains very specialized questions: requests for bibliography, questions about translations, and the like. It would be nice, in some ways, if some of these discussions were on Twitter: we could draw on the breadth of Twitter’s userbase, have discussions in real time to a greater extent, and get away from some of the ickiness that attaches to FB (and perhaps bring in people who stay away from FB because of said ickiness).

But just as a for instance, I was messing around this morning with looking at the character length of these discussions. These aren’t Tolstoyan ruminations or Herodotean digressions: most of these discussions are sparked by a brief, sometimes humorous comment someone has made about their work or something they’ve found in their research. Perhaps unsurprisingly, almost all of them are over 140 characters. Even just setting up the necessary context for many of these comments takes more than 140 characters. The only comments by my colleagues that fall under the 140 character limit are quick, humorous, and usually relate to popular culture (and so don’t have anything to do with professional communication at all).

Let’s be clear here: these are professional writers, and ones who’ve had a lot of success, too. These are people who write books and articles, and who and communicate for a living, who– as their posts make clear– are constantly engaged in the process of moving their ideas from insights to well crafted arguments, and for a variety of audiences, too. The argument that these people are incapable of concision and brevity strikes me as completely off base. (I make no such claims about my own capacity for brevity, however.)

The reality, I think, is that Twitter works great for subjects where everyone knows what you’re talking about: if you’re just railing about the latest idiotic or offensive thing that Trump has said, or some piece of celebrity gossip (and we all do), you don’t need any context. If you’re have something to say on subjects that require context or nuance, forget it.

But it’s not just that: I’m frequently astonished at how often Twitter falls down at its core functions: many, many times the most salient or compelling quotation on the news just won’t fit into 140 characters: I found a great analysis of some of the religious freedom legislation that’s been going through legislatures around the country a while ago, but the best quotations from and the core insights of the article just wouldn’t fit into 140 characters, and so I never ended up posting that analysis.

The result is that other services are eating Twitter’s lunch. Facebook, as I said, is pretty standard for a lot of scholars in my field. People in visually or design-oriented fields make a lot more use of Instagram than they do of Twitter. But it’s more than the fact that people self-select into platforms tailored to what they do instead of Twitter: it’s that these platforms are constructed in such a way that allows for novel kinds of use, and meaningful discussions beyond (and perhaps in spite of) the intentions of the platform’s designers in particular. I was surprised by how much substantive discussion there was on Instagram, for example, after the Freddie Gray murder and the unrest in Baltimore, and in a way that totally changed my feelings about the platform. Facebook allows for (even if it does not always brilliantly facilitate) real moments of connection: a friend going through medical difficulties, contact after a long period of disconnection, political debate that (sometimes) goes beyond kneejerk reaction. (And these are just things that have happened to me in the past week or so.) Every time I’ve tried to recommit to Twitter, I’ve had the opposite reaction: a lack of users beyond a narrow band of journalists, technology writers, and bots; friends with accounts who never post anything (and whose tweets get lost pretty quickly in the maelstrom); and, above all, the utter lack of any meaningful contact or communication through the platform, and the sheer disinterest of the company in fostering it. Twitter’s decision to stick with the current design of their broken platform may keep its users in the short term, but will do little to win anyone else over.

Building Web Apps with WordPress, by Messenlehner and Coleman (O’Reilly, 2014)

Writers of books on WordPress are presented with a bit of a quandary, I think. On the one hand, one of the best resources for working with WordPress is the WordPress Codex itself, which is free, complete, regularly updated, and can cover a lot more territory than neatly fits within the covers of any one book. On the other hand, writers of WordPress books have to contend with the fact that a phenomenal book on WordPress already exists, Williams, Damstra, and Stern’s Professional WordPress: Design and Development. Messenlehner and Coleman’s Building Web Apps with WordPress enters this crowded field and acquits itself reasonably well. It’s no Professional WordPress, and it’s not the book it might have been, but it is a solid addition.

The book covers a lot of territory, starting from the basics of WordPress as a CMS and an app platform all the way to how to optimize your WordPress performance. The basic idea of the book, then, is that it will take you from some basic understanding of WordPress and WordPress plugins through to scaling and optimizing your wildly successful app in a production environment. Early chapters introduce WordPress and give some rough idea of how it works. Chapters 4 through 8 are the core of the book, and cover themes, custom post types, users and roles, other miscellaneous APIs and objects, and security. Later chapters introduce more specialized, supplementary topics such as mobile WordPress apps, and ecommerce apps.

Despite the clear layout of the chapters, the organization could be better, and important material is hidden in unexpected places. For example, the section of chapter 5 on custom post types does not actually cover the functions used to work with post metadata. Fundamental concepts like the loop, hooks, and the standard WordPress global variables are not in chapter 2, on WordPress Basics, but buried in Chapter 3, on Leveraging WordPress Plugins.

The quality of the chapters varies. Some chapters of the book are introductory overviews while others are advanced discussions; some are crammed full of advice, insight, and helpful code examples while others are essentially a function reference (a wider failing of PHP books, in my experience). Thorough, insightful discussions of WordPress development are scattered through the book: their comparison of custom taxonomies and post metadata in chapter 5, for example, is one of the best discussions I’ve seen. In general, though, I think the book is hampered by the decision to make it cover WordPress from basics to advanced topics. This means that the book competes with Professional WordPress on its own turf (not to mention a whole host of other books that cover the basics of WordPress), rather than striking out for fresher territory.

Messenlehner and Coleman do have experience designing and building apps, and it would have been interesting to get a deeper perspective on the nuances of WordPress app development. For one, there are a range of ways to interact with your data in WordPress, everything from the WP_Query class to the $wpdb object to using custom tables. Some of this is touched on, in chapter 3 and much later in chapter 16. But the commitment of the book to the whole basics-to-advanced gamut means that these discussions are less sustained, and less helpful, than they might have been if they had just dropped the pretense. This might also help to resolve some of the organizational problems the book has: they discuss working with custom tables in chapter 3, but the full explanation doesn’t come until chapter 16. (Part of the explanation has to do with performance when querying post metadata, which is not discussed in the discussions of post metadata in chapters 2 and 5.)

For similar reasons, the book uses a single app as the example throughout the book (their Schoolpress app). As a number of reviewers on Amazon have pointed out, this sample app is not in fact complete (in private beta, at the moment), nor is the code up on github. If this app was in the early design stages when the book was written, one possibility would have been to give more thorough consideration to a range of examples, a range of design possibilities: an app where much of the work is in the theme, and the code in the functions.php file; a middle-of-the-road app, with some custom post types; and a very complex app like Schoolpress. I don’t think the fact that the Schoolpress app is incomplete is entirely fatal, but it seems like a missed opportunity: if the development process of Schoolpress hasn’t gone as smoothly as anticipated, the book might well have been enriched by the lessons of the development process.

Though I’ve dwelt on the book’s problems, the book contains insightful discussions of working with WordPress and making your app work well– though they may not be where you’d expect. With some reorganization, and a clearer sense of the book’s purpose, a second edition of this book may well earn a place next to Professional WordPress as an essential work for WordPress development.

Facebook as a community

I hate Facebook, for many reasons (and I am not using the word lightly). I think it’s despicable how Facebook presents you with a choice between economic exploitation and social connection, and I think it speaks to a deep erosion of what is good in American society: there are fewer and fewer spaces for human existence without the omnipresent hum of monetization, even exploitation. I like what I’ve seen of social networking tools beyond Facebook (esp. Friendica), and I’m looking forward to using them more.

That being said, I think Facebook feels more like a community than it did a couple of years ago (at least for the people I know). There’s a group of people I know will be on Facebook on a regular basis; I can message them or post something with them in mind and know they’ll probably see it.

I’m sure there are technical changes that have imperceptibly adjusted the Facebook experience, and it probably helps that Facebook has changed from something novel and stylish to something functional. But in the end, I think it’s less that the technology has changed than that people have changed the way they use it. When Facebook started, there was a certain pressure to burnish one’s reputation, to be aspirational in choosing what to post. Now, Facebook has so many people at different ages that aspirational posts are less salient than they used to be: a stylish post might just as easily receive a comment from your mom or brother as a carefully curated set of likes (and it’s kind of delightful to see this sort of misfire in Facebook posting). Facebook has become a community space in some way, a neighborhood bar rather than a trendy wine bar (or whatever is trendy: no fucking idea). And as numerous are its problems, and as hateful I find its venality, it’s hard for me not to see something valuable in the way we use Facebook now.

the joys of programming

I spend so much of my time reading and writing and doing things that are sort of intangible, that it’s been a real pleasure to be doing some (light) programming, and having things just work (or often, not. But then being able to fix them and getting them to work). There’s a longer post in here on the intellectual pleasures of different kinds of language, and probably also something to be said in favor of Perl and CPAN, but I’ll just leave it at this point for now.

Random bits

Things that have caught my attention recently:
C64 SID music (as in the excellent compilation here: https://www.youtube.com/watch?v=Sq9ZZ8zilDw ) How did I not know this exists? And is often so good? (h/t to Jimmy Maher’s excellent blog.)

Similarly: the Cygwin project. How nice it is to have a *nix terminal in Windows (especially if you alias some Windows files in Cygwin)! This is totally something I’m going to install on every computer from now on.

New EMA – pretty fucking rad.

The DM Genie – I was already excited looking at the screenshots of this, but then I saw the ones where you can simulate the weather… damn.

textual remixing

So, I was just reading a discussion of why there hasn’t been a big backlash against DRM in ebooks, and one answer was that most people don’t want to remix the content in their books.

But I thought of a really compelling reason for textual remixing: it would be really great to be able to make your own Loebs, in effect, by remixing the ebooks of new translations of texts with source texts– source texts which are, at least in some cases, freely available online. There have been interesting tablet-based apps for reading different texts, but it would be great if you could simply sync up translations with source texts on a tablet.

And the more I read ebooks, the more I think that the current set-up (in Kindle, for example, with an ebook linked directly to a dictionary) is pretty suboptimal. Reading Wilkie Collins, for example, there are a lot of expressions, words, etc., that are old-fashioned. God knows there are a lot of readers of Victorian novels, and it would be great if they could simply comment on words and phrases, and then have those comments be reincorporated into the ebooks themselves. Ebooks, and even their comments, are pretty lightweight…