Thursday, July 19, 2007

Quicktime / iTunes Update breaks Eclipse?

I upgraded my Quicktime and iTunes today and found that all of a sudden I couldn't compile anything in Eclipse! Apparently, the update got rid of the /System/Library/Java/Extensions/QTJSupport.jar file, and that shows up as part of the default Java 1.5 JRE definition in Eclipse. When it went missing, Eclipse freaked out.


The fix, found at note19 turned out to be simple.



Open the eclipse Preferences... menu and select Java > Installed JREs...; make sure that eclipse can locate the OS X Java 1.5. If it cannot (as was in my case), you manually add it. It is in the following folder:

/System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Home

Works like a charm now.


Wednesday, July 11, 2007

ProScope HR: Coolest Geek Toy Ever?

In preparation for my impending fatherhood, I've been keeping an eye out for fun things to do with my daughter once she's got a few years on her. Like any good geek, I'm looking for fun, geeky things to do that will pique her interest in the world around her and stimulate her little brain. I think I found something that fits the bill: the Pro Scope HR.


We were watching Alton Brown's Pretzel Logic episode and he was using this microscope hooked up to his PowerBook to look at the differences between various kinds of salt. Apparently it's also featured heavily on the various CSI shows.


I remember the microscope I had growing up. It was your standard light microscope; I think it went up to 200x magnification. It came with a number of prepared slides of various microorganisms, which were cool, but making new slides was kind of tedious. Plus, dealing with a microscope while wearing glasses has always been a pain. With a ProScope, you can take a look at anything you damn well please, not to mention taking high-resolution still photos and live-action and time-lapse videos. I can just see us following our daughter around the house, laptop in tow, looking at everything from pennies to raspberries to bugs in the backyard to our cats' paws.


I think that'd be pretty cool, anyway.


Properly setting up a crontab entry using 'date' to generate timestamps

I have a custom backup script that I want to run every night. I'd also like to have all the output (standard and error) redirected to a timestamped log file for subsequent review. My first stab at defining a cron job to handle this was something like the following:

00 0 * * * ~/bin/backup.sh > backup_$(date + %Y-%m-%d).log 2&>1

The only problem is it doesn't work!

Cron sends you messages pertaining to failed jobs in the system mail queue, which on Mac OS X you can access using the 'mailx' program, which comes with the system. Doing so, I saw this message:

/bin/sh: -c: line 1: unexpected EOF while looking for matching `)'

A little broswing the cron man pages turned up that "%" (as well as "#") is interpreted as a comment character. Looks like my timestamp generation was causing the job to fail. Grrr.

The solution? Escape each "%" with a backslash. The functional cron job definition is

00 0 * * * ~/bin/backup.sh > backup_$(date + \%Y-\%m-\%d).log 2&>1

Sanity restored.

Tuesday, July 10, 2007

Liskov's Substitution Principle, equals, and Hibernate Proxies

Hibernate's CGLIB dynamic proxy classes reared their ugly head today. I've been using the Eclipse-generated hashcode() and equals() methods in my domain objects for quite a while, with no problems. Then today I write a test that does a simple equality check and everything blows up!


The problem seems to be due to the way my equals() methods were written, and how Hibernate's default dynamic proxy strategy interacts with that. I was doing this:



if (getClass() != obj.getClass())

return false;
However, in my exploding test, one of my objects being compared was a regular old domain object, while the other was a proxied object. Since the proxied object that Hibernate creates is actually technically a subclass of my class (with additional Hibernate-specific methods and such), my getClass()-based equality test was choking badly; after all, com.foo.MyClass is most definitely not com.foo.MyClass$$EnhancerByCGLIB$$beb95050.

It turns out that this is due to something called the Liskov Substitution Principle, which basically formalizes the intuition behind the inheritance portion of the object-oriented programming model; if S is a subtype of T, then you can use an instance of S wherever an instance of T is called for and nothing breaks. The corollary would be that if something does break, then some of your assumptions might need re-examining.

In this case, specifying the equals() method in terms of getClass() is too restrictive; Hibernate proxies should be able to be used anywhere one of my domain objects is used (that's the whole point!). A way to solve this problem is offered by Josh Bloch in his Effective Java book: use an instanceof-based test instead:



if (!(obj instanceof MyClass))

return false;
Here, the proxied object, as a subclass of MyClass, is also an instanceof MyClass. Since the CGLIB proxy doesn't override equals(), the proxy inherits the same implementation of equals() as the base class, thus maintaining the symmetric property any valid equals() implementation must have. If the subclass did override equals(), then things would be different, but then you'd have a violation of the Liskov Substitution Principle. Also, as Bloch states in Effective Java, page 30:


It turns out that this is a fundamental problem of equivalence relations in object-oriented languages. There is simply no way to extend an instantiable class and add an aspect while preserving the equals contract. (emphasis in original)
If you're paranoid, you can declare your implementation of equals() to be final, so you can be sure that it is never overridden. Since the CGLIB proxy doesn't try to override it, you're safe.

The "downside" of this approach, if it can be called that, is that each "terminal" domain object needs its own implementation of equals() (note that it's (obj instanceof MyClass), not (obj instanceof getClass())); in other words, you can't define a general equals() in a superclass and let it do all the heavy lifting for inheriting classes. However, in the grand scheme of things, I don't really see that as much of a downside. Yeah, it's a bit of a pain to write equals() methods, but it has to be done anyway (it's your job as a designer), and it is arguably a more accurate approach to take. As a designer, you need to be aware of the implication of what you code. If the getClass() method works for you, fine; just be aware of what that implies. Ditto for the instanceof method. I'm convinced that in my particular case, the instanceof approach is the semantically correct one.


Update: I just checked my copy of Java Persistence with Hibernate and, sure enough, they use instanceof in their equals() implementations. Clearly, the interaction with the proxies is a driving reason to use this formulation. Apart from that, though, I still think that using instanceof is more semantically correct.

Helpful Links


Thursday, June 28, 2007

Cost-based vacuum delay caveat in Postgres

I've been trying to vacuum a 25M row table in Postgres and it has been taking forever; we're talking over 22 hours (thought I'd let it run as I flew to Philadelphia for this conference). A bit of Googling turned up this thread:

VACUUM ANALYZE taking a long time, %I/O and %CPU very low

This guy was seeing the same behaviour as I was: VACUUM ANALYZE was taking forever, and CPU and I/O percentages were hovering around 0. He had the "vacuum_cost_delay" parameter set to 70, which means that Postgres will go to sleep for 70ms when it determines that the I/O costs have exceeded a certain limit ("vacuum_cost_limit"). Since a 25M row table isn't going to fit into memory, there's going to be a good deal of reading in blocks from the disk, and thus you're going to regularly exceed your delay threshold.

Somehow I had set my delay to 500ms. No wonder it was taking so long. I dropped it down to 0, effectively disabling the cost-based delay feature. Now, 10 minutes later, my table has been vacuumed and analyzed.

Now, you can use the autovacuum daemon to vacuum your tables, and the pg_autovacuum table (where you specify table-specific vacuum parameters) will let you set a value for vacuum_cost_delay. Thus, you can set the attribute "vac_cost_delay" to 0 to get quick autovacuums of your big tables, while still allowing you to set a system-wide vacuum_cost_delay for other smaller, less critical tables. It looks like if you manually kick off a vacuum, though, it still uses the system-wide defaults, instead of the values from pg_autovacuum (why?). Since you can set vacuum_cost_delay without reloading the server, if you need to do a manual vacuum, do a
SET vacuum_cost_delay = 0;
first (or something higher than 0 if you can't afford to peg your disk I/O), and then VACUUM (remembering to set vacuum_cost_delay back to what it was afterwards!). If you do this from the commandline, you might want to write a small wrapper script that will do this instead of running vacuumdb.

The lesson here? Always read the directions, kids.

Monday, June 25, 2007

Transformers: Members of the Coalition of the Willing?

I'm going to a conference tomorrow, and decided to check on the TSA's website to make sure I wasn't going to be breaking any of their wonderfully inane rules, like bringing 4 oz. of shampoo (horrors!) in my carry-on luggage.

I was quite surprised to find that they specifically allow "Toy Transformer Robots" (scroll down near the bottom). Even without that, Megatron would still be OK, because toy guns (so long as they don't look like real guns) are cool.

Furthermore, meat cleavers are prohibited by name in carry-on luggage (come on, you ban sabers and swords, ninja stars, and ice picks, and with all that, you still have to call out meat cleavers?!?!)

I'm glad our government is hard at work protecting us from Shampoo Bombers and insane butchers, but alas, they are falling behind in preventing the impending robot invasion!

Thursday, June 21, 2007

Unreal

If anyone ever doubts that the Internet can truly be a powerful democratizing force in the world, where the average person can say something and have it matter, check this out.

I started this blog last week. I've never blogged anywhere before, and a search for my name on Google isn't going to bring up any significant hits to me (except now for this post I'm about to talk about!). In other words, I'm not a "big voice" on the Internet.

A few days ago, I posted my third blog post ever to this free Blogger account. I wrote about how I liked David Weinberger's book Everything is Miscellaneous, and made an observation about how the themes he develops tie into what I work with, namely the human genome. Nothing big, maybe a little insightful (I thought it was neat, anyway). I wasn't really writing "for" anyone... this blog is just a place I can write some of my own thoughts down, and if that might be useful or interesting to someone somewhere, then all the better.

Today I'm sifting through my newsfeeds, and I see that David Weinberger has linked to my post on the main page of his book's website.

Think about that for just a second.

Thanks to the infrastructure that has been built up surrounding the Internet (Google indexing, Technorati blog indexing, folksonomic tagging, etc.), the words that I wrote were found and read by the author of the book I was talking about. This isn't a top-down organization, either: there aren't professional indexers, catalogers, and abstractors out there reading and organizing everything that gets published online. This is truly bottom-up organization, growing organically out of the miscellaneous pile of information we're growing online: the content, the usage patterns, the metadata—everything. Nobody needs to see that "Ah, Christopher Maier has published a post on "Everything Is Miscellaneous." We need to properly file his post in the "Everything is Miscellaneous" bin (or was it the "genomics" bin, or...)". Furthermore, very few people, in the grand scheme of things, are going to particularly care that I've done such a thing. However, for the people that would care about it and are looking for something about Everything Is Miscellaneous, or genomes or whatever else I talk about, this infrastructure presents it to them, as if by magic.

It is difficult, if not downright impossible, to see kind of thing happening prior to the advent of the Internet. And it's really exciting to see where this will ultimately lead.

Monday, June 18, 2007

Rodenbach

I recently discovered the joy of Flemish sour ale. That's some damn fine beer.

Sunday, June 17, 2007

The Genome Is Miscellaneous

Hopefully by now you have read David Weinberger's Everything Is Miscellaneous: The Power of the New Digital Disorder. It's quite an interesting and absorbing read, one of those books that makes you look at the world just a bit differently. I seem to be doing that an awful lot lately, finding unexpected applications of Weinberger's thesis all over the place. The latest? The human genome!

The ENCODE Project just published its findings from a detailed investigation of 1% of the human genome, and it looks like it's waaaaaaaaaay more complex and interesting than we thought. There's the main article (DOI: 10.1038/nature05874) in the current issue of the journal Nature, and a whole slew of additional articles in this month's Genome Research. I've been working through Gerstein, et al.'s What is a gene, post-ENCODE? History and updated definition (DOI: 10.1101/gr.6339607) for a very absorbing look at how our notion of a "gene" has changed dramatically in the years since Mendel and his peas, and where our understanding of "gene" stands in light of this exciting new data from ENCODE.

It looks like the genome, far from being a nicely organized library of genetic building blocks, is a messy snarl of bits of coding DNA, all mixed up together in a pile. There is of course some physical structure to it all, but it seems pretty well jumbled up; the parts of a gene don't even need to be on the same chromosome. It reminded me of Weinberger's big miscellaneous pile, into which all our information goes, waiting to be organized by users and searchers according to their needs and desires. In the Miscellaneous Genome, the users and searchers are the complex regulatory networks of the cell, which seek out and assemble the bits they need to create the machinery and processes of life. They know how to read the genomic metadata that we are trying to grasp; once we can read the metadata, we'll be able to sift through the Miscellaneous Genome with ease.

Go read the book; go read the articles. Good stuff.

Tuesday, June 12, 2007

Postgres 8.2.4 Segmentation Fault on Mac OS X

I've been having an annoying segmentation fault with the recent install of PostgreSQL on Mac OS X. This happens whenever I quit psql after changing to a different database.

psql(336) malloc: *** error for object 0x1811000: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug
psql(336) malloc: *** set a breakpoint in szone_error to debug
Segmentation fault


Looks like others have run into this as well:
http://www.entropy.ch/phpbb2/viewtopic.php?p=10266
http://archives.postgresql.org/pgsql-hackers/2006-11/msg00331.php

Apparently it has something to do with readline libraries.... not sure exactly what, though. It's not a deal-breaker or anything, just annoying.

Sunday, June 10, 2007

Concatenating PDFs

A while back I downloaded the Basic Cryptanalysis Army Field Manual from the University of Michigan. The manual is available as a PDF-per-chapter, but I'd like to have the entire manual as one complete PDF.

Then I found out that Mac OS X already has this capability. With a pointer from this site I put together this command to create my single PDF:

/System/Library/Automator/Combine\ PDF\ Pages.action/Contents/Resources/join.py -o military_cryptanalysis.pdf toc.pdf pref.pdf intro.pdf ch1.pdf ch2.pdf ch3.pdf ch4.pdf ch5.pdf ch6.pdf ch7.pdf ch8.pdf ch9.pdf ch10.pdf ch11.pdf ch12.pdf ch13.pdf ch14.pdf ch15.pdf appa.pdf appb.pdf appc.pdf appd.pdf appe.pdf appf.pdf gloss.pdf ref.pdf index.pdf

The resulting PDF is rather large (~31MB), so it seems that there could be some compression to be done. But, the point is you can concatenate PDFs easily right out of the box with OS X.