Restarting remote dekstop VNC server on OS X

October 27th, 2008

Remote desktop’s VNC interface on OSX is crashy. I find myself having to restart it by doing

/System/Library/CoreServices/RemoteManagement/ARDAgent.app/Contents/\
Resources/kickstart -restart -agent -vnclegacy on

When it ceases to be responsive. That makes it responsive again (when I re-connect via vnc). Now, is there any way to force it to remain responsive in the first place?

Memoization

October 11th, 2008

Memoization is a useful technique created courtesy by LISP people and
made known by Peter Norvig in _Paradigms of Artificial Intelligence
Programming_. The technique is using a hashing function to cache the
results of a slow to compute function (which may need to be computed
multiple times). A wonderful module exists called Memoize which
seamlessly implements the technique for Perl.

One application where caching is useful is DNS. While DNS queries are
already cached by the server, server utilization/load and overhead can
be reduced by caching in the code. This is especially useful when
dealing with thousands of queries per second (e.g. monitoring a cluster
which emits periodic “heartbeat” messages) . Memoize::Expire makes
this especially easy by providing a way to “clean” the cache when needed.

Doing so is trivial. Here’s an example of caching reverse DNS:

1
2
3
4
5
6
7
8
9
10
11
12
use Memoize;
use Memoize::Expire;
use Socket;
 
tie my %cache 'Memoize::Expire',
        LIFETIME => 21600; # Cache for six hours
memoize('_reverse', SCALAR_CACHE => [ HASH => \%cache ]);
 
sub _reverse {
        my $ip = shift;
        return [gethostbyaddr(inet_aton($ip),AF_INET)]->[0];
}

Should you use Hadoop?

August 11th, 2008

There’s a lot of buzz right now about Map/Reduce, what it is supposed to be, why you should use it. Some see it as a sort of a panacea in solving all through for scaling, a general database, a general clustering or cloud computing system. In short, before you even begin thinking of Hadoop and Map/Reduce, ask yourself:

  • Is the data inherently impossible to treat as relational data without any drawbacks?
  • Can the data be handled by just a Perl script?
  • Is disk seek time the bottleneck (if it isn’t, Map/Reduce may be an solution but not in form of Hadoop, which through the use of Java, is not most efficient as a general “process-spawning” engine).

Map/Reduce with a distributed file system (and Hadoop as implementation of Map/Reduce layered on top of HDFS) is unique in that it is a way to distribute disk seek times, across a cluster of commodity hardware. Disk seek time is a resource for which parallelization on a single node (multiple cores, virtualization, threading) doesn’t work. What sort of tasks require disk seek times? Data processing and data access.

Map/Reduce, however, isn’t the only approach to data processing or distributing seek times. If the data is relational to start with, a much better approach is simply using a partitioned and replicated database cluster (MySQL or Postgres): write partitioning distributes seek times on writes, replication distributes seek times on reads. However, if your data is not relational to start with and can’t be inserted, rapidly, in a relational fashion (e.g. if you would need to log every single page view, or even emit multiple events per page view - thus you can’t afford to wait for a database statement which would insert a row (with a primary key value) into a table), you have to look elsewhere.

The simplest solution, however, if you need data logged rapidly and processed offline is to perform an asynchronous write (e.g. to syslog’s/syslog-ng’s UNIX domain socket, or implement UDP client/server) and then process the data from a Perl script executed through a cronjob. This solution can scale by storing the data on a file server or simply writing it to multiple log servers (built in feature of syslog, could also be done using UDP broadcast or multicast). The time when a Perl script won’t cut it, however, is when reading the data into from file and performing sorting/data distribution (to child processes for computation) takes so long, that by the time the script is finished the data is no longer useful.

If your data is stored in a relational database, using Map/Reduce would add redundant steps. If the database solution can’t scale (by slowing down response times and adding database contention), replace it by using hadoop (or another non-relational process) first and then bulk loading the processed data into the database (e.g. MySQL LOAD DATA INFILE).

If disk time itself is not a source of contention, but computation and/or network I/O is, it is better to distribute the task using threading or regular UNIX forking (if there’s intensive computation) or using event-loops (select() or epoll()) if there’s lot of network I/O (for a general discussion of concurrent network I/O, see The c10k problem. Java processes (which is what hadoop would use for every running map task) are fairly expensive to spawn and context switch through (compared to threads or traditional UNIX processes).

How to lose weight: iterative approach

August 5th, 2008

(Warning and a disclaimer: I am not a doctor or a personal trainer;this is a personal experience, do your own research and don’t just take my word for it. I am not responsible for any damage to your health you may incur as a result of my advice, nor do I guarantee my results will be duplicated).

There was an “Ask YC” post on Hacker News asking how to lose weight. I posted a response, describing my own experience. Since then I’ve decided to write more on the subject, perhaps tailoring this as a “weight loss for hackers” article.

From ~January 2006 to ~November 2007 I’ve lost over fifty pounds (going from 200+ lbs. to <150 lbs) The appearance difference had been noticeable and people (including waiters at restaurants I frequent) had been asking me what my secret is.

To put it simply, there is no secret other than excercise exercise. Is weight loss possible without exercise? May be, but for most, you’ll find that tactic actually more difficult than exercising. Dieting is tricky and one wrong move could put your body into a self-preservation mode where anything you eat turns into fat. Further more, a “hard-core” diet isn’t fun, you’ll constantly think about it. Should some changes made to what you eat? Since you’re already overweight, then likely yes (I will discuss them later on), but the most important change is to begin exercise*

From a hacker’s point of view weight gained or lost (which is a matter of a different sign of magnitude :-)) is a function of caloric intake and energy used. There’s the energy we expend by going about our daily chores and there’s additional energy we can expend through exercise. In the last fifty years, in the United States, we’ve undertaken a massive move: from walkable city to suburbs. I am not going to argue here whether this is or good or not (perhaps that deserves another post), but fact is for us to get to work or school requires less walking than it did fifty years ago. In addition, the work most of us do is less menial (and that is obviously a positive thing) and the previous menial household tasks are now more automated.

As a result, we have to either diet or exercise to maintain a constant weight (“stay in shape”). Both of these seem like daunting tasks, but the trick here is to break them down into smaller deliverable chunks, each one of them having its own visible results rather than hope for an overnight change (and burn out due to frustration).

Read the rest of this entry »

Odd Hadoop problem

June 29th, 2008

When running Hadoop against a relatively large (~100,000 file) dataset, I found that the userlogs/ directory, I found Hadoop would permanently enter a state where new tasks would be impossible to execute, issuing the following error and failing at all tasks (whether native Java tasks or streaming tasks):

ERROR org.apache.hadoop.mapred.TaskTracker:
Caught exception: java.net.SocketTimeoutException: timed out waiting for rpc response

After long searching, I’ve found that the issue is due to the “userlogs/” directory in $HADOOP_HOME/logs/ filling up. My guess is this is due to running out of available file descriptors (”df -i” didn’t yield any indication of running out of inodes). Simply removing the directory with “rm -rf” and restarting hadoop worked to fix it.

Emacs compile command

June 18th, 2008

I’ve found a snipet online that lets one use the M-x
compile
emacs command with custom language sensitive settings.
Unfortunately I don’t have the original URL that I found this on, but
here is the relevant snippet, with example settings for Perl and PHP.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
(require 'compile)
 
(defvar compile-guess-command-table
  '((cperl-mode . "perl -c %s")
    (php-mode . "php -l %s")))
 
(defun compile-guess-command ()
  (let ((command-for-mode (cdr (assq major-mode
				     compile-guess-command-table))))
    (if (and command-for-mode
	     (stringp buffer-file-name))
	(let* ((file-name (file-name-nondirectory buffer-file-name))
	       (file-name-sans-suffix (if (and (string-match ".[^.]*'"
							     file-name)
					       (&gt; (match-beginning 0) 0))
					  (substring file-name
						     0 (match-beginning 0))
					nil)))
	  (if file-name-sans-suffix
	      (progn
		(make-local-variable 'compile-command)
		(setq compile-command
		      (if (stringp command-for-mode)
			  ;; Optimize the common case.
			  (format command-for-mode
				  file-name file-name-sans-suffix)
			(funcall command-for-mode
				 file-name file-name-sans-suffix)))
		compile-command)
	    nil))
      nil)))
 
(add-hook 'cperl-mode-hook (function compile-guess-command))
(add-hook 'php-mode-hook (function compile-guess-command))

Back-propagation Neural Networks, Perl example

June 15th, 2008

I’ve been looking for a simple example of a Back Propagation Neural Network to base a class project on. The example I found was bpnn.py by Neil Schemenauer. A more thorough, Python-based introduction by IBM can be found here. For class (and personal) purposes, I’ve transliterated bpnn.py into Perl (see Bpnn.pm). Here is a quick example on how a five node Neural Network can be trained to recognize the Xor of two inputs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#!/usr/bin/perl -w
 
package XorBpnn;
 
use strict;
use warnings;
 
use lib qw(.);
 
use Bpnn;
 
use base qw(Bpnn);
 
sub new {
    my $class = shift;
    my $self = $class->SUPER::new(2,2,1);
 
    my $patterns = [
            [[0,0], [0]],
            [[0,1], [1]],
            [[1,0], [1]],
            [[1,1], [0]]
           ];
    $self->train($patterns);
 
    return $self;
}
 
sub round {
    my $number = shift;
 
    return int($number + .5 * ($number <=> 0));
}
 
sub calculate {
    my $self = shift;
    my ($x,$y) = @_;
 
    return round($self->update([$x,$y])->[0]);
}
 
1;
 
package main;
 
my $ann = XorBpnn->new();
 
print $ann->calculate(1,1) . "\n";
print $ann->calculate(0,0) . "\n";
print $ann->calculate(1,0) . "\n";
print $ann->calculate(0,1) . "\n";

New colocation

May 16th, 2008

Pretty much transitioned most hosted sites to a new machine collocated at Applied Operations - Layer42 datacenter. Big thanks go to their team in getting all of the gear setup properly. Transitioned from 32-bit FreeBSD 5.4-RELEASE to 64-bit Ubuntu Feisty. The general strlen.net site still goes to the FreeBSD machine, pending a transition (and possibly a re-write).