Monday, March 30, 2009

Being a Good Consumer

I'm in the "thinking" stage of a side project which has been on my mind for the past few weeks. A lot of ideas cross my mind but this one has stayed with me longer than any of the others. I've been programming long enough to know that thinking on this idea before jumping in and blindly coding will pay dividends later. In my thinking I find myself bouncing back-n-forth between two different approaches. Knowing some background about the problem will help understand my predicament.

The main source of data for my application will come from a 3rd party API. It has a REST-based interface which any one in the world has access to. It's kind of like the Twitter API but it's not Twitter. Hitting the API to grab data is easy if I'm only concerned with getting data for a few entities. The problem is when I'm dealing with tens of thousands of entities. How often can I query the API before I negatively impact the service? So I start thinking of ways to minimize my use of the API but still provide timely data.

Then I come back to one of my core philosophies of software development. That being "Don't Solve Problems That Don't Exist". I've been of projects in the past where people try to predict how the application will be used in the future and to code in functionality that may not be meaningful now and will be at some point in the future. The problem with that approach is that you can't predict how the application will be used or what uses will want out of it once they start using it in earnest. You can't second guess your users or usage patterns so it's futile to solve those problems before you know what they will be.

However, one of my other core philosophies is "Don't Be Dumb". It's important to design an application in an intelligent way so you can quickly respond to shifting demands. I don't have to solve every problem right now but I do need to be able to get the application back on track quickly when things blow up. I really bad design decision could be a total breakdown once usage reaches a certain level.

Adding new hardware isn't a viable solution to scaling since I expect the main bottleneck to be the 3rd party API. Much private testing will need to be done to determine if this really be the case or not.

In any case, it's time to stop thinking and start doing. However, I don't expect to release anything to the public until I've done enough testing and benchmarking to have a better idea of how the API will response to a large number of users.

Friday, March 20, 2009

UNIX Shell Tip: Alias Frequently Used Directories

If you work in UNIX you probably spend most of your time in a directory several levels deep. That may or may not be relative to your home directory. Perhaps you switch back and forth between Apache's configuration directory and your home directory several times a day. It would be nice not to have to type those directory names out all the time. Even with tab completion you still have to type at least a couple of characters per directory and that assumes you've memorized the paths.

For the directories you use the most often just alias them in your .bash_login file. Here's a few examples.

alias sb='cd /var/www/foo/app'
alias mods='cd /var/www/foo/app/hosts/bar/modules'
alias aconf='cd /etc/apache2/'
alias logs='cd /opt/var/log'

Obviously, your own aliases should be suited to your own environment. This makes moving around the filesystem much easier.

Wednesday, March 18, 2009

Managing Users and Groups in OS X Leopard

OS X is basically a flavor of UNIX with a really nice user interface. Anything you can do in a typical UNIX environment you can do in OS X. Well, almost anything. One of the ways OS X differs quite a bit from the typical UNIX environment is in how it manages users and groups. You're not going to find useradd or usermod anywhere. Instead OS X keeps all of that information in a directory service called Open Directory. Open Directory is Apple's implementation of LDAP and is how the operating system manages users and network resources. It's only the users we're interested in here so that's all I'm going to discuss in this post.

How do you manage users and groups then? There's a command line utility that ships with OS X Leopard (and Tiger I believe) called dscl. I would assume this stands for Directory Service Command Line. Enter the following command to start dscl in interactive mode.

dscl .

Being a directory service, resources are arranged in a tree structure much the same way the filesystem is. Type ls to see what items exist at the root level. You should notice a 'Users' entry near the bottom of that list. Switch to the Users directory the same way you would on the filesystem.

cd Users

Do an ls again and you'll see a bunch of system accounts and near the bottom of the list will be a few account names you might be more familiar with. Pick your own username and switch into that directory.

cd codingtank

To see all the attributes associated with your account you enter the read command.

You should see what appears to be a bunch of name/value pairs. Take some time to read through this information. It's good to know what kind of information the directory stores about users.

How do you change the value for one of those attributes? For example, you've been using your Mac for so long that your account still uses the tcsh shell and you want to switch it to the more popular and modern bash shell. Since this involves modifying the directory you should start dscl with sudo. To change your shell using the interactive prompt you'd do something like this:

sudo dscl .
-create /Users/codingtank UserShell /bin/bash

The 'create' command will either create the attribute if it doesn't already exist or modify it if it does. The first argument is the user you're dealing with. The second argument is the attribute you're trying to add or change. The last argument is the value of that attribute. Alternatively you can do this with a single command:

sudo dscl . -create /Users/codingtank UserShell /bin/bash

Now we want our user account to be associated with the 'www' group. We're not going to modify the user in this case. We're going to modify the group and add our account to the list of members. The command for that change looks like this.

sudo dscl . -append /Groups/www GroupMembership codingtank

Use the 'append' command when you want to add something to an existing value instead of replacing it.

To learn more about dscl you can study the online man page. It provides a few usage examples at the end. Also, people who appear to know much more about dscl than myself have written some helpful articles on it.

Compiling and Installing MySQL 5 on Mac OS X Leopard - See section on creating MySQL group and user.

dscl at U Mac

Add a User From the OS X Command Line

Easing Into dscl

Sunday, March 15, 2009

The MySQL Query Log in OS X Leopard

I installed MySQL 5.1 Community Server on my MacBook Pro using the convenient installer provided by the folks at MySQL. Something you should be aware of when doing this is that it will not install a configuration file anywhere. You'll be running with defaults unless you create a (or copy one of the provided) my.cnf file in one of the appropriate locations.

Default options may be fine for most local workstation installations. However, I wanted to see the query log. The easiest way to enable the query log is to add an entry to your my.cnf file under [mysqld] that looks something like this.

log=/var/log/mysqld.log

I initially chose /var/log becuase that's where OS X keeps some of its other log files and I thought it would be nice to keep log files together in the same place. I restarted the MySQL server and ran a couple of queries but the log file never showed up. It turns out there's a problem with the permissions on the /var/log directory. It's owned by root:wheel and is only writable by the owner. The mysql user is not in the wheel group and thus cannot create files in this directory.

Given that OS X maintains this directory I know any changes I make to it might get reverted at any point in the future. So I chose to simply create my logs in another location. An easy candidate is /usr/local/mysql/data but you can use any directory choose if you don't want to mix data and logs.

Thursday, March 5, 2009

Own Your Data Model

Application frameworks can be divided up into two broad categories. Those where you define your own data model and those where the data model is provided for you. Examples are the former include Symphony, and Ruby on Rails. Examples of the later include Joomla and Drupal. Any web application framework where you do *not* create and own your own data model is probably *not* a framework you want to base a long-term business on.

The more generic a solution is the more unlikely it is to solve any one particular problem that deeply. It may solve of bunch of problems partially. But it does so superficially and it will never cover all of your requirements and use cases. It most certainly will not keep up with your changing business needs. With a generic solution you will find yourself shaping your site to fit the framework.

The people who design and write these packages are smart but they're subject to the same laws of software development as the rest of us. They still have to make assumptions about how the product will be used and what kind of users will be involved. They have to make assumptions about what kind of sites will be produced with their tools and how it may be customized and extended. Even with ones that are extensible there are still limits on just how far you can take your custom modules or plugins.

When starting a new business it can be very tempting to use one of these off the shelf solutions to get a product out the door quickly. You can do your due-diligence and figure that the product will meet all of your requirements and will only need a few customizations. What these people don't take into account is that the business will change. It will change is unpredictable ways. It will change faster than anticipated. Then you start running into the limitations of your off-the-shelf solution really fast. Then you start spending a lot of engineering time slogging through the framework or outright fighting against it. Short term gain; long term loss.

At the very least you need to go with a framework where you own your own data model. The data model is your most important asset. It is not an area to corners in or make compromises. Your data model informs the rest of your application. If your business is online then your database is the core of your business.

Chances are good that an application framework that lets you specify your own data model will also implement a decent MVC pattern. Conversely, the all-in-one systems aren't likely to have any approximation of MVC at all. The code you invest in those system means a bigger mess to deal with. That mess is going to happen sooner than you thing. Remember the business will change in unpredictable ways very fast.

Chances are also good that an application framework that lets you specify your own data model will also implement a decent testing framework and may even allow for easier scalability. Easier than the all-in-one systems anyway.

So, if you think you're smart and will save your project a lot of money and time by using a ready-made content management system which you can customize please please please reconsider. You're setting yourself up for a lot of headache in the near future.

Monday, March 2, 2009

New Job

I'm happy to say I'm done doing interviews for a while. I started a new job today.

The Good
Nice clean office furniture that appears to be fairly new.
A better than average chair
A group of guys who on first blush appear to be really cool.

The Bad
The standard issue Lenovo laptop with Vista. I'll be working on my personal Mac Book Pro thankyouverymuch.

Small company. Well all fit in the same room. Little to no privacy for the occasional personal phone call. Not that I make a habit of it but sometimes you gotta talk to the kids during the day. Ya, that's going to be weird.

Drupal is not exactly the epitome of object oriented design.