Coding Tank: 2009

Wednesday, June 17, 2009

When Not to Use the Ternary Statement

The ternary operator is designed as a shorthand way to assign a value to a variable based on the result of an expression. The benefit of this is more compact code than the corresponding if/then/else structure. The drawback is that it's slightly less readable. But used properly it can enhance the quality of your code.

An if/then/else statement reads much like an english sentence. The structure can be understood at a quick glance. On the other hand, the ternary operator requires at least a tiny bit thought before you understand what's happening. Some of that mental processing time can be mitigated by using the ternary operator consistently and only where it's appropriate so readers of your code can build familiarity with how you're using it. What is the appropriate way to use it?

I'm a supporter of keeping lines of code shorter than 80 characters. That idea is certainly up for debate and is the subject for another post but just for now let's assume that anything longer than 80 characters is going to wrap and will greatly decrease the readability -- not to mention beauty -- of your code. It makes sense then to keep your ternary statements under 80 characters. This can be difficult since you're dealing with four different entities: a variable, an expression, the true condition and the false condition. With all of these things having to fit on a single line each part needs to be extremely terse. If you want to maintain readability anyway. My rule of thumb is that if I can't keep a ternary statement under 80 characters then it gets converted into an if/then/else structure. Readability is more important than conciseness in my book. Let's look at some examples.

Good Example:


$foo = ($_GET['foo']) ? intval($_GET['foo']) : 0;

You might see this in a typical web application today. You want to initialize a variable to whatever was specified as a request parameter or a default value if a request parameter wasn't given.

Bad Example:


$foo = (count($articles) < $min_article_count) ? get_articles_by_genre($status, $author, $site_id, $genre) : get_hot_topics($board_id, $platform_id, $min_date);

You really have to comb through that statement understand what's going on. Every time someone comes across this in the code they're going to have to stop and figure it out. This example isn't even as half as bad as some of the stuff I've seen in the wild. The above would be better written like this.


if (count($articles) < $min_article_count) {
   $foo = get_articles_by_genre($status, $author, $site_id, $genre);
}
else {
   $foo = get_hot_topics($board_id, $platform_id, $min_date);
}

Big difference. The lengthy ternary statement when broken up into multiple lines in an if/then/else statement becomes much easier to read. You get a sense of the logic very easily.

Another Bad Example:


foreach (innitech_comment_operations($arg == 'approval' ? 'publish' : 'unpublish') as $key => $value) {
   // stuff here
}

Some programmers feel compelled to stuff as much logic into a single line as possible. I don't understand this.

Tuesday, May 5, 2009

Iterating Over Form Elements With JQuery

I recently was tasked with making JQuery iterate over the elements of a form. . The form would be specified by name and the script has to loop over all of the input elements. After some Googling and trial-n-error this is the solution I came up with.

$("form[name=foo] :input").each(function(i) {
      console.log($(this).attr('id') + " / " + $(this).val());
}

Instead of outputting to the console you'll probably want to do something more interesting with the results.

Trying to explain a JQuery selector string is a bit like trying explain how to improvise a solo in music. But I'll try anyway.

form
This will select all form elements in the DOM.

[name=foo]
This limits the search to only the form with a name attribute with a value of 'foo'.

:input
This selects all of the input elements of the previously selected form. An 'input' element in this case is any form control.

Monday, March 30, 2009

Being a Good Consumer

I'm in the "thinking" stage of a side project which has been on my mind for the past few weeks. A lot of ideas cross my mind but this one has stayed with me longer than any of the others. I've been programming long enough to know that thinking on this idea before jumping in and blindly coding will pay dividends later. In my thinking I find myself bouncing back-n-forth between two different approaches. Knowing some background about the problem will help understand my predicament.

The main source of data for my application will come from a 3rd party API. It has a REST-based interface which any one in the world has access to. It's kind of like the Twitter API but it's not Twitter. Hitting the API to grab data is easy if I'm only concerned with getting data for a few entities. The problem is when I'm dealing with tens of thousands of entities. How often can I query the API before I negatively impact the service? So I start thinking of ways to minimize my use of the API but still provide timely data.

Then I come back to one of my core philosophies of software development. That being "Don't Solve Problems That Don't Exist". I've been of projects in the past where people try to predict how the application will be used in the future and to code in functionality that may not be meaningful now and will be at some point in the future. The problem with that approach is that you can't predict how the application will be used or what uses will want out of it once they start using it in earnest. You can't second guess your users or usage patterns so it's futile to solve those problems before you know what they will be.

However, one of my other core philosophies is "Don't Be Dumb". It's important to design an application in an intelligent way so you can quickly respond to shifting demands. I don't have to solve every problem right now but I do need to be able to get the application back on track quickly when things blow up. I really bad design decision could be a total breakdown once usage reaches a certain level.

Adding new hardware isn't a viable solution to scaling since I expect the main bottleneck to be the 3rd party API. Much private testing will need to be done to determine if this really be the case or not.

In any case, it's time to stop thinking and start doing. However, I don't expect to release anything to the public until I've done enough testing and benchmarking to have a better idea of how the API will response to a large number of users.

Friday, March 20, 2009

UNIX Shell Tip: Alias Frequently Used Directories

If you work in UNIX you probably spend most of your time in a directory several levels deep. That may or may not be relative to your home directory. Perhaps you switch back and forth between Apache's configuration directory and your home directory several times a day. It would be nice not to have to type those directory names out all the time. Even with tab completion you still have to type at least a couple of characters per directory and that assumes you've memorized the paths.

For the directories you use the most often just alias them in your .bash_login file. Here's a few examples.
alias sb='cd /var/www/foo/app' alias mods='cd /var/www/foo/app/hosts/bar/modules' alias aconf='cd /etc/apache2/' alias logs='cd /opt/var/log'
Obviously, your own aliases should be suited to your own environment. This makes moving around the filesystem much easier.

Wednesday, March 18, 2009

Managing Users and Groups in OS X Leopard

OS X is basically a flavor of UNIX with a really nice user interface. Anything you can do in a typical UNIX environment you can do in OS X. Well, almost anything. One of the ways OS X differs quite a bit from the typical UNIX environment is in how it manages users and groups. You're not going to find useradd or usermod anywhere. Instead OS X keeps all of that information in a directory service called Open Directory. Open Directory is Apple's implementation of LDAP and is how the operating system manages users and network resources. It's only the users we're interested in here so that's all I'm going to discuss in this post.

How do you manage users and groups then? There's a command line utility that ships with OS X Leopard (and Tiger I believe) called dscl. I would assume this stands for Directory Service Command Line. Enter the following command to start dscl in interactive mode.

dscl .

Being a directory service, resources are arranged in a tree structure much the same way the filesystem is. Type ls to see what items exist at the root level. You should notice a 'Users' entry near the bottom of that list. Switch to the Users directory the same way you would on the filesystem.

cd Users

Do an ls again and you'll see a bunch of system accounts and near the bottom of the list will be a few account names you might be more familiar with. Pick your own username and switch into that directory.

cd codingtank

To see all the attributes associated with your account you enter the read command.

You should see what appears to be a bunch of name/value pairs. Take some time to read through this information. It's good to know what kind of information the directory stores about users.

How do you change the value for one of those attributes? For example, you've been using your Mac for so long that your account still uses the tcsh shell and you want to switch it to the more popular and modern bash shell. Since this involves modifying the directory you should start dscl with sudo. To change your shell using the interactive prompt you'd do something like this:

sudo dscl .
-create /Users/codingtank UserShell /bin/bash

The 'create' command will either create the attribute if it doesn't already exist or modify it if it does. The first argument is the user you're dealing with. The second argument is the attribute you're trying to add or change. The last argument is the value of that attribute. Alternatively you can do this with a single command:

sudo dscl . -create /Users/codingtank UserShell /bin/bash

Now we want our user account to be associated with the 'www' group. We're not going to modify the user in this case. We're going to modify the group and add our account to the list of members. The command for that change looks like this.

sudo dscl . -append /Groups/www GroupMembership codingtank

Use the 'append' command when you want to add something to an existing value instead of replacing it.

To learn more about dscl you can study the online man page. It provides a few usage examples at the end. Also, people who appear to know much more about dscl than myself have written some helpful articles on it.

Compiling and Installing MySQL 5 on Mac OS X Leopard - See section on creating MySQL group and user.

dscl at U Mac

Add a User From the OS X Command Line

Easing Into dscl

Sunday, March 15, 2009

The MySQL Query Log in OS X Leopard

I installed MySQL 5.1 Community Server on my MacBook Pro using the convenient installer provided by the folks at MySQL. Something you should be aware of when doing this is that it will not install a configuration file anywhere. You'll be running with defaults unless you create a (or copy one of the provided) my.cnf file in one of the appropriate locations.

Default options may be fine for most local workstation installations. However, I wanted to see the query log. The easiest way to enable the query log is to add an entry to your my.cnf file under [mysqld] that looks something like this.

log=/var/log/mysqld.log

I initially chose /var/log becuase that's where OS X keeps some of its other log files and I thought it would be nice to keep log files together in the same place. I restarted the MySQL server and ran a couple of queries but the log file never showed up. It turns out there's a problem with the permissions on the /var/log directory. It's owned by root:wheel and is only writable by the owner. The mysql user is not in the wheel group and thus cannot create files in this directory.

Given that OS X maintains this directory I know any changes I make to it might get reverted at any point in the future. So I chose to simply create my logs in another location. An easy candidate is /usr/local/mysql/data but you can use any directory choose if you don't want to mix data and logs.

Thursday, March 5, 2009

Own Your Data Model

Application frameworks can be divided up into two broad categories. Those where you define your own data model and those where the data model is provided for you. Examples are the former include Symphony, and Ruby on Rails. Examples of the later include Joomla and Drupal. Any web application framework where you do *not* create and own your own data model is probably *not* a framework you want to base a long-term business on.

The more generic a solution is the more unlikely it is to solve any one particular problem that deeply. It may solve of bunch of problems partially. But it does so superficially and it will never cover all of your requirements and use cases. It most certainly will not keep up with your changing business needs. With a generic solution you will find yourself shaping your site to fit the framework.

The people who design and write these packages are smart but they're subject to the same laws of software development as the rest of us. They still have to make assumptions about how the product will be used and what kind of users will be involved. They have to make assumptions about what kind of sites will be produced with their tools and how it may be customized and extended. Even with ones that are extensible there are still limits on just how far you can take your custom modules or plugins.

When starting a new business it can be very tempting to use one of these off the shelf solutions to get a product out the door quickly. You can do your due-diligence and figure that the product will meet all of your requirements and will only need a few customizations. What these people don't take into account is that the business will change. It will change is unpredictable ways. It will change faster than anticipated. Then you start running into the limitations of your off-the-shelf solution really fast. Then you start spending a lot of engineering time slogging through the framework or outright fighting against it. Short term gain; long term loss.

At the very least you need to go with a framework where you own your own data model. The data model is your most important asset. It is not an area to corners in or make compromises. Your data model informs the rest of your application. If your business is online then your database is the core of your business.

Chances are good that an application framework that lets you specify your own data model will also implement a decent MVC pattern. Conversely, the all-in-one systems aren't likely to have any approximation of MVC at all. The code you invest in those system means a bigger mess to deal with. That mess is going to happen sooner than you thing. Remember the business will change in unpredictable ways very fast.

Chances are also good that an application framework that lets you specify your own data model will also implement a decent testing framework and may even allow for easier scalability. Easier than the all-in-one systems anyway.

So, if you think you're smart and will save your project a lot of money and time by using a ready-made content management system which you can customize please please please reconsider. You're setting yourself up for a lot of headache in the near future.

Monday, March 2, 2009

New Job

I'm happy to say I'm done doing interviews for a while. I started a new job today.

The Good
Nice clean office furniture that appears to be fairly new.
A better than average chair
A group of guys who on first blush appear to be really cool.

The Bad
The standard issue Lenovo laptop with Vista. I'll be working on my personal Mac Book Pro thankyouverymuch.

Small company. Well all fit in the same room. Little to no privacy for the occasional personal phone call. Not that I make a habit of it but sometimes you gotta talk to the kids during the day. Ya, that's going to be weird.

Drupal is not exactly the epitome of object oriented design.

Thursday, February 12, 2009

SQL: Finding Duplicate Values

It turns out there's another use for the HAVING clause that I hadn't considered before. This came to me when taking a written quiz for a job interview.

"Write a query to find duplicate 'name' values in a 'user' table"

I had to think about this for a few minutes. It's really easy to suppress duplicates. But returning only duplicates is not so obvious. Here's the answer I gave.

select
   name,
   count(name) as name_count
from
   users
group by
   name
having
   name_count > 1;

After getting home a quick test with my local copy of MySQL revealed my answer was indeed correct. As an added bonus you also get the number of times each value appears.

Monday, January 26, 2009

Ruby or Python

I'm trying to figure out if I should learn Ruby or Python next. Both seem like popular languages and ideally I'd like to learn them both. But one has to come first. Since my current motivation for learning anything new is to make myself more valuable to potential employers I'm going to use that as the basis for my choice.

I loaded up Craigslist (SF Bay Area) this morning and performed a few searches.

All Jobs
Ruby = 125
Python = 133

This would seem to tip the scales in favor of Python. But I'm a web guy so I'm specifically interested in Internet Engineering jobs. Let's see how the numbers break down when I apply that filter.

Internet Engineering Jobs
Ruby = 45
Python = 32

That would seem to favor Ruby. Let's try the search terms for their respective web application frameworks.

Internet Engineering Jobs
Rails = 28
Django = 10

That seals it. I'm learning Ruby. Not that I'm trying to pit one language against the other. I'm just making a practical decision about which one to learn first. I've heard very positive things about both would like to eventually get into both.

Thursday, January 15, 2009

SQL Group By

Besides doing SQL joins the other thing you'll be asked in job interviews is to use a GROUP BY clause in some fashion. Even though GROUP BY is a rare sight in the wild, potential employers still want to know that you know how to use it.

Let's take a look at our standard job interview tables.

+---------------+--------------+
| Field         | Type         |
+---------------+--------------+
| employee_id   | int(11)      | 
| department_id | int(11)      | 
| name          | varchar(255) | 
| salary        | int(11)      | 
+---------------+--------------+

+---------------+--------------+
| Field         | Type         |
+---------------+--------------+
| department_id | int(11)      | 
| name          | varchar(255) | 
+---------------+--------------+

We want to list all all departments along with how many employees are in each one. Here's the answer.

select
   d.name,
   count(e.employee_id)
from
   department d
left outer join
   employee e using (department_id)
group by
   d.department_id;

To better understand how GROUP BY works it sometimes helps to construct the query without the COUNT function and GROUP BY clause to see what the raw result set looks like. In this case the query would look something like this.

select
   d.name,
   e.employee_id
from
   department d
left outer join
   employee e using (department_id);

Which produces a result set like the following.

department               employee id
-----------------------  --------------
accounting                1
accounting                2
accounting                3
accounting                4
accounting                5
accounting                6
human resources           7
human resources           8
human resources           9
human resources          10
human resources          11
human resources          12
human resources          13
human resources          14
information technology   15
information technology   16
information technology   17
information technology   18
information technology   19
information technology   20
marketing                21
marketing                22
marketing                23
marketing                24
research & development   [NULL]
engineering              25
engineering              26
engineering              27
engineering              28
engineering              29

Note, because we did a LEFT OUTER JOIN we got a row for "Research & Development"" even though there aren't any corresponding employees. Now, let's add the COUNT function and see what happens.

select
   d.name,
   count(e.employee_id)
from
   department d
left outer join
   employee e using (department_id);

This query results in...

department     count
------------   ------
accounting     29

We only get one result. That's because we told the query to count all the rows in the result set. In addition to the count we also told it to return the department name. Since d.name in this case is completely ambiguous MySQL just picks the first one from the uncounted/uncollapsed result set.

Now instead of counting all the rows we want to count the rows for each department. This is where GROUP BY comes into play.

select
   d.name,
   count(e.employee_id)
from
   department d
left outer join
   employee e using (department_id)
group by
   d.department_id;

This query results in...

department                 employee id
-----------------------    ------------
accounting                 6
human resources            8
information technology     6
marketing                  4
research & development     0
engineering                5

Now MySQL returns a row for each unique department. It also picks the department name from the first result in each group to return for the 'department' column.

It's layoff season and management wants to know which departments have people making six figures. Let's look at adding a WHERE clause.

select
   d.name,
   count(e.employee_id) as emp_count
from
   department d
inner join
   employee e using (department_id)
where
   e.salary >= 100000
group by
   d.department_id;

Results in...

department                        emp_count
-------------------------         ----------
accounting                        1
human resources                   1
information technology            1
engineering                       4

So almost every department has at least one person who makes 100K or more. That's not interesting. What management really wants to know is which departments have more than 2 people making six figures. If an interviewer asks you this chances are good they're slightly evil. Here's how you beat the evil.

select
   d.name,
   count(e.employee_id) as emp_count
from
   department d
left outer join
   employee e using (department_id)
where
   e.salary >= 100000
group by
   d.department_id
having 
   emp_count > 2;

Results in...

department              emp_count
-------------           ----------
engineering             4

The HAVING clause! It works in conjunction with with the GROUP BY to further filter the result set. The GROUP BY clause takes the raw result set and performs a calculation and filters the results to a smaller set of rows. The HAVING clause picks up reduced result set and applies an additional filter. The column used in the HAVING condition must be an explicitly selected column. In this case that's either d.name or emp_count.

Tip: It's a good idea to name your calculated columns to make them easier to reference later in the query if required.

That should give you enough ammunition to pass any GROUP BY questions an interviewer throws at you. If you do get a crazy question not covered in the article let me know if the comments and I'll add it. Good luck.

Monday, January 12, 2009

SQL Joins

If you are interviewing for a web engineering position you are guaranteed to be asked to write an SQL join or two. This is an easy way for the interviewer to verify that you really do know how to write SQL and aren't lying about it. When I learned SQL back in the old days (early 90's) we didn't have fancy join statements. We did everything in the WHERE clause and we liked it. Seriously, I liked it. I had a hard time adjusting to reading and writing join statements. But over the last couple of years I've become accustomed to it. Let's look at a few different types of joins you can do. First let's define a couple of tables. Here's are your standard job interview test tables. NOTE: This article is written from a MySQL point of view.


+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| employee_id   | int(11)      | NO   | PRI | NULL    | auto_increment | 
| department_id | int(11)      | NO   |     | NULL    |                | 
| name          | varchar(255) | NO   |     | NULL    |                | 
+---------------+--------------+------+-----+---------+----------------+

+---------------+--------------+------+-----+---------+----------------+
| Field         | Type         | Null | Key | Default | Extra          |
+---------------+--------------+------+-----+---------+----------------+
| department_id | int(11)      | NO   | PRI | NULL    | auto_increment | 
| name          | varchar(255) | NO   |     | NULL    |                | 
+---------------+--------------+------+-----+---------+----------------+

I populated these tables with some random data. Most employees have corresponding records in the department table but some don't. Additionally, there are some departments without any associated employees.

Let's look at a few questions and answers to demonstrate the various types of joins you can perform on these two tables.

List all employees along with their department.


select
   e.name,
   d.name
from
   employee e
inner join
   department d using (department_id)

This uses an 'inner join' since the result set will only include rows from both tables where department_id matches. We could have omitted 'inner' and just used 'join'. The two are equivalent. However, I prefer to use 'inner join' since it explicitly states what is happening.

List all employees along with their department. Include employees who don't have a department.


select
   e.name,
   d.name
from
   employee e
left outer join
   department d using (department_id)

When performing a join the 'left' table is the table specified in the 'from' clause. To include all rows from this table whether or not there is a matching record in the department table you need to use a 'left outer join'. Technically you could just write 'left join' but again I prefer the more verbose syntax since it leaves no question about the intent of the query.

List all employees along with their department. Also include departments which don't have any employees.


select
   e.name,
   d.name
from
   employee e
right outer join
   department d using (department_id)

Using a 'right outer join' will force all rows from the department table to be represented in the result set. Right joins in a production environment are extremely rare. Personally, I've never seen or used one in any of the code I've worked on in my 13 years of experience. The concept is confusing and performance is poor. I strongly recommend against using them. But in case you're asked to explain a 'right join' in an interview you now know.

I'm not going to discuss the 'full outer join' since MySQL doesn't support it and like the 'right join' you shouldn't be using it anyway. But it does what you would expect it to. Include rows from both left and right tables regardless if there's a match or not.

Something else you shouldn't ever do in MySQL is substitute 'cross join' for 'inner join'. Seriously, in MySQL the two are syntactically equivalent which makes no sense since the phrase 'cross join' has always meant a cartesian product in my experience.

Thursday, January 8, 2009

SQL Injection

SQL Injection is kind of old news. We've been hearing about it for years and things like cross-site scripting is more interesting now anyway. But how many of you have actually been paying attention to how SQL injections attacks happen? How many of you are actually doing something about it?

An SQL injection is when someone passes arbitrary SQL to your application. This usually happens through the html forms on your site. For example, you have a user login form where a user types in their email address and password. Without knowing anything about the database structure of the application I could enter something like this in the email field.

blah' or 'n'=='n

or worse...

'; drop table user;--

When your code drops those strings into the middle of an SQL statement bad things are going to happen. I'm not going to spend a lot of time explaining how to mount an SQL injection attack. People much smarter than me have already done so. Fire up Google and learn how it's done by the pros. The question is, are you protected? Here's how.

Sanitize Input
All data entering your database should be considered suspect. Any value than can vary in an SQL statement needs to be sanitized before it's used. Don't try to code this routine yourself. Use what the database vendor already gives you. If you're using MySQL then you're going to want to use mysql_real_escape_string()

Hide Error Messages
The public should never ever see error messages generated by the application server. Those are for developers to look at and no one else. These messages give away all kinds of information useful to attackers. Many times you're going to want to avoid even indicating an error occurred. This paper shows how an attacker was able to use the presence or absence of a 500 error page to know if they were able to generate valid SQL or not. How to handle errors and bad user input is entirely application dependent. Use your best judgement.

Limit Database Permissions
The web application should connect to the database as a user with a limited set of permissions. If the user anonymous you're going to want to use a read-only db user in that case. Again, this is application dependent. Use your judgment but think about database permissions. Think about it hard.

Parameterized Queries
I saved the best for last. In my experience 99% of all web application piece together SQL on the fly. This is bad for performance and even worse for security. I'm not judging because I've written tons of code like that myself. However, there is a better way. Using parameterized queries gives you some performance gains as well as a huge improvement in security. No longer can your original statement be modified into something you never intended.

Parameterized queries is where you construct your SQL statement using '?' placeholders for variable data and then bind the actual values to those placeholders. Once again, an example speaks far better than my prose. This is in PHP5.


$sql = "select first_name, last_name from user where email = ?";
$stmt = $mysqli->prepare($sql);

$email = "someone@yahoo.com";
$stmt->bind_param("s", $email);

$stmt->execute();
$stmt->bind_result($first, $last);

while($stmt->fetch()) {
 print("$first $last");
}

Monday, January 5, 2009

Cookie Security

Describe the security model for browser cookies?

That question was asked of me in a job interview recently. Unfortunately, I didn't have a good answer because I haven't actually worked with the details of creating and reading cookies for a long time. When you don't exercise a muscle it loses its tone. If you don't exercise knowledge it fades. That's what happened here. Unless you're coding new sites frequently you're probably not dealing with cookie security that often. It was time for me to hit the books and brush up on my knowledge.

So, what is the security model for browser cookies? Here goes.

First, browsers will only send cookies to the site which set the cookie. In this case we define "site" by a domain name and path. If a cookie is set with a domain of news.google.com then only news.google.com can read that cookie. A cookie set with mail.google.com can only be read by mail.google.com.

What if you want to share cookies among subdomains in a given domain? For example you want news.google.com and mail.google.com to read the same cookie. If the cookie is set to a subset of the fully qualified name then the cookie will be shared among any server whose tail matches the domain of the cookie.

For example: a cookie with a domain of google.com will be sent to mail.google.com as well as www.google.com and news.google.com.

Furthermore, a cookie with a domain of news.google.com will be sent to foo.news.google.com and foo.bar.news.google.com.

When setting the domain for a cookie you must specify a domain name and not just a top-level-domain. Meaning, you can't set a cookie to just .com or .edu. Browsers won't let you do this. But what about co.uk? That's a TLD but it has two parts so according to the specification it's fair game. Older browsers do in fact allow this to happen. However, newer browsers have restrictions to prevent cookies being tied to these particular two-part TLDs although each browser implements these restrictions a little differently.

So now we understand domains and tail matching. We can also restrict a cookie to a particular path on the server. For example, if a cookie's path is set to /blah/ it will only be available to requests with the /blah/ directory and any sub-directory such as /blah/hooga/.

There is one extra layer of built-in protection that's worth noting. When creating a cookie you can specify if it is to be sent only over HTTPS or not. However, since the majority of my work exists in the plain HTTP world this feature doesn't do me a lot of good.

We now understand the rules for which cookies get sent to which servers but there's a couple of other problems. First, cookies are sent as clear text across the net and can be eavesdropped on. Second, users can modify their cookies to contain values you may not want. How do we protect against eavesdropping and tampering?

Let's address the eavesdropping question first. We'll use two-way encryption to reduce the chance of eavesdropping. We're using two-way encryption because we want to send an encrypted value over the net but we need to decrypt the value so we can actually read it once the cookie has landed on the server. This doesn't make the cookie bullet proof because it can still be decrypted by a 3rd party with brute-force. Because of this and other reasons, you should store as little personal information as possible in cookies. If this cookie is intended to recognize a logged in user you may want to minimally include userid and ip address.

Moving on... let's say a cookie has been modified by whatever means and it may contain bad values. Values that might allow a user to impersonate other users. Like admins. We don't want that.

Here is where we're going to use one-way hashing on the cookie value itself. Let's use the logged in user example. We want to store username in a cookie for users who are logged in so the site will recognize them between sessions. We'll take the username, say, 'bob', add a salt to it and generate a hash using whatever algorithm floats your boat. Now we'll append the hash to the username and that becomes are full cookie value. Of course, you'll need a delimiter between the individual fields so you can pick them apart when you need to read it later. For example...

bob|cac991e4b010585f61ed2e40641ec77e

This is our basic cookie value. This is what we're going to encrypt and decrypt. Upon decryption we're going to rehash the username and make sure it matches up with the hash sent in the cookie. If the hashes match we're good cookie. If not, then you know something fishy is going on.

Coding Tank