Open Source Technology: May 2007

Thursday, May 31, 2007

PHP Session Security

1. Shared web servers—anyone else on the server can read your session files (typically in the /tmp directory) if PHP is running as an Apache module (so the session files belong to the web user) and possibly when PHP is used as a CGI (depending on how sessions are implemented).

Someone browsing the session files (probably) won’t know the site the server the sessions apply to (so may not be able to use a username / password combination they found) but you may still be putting sensitive info (like credit card details) somewhere for all to see. Plus they’ve got a list of valid session IDs…

If you’re just storing passwords in the session, you can get away with this by using md5() (preferably twice) to one-way encypt the password. This doesn’t help though if you need to recover the value of a session variable.

Using a custom session handler to store the sessions in a database is probably the best solution. You might consider MySQL HEAP tables if performance is an issue (assuming MySQL running on same machine as Apache). If it gets to very high traffic, it’s time to think about getting your own server…

2. XSS exploits (and session hijacking)—using JavaScript users can be fooled into giving away their active session_id.

All someone needs to “hijack” a session is the unique session id. It’s like the key to a railway station locker. The locker doesn’t check you’re the valid owner of the key, before allowing you to open it so anyone with the key can get in.

Research XSS and how to prevent it.

Accept that session hijacking cannot be entirely prevented (checks on IP address, for example, is foiled by AOL, who assign a new client IP on more or less every page request) so double check “critical actions” a user can perform when logged in e.g. when changing password—require the old password, which the session hijacker will (hopefully) not know. Displaying credit card infomation—do like Amazon and only display the last four digits. Basically limit the damage someone can do if they hijack a session.

3. Session IDs in URL (and hijacking)—if you’re using session IDs in the URL (as opposed to a session cookie), make sure offsite links do not contain the session ID (or the remote site will be able to hijack)—PHP should take care of this. Also your visitors may give away the session ID in the referrer field—ideally pass off site links through a redirect page, to elimate the referrer (although, unfortunately, some browsers keep the last 3 pages viewed I believe—unsure of facts).

Ideally, don’t pass session ids in the URL—require users to accept a cookie if they need to “log in”.

4. Session Fixation (pre-hijacking) (see http://www.acros.si/papers/session_fixation.pdf).

If you assign a session to a visitor to your site, before they are logged in (for example for clickpath analysis), make sure that you assign them a new session id when they do login, so that if someone pre-generated the initial session id for them, they won’t get the new ID.

For PHP 4.2.0+, see session_regenerate_id() (in particular the user submitted comments). For PHP < href="http://www.php.net/session_id">session_id() function may also be useful (haven’t explored it in this context myself).

5. Sniffing Packets (use SSL [HTTPS])—a session ID can be “sniffed” between the client and your server. If it’s a site where money is changing hands or other sensitive personal information is involved, SSL is a requirement.

Otherwise, without SSL, you have to live with the risk (just like you do every time you use that FTP client…).

6. Cookies are not for session data—on a related note, don’t use cookies for store sensitive information.

Cookie data, unlike sessions, gets stored on the client site. Apart from the “sniffing risk”, a large majority of Windows users have little idea of security and may be “owned by haxor”.

Otherwise, cookies (aside from session cookie PHP creates for you) are generally meant for long term (i.e. between visits) data persistance (e.g. “Remember Me”) rather than “active session” persistance.

There’s probably more things to watch out for (or facts to correct)—suggestions appreciated.

5 Traits of a Successful Project

Why do some companies, I.T. teams, or project leaders always seem to complete difficult implementations successfully while others struggle? The reason is that there are similar actions taken on most, if not all, successful technology implementations. Regardless of the development methodology employed, leaders should do the following to make sure every major project has a shot at success.

1. Balance demand with capacity.
One of the most important traits of all successful organizations is the balance between demand and capacity. Successful governance committees know with accuracy the available capacity of their technology implementation teams. When this committee commits to a project, it knows demand and capacity are balanced. In other words when it commits the troops, the committee recognizes it has sufficient resources to carry out the mission. Projects descend into chaos when demand exceeds capacity. Most of you, I wager, have witnessed the chaos of over committed implementation teams. Committees on the right side of this issue avoid creating a mess for themselves.

2. Dedicate resources the team can count on day in and day out.
Successful projects have resources the team can rely on. If a person is dedicated, for example, 50% of the time to the project, this doesn't mean 45% or 25%. It means the project leader knows he or she has 50% of that person's time - guaranteed. Before the successful implementation begins, the project manager details the type and number of the human resources required for the project. Then the organization provides those resources and keeps them dedicated.

3. Include skilled business analysts on the implementation team.
Successful implementations are based on a thorough business analysis of desired outcomes. Insightful business analysis relies on skilled and experienced investigators, whose curiosity drives them to discover the heart of an issue or problem and then participate in devising a solution. In-depth business analysis, at the conclusion of the implementation, leads to a "Wow!" from users of the new system, not an "Oh, that is not what we wanted." I predict that in the next several years implementation teams will routinely include business analysts who are certified by the International Institute of Business Analysis, because business analysis is fast becoming a profession, not a part-time job.

4. Rely on project managers that exemplify mature professionalism.
Successful implementations always have at their head experienced, mature project managers who know the science of project management and possess leadership skills to rally the troops. These leaders inspire confidence. They listen and get out from behind their desks. They make sure the project team is trained on and uses calibrated project management tools. Competent project managers know on any given day within 10% where a project is in terms of cost and progress. These project leaders serve as a hub for communications - sending information down from the governance committee and up from the implementation team.

5. Make fact-based decisions.
The one unvarying trait competent project leaders possess is honesty. They give truthful evaluations and are mature enough to make timely reports on bad news to the governance committee. This unflinching honesty makes it possible for the organization to kill "bad" projects before they waste resources and destroy morale. This is in itself a measure of success - limiting risk and loss to the enterprise. Successful project leaders have both responsibility and authority. For example, they have the authority to dedicate additional resources should that become necessary.

Successful governance committees are trained on and use portfolio management tools. This means that they have a window into the process and make fast effective decisions to stop small problems from becoming major ones. These committees are integrated into an effective two-way flow of information down to the project leaders and up from the implementation team. Fundamentally, it does not seem to matter whether an organization subscribes to agile project management, waterfall model, incremental, spiral, scrum, crystal, lean development, or the project management body of knowledge (PMBOK). Success truly rests on:

A governance committee that balances capacity with demand
Sufficient, dedicated, equipped, and experienced resources
Thorough business analysis
Mature, secure, honest project leaders who have responsibility and authority
Fact-based decisions

These organizations have done everything humanly possible to ensure that technology implementations either deliver business improvements or are cancelled before they waste precious resources.

Wednesday, May 16, 2007

14 rules for fast web pages

Steve Souders of Yahoo's "Exceptional Performance Team" gave an insanely great presentation at Web 2.0 about optimizing website performance by focusing on front end issues. Unfortunately I didn't get to see it in person but the Web 2.0 talks have just been put up and the ppt is fascinating and absolutely a must-read for anyone involved in web products.

His work has been serialized on the Yahoo user interface blog, and will also be published in an upcoming O'Reilly title (est publish date: Sep 07).

We have so much of this wrong at topix now that it makes me want to cry but you can bet I've already emailed this ppt to my eng team. :) Even if you're pure mgmt or product marketing you need to be aware of these issues and how they directly affect user experience. We've seen a direct correlation between site speed and traffic.

This is a big presentation, with a lot of data in it (a whole book's worth apparently), but half way through he boils it down into 14 rules for faster front end performance:

Make fewer HTTP requests
Use a CDN
Add an Expires header
Gzip components
Put CSS at the top
Move JS to the bottom
Avoid CSS expressions
Make JS and CSS external
Reduce DNS lookups
Minify JS
Avoid redirects
Remove duplicate scripts
Turn off ETags
Make AJAX cacheable and small

The full talk has details on what all of these mean in practice. The final slide of the deck is a set of references and resources, which I've pulled out here for clickability:

book: http://www.oreilly.com/catalog/9780596514211/
examples: http://stevesouders.com/examples/
image maps: http://www.w3.org/TR/html401/struct/objects.html#h-13.6
CSS sprites: http://alistapart.com/articles/sprites
inline images: http://tools.ietf.org/html/rfc2397
jsmin: http://crockford.com/javascript/jsmin
dojo compressor: http://dojotoolkit.org/docs/shrinksafe
HTTP status codes: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
IBM Page Detailer: http://alphaworks.ibm.com/tech/pagedetailer
Fasterfox: http://fasterfox.mozdev.org/
LiveHTTPHeaders: http://livehttpheaders.mozdev.org/
Firebug: http://getfirebug.com/
YUIBlog: http://yuiblog.com/blog/2006/11/28/performance-research-part-1/
http://yuiblog.com/blog/2007/01/04/performance-research-part-2/
http://yuiblog.com/blog/2007/03/01/performance-research-part-3/
http://yuiblog.com/blog/2007/04/11/performance-research-part-4/
YDN: http://developer.yahoo.net/blog/archives/2007/03/high_performanc.html
http://developer.yahoo.net/blog/archives/2007/04/rule_1_make_few.html

5 Ways People Screw Up AJAX

I had noticed that not many articles existed on the negative aspects/implementation of ajax so came up with this top 5 list of things people screw up when using ajax.

1. No back button!:
One of the most annoying things to a user is the inability to go backwards. They may visit a site, perform a few searches and want to go back 2 searches before. Some sites utilizing ajax make the simple task of going back extremely difficult and end up bringing the user back to the initial page they clicked on to go to the site function, thereby removing the user's history.

2. No more links:
As mentioned in item 4 if people can't find your site or a specific section of it you'll lose traffic. Poor implementations fetching all content dynamically via ajax requests do not allow the user to get a web link they can forward along or bookmark.

3. Over complication when it isn't needed
As with other technologies things can get more complicated than is really needed and people can get excited when a new technology comes out. Do you really need to ajaxify your contact form?

4. Removing site indexability:
Depending on how your dynamic content is implemented web spiders may have a hard time finding all of the content available on your site. This can happen when content is stored in a DB only accessible via AJAX and web service calls. If a crawler can't obtain your content, how are users supposed to find it?

5. Web Server connections increase:
One of the advantages is that ajax receives tiny responses when compared to large responses typically associated with classic web browsing. While this may reduce some bandwidth it may also fill up web server max connections and require a retweaking of your server, or worst case throwing in more hardware when implemented poorly. I'm not stating this is the case for most ajax implementations by any means, however more requests (either via polling or direct user requests) equals more connections on average per user which depending on your userbase can really add up.

Tuesday, May 8, 2007

The 7 myths about protecting your web applications

Today Web Applications are delivering critical information to a growing number of employees and partners. Most organizations have already invested heavily in network security devices, thus they often believe they are also protected at the application layer; in fact they are not.

Myth 1: IPS defeat application attacks

Intrusion Prevention Systems, initially developed to monitor and alert on suspicious activity and system behavior, are becoming widely deployed. IPS’s are useful to detect known attacks, but are inadequate to protect against new types of attack targeting the web applications and are often blind for traffic secured by SSL technology.

Myth 2: Firewalls protect the application layer

Most companies have deployed firewall technology to protect and control traffic in and out of the network. Firewalls are designed to control access by allowing or blocking IP addresses and port numbers. As well as firewalls are still failing to protect against worms and viruses, they are not suited to protect web applications against application attacks neither.

Network firewalls only protect or "validate" the HTTP protocol and do not secure the most critical part: the application.

Myth 3: Application vulnerabilities are similar to network and system vulnerabilities

A common problem in web applications is the lack of input validation in web forms. For example, a web form field requesting an email address should only accept characters that are allowed to appear in email addresses, and should carefully reject all other characters! An attacker might potentially delete or modify a database ‘safely’ hidden behind state of the art-Network Firewalls, IPS and web servers by filling in SQL query syntax in the unsecured email field and exploit a SQL Injection vulnerability!

Web application attacks are not targeting protocols, but target badly written applications using HTTP(s).

Myth 4: Network devices can understand the application context

To correctly protect web applications and web services, a full understanding of the application structure and logic must be acquired. Track must be kept of the application state and associated sessions. Different technologies, such as cookie insertion, automated process detection, application profiling and web single sign on technology are required to obtain adequate application protection.

Myth 5: SSL secures the application

SSL technology is initially developed to secure and to authenticate traffic in transit. SSL technology protects against man-in-the-middle attacks (eaves dropping) or data alteration attacks (modifying data in transit), but do not secure the application logic.

Most vulnerabilities found in today’s web servers are exploitable via unsecured HTTP connections as well as via ‘secured’ HTTPS connections.

Myth 6: Vulnerability scanners protect the web environment

Vulnerability scanners look for weaknesses based on signature matching. When a match is found a security issue is reported.

Vulnerability scanners work almost perfect for all popular systems and widely deployed applications, but prove to be unable at the web application layer because companies do not use the same web environment software, most of them even opt for creating their own web application.

Myth 7: Vulnerability assessment and patch management will do the job

While it is often required to have yearly security assessments performed on a web site, the common web application life cycle requires more frequent security reviews. As each new revision of a web application is developed and pushed, the potential for new security issues increases. Pen Test or Vulnerability assessments will ever be out of date.

Furthermore, it is illusive to think that Patch Management will assist to rapidly respond to the identified vulnerabilities.

Real life

Web applications are currently proving to be one of the most powerful communication and business tool. But they also come with weaknesses and potential risks that network security devices are simply not designed to protect.

Key security concepts such as Security Monitoring, Attack Prevention, User Access control and Application Hardening, remain true. Since the web application domain is so wide and different, these concepts need to be implemented with new “application oriented” technologies.

Web Site Usability Checklist

Web Site Usability Checklist 1.0

Site Structure:
· Does everything in the site contribute to the purpose of the site?
· Is the overall site structure confusing, vague, or seemingly endless?
· Is the overall site structure capable of being grasped?
· Does it have definite boundaries or does it seem endless?
· Does the user have some feedback about where he is in the site?
· Is the site too cluttered (information overload) or too barren (information underload)?
· Is the most important content displayed in a more PROMINENT manner?
· Are the more frequently used functions more PROMINENT on the site?
· Does the site use technologies that lend themselves to the web (such as graphics, sound, motion, video, or other new technology)?
· Does the site use advanced technologies only in manner that enhances the purpose of the site?
. Does the site have too many useless bells and whistles?)
· Is the site so aesthetic (or comedic, etc) that it distracts from the overall site purpose?
· Is it clear to the novice how to move within the site?
· Is the site so narrow and deep that the user has to keep clicking through to find something, and gets lost?
· Is the site so broad and shallow that the user has to keep scrolling to find something?
Content:
· From the viewpoint of the user, is the site full of trivial content or vital content?
· Is the overall purpose of the site muddy or clear?
Usual purposes:
1) to exchange money for a product or service or
2) educate about someone or something.
· Does the site use words, abbreviations, or terms that would be unfamiliar to a novice user?
· Does part of the site establish the creditability, trustworthiness, or honesty of the owners when necessary?
· Does the site allow for suggestions and feedback from the users?
· Does the site allow for the users to communicate with each other via chat rooms or internal newsgroups thus creating a sense of community?
Readability:
· Is the text easy to read?
· Does the font style contribute to the purpose of the site without losing readability?
· Is there sufficient contrast between the text and the background?
· Is there too much contrast between the text and the background?
· Are the characters too small? Too large? Does the novice know how to change their size for easier reading?
· Do the colors enhance the user's experience while not sacrificing text legibility?
Graphics:
· Do the graphics contribute to the overall purpose of the site or distract from it?
· Do the images load quickly or does the user have to wait impatiently?
Speed:
· Is it hard to locate a target item, causing the user to lose patience and leave?
· For a large-content site, is there an internal search engine?
· Does the user have to go through too many steps to accomplish a task? (buying, joining, registering)?
· Does an expert user have options that allow them higher speed?
· Does the site designed using generally accepted human factors principles? (feedback, transfer of training, natural mapping, movement compatibility, cultural compatibility, logical compatibility, etc.)

Page design sometimes gets the most attention. After all, with current web browsers, you see only one page at a time. The site itself is never explicitly represented on the screen. But from a usability perspective, site design is more challenging and usually also more important than page design.

Once users arrive at a page, they can usually figure out what to do there, if only they would take a little time (OK, users don't take the time to study pages carefully, which is why we also have many usability problems at the page level). But getting the user to the correct page in the first place is not easy.

In a study by Jared Spool and colleagues, when users were started out at the home page and given a simple problem to solve, they could find the correct page only 42 percent of the time. In a different study by Mark Hurst and myself, the success rate was even lower; only 26 percent of users were capable of accomplishing a slightly more difficult task which, in the case of our study, was to find a job opening and apply for it (averaged across six representative corporate sites with job listings).

The reason for the lower success rate in our study relative to Jared Spool's study was not because we had picked particularly poorly designed sites; on the contrary, we were looking at sites from fairly large and well-respected companies. The difference in success rates was due to differences in the task complexity. The 42 percent success rate was the average outcome across a range of tasks where users were asked to find the answers to specific questions on a website-in other words, the exact task the Web is best for. In contrast, the 26 percent success rate was the average when users had to carry out a sequence of steps in order to complete the task of finding and applying for a job. If a user was prevented from progressing through any one of the individual steps, then he or she would not be able to perform the task. After all, you can't apply for a job if you can't find it. But it also does you no good to find a job posting if the application form is too difficult.

The problem is that web usability suffers dramatically as soon as we take users off the home page and start them navigating or problem solving. The Web was designed as an environment for reading papers, and its usability has not improved in step with the ever-higher levels of complexity users are asked to cope with. Therefore, site design must be aimed at simplicity above all else, with as few distractions as possible and with a very clear information architecture and matching navigation tools.

Top 5 javascript frameworks

5) Yahoo! User Interface Library

The Yahoo! User Interface (YUI) Library is a set of utilities and controls, written in JavaScript, for building richly interactive web applications using techniques such as DOM scripting, DHTML and AJAX. The YUI Library also includes several core CSS resources. All components in the YUI Library have been released as open source under a BSD license and are free for all uses.

Features

Two different types of components are available: Utilities and controls. The YUI utilities simplify in-browser devolvement that relies on cross-browser DOM scripting, as do all web applications with DHTML and AJAX characteristics. The YUI Library Controls provide highly interactive visual design elements for your web pages. These elements are created and managed entirely on the client side and never require a page refresh.

utilities available:

Animation: Create “cinematic effects” on your pages by animating the position, size, opacity or other characteristics of page elements. These effects can be used to reinforce the user’s understanding of changes happening on the page.
Browser History Manager: Developers of rich internet applications want bookmarks to target not just pages but page states and they want the browser’s back button to operate meaningfully within their application’s screens. Browser History Manager provides bookmarking and back button control in rich internet applications.
Connection Manager: This utility library helps manage XMLHttpRequest (commonly referred to as AJAX) transactions in a cross-browser fashion, including integrated support for form posts, error handling and callbacks. Connection Manager also supports file uploading.
DataSource Utility: DataSource provides an interface for retrieving data from arrays, XHR services, and custom functions with integrated caching and Connection Manager support.
Dom Collection:The DOM Utility is an umbrella object comprising a variety of convenience methods for common DOM-scripting tasks, including element positioning and CSS style management.
Drag & Drop: Create draggable objects that can be picked up and dropped elsewhere on the page. You write code for the “interesting moments” that are triggered at each stage of the interaction (such as when a dragged object crosses over a target); the utility handles all the housekeeping and keeps things working smoothly in all supported browsers.

Controls available:

AutoComplete: The AutoComplete Control allows you to streamline user interactions involving text-entry; the control provides suggestion lists and type-ahead functionality based on a variety of data-source formats and supports server-side data-sources via XMLHttpRequest.
Button Control: The Button Control provides checkbox, radio button, submit and menu-button UI elements that are more impactful visually and more powerful programmatically than the browser’s built-in form widgets.
Calendar: The Calendar Control is a graphical, dynamic control used for date selection.
Container: The Container family of controls supports a variety of DHTML windowing patterns including Tooltip, Panel, Dialog and SimpleDialog. The Module and Overlay controls provide a platform for implementing additional, customized DHTML windowing patterns.
DataTable Control: DataTable leverages the semantic markup of the HTML table and enhances it with sorting, column-resizing, inline editing of data fields, and more.
Logger: The YUI Logger provides a quick and easy way to write log messages to an on-screen console, the FireBug extension for Firefox, or the Safari JavaScript console. Debug builds of YUI Library components are integrated with Logger to output messages for debugging implementations.
Menu: Application-style fly-out menus require just a few lines of code with the Menu Control. Menus can be generated entirely in JavaScript or can be layered on top of semantic unordered lists.

Download and more information: here

4) Prototype

Prototype is a JavaScript Framework that aims to ease development of dynamic web applications.

Featuring a unique, easy-to-use toolkit for class-driven development and the nicest Ajax library around, Prototype is quickly becoming the codebase of choice for web application developers everywhere.

Features

Easily deploy ajax applications: Besides simple requests, this module also deals in a smart way with JavaScript code returned from a server and provides helper classes for polling.
DOM extending: adds many convenience methods to elements returned by the $() function: for instance, you can write $(’comments’).addClassName(’active’).show() to get the element with the ID ‘comments’, add a class name to it and show it (if it was previously hidden).
Utilizes JSON (JavaScript Object Notation): JSON is a light-weight and fast alternative to XML in Ajax requests

Download and more information here

3) Rico

Designed for building rich Internet applications.

Features

Animation Effects: provides responsive animation for smooth effects and transitions that that can communicate change in richer ways than traditional web applications have explored before. Unlike most effects, Rico 2.0 animation can be interrupted, paused, resumed, or have other effects applied to it to enable responsive interaction that the user does not have to wait on
Styling: Rico provides several cinematic effects as well as some simple visual style effects in a very simple interface.
Drag And Drop: Desktop applications have long used drag and drop in their interfaces to simplify user interaction. Rico provides one of the simplest interfaces for enabling your web application to support drag and drop. Just register any HTML element or JavaScript object as a draggable and any other HTML element or JavaScript object as a drop zone and Rico handles the rest.
AJAX Support: Rico provides a very simple interface for registering Ajax request handlers as well as HTML elements or JavaScript objects as Ajax response objects. Multiple elements and/or objects may be updated as the result of one Ajax request.

Download and more information here

2) Qooxdoo

qooxdoo is one of the most comprehensive and innovative Open Source multipurpose AJAX frameworks, dual-licensed under LGPL/EPL. It includes support for professional JavaScript development, a state-of-the-art GUI toolkit and high-level client-server communication.

Features

Client detection: qooxdoo knows what browser is being used and makes this information available to you.
Browser abstraction: qooxdoo includes a browser abstraction layer which tries to abstract all browser specifics to one common “standard”. This simplifies the real coding of countless objects by allowing you to focus on what you want and not “how to want it”. The browser abstraction layer comes with some basic functions often needed when creating real GUIs. For example, runtime styles or positions (in multiple relations: page, client and screen) of each element in your document.
Advanced property implementation: qooxdoo supports “real” properties for objects. This means any class can define properties which the created instances should have. The addProperty handler also adds getter and setter functions. The only thing one needs to add - should you need it - is a modifier function.
Event Management: qooxdoo comes with its own event interface. This includes event registration and deregistration functions.
Furthermore there is the possibility to call the target function in any object context. (The default is the object which defines the event listener.) The event system normalizes differences between the browsers, includes support for mousewheel, doubleclick and other fancy stuff. qooxdoo also comes with an advanced capture feature which allows you to capture all events when a user drags something around for example.

Download and more information here

1) Dojo

Dojo allows you to easily build dynamic capabilities into web pages and any other environment that supports JavaScript sanely. You can use the components that Dojo provides to make your web sites more usable, responsive, and functional. With Dojo you can build degradable user interfaces more easily, prototype interactive widgets quickly, and animate transitions. You can use the lower-level APIs and compatibility layers from Dojo to write portable JavaScript and simplify complex scripts. Dojo’s event system, I/O APIs, and generic language enhancement form the basis of a powerful programming environment. You can use the Dojo build tools to write command-line unit-tests for your JavaScript code. The Dojo build process helps you optimize your JavaScript for deployment by grouping sets of files together and reuse those groups through “profiles”.

Features

Multiple Points Of Entry: A fundamental concept in the design of Dojo is “multiple points of entry”. This term means that Dojo should work very hard to make sure that users should be able to start using Dojo at the level they are most comfortable with.
Interpreter Independence: Dojo tries very hard to ensure that it’s possible to support at least the very core of the system on as many JavaScript enabled platforms as possible. This will allow Dojo to serve as a “standard library” for JavaScript programmers as they move between client-side, server-side, and desktop programming environments.
Unifies several codebases: builds on several contributed code bases (nWidgets, Burstlib, and f(m)).

Download and more information here

Web 2.0 Threats and Risks for Financial Services

Web 2.0 technologies are gaining momentum worldwide, penetrating in all industries as enterprise 2.0 applications. Financial services are no exception to this trend. One of the key driving factors behind penetration of Web 2.0 into the financial services sector is the “timely availability of information”. Wells Fargo, Merill Lynch and JP Morgan are developing their next generation technologies using Web 2.0 components; components that will be used in banking software, trading portals and other peripheral services. The true advantage of RSS components is to push information to the end user rather than pull it from the Internet. The financial industry estimates that 95% of information exists in non-RSS formats and could become a key strategic advantage if it can be converted into RSS format. Wells Fargo has already implemented systems on the ground and these have started to yield benefits. Financial services are tuning into Web 2.0 but are simultaneously exposing their systems to next generation threats such as Cross site Scripting (XSS), Cross Site Request Forgery (CSRF) and Application interconnection issues due to SOA.

With regard to security, two dimensions are very critical for financial systems – Identity and Data privacy. Adopting the Web 2.0 framework may involve risks and threats against these two dimensions along with other security concerns. Ajax, Flash (RIA) and Web Services deployment is critical for Web 2.0 applications. Financial services are putting these technologies in place; most without adequate threat assessment exercises. Let’s look at threats to financial services applications using Web 2.0.

Cross site scripting with Ajax

In the last few months, several cross-site scripting attacks have been observed, where malicious JavaScript code from a particular Web site gets executed on the victim’s browser thereby compromising information on the victim’s system. Poorly written Ajax routines can be exploited in financial systems. Ajax uses DOM manipulation and JavaScript to leverage a browser’s interface. It is possible to exploit document.write and eval() calls to execute malicious code in the current browser context. This can lead to identity theft by compromising cookies. Browser session exploitation is becoming popular with worms and viruses too. Infected sessions in financial services can be a major threat. The attacker is only required to craft a malicious link to coax unsuspecting users to visit a certain page from their Web browsers. This vulnerability existed in traditional applications as well but AJAX has added a new dimension to it.

RSS injection

RSS feeds exist in Web 2.0 data format. This format can be pushed to the web application to trigger an event. RSS feeds are a common means of sharing information on portals and Web applications. These feeds are consumed by Web applications and sent to the browser on the client-side. Literal JavaScripts can be injected into RSS feeds to generate attacks on the client browser. An end user visits a particular Web site that loads a page with an RSS feed. A malicious script – a script that can install software or steal cookies – embedded in the RSS feed gets executed. Financial services that use RSS feeds aggressively can pose a potential threat to resource integrity and confidentiality. RSS readers bundled with applications run by end clients can cause identity thefts if they fail to sanitize incoming information.

Untrusted data sources

One of the key elements of Web 2.0 application is its flexibility to talk with several data sources from a single application or page. This is a great feature but from a security perspective, it can be deadly. Financial services running Web 2.0 application provides key features to users such as selecting RSS feeds, search triggers, news feeds, etc. Using these features end users can tune various sources from one location. All these sources can have different point of origin and are totally untrusted. What if one of these sources injects a hyperlink camouflaged as a malicious JavaScript code snippet? Applications that trust these sources blindly can backfire. Clicking a link can compromise the browser session and lead to identity theft. Dealing with untrusted sources in an application framework is a challenge on the security front.

Client-side routines

Web 2.0 based financial applications use Ajax routines to do a lot of work on the client-side, such as client-side validation for data types, content-checking, date fields, etc. Normally client-side checks must be backed up by server-side checks as well. Most developers fail to do so; their reasoning being the assumption that validation is taken care of in Ajax routines. Ajax has shifted a lot of business logic to the client side. This itself is a major threat because it is possible to reverse-engineer or decode these routines and extract internal information. This can help an attacker to harvest critical information about the system.

Widgets exploitation

Widgets are small components that can be integrated into an application very easily without obtaining actual source code. These widgets are offered as part of larger libraries or created by users and posted on the Internet. It is very tempting to use them to achieve short term goals. It must be kept in mind that it is possible that these widgets can be exploited by an attacker if they are poorly written. If financial applications use widgets then it must be made a focal point for analysis. Any weak spot in this widget can lead to script injection on the browser side. It is imperative to analyze the source code of the widget for viruses, worms or possible weaknesses.

Web Services enumeration

Web Services are picking up in the financial services sector and are becoming part of trading and banking applications. Service-oriented architecture is a key component of Web 2.0 applications. WSDL (Web Services Definition Language) is an interface to Web services. This file provides sensitive information about technologies, exposed methods, invocation patterns, etc. that can aid in defining exploitation methods. Unnecessary functions or methods kept open can spell potential disaster for Web services. Web Services must follow WS-security standards to counter the threat of information leakage from the WSDL file. WSDL enumeration helps attacker to build an exploit. Web Services WSDL file access to unauthorized users can lead to private data access.

XML poisoning and Injections

SOAP, XML-RPC and REST are the new standard protocols for information-sharing and object invocation. These standards use XML as underlying sources and financial applications use these standards for client-to-server or application-to-application communication. Not uncommon is the technique of applying recursive payloads to similar-producing XML nodes multiple times. An engine’s poor handling of XML information may result in a denial of services on the server.
Web services consume information and variables from SOAP messages. It is possible to manipulate these variables. For example, if 10 is one of the nodes in SOAP messages, an attacker can start manipulating this node by trying different injection attacks – SQL, LDAP, XPATH, command shell – and exploring possible attack vectors to get a hold of internal machines. XML poisoning and payload injections are another emerging threat domain for Web 2.0 financial applications.

CSRF with Web 2.0 applications

CSRF allows transactions to be carried out without an end user’s consent, making them one of the most effective attack vectors in financial applications. In Web 2.0 applications Ajax talks with backend Web services over XML-RPC, SOAP or REST. It is possible to invoke them using GET and POST methods. In other words, it is also possible to make cross-site calls to these Web services and in doing so, compromise a victim’s profile interfaced with Web services. CSRF is an interesting attack vector that takes on a new dimension in this newly defined endpoints scenario. These endpoints may be for Ajax or Web services but can also be invoked by cross-domain requests. Key financial transactions cannot depend simply on authenticated sessions, but must take extra care to process information, either by manually validating the password or by using CAPTCHA.

Conclusion

A lot more analysis needs to be done before financial applications can be integrated with their core businesses using Web 2.0. The Web security space is filling up with new attacks as we speak or offering new ways of delivering old attacks – both are dangerous where “monetary transactions” are involved. Here, we have seen just a small set of attacks. There are several other attack vectors with respect to Web 2.0 frameworks. A better threat model is required to undertake a thorough security analysis. Web 2.0 is a promising technology but also one that needs careful coding and usage practices prior to being consumed in applications.

Monday, May 7, 2007

Storing PHP Sessions in a Database

There are many reasons to utilize sessions when creating a web-based application using PHP. Session information, by default, is stored in a file on your web server. But what if that becomes a problem? In this article, I'll talk about why you might want to move your PHP sessions to a database, and show you how to do it.

So you've finished your largest project yet, a robust order-taking system for a very successful company. To give the best user experience possible, you made heavy use of sessions in order to keep information about the customer handy, from page to page, as any good PHP programmer would. You deliver the site to your customer and it goes live, to great fanfare. Traffic is slow at first, but as the customers begin to use it, traffic picks up and in a few months the site is serving over 5 million hits a day, between shoppers, search engines, and actual customers. The web server is dying under the stress, and the company turns to you to figure out how to increase their bandwidth.

A quick analysis of the database shows you that its usage is quite low. It's the actual load on the web server that's causing the issue. Your code is tight and uses caching where available (as is obvious by the low load on the database). The problem seems to be the sheer traffic on your web server. How do you cope?

The above problem is one we should all be so lucky to face: being overly popular. If your website is as wildly successful as the one in the above example, it means you should have the resources available to address this kind of issue.

And just how do you fix this problem? In our example, let's address the problem by adding web server(s). It doesn't matter if you round-robin them, or add them to a true load balancer. But adding additional servers will allow the load to be split among more than one machine, allowing each of them to serve data in a more efficient fashion.

This will solve the immediate load problem. But what will it do to your application?

For those of you who use sessions but have never written an application for multi-server distribution, you may be surprised to know that your sessions will fail miserably if left alone. Why is that?

Sessions, by default, write to a temporary file on the web server. So if you choose to store data in your session (a user's name, for example), it is available on any page just by reading from the session. This works great, until you bring more servers into the equation.

Think about this for a moment. Let's say you have three web servers, all with the same website on them. Furthermore, these web servers are set up as a round robin. A round robin means that when a new request comes in, it's handed to the next server in the series. So with three web servers, requests would be handled in the order of "1, 2, 3, 1, 2, 3, etc." This means that, as a visitor is surfing your site, they are potentially visiting different servers in the same session. With an intelligent load balancer, the handling of connections is not as crude, but it is still possible for a user to visit a different server with each click of the mouse.

So now, let's take the users on your site. As they click through your web pages, they will be moving from server to server. So if you saved something to a session variable while you were on server 1, it would not be available to you if your next click took you to server 3. This doesn't mean that you coded your application incorrectly, it merely means you need to reconsider the session configuration.

With more than one server hosting the same website, your options narrow. If you wish to keep using disk space to store your session information, then all of your web servers need to mount the same share, so they all have access to the file. Another option, and the one we are going to explore in better detail, is to store your session inside a database instead of on disk. This way, your session information is available no matter which web server you are on.

Luckily, PHP has a built-in ability to override its default session handling. The function session_set_save_handler() lets the programmer specify which functions should actually be called when it is time to read or write session information.

s I just said, the session_set_save_handler() function will allow us to override the default method of storing data. According to the documentation, here is the format for this function:

bool session_set_save_handler ( callback $open, callback $close, callback $read, callback $write, callback $destroy, callback $gc );

In our example, I'm going to create a session class that can be used to store information to the database instead of to a file.

For starters, create a new file called "sessions.php." Inside this file, put the following code:

class SessionManager {

var $life_time;

function SessionManager() {

// Read the maxlifetime setting from PHP
$this->life_time = get_cfg_var("session.gc_maxlifetime");

// Register this object as the session handler
session_set_save_handler(
array( &$this, "open" ),
array( &$this, "close" ),
array( &$this, "read" ),
array( &$this, "write"),
array( &$this, "destroy"),
array( &$this, "gc" )
);

}

In the above example, the SessionManager() class and its constructor are created. You will notice that instead of merely passing function names into the session_set_save_handler() function, I sent arrays allowing me to identify class methods as the intercepts for the session actions.

On the previous page, when we called session_set_save_handler() to override the session handling functions, the first two arguments passed were for the open and close logic. Let's add these methods to our new class, then take a closer look.

function open( $save_path, $session_name ) {

global $sess_save_path;

$sess_save_path = $save_path;

// Don't need to do anything. Just return TRUE.

return true;

}

function close() {

return true;

}

The above code should be added inside the SessionManager class we started on the previous page.

Taking a close look, you will see that, for the most part, we didn't do anything in the open and close methods. That would be because we will be writing our information to a database. In this lesson, I am assuming that the application you are adding this to ALREADY has an open database connection to your database. If this is the case, then the above code is good as it is. If not, then the open function would need to include code for creating your database connection, and the close would close said connection.

If we were writing a new file-handling session manager, the open function would handle opening the file descriptor. The close function would then close said file descriptor, to prevent data from being lost.

So far, so good. But before we get to the point where we are actually writing the session information into the database, we need some place to put it. In your application database, you will need to create a "sessions" table to store the session information. Here's the one I defined, in MySQL create format:

CREATE TABLE `sessions` (

`session_id` varchar(100) NOT NULL default '',

`session_data` text NOT NULL,

`expires` int(11) NOT NULL default '0',

PRIMARY KEY (`session_id`)

) TYPE=MyISAM;

I would definitely suggest adding this table to your application's database. Not only does it keep it together with everything else, but it also makes it possible to share the same database connections for your sessions and your application itself.

Okay. We've intercepted PHP's session handling logic, and are systematically replacing it with our own. After the open and close, the next to areas to address are the read and write methods.

Let's first take a look at the read method:

function read( $id ) {

// Set empty result
$data = '';

// Fetch session data from the selected database

$time = time();

$newid = mysql_real_escape_string($id);
$sql = "SELECT `session_data` FROM `sessions` WHERE
`session_id` = '$newid' AND `expires` > $time";

$rs = db_query($sql);
$a = db_num_rows($rs);

if($a > 0) {
$row = db_fetch_assoc($rs);
$data = $row['session_data'];
}

return $data;

}

In the above example, you will see that I used functions called db_query(), db_num_rows(), and db_fetch_assoc(). These functions are from the application I wrote this class for.

But a close look will show you that when the function is called, the unique session identifier is passed along with it. I then query the database to see if I can find a record for that session that has not expired. If successful, you return the data to the calling program.

Now take a look at the code to write the data.

function write( $id, $data ) {

// Build query
$time = time() + $this->life_time;

$newid = mysql_real_escape_string($id);
$newdata = mysql_real_escape_string($data);

$sql = "REPLACE `sessions`
(`session_id`,`session_data`,`expires`) VALUES('$newid',
'$newdata', $time)";

$rs = db_query($sql);

return TRUE;

}

In the above example, you see that the write function is passed the unique session identifier, as well as the data to save to the database. One thing to note is what we are doing with the time. We grab the current time, then add to it the number of seconds that were defined in the constructor as lifetime. So basically, each time the data is written, we reset the timeout. So if your system is configured to expire sessions after 20 minutes of inactivity, this code supports it.

You will also notice that, when writing the database, we utilize the replace function instead of an insert. Replace works exactly like an insert if the record already exists, or only updates it.

And assuming all went well with the update, we return true.

You can't properly have your own session handler without writing code to clean up after yourself. The last two methods for our class will be the destroy and gc (garbage collection) methods.

When the session_destroy() function is called, it triggers a call to our destroy method. Here's a look at the code:

function destroy( $id ) {

// Build query
$newid = mysql_real_escape_string($id);
$sql = "DELETE FROM `sessions` WHERE `session_id` =
'$newid'";

db_query($sql);

return TRUE;

}

The above logic is fairly straightforward. The destroy method is called, passing the unique session identifier. We then make a call to delete the record. Boom, session information no longer exists.

Periodically, PHP will trigger a garbage collection routine, meant to handle the conditions where people left before the system got a chance to clean up their sessions. Here's the code:

function gc() {

// Garbage Collection

// Build DELETE query. Delete all records who have
passed the expiration time
$sql = 'DELETE FROM `sessions` WHERE `expires` <>

db_query($sql);

// Always return TRUE
return true;

}

In the above query, we delete all records that are expired, and SHOULD have been deleted by the destroy method, but the method was apparently never called. This garbage collection could be considered "self-maintenance," and stops you from needing to write a cronjob to keep your sessions table clean.

On the last few pages, we've talked about a bunch of little pieces that make up the sessions class. I thought I would take a minute and give you the complete class in one shot.

class SessionManager {

var $life_time;

function SessionManager() {

// Read the maxlifetime setting from PHP
$this->life_time = get_cfg_var("session.gc_maxlifetime");

}

function open( $save_path, $session_name ) {

global $sess_save_path;

$sess_save_path = $save_path;

// Don't need to do anything. Just return TRUE.

return true;

}

function close() {

return true;

}

function read( $id ) {

// Set empty result
$data = '';

// Fetch session data from the selected database

$time = time();

$newid = mysql_real_escape_string($id);
$sql = "SELECT `session_data` FROM `sessions` WHERE
`session_id` = '$newid' AND `expires` > $time";

$rs = db_query($sql);
$a = db_num_rows($rs);

if($a > 0) {
$row = db_fetch_assoc($rs);
$data = $row['session_data'];

}

return $data;

}

function write( $id, $data ) {

// Build query
$time = time() + $this->life_time;

$newid = mysql_real_escape_string($id);
$newdata = mysql_real_escape_string($data);

$sql = "REPLACE `sessions`
(`session_id`,`session_data`,`expires`) VALUES('$newid',
'$newdata', $time)";

$rs = db_query($sql);

return TRUE;

}

function destroy( $id ) {

// Build query
$newid = mysql_real_escape_string($id);
$sql = "DELETE FROM `sessions` WHERE `session_id` =
'$newid'";

db_query($sql);

return TRUE;

}

function gc() {

// Garbage Collection

// Build DELETE query. Delete all records who have passed
the expiration time
$sql = 'DELETE FROM `sessions` WHERE `expires` <>

db_query($sql);

// Always return TRUE
return true;

}

Throughout this article, I've talked about possible reasons for wanting your session information in a database. I've also showed you how to create a sessions class that would replace the built-in PHP session handling. Now I just need to show you how to implement it.

Let's go back to our example from the beginning of the article. You have an application built that uses session information heavily. Most likely, you have common functions and logic broken out into include files, including the database connection and session starting code. If you actually code the session_start() function in each file where you lose it, you'll have a bunch more work to do.

In any case, here's the code to add to your application in order to make PHP use your class instead. Find your session_start() code, and make it look like this:

require_once("sessions.php");
$sess = new SessionManager();
session_start();

So in the above example, we require the code. This adds the class declaration into memory. With the second line, we invoke the SessionManager() class, which in turn reworks the session_set_save_handler to call class functions rather than the default ones. And finally, the session_start() function is drawn, which begins using the new DB class right away.

For most structured code, the hardest part of the scenario would be to replace the db_() calls with calls from your own application's db api. Once your class is created it takes mere seconds to implement it to your code. Copy that code across all of your web servers, and they instantly begin using the database instead of files to read/write as default.

One last warning before I go. Depending on how your code is structured, PHP does not always automatically save any session data. To be certain you are retaining your session data all the time, be sure to call the session_write_close() function at the end of each page.

I hope the information I've shared with you during this article has, at a minimum, helped you understand how and why you would approach moving session data from files to a database. Whether you use the example code I've supplied, or create your own from scratch, its been my goal to educate you on the whys and hows, so you can make the best decisions for your own applications.

Thursday, May 3, 2007

Most Important Search Engine Ranking Factors

Improve your site’s search engines rankings and drive huge search engine traffic your way. But while search engine optimization is essential, it is important to know which factors actually influence search engine rankings. I read this amazing Search Engine Ranking Factors report which compiled the wisdom of 37 leaders in the world of organic search engine optimization. They voted on the various factors that are estimated to comprise Google’s ranking algorithm.

While I tried to grasp the concepts underlying these SEO factors, understanding the whole document takes time and effort. To make it simpler, the guys at Search Engine Journal compiled a quick reference (color chart) sorted by absolute rating of importance and not by category/type first as the original document.

4 factors which were rated high importance with full consensus were
* Keyword Use in Title Tag
* Global Link Popularity of Site
* Anchor Text of Inbound Link
* Link Popularity within the Site’s Internal Link Structure

Some factors of High importance with some consensus were
* Age of Site
* Topical Relevance of Inbound Links to Site
* Link Popularity of Site in Topical Community
* Keyword Use in Body Text
* Quality/Relevance of Links to External Sites/Pages
* Rate of New Inbound Links to Site
* Topical Relationship of Linking Page
* Link Popularity of Site in Topical Community
* Relationship of Body Text Content to Keywords (Topic Analysis)
* Keyword Use in H1 Tag
* Age of Document
* Amount of Indexable Text Content
* Topical Relationship of Linking Site
* Age of Link
* Text Surrounding the Link
Negative Factors
* External Links to Low Quality/Spam Sites
* Overuse of Targeted Keywords (Stuffing/Spamming)
* Content Very Similar or Duplicate of Existing Content in the Index

These SEO factors are really insightful in helping improve your rankings in SERP’s. If you carefully analyze these factors and apply them to your site, you should definitely see a rise in search engine traffic.