Chuck's Horde 4 Thoughts

Overall Design

http://www.25hoursaday.com/weblog/2008/08/04/[AvoidingTheSecondSystemEffectInSoftwareDevelopment](AvoidingTheSecondSystemEffectInSoftwareDevelopment).aspx

Controllers

Horde_Controller - make Horde_Controller_Dispatcher Horde_FrontController_Http or similar, then add other front controllers for Soap, XmlRpc, JsonRpc, Cli ...

User Interface

Get rid of popups. Any new window functionality should be in an ajax overlay, or a full new browser window.

API design

Documentation

Developer docs (PHPDoc alternatives)

Debug support

http://code.google.com/p/webgrind/
http://www.sitepoint.com/blogs/2008/05/13/useful-in-browser-development-tools-for-php/
http://badapi.trib.tv/
http://bergie.iki.fi/blog/sql-level\_debugging\_with\_midgard.html
http://code.google.com/p/formaldehyde/
Debug "wrapper" drivers - encapsulate another driver and delegate all calls, but provide before/after hooks for any function along with timing, profiling, reporting of calls and arguments, etc.

URLs

We should sign (with a timestamp and HMAC, per Horde::signQueryString) all URLs that perform destructive actions.

Configuration

use Horde_Policy
Allow conf.d directory styles, like Apache2 config (see http://bugs.horde.org/ticket/4747\).
Use return $... in PHP config files to avoid defining local-scope variables? (http://www.urdalen.com/blog/?p=257\)

Permissions

Profiling

Lots of overlap with debugging

http://docs.kohanaphp.com/libraries/profiler
See Zend_Db_Profiler, and idea for Cache profiler also, including Firebug plugins for both

Testing

Error handling

Jabber/XMPP support

http://www.xmpp.org/extensions/xep-0114.html

DFS/DHT uses

Idea sources

Don't get too caught up in everything everyone else is doing. However, some things that might be useful food for thought are listed below. Projects may be added to or discarded from this list quickly as they are synthesized.

Horde 4 Administratation

http://bergie.iki.fi/blog/asgard\_welcome\_page\_just\_got\_useful.html

Package structure

http://www.apsstandard.com/doc

Unsorted

http://incubator.apache.org/thrift/

http://www.gearmanproject.org/

http://codeigniter.com/wiki/Modular\_Extensions\_-\_HMVC/

http://ojay.othermedia.org/articles/keyboard.html

http://www.lkozma.net/autocomplete.html

http://writer.bighugelabs.com/

http://www.spread.org/

http://www.backhand.org/wackamole/

protocol-independent URLs:

http://nedbatchelder.com/blog/200710.html#e20071017T215538

return a 304 if the file hasn't been modified since the If-Modified-Since date

no point in resending all the data if the browser already has it cached

if (function_exists("apache_request_headers")) {

  $headers = apache\_request\_headers\(\);



  if \($headers\['If-Modified-Since'\]\) \{

      $ims = strtotime\($headers\['If-Modified-Since'\]\);

      if \($ims \>= $serve\_data\['modified\_time'\]\) \{

          Header \("HTTP/1.0 304 Not Modified"\);

          exit\(0\);

      \}

  \}

}

horde apps - "instance" of a horde app == installed horde app + a group of horde_policies that configure it

let those policies be named

instead of using shipping/foo api calls, use $instance->foo()

$Horde->api->method() (chaining)?

I just went through my first signup process that required an

SMS-capable device for confirmation. It also didn't make me pick my

credit card type, and instead used my country code (+1) to decide on a

card detection algorithm.

update_client.pl /modules/future_contribution /modules/future_signup

I think I found now the right mysql-server settings, with which the performance is quite Ok. Increasing the sort_buffer_size was one of the changes that helped.

skip-external-locking

skip-thread-priority

key_buffer = 64M

max_connections = 1024

max_connect_errors = 1000

max_allowed_packet = 8M

table_cache = 512

sort_buffer_size = 8M

read_buffer_size = 1M

read_rnd_buffer_size = 2M

myisam_sort_buffer_size = 64M

thread_cache_size = 50

query_cache_size = 128M

tmp_table_size= 1024M

thread_concurrency = 12

wait_timeout = 60

interactive_timeout = 60

log_slow_queries

add dynamic finders (find_by_name, find_by_id, etc.) to Rdo Mappers or Horde_Db_Model or whatever

Controller classes/objects vs. Action classes/objects vs. Resources vs. API

how to develop? give up central config?

http://www.w3.org/Provider/Style/URI

index.php - global dispatcher

how to do themes/custom templates? chain local -> app -> horde?

a horde 4 installation:

config/

lib/

apps/

public/ <- with app/ subdirs containing images, etc.

everything routable goes in apps/

apps/

help/

prefs/

admin/

etc...

... auto-install web files to a writable dir, either in web ui or in cli? keep apps self-contained that way?

app name is the first part of the route > /login

subdomain support

route aliases

an app should uncompress over a horde/ dir - /config/app/*.php -> config dir is compiled/cached

horde is not rails. it is designed as a container for multiple, collaborating apps

need Horde_Db, whatever implements DML, DDL, and SQL - Mad? MDB2?

prefer PHP over XML

merge Rdo and Mad into Horde_Db

allow for overriding the mappers so that non-SQL can be used, but, default to SQL/sqlite and leverage it

framework repository/module

Horde/lib/...

Rampage/lib/...

 use subpackages or multiple \*.xml for packages to avoid silliness?

apps should be installable into a horde container. shouldn't be tied to the app name - keep imp, krono, etc, but install as mail, cal, events (should be able to install two versions of krono w/ different permissions - see HordeSpaces)

installing gives a slug, that slug manages config, templates, themes, perms, etc.

figure out how to merge luxor into Chora

for now, build Horde_Content_* based on Rdo, then move to Horde_Db

Horde_Db provides Horde_Db_Mapper which creates Horde_Model_Base objects

apps have a config/ dir, but that's just defaults and defining base routes, polices, etc. user settings are stored in the db or a global directory.

should have parallel web and cli configuration and installation/update tools; web requires webserver to have write access to a config/ dir and to public/; cli tools do not (if run as another user)

Horde 4 app - a Horde 3.x app updated for PHP 5 and to use the latest libraries

Rampage app - "RAD" (rapid application development) MVC app that uses Horde 4

/horde/page/ -> dispatcher for Rampage modules w/ views (overridable), routes, controllers, etc.?

have generic views for rampage_login, rampage_admin_*, etc.

configuration:

config/routes.php

config/routes_local.php -> do this for all config files

Horde_Content_Index -> horde-wide search

Random Horde Ideas

mini-cms for building your own sidebar/menu/etc?

shortcuts to any bit of horde

labels labels labels

keywords also or just labels? probably just flexible labels

"smart folders"

Getting Things Done support? (other apps that do it - Tracks, Kinkless GTD, Midnight Inbox)

make mnemo into more of a snippet keeper? sort of like a personal cms - or wiki. carry the encryption feature through to other kinds of content

create an outliner!

tags/labels for mail

rename virtual folders to smart folders? too apple?

freetext boolean mail searches:

apples & oranges

apples | oranges

apples ! oranges (apples but not oranges)

apples & (oranges | lemons)

security of redirects:

http://www.xssed.com/mirror/39494/

This is sort of an interesting one. For the actual attack he merely figured out that we are base64 encoding the successurl and reflecting back whatever is there. The interesting thing is that merely filtering the unecoded data is not going to save us here. The string was javascript:alert(/XSS.By.Mityo/) and was being loaded into the URL field of a meta redirect. So our max filter of strip_tags is useless. It just illustrates the rationale for Phase 2 of the security build out where we have to be careful when we are dealing with redirects.

In this particular case, we need to make sure we are getting a valid URL format. That will prevent javascript insertions. But we also want to make sure the URL is not redirecting outside the intended domain for some phishing scam. In this case I will fix the problem by validating the URL on the cons/login.inc.php where the data is coming in but will also try doing it on the generic show_redirect_message() call if I think I can do so without breaking other pages.

Event-driven apps:

"Understanding and implementing this event model can free your application from the constraints of defined elements. For example, instead of applying an event listener for each link in a menu, you can assign a single listener to the menu item itself and retrieve the event target. That way you don?t need to change your script when the menu gets larger or when links get removed from it."

http://yuiblog.com/blog/2007/01/17/event-plan/

tagging/instant hierarchies as specialized permission-based search

RBAC

what is horde?

groupware?

horde data services?

horde data access?

ui layers

be the php dojo framework? or the php yui framework?

see http://tigermouse.epsi.pl/ ?

or, don't do desktop-like widgets? see UI design bookmarks

move away from gettext, at least as a default? midgard i18n notes:

http://www.midgard-project.org/discussion/developer-forum/midgard-s-multilang-support/

try to rely only on thread-safe extensions?

reduce dependency tree

avoid globals and non horde-namespaced functions/methods in framework and core app code

class-based registry apis

against edge cases: http://www.bakesalehq.com/contents/show/12/

features from Prado? http://www.urdalen.com/blog/?p=198

use functions where appropriate for shortcuts/helpers, like Mike's t("translated string") function? but would be horde_t? would call configured translation system

helper sets for dojo, protaculous, yui - simple functions like dojo_editor(), dojo_pane(), yui_map(), etc. Load with something like Horde/Layout/Helpers/YUI.php, etc. See http://www.ngcoders.com/projax/

Horde as a set of apps and methodology needs to pick a js lib, pick a template methodology, etc. - this is Rampage Horde as a framework can allow for flexibility

To make it even better, separate the control logic from the presentation. That way, back could be reverse, etc. I do this in all my forms since application logic and presentation "word play" are two distinct things to me. This is what I use:

</form>

Then, you can have a simple routine that captures submit actions regardless of the presentation value. You check for the array submit -- count 1 and whitelist against the acceptable values. A multi-row table can expand upon the theme by using this: submitedit_3, submitdelete_3m submitedit_5, etc.

caching

make sure Rdo and other services allow dropping in caching rules

http://sebastian-bergmann.de/pages/talks.html

phpunit - @test markup in methods

phpunit + selenium

cruise control?

really hope google will integrate any product of theirs with any other products of theirs? receive an email, transform it to document, add spreadheet, add notes, add bookmarks saved from search history and a link to an event in calendar anyone?

From nyphp-talk:

The other day I had to get an application started in a hurry.  It's

doing something useful at < 700 lines, but I'm considering options that

could grow it out to about 10 times that. It depends on a "core

library" that's < 500 lines. This library deals with common issues in

string handling, parameter handling, and HTML form generation.

About 10% of the application,  or 70 lines,  is a microframework

that's loosely built on Struts. About 20 of those lines are in 2

functions which would be generally useful for microframeworks (such as

file_exists_in_include_path()). Like Struts, the microframework

chooses an "action" based on form parameters: the action then chooses a

"view" -- a "view" is basically a template that a designer can edit

which can be supplemented by an optional "query" which pulls stuff out

of the database. Like Ruby-on-Rails, the microframework uses

convention instead of configuration: the dispatcher computes an "action

name" based on query parameters, and uses that to compute a

filename... It checks that the file exists and executes it with the

"require method".

The microframework uses no object-oriented techniques.  That's not

because I have any antipathy to OO, but because I didn't need it, and

I like writing my actions, queries, and views in a style that "feels

like PHP".

Yes, my microframework is nowhere near as powerful as [CakePHP](CakePHP) or

Symfony. Yet, it's more flexible, because I can codesign it with my

application. Because it's so simple, I can easily adapt it to do what

I want. If I decide I really hate it, I can write a new one in an

hour. I'm an expert on it, because I developed it, and I wouldn't

have to take on the technical, social and emotional burdens of

"forking" an open-source codebase if I wanted to make a change in direction.

I'm moving towards a vision of web app architecture where we move

towards shared vocabulary and standardized interfaces. Rather than

working with a "comprehensive framework" that does everything, I'd like

to have a "framework construction set" that contains a number of

elements that I can take or leave."

Resources:

http://www.ryandaigle.com/articles/2006/06/30/whats-new-in-edge-rails-activeresource-is-here

mixins: http://www.symfony-project.com/book/trunk/17-Extending-Symfony

split db ideas: http://pear.php.net/pepr/pepr-proposal-show.php?id=359

http://dataspill.org/pages/projects/ruby-activeldap

More php features to look in to:

__toString works everywhere

SPL features: Regex Iterators, SplFileObject CSV support, Caching Iterator

Data: stream support

DateTime and DateTimeZone classes

set date.timezone ini setting automatically based on user?

Search engine sitemap stuff - of use at all? maybe support in rampage cms

http://p7.hostingprod.com/@www.ysearchblog.com/archives/000437.html

I want a registration info tab like Inbox.lv where they can change their personal stuff they put on file with us on the signup forms.
We may need Windows address book synchronization(this is a feature that fastmail is adding, and hotmail already has, so I guess we will have to also?) It is not a must in my books.
I want to add a new feature next to the attach button that is like send message after attached, so if they are uploading a big file the can leave and it will be sent automatically.
We MUST have an easy user interface. Fastmail has lots of features and they try to make it where you can do everything in 2 clicks or less. We need to try to do this. Fastmail is all bunched up and looks like shit though. We need to make ours more of a packed with features like fastmail, but spread out like AOL has or fastmail. This will attract all the old people and beginners of the internet who have just gotten off of AOL and moved to DSL Fastmail looks like it is only made for advanced users and is hard to get used to. We need to

Have a main Navigation bar and which is on every page, which has all the mail icons that people use the most like, compose, inbox, addressbook, options, and the main Navagation bar should be on every page at the top.Then we wil have a subnavagation bar for each other page , for example, if you were to hit the calander icon on the main navagition bar that is on the top of EVERY page, then it would take you to the calander page and show you the calander and the subnavagation bar would have all the calander icons like add events ect. I was thinking, in IMP we could have the logo at the top left coner of the page, then on the top right we could have all the main navagation icons. Both the logo and the main navagitions would be o every sign page in IMP, so it would be easy to get around. Then the sub navagation bars coulkd go where the main navagition bar is now on IMP, understand?

Make a bounce button like fastmail.fm. This is how fastmail explains their bounce button:

'Bounce' takes the currently selected emails and sends back an email to the addresses the email(s) came from saying basically that 'the email address does not exist' in a standard internet email protocol way. Some more organised spammers remove these from their lists. After sending the bounce response, the messages are deleted."

If accessed with a browser, public folder is also a personal web-site, accessible at http://username.fastmail.fm
Provide tool allowing synchronization of Outlook Express etc address book with FastMail contacts, possibly using LDAP
Use JavaScript for browsers that support it to speed up many actions, such as searching through the address book
A general notification system, so you can send a pager message, SMS message, instant message, or short email

eGroupWare over Horde reasons

Linking: There is the "infolog" for linking items. An infolog item can be a to-do, call, or note. It can link to the addressbook, projects, calendar, or another infolog item. That is very flexible.

Access Control: Under Preferences, there is a "Grant Access" link for the calendar, addressbook, infolog, and projects. It allows you to select Read, Add, Edit, Delete, and Private access for each group and each user. Again, very flexible.

Categories: Multiple category selection is allowed in the addressbook, projects, calendar and infolog.

Custom Fields: I can create custom fields.

PHP_SELF

Executive summary: PHP_SELF intentionally includes extra URL garbage (or

valuable URL variables, take your pick) tacked on by the user. Don't use

it without knowing what it does.

Here's what you get when you hit the URL:

http://example.com/info.php/testing1?testing2 :

_SERVER["REQUEST_URI"] /info.php/testing1?testing2

_SERVER["PHP_SELF"] /info.php/testing1

_SERVER["SCRIPT_NAME"] /info.php

Get it? If you don't want that extra stuff tacked on by the user, use the

correct _SERVER variable. If you use REQUEST_URI or PHP_SELF, be aware the

user can affect the contents of that variable. 99% of the time, you want

SCRIPT_NAME, not PHP_SELF.

By the way, here's another test:

http://example.com/info.php/testing\<script>?testing :

_SERVER["REQUEST_URI"] /info.php/testing%3Cscript%3E?testing

_SERVER["PHP_SELF"] /info.php/testing<script>

_SERVER["SCRIPT_NAME"] /info.php

Note that the REQUEST_URI variable, which comes from Apache, is encoded,

while the PHP_SELF variable, which comes from PHP, is not. So PHP 5.2.0

still makes it possible to shoot yourself in the foot, and as I've pointed

out below, well-known PHP authorities actually recommend that you do so.

Here's the email that I sent at in July 2005:

Subject: Re: nyphp-talk $_SERVER['PHP_SELF'} not working?

Date: Friday 22 July 2005 12:05 pm

From: Michael Sims <jellicle@gmail.com>

To: NYPHP Talk <talk@lists.nyphp.org>

On Thursday 21 July 2005 17:16, Dan Cech wrote:

You could put:

$_SERVER['PHP_SELF'] = $_SERVER['SCRIPT_NAME'];

into one of your common include files.

Yes. I'm afraid I don't understand this entire thread. Apparently

because of the numerous PHP developer articles recommending it, and

because of the php.net page which for whatever reason lists it first on

the list of predefined variables, people are using PHP_SELF when they

really want SCRIPT_NAME. SCRIPT_NAME solves all the problems mentioned

in this thread - it's just the script name, without any extra garbage

that might be tacked on by the user. PHP_SELF explicitly includes that

extra garbage, so solutions in this thread that involve stripping the

garbage off of PHP_SELF to make it safe are really, really missing the

point - just use SCRIPT_NAME instead. Please don't use FORM ACTION="";

according to the spec, what the browser does with that is undefined, so

even if it works in current browsers, it might not work in future ones.

People can be forgiven for making this mistake -- I'm here holding my

copy of _Learning PHP 5_, and it recommends on page 8 and again on page

86 the use of PHP_SELF for self-referencing forms, ahem -- but it's time

to put it to bed: PHP_SELF is unsafe for any usage where it is echoed

back to the page.

SESSIONS:

I'll try to reply to this and some other people who replied to my previous message.

I'll start with my background. I've often been the person who the buck stops with --

somebody else develops an application that almost works (perhaps even puts it in

production) and then I have to clean up the mess. The app might be written in PHP,

Java, Cold Fusion, Perl, you name it. I've learned to see session variables as a "bad

smell".

When I develop my own applications, I use cookies for personalization and caching. I

use the authentication system described in

http://cookies.lcs.mit.edu/pubs/webauth:sec10-slides.ps.gz

this mechanism can carry a "session id", which in turn can be used a key against

application state stored in a relational database. I think through the boundary cases,

and find that my greenfield apps behave predictably -- my only woe is that you'll

discover that browsers have a lot of undocumented behavior connected with cookies, form

handling, and caching. All problems that you still need to fight with if you use

sessions, see the comments for

http://www.php.net/manual/en/function.session-cache-limiter.php

The context of this is that the average web application is poor in the areas of

usability and security: recent studies show that 80% of web applications have serious

security problems

http://www.whitehatsec.com/home/resources/presentations/files/wh\_security\_stats\_webinar.pdf

Jacob Nielsen's website has been chronicling the sorry state of web application

usability:

http://www.useit.com/

Perhaps the top 20% of programmers can write applications with $_SESSION that don't

have serious security and usability problems, but what about the other 80%?

(1) Session variables are treacherous. Odd things can happen in boundary cases, such

as when sessions expire, or when you are targeted by session fixation attacks.

http://shiflett.org/articles/security-corner-feb2004

I've looked at many apps that use sessions that seem to be working... Until you walk

away for two hours, come back, and discover that you're logged in as somebody else. I

suppose I could have spent hours or days tracking down an intermittent problem, which

involved some confluence of browser oddness (IE was fine, Firefox was screwy), the

behavior of the session system, and crooked logic in the application. Or I could use

cryptographically signed cookies to implement an authentication system which won't give

me surprises in the future.

Anybody can write applications that work 95% of the time with $_SESSION. Getting the

other 5% right requires a deep understanding of state and statelessness on the web...

Which is what (many) people are trying to avoid when they use $_SESSION variables.

There are more than twenty configuration variables that affect the way sessions work

under PHP. Incorrect configuration of any of these can cause applications to fail,

often in intermittent ways. The use of a custom session handler can have unpredictable

effects on security, reliability and performance.

Other languages are a lot worse than PHP -- the use of the "scope" concept in

languages such as Cold Fusion and Tango makes it easy to use a session variable without

realizing it... Resulting in an application that "works" sometimes, but fails in

mysterious ways.

(2) Session variables are bound to a particular language. In the real world, I work

with legacy systems that might be written in other languages. I might have some old

pages in Cold Fusion that work just fine, and I won't rework them in PHP until I've got

a good reason. If users can set a customization parameter, such as the background of a

page, it's easy to write a cookie that all languages can read. Applications stuck in

the session variable roach motel aren't as maintainable and portable.

(3) PHPSESSID. Do I need to say more? I consider the client that wants user tracking

and can't accept cookies, so all the pages on their

site look like

http://www.example.com/about\_us.php?PHPSESSID=**pseudo-random blob**

Three months later they come back and wonder why their site isn't being indexed in

Google. Yes, there's a saner way to use this feature, but this "cure" to privacy

violation is worse than the cookie "disease", since session ids will leak out through

referrers, bookmarks, links that people cut-and-pate...

(4) The back button. When somebody asks a question about sessions on a forum, they'll

usually ask another question a few days or weeks later: "How do I disable the back

button?"

The underlying problem is a deep aspect of the structure of the web. There is certain

state information that's particular to a request (GET and POST variables) and certain

state information that has a more persistent scope (cookies, session information, a

relational database.) The back button makes it possible for these two things to get out

of sync.

Ultimately, we need a systematic strategy to deal with this. One pattern is to put

the complete state of the application in form variables. Applications that use this

pattern always work perfectly with the back button. This pattern doesn't work always

(hitting the back button shouldn't cancel your order on an e-commerce site), but it

works often... For instance, you can use hidden variables to hold onto form variables

for complicated forms that spread over several pages,

(5) Multiple windows. I think it's a human right to be able to have more than one window

open on a web site. If I'm shopping, for instance, I'd like to be able to look at two

products simultaneously. An application that keeps state in form variables doesn't care

how many you have open. If you're looking for jobs at an organization that uses

taleo.net's software, you'll find that it uses trickery to prevent you from having more

than one window open... So you can't look at two jobs at once, or look at the job

description while you're filling out the application. I suspect that they did this

because they don't want to spend forever debugging "race conditions" that could be caused

by a user acting in two windows simultaneously.

Session variables introduce problems of locking. PHP gets an exclusive lock on the

session for each page displayed. This hurts the performance of pages that use

dynamically generated images and Javascript, and can mysteriously deadlock AJAX

applications.

(6) Scalability, Reliability, and all that. This is a tricky one, because it depends

on particulars. Sessions can be lightning-fast in systems that keep them in RAM, such

as Java and Cold Fusion. The default session handler in PHP uses files, and is probably

faster than a relational database in a direct comparison: however, the session handler

will load all of the data into RAM, whereas a relational implementation may only need to

load information when it's needed. Keeping information in POST variables or cookies also

involves a tradeoff -- this is as scalable as it gets so far as server resources, but

requires that the state be passed back and forth between the browser and server. This is

no big deal if the state is 500 bytes. It's unacceptable if the state is 500 megabytes.

In most cases, it starts looking expensive when we're passing an extra 10k-100k around.

I've recently been working on a legacy app that contains a query (select a subset of

items) and reporting (display user-selected fields of those items) function. The

interface between those modules is simple: the query system passes a comma-separated

list of item identifiers to the reporting system. I like this, because it meant that

one system could be changed without affecting the other. I had to update the app so it

would work with a changed database schema, so both sides needed some work.

I discovered that the app was passing the item list as a session variable. This worked:

unless I was using the application in two windows at a time. In that case, a query in

one window would change the report delivered in another window. I thought about it, and

realized that in this case, result sets would always be under about 10k, and usually be

around 1k. Therefore, it made sense to pass this as a hidden variable in the form and

ditch the session variable.

This shows the kind of problems that regularly turn up in the applications that

developers "throw over the wall" to testers and clients. Choose a session variable, and

your application behaves mysteriously for a user who didn't respect the "one window at a

time" assumption you made. Passing hidden variables in forms, on the other hand, might

work OK when you're testing with a small data set over a LAN, but could rapidly become a

performance nightmare for dialup users using a production database.

Performance can be improved in a number of ways: for instance, by delta-sigma

compressing the item list, or creating a "form scope" variable that's keyed against a

unique identifier in the form. Either way, quality web applications take quality

thought.

(7) Lack of engineered application state: Engineered Application State is the gem of

database-backed web applications.

If you keep the state of your application in a relational database, you need to ~design~

the state of your application. You need to ~think~ every time you add or change a table

in your relational database. You can add a new variable to your application as easily as

typing '$'.

Desktop apps keep the application state in a tangle of pointers. C and C++ applications

tend to contain 5 or more defects per thousand lines of code. Errors show up in data

structures over time, just as mutations occur in your cells. Memory leaks, application

hangs, and crashes are cancers caused by these mutations.

PHP apps die at the end of each request, and are reborn for the next request. They

don't accumulate errors over time. Web application environments such as Java and Cold

Fusion that involve a long-running process regularly hang or crash and require restarts.

When is the last time you've had to restart PHP?

A database protects you from errors in multiple ways. Transactions, for instance,

protect against data corruption caused by crashing scripts. It's easy to write

$_SESSION["logged_in"]=true;

in one place and

$_SESSION["logged-in"]=false;

in another, introducing unpredictable behavior and security holes. A relational

database will give you an error if you try something like that.

Can users of $_SESSION avoid the seven deadly sins?

Yes.

In practice they don't.

Paul,

That looks like a lot of info to digest without specific examples. Is there a book or

other resource on session management that you recommend that deals with these issues in

more detail?

Thanks.

-Leo

I'm not aware of one, but I wish there was. I think the question isn't so much "session management" but about how to manage state in a stateless protocol -- sessions

are one abstraction for doing that, but other abstractions exist too.

I think the best approach here is the "Pattern Vocabulary" approach. There are

certain practices, that when applied to an application, have certain results.

For instance, there's the pattern of "Stateless Server" -- the complete state of the

application (or subsystem thereof) is kept in hidden POST and GET variables. You accept

some limits, but get some real benefits: infinite scalability, no headaches with the

back button, no need for cookies...

You might try the above and then notice that you're passing 100K around in your hidden

form variables... People are complaining that your app is slow. Now you can generate a

unique id each time you draw a form ("Generated Form Scope", for lack of a better term.)

You can stuff your "hidden" variables into the database under this key, and restore

them when the key comes back... If your code is organized right (does something like

$vars=$_POST, and only looks at $vars afterwards), you can do this transparently to the

rest of your app.

The same kind of thinking can protect you against certain kinds of back button woes --

you can at least stop people from submitting the same form more than once, by checking

to see if a form with that unique id has been submitted before.

"Shopping Cart" is another pattern. People often use session variables to handle

shopping carts, but that's really not ideal from a user interface perspective...

Ideally, each instance of a shopping cart has it's own unique id... Imagine we want to

make an e-commerce site that behaves like amazon.com:

(1) User visits e-commerce site from a home computer -- a long-term tracking cookie gets

stuck on their browser

(2) User adds item A to their shopping cart... A new shopping cart is created with id

#101, associated with the tracking cookie. (3) User adds items B,C,D, and E to their

shopping cart in the course of 30 minutes of browsing. Each time an item is added, we

add a row to a table in the database that links the item id to the shopping cart id.

(4) 4-year old hits reset button

(5) User comes back to e-commerce site... He's happy to find his cart is still there.

User creates account #202 to check out. Shopping cart #101 is associated with account

#202

(6) User checks out shopping cart.

(7) User comes back a week later, wants to buy a few more items. The site recognizes

who he is. He adds two of item A and an item F to a newly created shopping cart with id

#102, associated with user account #202.

(8) User goes to work, logs in... The system sees that he has shopping cart #102 open.

He adds item G, and then checks out.

(9) User learns that he can trust this site to work correctly and becomes a loyal

customer.

It's nice that we've got a historical record of the shopping cart after the fact, but

there's a more important point -- we could have lost the customer's dollar at many points

in the above transaction if we were using a $_SESSION based cart. The session wouldn't

have survived step 4, for instance. A good user interface isn't academic here... It

puts money in our pocket.

The above scenario is complex, and it might not be fair to expect that a

first-generation shopping cart has those features. A $_SESSION-based shopping cart would

need to be completely reworked to add the features above. A cart that uses a unique

"cart id" and relational back end, will be a lot more maintainable... You could even

start out using $_SESSION to keep track of the "cart id", then keep it in a cookie,

then associate it with a user name, add the facility to promote an anonymous cart to an

authenticated cart and so on. Starting with a good design, we can provide the interface

that we ~want~ to provide, not that one that our abstract layer ~forces~ us to provide.

In regards to slides 29 and 30, can you elaborate and give a more detailed

example what they are trying to say? Are they saying that the session key

should contain a hash of the data? Or does the hash become the "salt" in

crypting the data? Finally, how does doing that make it easier to prevent

circumvention and forgeability.

Let's take it a step at a time... Imagine we've got a token of the following format...

$token="$user_id:$session_id"

The session_id doesn't have to be unpredictable -- it could could from an

auto_increment column in a database table... With the caveat that people could estimate

the usage of your site by looking at the session id's.

You could put this in a cookie, and it would work quite well, as long as you didn't

have users who knew how to look at or change the cookies. An attacker who understands

cookies can easily change the user id, or session_id.

To protect the cookies from tampering, we could do something like

$hash=sha1($token);

$signed_token="$hash:$token";

We could check the integrity of the token by recomputing the hash and see if it

matches the one in the signed token. This protects against accidental damage, or very

simple attacks. Still, it's quite possible that an attacker could guess what you're

doing: it wouldn't be safe at all in an open source system.

That's where the salt comes in... For a particular web site, we create a random

"salt" that, effectively, gives us a unique hash function for our web site.

$salt="... a random salt defined in a per-site configuration file ...";

function private_hash($token) {

global $salt;

return sha1("$salt:$token");

}

$private_hash=sha1("$salt:$token");

$signed_token="$private_hash:$token";

Now, nobody can alter your tokens unless they know your salt.

Because the tokens are cryptographically signed, the token itself is a proof that

somebody has logged in -- you don't need to look at the database or keep ~any~ server

side state. This makes it a highly scalable system... This basic approach is used on

some of the biggest sites in the world, such as yahoo.com.

Except for one little detail: replay attacks.

Nothing stops a person from saving his token and presenting later -- after his account

may have been deactivated, or after associated session information has been purged (an

error condition.) An attacker that gets the person's cookie jar, or who intercepts

network traffic, can also steal the token.

It's not possible to completely protect against sophisticated attacks where a hostile

party controls your network without installing complex software on both ends, and

solving some intrinsically difficult problems having to do with mutual authentication.

Let's just say that the developers of SSL have solved these problems, and that you

should use SSL for applications with the strongest security needs.

We can, however, make replay attacks a lot harder by adding a timestamp... Now the

token looks like

$timestamp:$user_id:$session_id

Now we're keeping a table on the server that looks like

create table session (

session id ... session id ... primary key

user_id ... user id ...,

last_updated ... timestamp ...,

begin_time ... timestamp ...,

end_time ... timestamp ...

);

Now we've got two constants:

REFRESH_TIME: how old a timestamp is before we issue a token with a new timestamp and

write the timestamp to the last_updated column.

EXPIRE_TIME: how old a timestamp is before we eliminate the session.

You might think you could put the client ip address in the token, and lock the

session to an ip address to make it harder to steal tokens. I tried this, but found out

that some of the largest ISPs (such as aol) have a proxy server that makes users seem to

"jump around". You can do it if you know people are logging from a sane ISP, but you

can't do it in general.

This system can be improved in numerous ways, such as adding anonymous sessions,

operating in a split http/https mode, and caching authorization system in the token.

If you're worried about information leakage (you don't want someone to know that he

got session 88427 yesterday and 99105 today), you can encrypt the token. But be

careful... It's easy to use cryptography the wrong way: don't rely on encryption to

protect token integrity against tampering -- most of the obvious schemes don't really

work.

cookie usage:

20 per domain, 4094 characters (bytes) in the value

Horde_Model -> Horde_Rdo_Model extends it

Horde_Type

Page/Block object

how to return block from driver, inherit Block methods, but also inherit Rdo_Base?

Mapper! _Mappers are the drivers_

Nag - tasks are a model

different models for different sources of tasks

so maybe horde_rdo_model isn't extension but delegate?

types are string, etc.

types can be used by rdo as well as by forms (models)

form helpers go into horde_view helper pack

Horde_Model:

validation:

validatesPresenceOf

validatesUniquenessOf

validatesAcceptanceOf

validatesConfirmationOf

one database, one real filesystem space

no globals

webroot has:

index.php

.htaccess

assets/ (css, images, js)

mod_rewrite rules

everything else pear-installable

make assets pear installable somehow

viewbuilder/pagebuilder - custom views

command line and web service actions (still api/method/params)

catalyst::message() - replaces logmessage - fatal, notification, observer - has a return value (?)

session object management

cms for rampage based on (replacing) ulaform + wicked + giapeto

horde_form

db and xml descriptions instead of just php building

reconcile driver architecture with Rdo Models

apps provide models instead of forms?

apps provide route bundles? (if frontcontroller)

forms are models!

reconcile models and mappers

what do routes point to (models? mappers? views?) -> controllers

controllers handle mappers vs. models?

composite mapper? (turba, etc.)

After reading that theserververside.com entry, it seems like we've been doing this in Solar (framework for PHP5) for a little while now. Essentially, after processing a form, you call $this->_redirectNoCache('controller/action') and you shouldn't get any re-POST troubles.

Boring code from the page-controller follows.

<http://solarphp.com/svn/trunk/Solar/Controller/Page.php>;;

/\*\*

 \*

Redirects to another page and action after disabling HTTP caching.

*
The _redirect() method is often called after a successful POST
operation, to show a "success" or "edit" page. In such cases, clicking
clicking "back" or "reload" will generate a warning in the
browser allowing for a possible re-POST if the user clicks OK.
Typically this is not what you want.

*
In those cases, use _redirectNoCache() to turn off HTTP caching, so
that the re-POST warning does not occur.

*
This method sends the following headers before setting Location:

*
{{code: php

header\("Cache-Control: no-store, no-cache, must-revalidate"\);

header\("Cache-Control: post-check=0, pre-check=0", false\);

```
header$"Pragma: no-cache"$;
```
}}

*
@param Solar_Uri_Action|string $spec The URI to redirect to.

*
@param int|string $code The HTTP status code to redirect with; default
is '303 See Other'.

*

@return void

protected function _redirectNoCache($spec, $code = 303)

{

  * reset cache-control

  $this-\>\_response-\>setHeader\(

      'Cache-Control',

      'no-store, no-cache, must-revalidate'

  \);



  * append cache-control

  $this-\>\_response-\>setHeader\(

      'Cache-Control',

      'post-check=0, pre-check=0',

      false

  \);



  * reset pragma header

  $this-\>\_response-\>setHeader\('Pragma', 'no-cache'\);



  * continue with redirection

  return $this-\>\_redirect\($spec, $code\);

}

apps provide models instead of forms

apps provide route bundles

apps provide controllers

seekable iterators?

use of ArrayIterator

adding LimitIterators and FilterIterators on top of Rdo

match up RDO with making resources first class - a wiki page, a task, etc. all get a URI

Meanwhile HTTP was designed for access to resources, the ìprimary keyî being determined by itís URL (vs. having to worry about the insert id). If you think ìdocumentsî, itís clear thereís no need to make a distinction between creating and updatingócreating a document results in the first version. Updating means overwriting an existing document with a new version. But in both cases the client is POSTing the same thing and does not need to be aware of whether the document already existed or not.

Meanwhile a common first demo app for server side frameworks is a CRUD example. The implication here is frameworks place a strong emphasis on the database, while HTTP is largely ignored (itís rare to even see HTTP status codes as a fundamental part of a framework).

Avoiding a long filesystem vs. database discussion (like the need for virtual file systems with extensible properties) suffice to sayóconsider how Dokuwiki stores wiki pages 1-to-1 as files compared to MediaWiki. What makes more sense to you? Perhaps our websites have been driven too far by the database?

The point here is, given the mismatch between HTTP and CRUD, weíve put CRUD first which in turns makes actions first class in our frameworks. We aim to support N different types of action (verbs) when really we should have been dealing with only threeóGET, POST and DELETE (the latter being perhaps re-routed to a specific ìresource classî method according to some framework / form conventions).

Nannying: tell me how to get organisedóclear signposts for where to put my code.
Just add water: give me my prototype now!
Donít make me think: I can do this stuff even on my dumbest days.
DRY: making the same change 50 times is not cool.
Anti-pasta: help me avoid spaghetti
Security: no nasty surprises please. Help me get this right first time.
Testing: help me protect myself against myself.

To me, what we should look at is the basic reasons why we want to manage web-pages and satisfy them:

centralized control over page rights and access
ability to remap urls due to changes in web-site structure
handling 404-errors intelligently
ability to dynamically add headers and footers to pages for displaying alerts such as "system going down at 5pm"
separates content from presentation in a reasonable manner, eg. with templates
managing tainted data (eg. POSTS, GETS, COOKIES)

AJAX Considered Harmful

Please pardon the provocative title, but this post is intended to

surface one point I buried in yesterday's presentation in the hopes

that by making it a separate post it will attract a wider audience.

I intend for this to post to be constructive, so I will focus on two

specific suggestions which hopefully will serve as the seed for the

development of a set of best practices for AJAX. Here are the two

humble suggestions on things that people should standardize on:

the data should first be encoded as octets according to the
```
UTF-8 character encoding
```
GET should never be used to initiate another operation which
```
will change state
```

Rationale for these two suggestions follows.

Encoding

For the former, I proposed a simple test:

The first thing I want you to do is to copy the string

ìIÒtÎrntiÙnlizÊti¯nî into your tool

and observe what comes out the other side.

When expressed as a part of the query component of a URI, it should

look like I%C3%B1t%C3%ABrn%C3%A2ti%C3%B4n%C3%A0liz%C3%A6ti%C3%B8n.

Standardizing improves interoperability, and the reason why I am

suggesting UTF-8 is that it is backwards compatible with ASCII, can

express the full range of the Unicode character set, and is widely

implemented.

Idempotency

Looking into the current PHP implementation of SAJAX, you will see the

following:

Bust cache in the head

header ("Expires: Mon, 26 Jul 1997 05:00:00 GMT"); * Date in the past

header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");

           * always modified

header ("Cache-Control: no-cache, must-revalidate"); * HTTP/1.1

header ("Pragma: no-cache"); * HTTP/1.0

This code should be a rather large clue that you are probably doing

something wrong. Apparently the author recognized that these headers

are somewhat sporadically and inconsistently implemented, and hoped

that by combining them that the chances of success would be improved.

The danger that the responses may be cached is actually the smaller of

several concerns. A much bigger concern is that unsuspecting

grandmothers and bots everywhere can be tricked into modifying online

databases simply by following a link.

Judicious use of HTTP GET can be a very good thing. Perhaps toolkits

can adopt a convention that procedure names that start with the

characters ìGetî use GET, everything else uses POST.

possible dispatcher:

<?php

define('HORDE_CONFIG_DSN', 'file:*/var/horde/config/head-site1');

define('HORDE_BASE', '/var/horde/head');

require_once HORDE_BASE . 'core.php';

Horde_Rampage_Dispatcher::run();

no-cache headers:

header("Cache-Control: no-store, private, must-revalidate, proxy-revalidate, post-check=0, pre-check=0, max-age=0, s-maxage=0");

meta tags to include:

Things to watch out for:

PHP_SELF

SERVER_NAME

Referer - never depend on it

passwords - don't use just md5, add a salt.

or, consider everything in $_SERVER tainted. Of course $_GET, $_POST,

$_REQUEST, $_COOKIE are.

Edge cases: $_SESSION, backend databases. If you don't consider it

input, then it's part of your application for security purposes.

Never display credit card info - this means it shouldn't be

repopulated!

Filtering is _inspection_, not correction. Don't try to correct

invalid data. Casts are relatively safe but still miss simplistic

attacks.

When possible, whitelist - prove data valid. Simple list of values, or

a regexp. Everything else is bad.

Need a model for making the filtered data clearly available, and don't

touch the tainted data.

ctype_* - fast, and charset aware. Much better than regexp tests.

Output filtering

Escaping is preservation, not changing data.

HTML, javascript, cli output, session data, rss feeds, XML, etc. Any

remote destination.

Need a clever way to integrate this into the template system! Perhaps

a content-type on variables (too much?) of text/html, text/plain,

text/xml, etc.? How about instead of tag, have text:foo, html:foo,

xml:foo? Or <tag:foo type="html">, defaulting to type="text".

Escaping MUST be charset aware. Data escaped for us-ascii might result

in JavaScript in Japanese (not necessarily a valid example).

For filtering complex data, use checksums instead.

fopen_wrappers - turn off if possible?

display_errors - write a custom error handler, handle errors elegantly

& integrated with Log object.

complexity leads to mistakes

http://phpsec.org/

http://brainbulb.com/

http://shiflett.org/

http://md5.rednoize.com/

http://www.midgard-project.org/updates/2003-05-29-000.html

Standardized URL-to-object mapping
Standardized object-to-application mapping
Standardized navigational system
Standardized object extensibility API
Standardized way to make application output configurable

So, MidCOM is about standardizing how to build Midgard applications

and site features. Lets look at each of the points in more detail

Standardized URL-to-object mapping

Before MidCOM Midgard site and application developers have had to

figure out how to map URL requests into Midgard objects, typically to

topics and articles. Everybody has rolled their own solution for this,

using object names, IDs or GUIDs as the identifiers, and using either

GET parameters or active page arguments.

With MidCOM, application development doesn't any more have to start by

writing a URL parser, as the MidCOM system provides this already. URL

parsing happens completely in topic and article space, using object

names as the identifiers. This makes for very clean URLs. Consider the

following:

/gallery/spring-2003/IMG_2442.html

This example would translate to article named "IMG_2442" in topic

"spring-2003" under topic "gallery". Clean, pronounceable and easy to

use. An even better, any Midgard object instanced using a MidCOM

component is aware of its location, providing the URL through MidCOM's

metadata API.

Standardized object-to-application mapping

In addition to connecting URLs to Midgard objects, URLs also need to

be connected to specific applications, or in MidCOM terms, components.

All topics in MidCOM are assigned to be managed by a component. This

means that different parts of the site can work in different ways. For

example, URL:

/news/midgard-tutorial.html

Could load a "news ticker" component, and provide the topic "news" and

article "midgard-tutorial" to be handled and displayed by it.

The newsticker component can fully control the administrative

interface for managing content under it, and the output provided by

URLs it manages.

Component is selected for each topic separately. This means that

example URL:

/news/contacts/bergius.html

Could be handled by a "employee directory" component.

Standardized navigational system

Each MidCOM component provides all navigational information about

objects managed by it to a system called NAP, which is accessible by

an easy object-oriented API.

The NAP system means that site developers don't worry about different

components or object types when writing the site's navigational

interface. You can write one script for generating the whole site

navigation, and it will work with the site and any component under it.

This makes standardized navigational tools like breadcrumbs or the

NemeinNavBar utility much more useful, as they can be used with any

MidCOM-based site. I expect that in near future site developers will

have a huge library of prebuilt navigational systems to select from.

Standardized object extensibility API

Enabling content managers to define their own object types or metadata

fields has always been a problem with Midgard, meaning that any new

metadata field has forced site developers to write their own content

creation UIs.

MidCOM provides an easier system for this called datamanager. With

datamanager, site developers can define their own customer data

structures, called "layouts". Layouts are PHP arrays telling

datamanager what fields to allow for objects handled for that

component, how to present those fields in an administrative interface,

and where to store them (parameters, object fields or attachments).

Using datamanager component writers don't really have to care about

what object fields site developers will want to use, they just need to

use the datamanager utility. Data structure "layouts" can be provided

as part of the default component configuration, and can be overridden

on a per-sitegroup basis.

Datamanager is integrated to the MidCOM AIS content management

interface, providing customized editing forms for all components based

on widgets defined in the "layouts" configuration. The widgets can be

anything from text input boxes to a WYSIWYG editor or image upload

system.

Standardized way to make application output configurable

The MidCOM specification requires that all application output is

handled through the MidCOM style system. MidCOM's style engine is an

extension of the Midgard style engine, allowing component outputs to

be configured using style elements, but also for fallback elements to

be provided as snippets.

This means that output of any MidCOM component will be fully

configurable by site developers using the familiar Midgard style

engine. Style to be used can be defined separately for all topics,

allowing for different output styles from same components on per site

area basis.

Because components can be loaded dynamically to a Midgard page, site

developers can have different parts of the same page use different

styles, making administration of the style elements much easier.

Conclusions

MidCOM brings into Midgard something that has been lacking so far: a

"write once and run everywhere" framework for building site

components, styles and navigational tools.

This promotes component sharing and code reuse, both within a single

Midgard solution provider company, and within the international Open

Source community.

So far Midgard has provided a nice content management framework, but

actual sites have needed to be built from scratch. MidCOM promises to

change that, making Midgard much easier to implement.

Of course, sloppy coding is still possible with MidCOM, but if

component writers adher to the MidCOM specification, PEAR coding

standards and use NemeinLocalization for internationalizing their

components, we should achieve global reusability.

I invite all Midgard developers to seriously study and consider MidCOM

for their projects. There is some learning curve, but real code

reusability should repay that very quickly.

The Midgard Framework is a powerful toolkit for managing online

information. Writing applications and functionalities to the platform

is done using the easy-to-learn PHP scripting language. All

interfacing with the system is done via a regular Web browser, and no

special tools are needed for developers or content authors.

Main features of Midgard Framework include:

Easy and well documented Application Programming Interface (API)
Efficient management of Web content using a hierarchical topic system
Separation of layout, content and site logic
Support for editorial workflow and approval mechanisms
Attachment of metadata to all content objects
Management of PIM data including contacts and calendaring information
Multilingual support (including Unicode) and localization
Replication for clustered setups and staging
Multi-company support using virtual databases
Flexible user and group management

Midgard works on most common UNIX platforms, including Linux, FreeBSD

and Solaris. Prebuilt binary packages are available for some Linux

platforms (including Red Hat, Debian and Mandrake), and the system can

be installed from sources to most other environments.

For other environments, including hosted servers and Windows systems,

there is the pure-PHP implementation, Midgard Lite.

The Midgard Application Server is free software developed

internationally with the Open Source model and distributed under the

GNU licenses. Commercial support, applications and services for the

platform are available from a range of companies worldwide.

The PHPmole toolkit provides Midgard developers with a

freely-available Integrated Development Environment (IDE) comparable

to DreamWeaver and MS Visual Studio, with additional content

management functionalities.

With the Midgard CMS package, the ease-of-use of productivity software

and office suites can be brought to Midgard content management.

query building:

<?php

Instantiate the Query Builder for seeking MidgardArticles

$query = new MidgardQueryBuilder("MidgardArticle");

Next add the SQL constraints you need
List articles only from specific topic

$query->addConstraint("topic", "=", $topic->id);

List only articles that have been approved since some timestamp

$query->addConstraint("approved", ">", $starting_time);

Order the articles based on their approval time

$query->addOrder("approved", "DESC");

Get only 20 articles for this particular view

$query->setLimit(20);

// Start from the Nth page of this article list

$query->setOffset($_REQUEST["startfrom"

Download this page as: Plain Text, HTML, Latex, reStructuredText