6.0.0-git
2024-04-24

Diff for ChucksHorde4Thoughts between 36 and 37

[[toc]]



+ Chuck's Horde 4 Thoughts



++ Overall Design



* http://www.25hoursaday.com/weblog/2008/08/04/AvoidingTheSecondSystemEffectInSoftwareDevelopment.aspx





++ Controllers



Horde_Controller - make Horde_Controller_Dispatcher Horde_FrontController_Http or similar, then add other front controllers for Soap, !XmlRpc, !JsonRpc, Cli ...





++ User Interface



* Get rid of popups. Any new window functionality should be in an ajax overlay, or a full new browser window.





++ API design



* http://www.lornajane.net/posts/2009/Error-Feedback-for-Web-Services

* http://www.lornajane.net/posts/2009/Status-Codes-for-Web-Services

* http://www.lornajane.net/posts/2009/Version-Parameters-for-Web-Services





++ Documentation



+++ Developer docs (PHPDoc alternatives)



* http://ajaxian.com/archives/beautiful-code-documentation

** http://sphinx.pocoo.org/





++ Debug support



* http://code.google.com/p/webgrind/

* http://www.sitepoint.com/blogs/2008/05/13/useful-in-browser-development-tools-for-php/

* http://badapi.trib.tv/

* http://bergie.iki.fi/blog/sql-level_debugging_with_midgard.html

* http://code.google.com/p/formaldehyde/

* Debug "wrapper" drivers - encapsulate another driver and delegate all calls, but provide before/after hooks for any function along with timing, profiling, reporting of calls and arguments, etc.

++ URLs




++ URLs



** We should sign (with a timestamp and HMAC, per Horde::signQueryString) all URLs that perform destructive actions.





++ Configuration



* use Horde_Policy

* Allow conf.d directory styles, like Apache2 config (see http://bugs.horde.org/ticket/4747).

* Use return $... in PHP config files to avoid defining local-scope variables? (http://www.urdalen.com/blog/?p=257)





++ Permissions



* http://blog.wolff-hamburg.de/archives/25-A-pragmatic-approach-to-rights-management.html

* http://stonean.com/wiki/lockdown





++ Profiling



Lots of overlap with debugging



* http://docs.kohanaphp.com/libraries/profiler

* See Zend_Db_Profiler, and idea for Cache profiler also, including Firebug plugins for both





++ Testing



* http://www.phpunit.de/pocket_guide/3.2/en/database.html

* http://sebastian-bergmann.de/archives/702-Data-Providers-in-PHPUnit-3.2.html

* http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/

* http://mikenaberezny.com/2008/10/17/php-temporary-streams/





++ Error handling



* http://derickrethans.nl/five_reasons_why_the_shutop_operator_@_should_be_avoided.php

* http://eirikhoem.wordpress.com/2008/03/15/dying-with-grace-phps-register_shutdown_function/





++ Jabber/XMPP support





* http://www.danga.com/mogilefs/

* http://hadoop.apache.org/core/

* http://hadoop.apache.org/core/docs/r0.16.4/hdfs_design.html

* http://en.wikipedia.org/wiki/Hadoop 

* http://wiki.apache.org/hadoop/ProjectDescription

* http://code.google.com/p/the-cassandra-project/





++ Object instantiation (sometimes known as dependency injection)



* http://bergie.iki.fi/blog/midcom_3_and_context_injectors.html

* http://www.sitepoint.com/blogs/2008/02/04/dealing-with-dependencies/

* http://usrportage.de/archives/897-Antipattern-the-verbose-constructor.html

* http://usrportage.de/archives/904-Antipattern-chaining-stateless-protocol-requests.html



<code>

horde_ctx::getQueue

$horde-> (no, no global variable)

Horde::$queue (__getStatic?)

Horde::queue() (__callStatic, include extra $args)

Horde_Factory::make(...)

    ::makeQueue() (using __callStatic to introspect)

Horde_Builder

</code>





++ Idea sources



Don't get too caught up in everything everyone else is doing. However, some things that might be useful food for thought are listed below. Projects may be added to or discarded from this list quickly as they are synthesized.



* http://kohanaphp.com/home.html

* http://code.whytheluckystiff.net/camping

* http://api.rubyonrails.com/

* http://toys.lerdorf.com/archives/38-The-no-framework-PHP-MVC-framework.html

* http://www.xml.lt/Resources/Framework

* http://cognifty.com/

* http://docs.kohanaphp.com/

* http://merbivore.com/documentation.html

* http://weblog.rubyonrails.org/2009/2/1/rails-2-3-0-rc1-templates-engines-rack-metal-much-more

* http://static.repoze.org/bfgdocs/

* http://oddments.org/?p=78

* http://www.brandonsavage.net/use-registry-to-remember-objects-so-you-dont-have-to/





++ Horde 4 Administratation



* http://bergie.iki.fi/blog/asgard_welcome_page_just_got_useful.html





++ Package structure



* http://www.apsstandard.com/doc





++ Unsorted



http://incubator.apache.org/thrift/

http://www.gearmanproject.org/

http://codeigniter.com/wiki/Modular_Extensions_-_HMVC/

http://ojay.othermedia.org/articles/keyboard.html

http://www.lkozma.net/autocomplete.html

http://writer.bighugelabs.com/

http://www.spread.org/

http://www.backhand.org/wackamole/





protocol-independent URLs:

http://nedbatchelder.com/blog/200710.html#e20071017T215538





// return a 304 if the file hasn't been modified since the If-Modified-Since date

    // no point in resending all the data if the browser already has it cached

    if (function_exists("apache_request_headers")) {

        $headers = apache_request_headers();



        if ($headers['If-Modified-Since']) {

            $ims = strtotime($headers['If-Modified-Since']);

            if ($ims >= $serve_data['modified_time']) {

                Header ("HTTP/1.0 304 Not Modified");

                exit(0);

            }

        }

    }




I just went through my first signup process that required an

SMS-capable device for confirmation. It also didn't make me pick my

credit card type, and instead used my country code (+1) to decide on a

card detection algorithm.

skip-external-locking

skip-thread-priority

key_buffer = 64M

max_connections = 1024

max_connect_errors = 1000

max_allowed_packet = 8M

table_cache = 512

sort_buffer_size = 8M

read_buffer_size = 1M

read_rnd_buffer_size = 2M

myisam_sort_buffer_size = 64M

thread_cache_size = 50

query_cache_size = 128M

tmp_table_size= 1024M

thread_concurrency = 12

wait_timeout = 60

interactive_timeout = 60

log_slow_queries








index.php - global dispatcher

how to do themes/custom templates? chain local -> app -> horde?



a horde 4 installation:

  config/

  lib/

  apps/

  public/ <- with app/ subdirs containing images, etc.

everything routable goes in apps/



apps/

  login/

  help/

  prefs/

  admin/

  etc...

app name is the first part of the route > /login

subdomain support

route aliases


 - prefer PHP over XML

merge Rdo and Mad into Horde_Db



     use subpackages or multiple *.xml for packages to avoid silliness?


should have parallel web and cli configuration and installation/update tools; web requires webserver to have write access to a config/ dir and to public/; cli tools do not (if run as another user)







Horde 4 app - a Horde 3.x app updated for PHP 5 and to use the latest libraries



Rampage app - "RAD" (rapid application development) MVC app that uses Horde 4



/horde/page/ -> dispatcher for Rampage modules w/ views (overridable), routes, controllers, etc.?

have generic views for rampage_login, rampage_admin_*, etc.



configuration:

config/routes.php

config/routes_local.php -> do this for all config files



Horde_Content_Index -> horde-wide search





Random Horde Ideas



mini-cms for building your own sidebar/menu/etc?

- shortcuts to any bit of horde



labels labels labels

keywords also or just labels? probably just flexible labels

"smart folders"



Getting Things Done support? (other apps that do it - Tracks, Kinkless GTD, Midnight Inbox)



make mnemo into more of a snippet keeper? sort of like a personal cms - or wiki. carry the encryption feature through to other kinds of content



create an outliner!



tags/labels for mail



rename virtual folders to smart folders? too apple?



freetext boolean mail searches:

apples & oranges

apples | oranges

apples ! oranges (apples but not oranges)

apples & (oranges | lemons)








Event-driven apps:

"Understanding and implementing this event model can free your application from the constraints of defined elements. For example, instead of applying an event listener for each link in a menu, you can assign a single listener to the menu item itself and retrieve the event target. That way you don?t need to change your script when the menu gets larger or when links get removed from it."

http://yuiblog.com/blog/2007/01/17/event-plan/






tagging/instant hierarchies as specialized permission-based search

RBAC




what is horde?


groupware?

horde data services?

horde data access?

ui layers


be the php dojo framework? or the php yui framework?

see http://tigermouse.epsi.pl/ ?

or, don't do desktop-like widgets? see UI design bookmarks




try to rely only on thread-safe extensions?

reduce dependency tree



avoid globals and non horde-namespaced functions/methods in framework and core app code

class-based registry apis





against edge cases: http://www.bakesalehq.com/contents/show/12/



features


against edge cases: http://www.bakesalehq.com/contents/show/12/



features from Prado? http://www.urdalen.com/blog/?p=198



use functions where appropriate for shortcuts/helpers, like Mike's t("translated string") function? but would be horde_t? would call configured translation system





helper sets for dojo, protaculous, yui - simple functions like dojo_editor(), dojo_pane(), yui_map(), etc. Load with something like Horde/Layout/Helpers/YUI.php, etc. See http://www.ngcoders.com/projax/



Horde as a set of apps and methodology needs to pick a js lib, pick a template methodology, etc. - this is Rampage Horde as a framework can allow for flexibility





To make it even better, separate the control logic from the presentation. That way, back could be reverse, etc. I do this in all my forms since application logic and presentation "word play" are two distinct things to me. This is what I use:



<form method="post" action="form.php">

<input name="submit[back]" value="reverse" type="submit" />


<input name="submit[next]" value="speed ahead" type="submit" />

&nbsp;

<input name="submit[home]" value="no place like home" type="submit" />

</form>


Then, you can have a simple routine that captures submit actions regardless of the presentation value. You check for the array submit -- count 1 and whitelist against the acceptable values. A multi-row table can expand upon the theme by using this: submit[edit_3], submit[delete_3]m submit[edit_5], etc.

caching

make sure Rdo and other services allow dropping in caching rules




http://sebastian-bergmann.de/pages/talks.html

phpunit - @test markup in methods

  phpunit + selenium

  cruise control?





really hope google will integrate any product of theirs with any other products of theirs? receive an email, transform it to document, add spreadheet, add notes, add bookmarks saved from search history and a link to an event in calendar anyone?





From nyphp-talk:

    The other day I had to get an application started in a hurry.  It's

doing something useful at < 700 lines,  but I'm considering options that

could grow it out to about 10 times that.  It depends on a "core

library" that's < 500 lines.  This library deals with common issues in

string handling,  parameter handling,  and HTML form generation.

    About 10% of the application,  or 70 lines,  is a microframework

that's loosely built on Struts.  About 20 of those lines are in 2

functions which would be generally useful for microframeworks (such as

file_exists_in_include_path()).  Like Struts,  the microframework

chooses an "action" based on form parameters:  the action then chooses a

"view" -- a "view" is basically a template that a designer can edit

which can be supplemented by an optional "query" which pulls stuff out

of the database.  Like Ruby-on-Rails,  the microframework uses

convention instead of configuration:  the dispatcher computes an "action

name" based on query parameters,  and uses that to compute a

filename...  It checks that the file exists and executes it with the

"require method".



    The microframework uses no object-oriented techniques.  That's not

because I have any antipathy to OO,  but because I didn't need it,  and

I like writing my actions,  queries,  and views in a style that "feels

like PHP".



    Yes, my microframework is nowhere near as powerful as CakePHP or

Symfony.  Yet,  it's more flexible,  because I can codesign it with my

application.  Because it's so simple,  I can easily adapt it to do what

I want.  If I decide I really hate it,  I can write a new one in an

hour.  I'm an expert on it,  because I developed it,  and I wouldn't

have to take on the technical,  social and emotional burdens of

"forking" an open-source codebase if I wanted to make a change in direction.



    I'm moving towards a vision of web app architecture where we move

towards shared vocabulary and standardized interfaces.  Rather than

working with a "comprehensive framework" that does everything,  I'd like

to have a "framework construction set" that contains a number of

elements that I can take or leave."









Resources:

http://www.ryandaigle.com/articles/2006/06/30/whats-new-in-edge-rails-activeresource-is-here





mixins: http://www.symfony-project.com/book/trunk/17-Extending-Symfony


http://dataspill.org/pages/projects/ruby-activeldap




SPL features: Regex Iterators, SplFileObject CSV support, Caching Iterator

Data: stream support

set date.timezone ini setting automatically based on user?







'Bounce' takes the currently selected emails and sends back an email to the addresses the email(s) came from saying basically that 'the email address does not exist' in a standard internet email protocol way. Some more organised spammers remove these from their lists. After sending the bounce response, the messages are deleted."

 

* If accessed with a browser, public folder is also a personal web-site, accessible at http://username.fastmail.fm 

* Provide tool allowing synchronization of Outlook Express etc address book with FastMail contacts, possibly using LDAP 




Access Control:  Under Preferences, there is a "Grant Access" link for the calendar, addressbook, infolog, and projects.  It allows you to select Read, Add, Edit, Delete, and Private access for each group and each user.  Again, very flexible.



Categories: Multiple category selection is allowed in the addressbook, projects, calendar and infolog.



Custom Fields:  I can create custom fields.









PHP_SELF



Executive summary: PHP_SELF intentionally includes extra URL garbage (or

valuable URL variables, take your pick) tacked on by the user.  Don't use

it without knowing what it does.



Here's what you get when you hit the URL:



http://example.com/info.php/testing1?testing2 :



_SERVER["REQUEST_URI"]         /info.php/testing1?testing2

_SERVER["PHP_SELF"]    /info.php/testing1

_SERVER["SCRIPT_NAME"]         /info.php



Get it?  If you don't want that extra stuff tacked on by the user, use the

correct _SERVER variable.  If you use REQUEST_URI or PHP_SELF, be aware the

user can affect the contents of that variable.  99% of the time, you want

SCRIPT_NAME, not PHP_SELF.



By the way, here's another test:



http://example.com/info.php/testing<script>?testing :



_SERVER["REQUEST_URI"]         /info.php/testing%3Cscript%3E?testing

_SERVER["PHP_SELF"]    /info.php/testing<script>

_SERVER["SCRIPT_NAME"]         /info.php



Note that the REQUEST_URI variable, which comes from Apache, is encoded,

while the PHP_SELF variable, which comes from PHP, is not.  So PHP 5.2.0

still makes it possible to shoot yourself in the foot, and as I've pointed

out below, well-known PHP authorities actually recommend that you do so.




Subject: Re: [nyphp-talk] $_SERVER['PHP_SELF'} not working?

Date: Friday 22 July 2005 12:05 pm

From: Michael Sims <jellicle@gmail.com>

To: NYPHP Talk <talk@lists.nyphp.org>



On Thursday 21 July 2005 17:16, Dan Cech wrote:

You could put:



$_SERVER['PHP_SELF'] = $_SERVER['SCRIPT_NAME'];



into one of your common include files.



Yes.  I'm afraid I don't understand this entire thread.  Apparently

because of the numerous PHP developer articles recommending it, and

because of the php.net page which for whatever reason lists it first on

the list of predefined variables, people are using PHP_SELF when they

really want SCRIPT_NAME.  SCRIPT_NAME solves all the problems mentioned

in this thread - it's just the script name, without any extra garbage

that might be tacked on by the user.  PHP_SELF explicitly includes that

extra garbage, so solutions in this thread that involve stripping the

garbage off of PHP_SELF to make it safe are really, really missing the

point - just use SCRIPT_NAME instead.  Please don't use FORM ACTION="";

according to the spec, what the browser does with that is undefined, so

even if it works in current browsers, it might not work in future ones.



People can be forgiven for making this mistake -- I'm here holding my

copy of _Learning PHP 5_, and it recommends on page 8 and again on page

86 the use of PHP_SELF for self-referencing forms, ahem -- but it's time

to put it to bed: PHP_SELF is unsafe for any usage where it is echoed

back to the page.





SESSIONS:



  I'll try to reply to this and some other people who replied to my previous message.

   I'll start with my background.  I've often been the person who the buck stops with --

somebody else develops an application that almost works (perhaps even puts it in

production) and then I have to clean up the mess.  The app might be written in PHP,  

Java,  Cold Fusion,  Perl,  you name it.  I've learned to see session variables as a "bad

smell".



   When I develop my own applications,  I use cookies for personalization and caching.  I

use the authentication system described in



   this mechanism can carry a "session id",  which in turn can be used a key against

application state stored in a relational database.  I think through the boundary cases,  

and find that my greenfield apps behave predictably -- my only woe is that you'll

discover that browsers have a lot of undocumented behavior connected with cookies,  form

handling,  and caching.  All problems that you still need to fight with if you use

sessions,  see the comments for

usability and security:  recent studies show that 80% of web applications have serious

security problems



http://www.whitehatsec.com/home/resources/presentations/files/wh_security_stats_webinar.pdf




http://www.useit.com/


   Perhaps the top 20% of programmers can write applications with $_SESSION that don't

have serious security and usability problems,  but what about the other 80%?


(1)  Session variables are treacherous.  Odd things can happen in boundary cases,  such

as when sessions expire,  or when you are targeted by session fixation attacks.


   I've looked at many apps that use sessions that seem to be working...  Until you walk

away for two hours,  come back,  and discover that you're logged in as somebody else.  I

suppose I could have spent hours or days tracking down an intermittent problem,  which

involved some confluence of browser oddness (IE was fine,  Firefox was screwy),  the

behavior of the session system,  and crooked logic in the application.  Or I could use

cryptographically signed cookies to implement an authentication system which won't give

me surprises in the future.

other 5% right requires a deep understanding of state and statelessness on the web...  

Which is what (many) people are trying to avoid when they use $_SESSION variables.



   There are more than twenty configuration variables that affect the way sessions work

under PHP.  Incorrect configuration of any of these can cause applications to fail,  

often in intermittent ways.  The use of a custom session handler can have unpredictable

effects on security,  reliability and performance.



   Other languages are a lot worse than PHP -- the use of the "scope" concept in

languages such as Cold Fusion and Tango makes it easy to use a session variable without

realizing it...  Resulting in an application that "works" sometimes,  but fails in

mysterious ways.



(2) Session variables are bound to a particular language.  In the real world,  I work

with legacy systems that might be written in other languages.  I might have some old

pages in Cold Fusion that work just fine,  and I won't rework them in PHP until I've got

a good reason.  If users can set a customization parameter,  such as the background of a

page,  it's easy to write a cookie that all languages can read.  Applications stuck in

the session variable roach motel aren't as maintainable and portable.



(3) PHPSESSID.  Do I need to say more?  I consider the client that wants user tracking

and can't accept cookies,  so all the pages on their

site look like



http://www.example.com/about_us.php?PHPSESSID=**pseudo-random blob**



   Three months later they come back and wonder why their site isn't being indexed in

Google.  Yes,  there's a saner way to use this feature,  but this "cure" to privacy

violation is worse than the cookie "disease",  since session ids will leak out through

referrers,  bookmarks,  links that people cut-and-pate...



(4) The back button.  When somebody asks a question about sessions on a forum,  they'll

usually ask another question a few days or weeks later:  "How do I disable the back

button?"





   The underlying problem is a deep aspect of the structure of the web.  There is certain

state information that's particular to a request (GET and POST variables) and certain

state information that has a more persistent scope (cookies,  session information,  a

relational database.)  The back button makes it possible for these two things to get out

of sync.

the complete state of the application in form variables.  Applications that use this

pattern always work perfectly with the back button.  This pattern doesn't work always

(hitting the back button shouldn't cancel your order on an e-commerce site),  but it

works often...  For instance,  you can use hidden variables to hold onto form variables

for complicated forms that spread over several pages,


(5) Multiple windows.  I think it's a human right to be able to have more than one window

open on a web site.  If I'm shopping,  for instance, I'd like to be able to look at two

products simultaneously.  An application that keeps state in form variables doesn't care

how many you have open.  If you're looking for jobs at an organization that uses

taleo.net's software,  you'll find that it uses trickery to prevent you from having more

than one window open...  So you can't look at two jobs at once,  or look at the job

description while you're filling out the application.  I suspect that they did this

because they don't want to spend forever debugging "race conditions" that could be caused

by a user acting in two windows simultaneously.


session for each page displayed.  This hurts the performance of pages that use

dynamically generated images and Javascript,  and can mysteriously deadlock AJAX

applications.

on particulars.  Sessions can be lightning-fast in systems that keep them in RAM,  such

as Java and Cold Fusion.  The default session handler in PHP uses files,  and is probably

faster than a relational database in a direct comparison:  however,  the session handler

will load all of the data into RAM,  whereas a relational implementation may only need to

load information when it's needed.  Keeping information in POST variables or cookies also

involves a tradeoff -- this is as scalable as it gets so far as server resources,  but

requires that the state be passed back and forth between the browser and server.  This is

no big deal if the state is 500 bytes.  It's unacceptable if the state is 500 megabytes.  

In most cases,  it starts looking expensive when we're passing an extra 10k-100k around.



I've recently been working on a legacy app that contains a query (select a subset of

items) and reporting (display user-selected fields of those items) function.  The

interface between those modules is simple:  the query system passes a comma-separated

list of item identifiers to the reporting system.  I like this,  because it meant that

one system could be changed without affecting the other.  I had to update the app so it

would work with a changed database schema,  so both sides needed some work.



I discovered that the app was passing the item list as a session variable.  This worked:  

unless I was using the application in two windows at a time.  In that case,  a query in

one window would change the report delivered in another window.  I thought about it,  and

realized that in this case,  result sets would always be under about 10k,  and usually be

around 1k.  Therefore,  it made sense to pass this as a hidden variable in the form and

ditch the session variable.



This shows the kind of problems that regularly turn up in the applications that

developers "throw over the wall" to testers and clients.  Choose a session variable,  and

your application behaves mysteriously for a user who didn't respect the "one window at a

time" assumption you made.  Passing hidden variables in forms,  on the other hand,  might

work OK when you're testing with a small data set over a LAN,  but could rapidly become a

performance nightmare for dialup users using a production database.



Performance can be improved in a number of ways:  for instance,  by delta-sigma

compressing the item list,  or creating a "form scope" variable that's keyed against a

unique identifier in the form.  Either way,  quality web applications take quality

thought.



(7) Lack of engineered application state:  Engineered Application State is the gem of

database-backed web applications.



If you keep the state of your application in a relational database,  you need to ~design~

the state of your application.  You need to ~think~ every time you add or change a table

in your relational database.  You can add a new variable to your application as easily as

typing '$'.



Desktop apps keep the application state in a tangle of pointers.  C and C++ applications

tend to contain 5 or more defects per thousand lines of code.  Errors show up in data

structures over time,  just as mutations occur in your cells.  Memory leaks,  application

hangs,  and crashes are cancers caused by these mutations.


don't accumulate errors over time.  Web application environments such as Java and Cold

Fusion that involve a long-running process regularly hang or crash and require restarts.  

When is the last time you've had to restart PHP?







$_SESSION["logged-in"]=false;



in another,  introducing unpredictable behavior and security holes.  A relational

database will give you an error if you try something like that.



-------------



Can users of $_SESSION avoid the seven deadly sins?



Yes.



Yes.



InIn practice they don't.





Paul,

That looks like a lot of info to digest without specific examples. Is there a book or

other resource on session management that you recommend that deals with these issues in

more detail?

Thanks.

-Leo

  I'm not aware of one,  but I wish there was.  I think the question isn't so much "session management" but about how to manage state in a stateless protocol -- sessions

are one abstraction for doing that,  but other abstractions exist too.





   For instance,  there's the pattern of "Stateless Server" -- the complete state of the

application (or subsystem thereof) is kept in hidden POST and GET variables.  You accept

some limits,  but get some real benefits:  infinite scalability,  no headaches with the

back button,  no need for cookies...


form variables...  People are complaining that your app is slow.  Now you can generate a

unique id each time you draw a form ("Generated Form Scope",  for lack of a better term.)

 You can stuff your "hidden" variables into the database under this key,  and restore

them when the key comes back...  If your code is organized right (does something like

$vars=$_POST,  and only looks at $vars afterwards),  you can do this transparently to the

rest of your app.


you can at least stop people from submitting the same form more than once,  by checking

to see if a form with that unique id has been submitted before.


   "Shopping Cart" is another pattern.  People often use session variables to handle

shopping carts,  but that's really not ideal from a user interface perspective...  

Ideally,  each instance of a shopping cart has it's own unique id...  Imagine we want to

make an e-commerce site that behaves like amazon.com:


(1) User visits e-commerce site from a home computer -- a long-term tracking cookie gets

stuck on their browser

(2) User adds item A to their shopping cart...  A new shopping cart is created with id

#101,  associated with the tracking cookie. (3) User adds items B,C,D, and E to their

shopping cart in the course of 30 minutes of browsing.  Each time an item is added,  we

add a row to a table in the database that links the item id to the shopping cart id.

(4) 4-year old hits reset button

(5) User comes back to e-commerce site... He's happy to find his cart is still there.  

User creates account #202 to check out.  Shopping cart #101 is associated with account

#202

(6) User checks out shopping cart.

(7) User comes back a week later,  wants to buy a few more items.  The site recognizes

who he is.  He adds two of item A and an item F to a newly created shopping cart with id

#102,  associated with user account #202.

(8) User goes to work, logs in...  The system sees that he has shopping cart #102 open.  

He adds item G,  and then checks out.

(9) User learns that he can trust this site to work correctly and becomes a loyal

customer.



   It's nice that we've got a historical record of the shopping cart after the fact,  but

there's a more important point -- we could have lost the customer's dollar at many points

in the above transaction if we were using a $_SESSION based cart.   The session wouldn't

have survived step 4,  for instance.  A good user interface isn't academic here...  It

puts money in our pocket.



   The above scenario is complex,  and it might not be fair to expect that a

first-generation shopping cart has those features.  A $_SESSION-based shopping cart would

need to be completely reworked to add the features  above.  A cart that uses a unique

"cart id" and relational back end,  will be a lot more maintainable...  You could even

start out using $_SESSION to keep track of the "cart id",  then keep it in a cookie,  

then associate it with a user name,  add the facility to promote an anonymous cart to an

authenticated cart and so on.  Starting with a good design,  we can provide the interface

that we ~want~ to provide,  not that one that our abstract layer ~forces~ us to provide.







In regards to slides 29 and 30, can you elaborate and give a more detailed

example what they are trying to say?  Are they saying that the session key

should contain a hash of the data? Or does the hash become the "salt" in

crypting the data? Finally, how does doing that make it easier to prevent

circumvention and forgeability.

  Let's take it a step at a time...  Imagine we've got a token of the following format...

$token="$user_id:$session_id"


   The session_id doesn't have to be unpredictable -- it could could from an

auto_increment column in a database table...  With the caveat that people could estimate

the usage of your site by looking at the session id's.


have users who knew how to look at or change the cookies.  An attacker who understands

cookies can easily change the user id,  or session_id.


$hash=sha1($token);

$signed_token="$hash:$token";



   We could check the integrity of the token by recomputing the hash and see if it

matches the one in the signed token.  This protects against accidental damage,  or very

simple attacks.  Still,  it's quite possible that an attacker could guess what you're

doing:  it wouldn't be safe at all in an open source system.



   That's where the salt comes in...  For a particular web site,  we create a random

"salt" that,  effectively,  gives us a unique hash function for our web site.


function private_hash($token) {

   global $salt;

   return sha1("$salt:$token");



}

$private_hash=sha1("$salt:$token");

$signed_token="$private_hash:$token";

somebody has logged in -- you don't need to look at the database or keep ~any~ server

side state.  This makes it a highly scalable system...  This basic approach is used on

some of the biggest sites in the world,  such as yahoo.com.


   Nothing stops a person from saving his token and presenting later -- after his account

may have been deactivated,  or after associated session information has been purged (an

error condition.)  An attacker that gets the person's cookie jar,  or who intercepts

network traffic,  can also steal the token.


   It's not possible to completely protect against sophisticated attacks where a hostile

party controls your network without installing complex software on both ends,  and

solving some intrinsically difficult problems having to do with mutual authentication.  

Let's just say that the developers of SSL have solved these problems,  and that you




   We can,  however,  make replay attacks a lot harder by adding a timestamp...  Now the

token looks like


create table session (

   session id      ... session id ... primary key

   user_id          ... user id ...,

   last_updated  ... timestamp ...,

   begin_time    ... timestamp ...,

   end_time       ... timestamp ...

);


   Now we've got two constants:

write the timestamp to the last_updated column.

EXPIRE_TIME: how old a timestamp is before we eliminate the session.


   You might think you could put the client ip address in the token,  and lock the

session to an ip address to make it harder to steal tokens.  I tried this,  but found out

that some of the largest ISPs (such as aol) have a proxy server that makes users seem to

"jump around".  You can do it if you know people are logging from a sane ISP,  but you

can't do it in general.


   This system can be improved in numerous ways,  such as adding anonymous sessions,  

operating in a split http/https mode,  and caching authorization system in the token.


   If you're worried about information leakage (you don't want someone to know that he

got session 88427 yesterday and 99105 today),  you can encrypt the token.  But be

careful...  It's easy to use cryptography the wrong way:  don't rely on encryption to

protect token integrity against tampering -- most of the obvious schemes don't really

work.

cookie usage:

20 per domain, 4094 characters (bytes) in the value



Page/Block object

- how to return block from driver, inherit Block methods, but also inherit Rdo_Base?

Mapper! _Mappers are the drivers_



Nag - tasks are a model

different models for different sources of tasks

so maybe horde_rdo_model isn't extension but delegate?




form helpers go into horde_view helper pack



Horde_Model:

validation:


validatesAcceptanceOf

validatesConfirmationOf



webroot has:

index.php

.htaccess

assets/ (css, images, js)

mod_rewrite rules

everything else pear-installable

make assets pear installable somehow



viewbuilder/pagebuilder - custom views

command line and web service actions (still api/method/params)



catalyst::message() - replaces logmessage - fatal, notification, observer - has a return value (?)



session object management



cms for rampage based on (replacing) ulaform + wicked + giapeto





horde_form

 - db and xml descriptions instead of just php building





reconcile


reconcile driver architecture with Rdo Models



apps provide models instead of forms?

apps provide route bundles? (if frontcontroller)

forms are models!

what do routes point to (models? mappers? views?) -> controllers

controllers handle mappers vs. models?

composite mapper? (turba, etc.)








     * browser allowing for a possible re-POST if the user clicks OK.

     * Typically this is not what you want.

     *



     *

     * {{code: php

     *     header("Cache-Control: no-store, no-cache, must-revalidate");

     *     header("Cache-Control: post-check=0, pre-check=0", false);

     *     header("Pragma: no-cache");

     * }}

     * @param int|string $code The HTTP status code to redirect with; default

     * is '303 See Other'.


     * @return void

     *

     */

    protected function _redirectNoCache($spec, $code = 303)

    {

        // reset cache-control

        $this->_response->setHeader(

            'Cache-Control',


            'post-check=0, pre-check=0',

            false

        );




        // continue with redirection

        return $this->_redirect($spec, $code);

    }


apps provide route bundles

apps provide controllers





seekable iterators?

use of ArrayIterator

adding LimitIterators and FilterIterators on top of Rdo


# Just add water: give me my prototype now!

# Donít make me think: I can do this stuff even on my dumbest days.

# DRY: making the same change 50 times is not cool.

# Anti-pasta: help me avoid spaghetti

# Security: no nasty surprises please. Help me get this right first time.

# Testing: help me protect myself against myself.



   1. centralized control over page rights and access

   2. ability to remap urls due to changes in web-site structure

   3. handling 404-errors intelligently

   4. ability to dynamically add headers and footers to pages for displaying alerts such as "system going down at 5pm"

   5. separates content from presentation in a reasonable manner, eg. with templates

   6. managing tainted data (eg. POSTS, GETS, COOKIES) 





Please pardon the provocative title, but this post is intended to

surface one point I buried in yesterday's presentation in the hopes

that by making it a separate post it will attract a wider audience.



I intend for this to post to be constructive, so I will focus on two

specific suggestions which hopefully will serve as the seed for the

development of a set of best practices for AJAX.  Here are the two

humble suggestions on things that people should standardize on:

    * the data should first be encoded as octets according to the

      UTF-8 character encoding


    * GET should never be used to initiate another operation which

      will change state


    ìIÒtÎrn‚tiÙn‡lizÊti¯nî into your tool

    and observe what comes out the other side.

When expressed as a part of the query component of a URI, it should

look like I%C3%B1t%C3%ABrn%C3%A2ti%C3%B4n%C3%A0liz%C3%A6ti%C3%B8n.



Standardizing improves interoperability, and the reason why I am

suggesting UTF-8 is that it is backwards compatible with ASCII, can

express the full range of the Unicode character set, and is widely

implemented.



Idempotency



Looking into the current PHP implementation of SAJAX, you will see the

following:



// Bust cache in the head

header ("Expires: Mon, 26 Jul 1997 05:00:00 GMT");    // Date in the past

header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");

               // always modified

header ("Cache-Control: no-cache, must-revalidate");  // HTTP/1.1

header ("Pragma: no-cache");                          // HTTP/1.0





This code should be a rather large clue that you are probably doing

something wrong.  Apparently the author recognized that these headers

are somewhat sporadically and inconsistently implemented, and hoped

that by combining them that the chances of success would be improved.



The danger that the responses may be cached is actually the smaller of

several concerns.  A much bigger concern is that unsuspecting

grandmothers and bots everywhere can be tricked into modifying online

databases simply by following a link.



Judicious use of HTTP GET can be a very good thing.  Perhaps toolkits

can adopt a convention that procedure names that start with the

characters ìGetî use GET, everything else uses POST.







meta tags to include:



<meta name="MSSmartTagsPreventParsing" content="true" />

<meta http-equiv="imagetoolbar" content="no" />


PHP_SELF

SERVER_NAME

Referer - never depend on it

passwords - don't use just md5, add a salt.




Edge cases: $_SESSION, backend databases. If you don't consider it

input, then it's part of your application for security purposes.


Never display credit card info - this means it shouldn't be

repopulated!


Filtering is _inspection_, not correction. Don't try to correct

invalid data. Casts are relatively safe but still miss simplistic

attacks.



When possible, whitelist - prove data valid. Simple list of values, or

a regexp. Everything else is bad.

Need a model for making the filtered data clearly available, and don't

touch the tainted data.



HTML, javascript, cli output, session data, rss feeds, XML, etc. Any

remote destination.



Need a clever way to integrate this into the template system! Perhaps

a content-type on variables (too much?) of text/html, text/plain,

text/xml, etc.? How about instead of tag, have text:foo, html:foo,

xml:foo? Or <tag:foo type="html">, defaulting to type="text".



Escaping MUST be charset aware. Data escaped for us-ascii might result

in JavaScript in Japanese (not necessarily a valid example).







display_errors - write a custom error handler, handle errors elegantly

& integrated with Log object.


http://phpsec.org/

http://brainbulb.com/

http://shiflett.org/

http://md5.rednoize.com/

http://www.midgard-project.org/updates/2003-05-29-000.html





* Standardized URL-to-object mapping

* Standardized object-to-application mapping

* Standardized navigational system

* Standardized object extensibility API

* Standardized way to make application output configurable


So, MidCOM is about standardizing how to build Midgard applications

and site features. Lets look at each of the points in more detail

Before MidCOM Midgard site and application developers have had to

figure out how to map URL requests into Midgard objects, typically to

topics and articles. Everybody has rolled their own solution for this,

using object names, IDs or GUIDs as the identifiers, and using either

GET parameters or active page arguments.


With MidCOM, application development doesn't any more have to start by

writing a URL parser, as the MidCOM system provides this already. URL

parsing happens completely in topic and article space, using object

names as the identifiers. This makes for very clean URLs. Consider the

following:

"spring-2003" under topic "gallery". Clean, pronounceable and easy to

use. An even better, any Midgard object instanced using a MidCOM

component is aware of its location, providing the URL through MidCOM's

metadata API.



In addition to connecting URLs to Midgard objects, URLs also need to

be connected to specific applications, or in MidCOM terms, components.



All topics in MidCOM are assigned to be managed by a component. This

means that different parts of the site can work in different ways. For

example, URL:

article "midgard-tutorial" to be handled and displayed by it.



The newsticker component can fully control the administrative

interface for managing content under it, and the output provided by

URLs it manages.


Component is selected for each topic separately. This means that

example URL:



/news/contacts/bergius.html



Could be handled by a "employee directory" component.



Standardized navigational system



Each MidCOM component provides all navigational information about

objects managed by it to a system called NAP, which is accessible by

an easy object-oriented API.



The NAP system means that site developers don't worry about different

components or object types when writing the site's navigational

interface. You can write one script for generating the whole site

navigation, and it will work with the site and any component under it.



This makes standardized navigational tools like breadcrumbs or the

NemeinNavBar utility much more useful, as they can be used with any

MidCOM-based site. I expect that in near future site developers will

have a huge library of prebuilt navigational systems to select from.



Standardized object extensibility API



Enabling content managers to define their own object types or metadata

fields has always been a problem with Midgard, meaning that any new

metadata field has forced site developers to write their own content

creation UIs.



MidCOM provides an easier system for this called datamanager. With

datamanager, site developers can define their own customer data

structures, called "layouts". Layouts are PHP arrays telling

datamanager what fields to allow for objects handled for that

component, how to present those fields in an administrative interface,

and where to store them (parameters, object fields or attachments).



Using datamanager component writers don't really have to care about

what object fields site developers will want to use, they just need to

use the datamanager utility. Data structure "layouts" can be provided

as part of the default component configuration, and can be overridden

on a per-sitegroup basis.


interface, providing customized editing forms for all components based

on widgets defined in the "layouts" configuration. The widgets can be

anything from text input boxes to a WYSIWYG editor or image upload

system.



Standardized way to make application output configurable



The MidCOM specification requires that all application output is

handled through the MidCOM style system. MidCOM's style engine is an

extension of the Midgard style engine, allowing component outputs to

be configured using style elements, but also for fallback elements to

be provided as snippets.



This means that output of any MidCOM component will be fully

configurable by site developers using the familiar Midgard style

engine. Style to be used can be defined separately for all topics,

allowing for different output styles from same components on per site

area basis.



Because components can be loaded dynamically to a Midgard page, site

developers can have different parts of the same page use different

styles, making administration of the style elements much easier.



Conclusions



MidCOM brings into Midgard something that has been lacking so far: a

"write once and run everywhere" framework for building site

components, styles and navigational tools.



This promotes component sharing and code reuse, both within a single

Midgard solution provider company, and within the international Open

Source community.



So far Midgard has provided a nice content management framework, but

actual sites have needed to be built from scratch. MidCOM promises to

change that, making Midgard much easier to implement.



Of course, sloppy coding is still possible with MidCOM, but if

component writers adher to the MidCOM specification, PEAR coding

standards and use NemeinLocalization for internationalizing their

components, we should achieve global reusability.



I invite all Midgard developers to seriously study and consider MidCOM

for their projects. There is some learning curve, but real code

reusability should repay that very quickly.





The Midgard Framework is a powerful toolkit for managing online

information. Writing applications and functionalities to the platform

is done using the easy-to-learn PHP scripting language. All

interfacing with the system is done via a regular Web browser, and no

special tools are needed for developers or content authors.



Main features of Midgard Framework include:



    * Easy and well documented Application Programming Interface (API)

    * Efficient management of Web content using a hierarchical topic system

    * Separation of layout, content and site logic

    * Support for editorial workflow and approval mechanisms

    * Attachment of metadata to all content objects

    * Management of PIM data including contacts and calendaring information

    * Multilingual support (including Unicode) and localization

    * Replication for clustered setups and staging

    * Multi-company support using virtual databases

    * Flexible user and group management 


Midgard works on most common UNIX platforms, including Linux, FreeBSD

and Solaris. Prebuilt binary packages are available for some Linux

platforms (including Red Hat, Debian and Mandrake), and the system can

be installed from sources to most other environments.


For other environments, including hosted servers and Windows systems,

there is the pure-PHP implementation, Midgard Lite.


The Midgard Application Server is free software developed

internationally with the Open Source model and distributed under the

GNU licenses. Commercial support, applications and services for the

platform are available from a range of companies worldwide.


The PHPmole toolkit provides Midgard developers with a

freely-available Integrated Development Environment (IDE) comparable

to DreamWeaver and MS Visual Studio, with additional content

management functionalities.



With the Midgard CMS package, the ease-of-use of productivity software

and office suites can be brought to Midgard content management.




query building:

// Instantiate the Query Builder for seeking MidgardArticles

$query = new MidgardQueryBuilder("MidgardArticle");

// List articles only from specific topic

$query->addConstraint("topic", "=", $topic->id);


// List only articles that have been approved since some timestamp

$query->addConstraint("approved", ">", $starting_time);



// Order the articles based on their approval time

$query->addOrder("approved", "DESC");

$query->setLimit(20);



// Start from the Nth page of this article list

$query->setOffset($_REQUEST["startfrom"]);



// Execute the query returning an array of matching MidgardArticle objects

// The MidgardArticles are the full article objects with all regular methods

$articles = $query->execute();



if (!$articles)

{

    // Handle error

}



// And then display your articles

print_r($articles);

?>



Query Builder in action

Thanks to Jukka's efforts, we have already working MidgardQueryBuilder.



Let's start with simple example.



/* Define which MgdSchema type should be used and returned by QB */

$qb = new midgardquerybuilder("NewMidgardArticle");



/* Define constraints */

$qb->addConstraint("topic", "<", 2);

$qb->addConstraint("title", "=", "News");



/* Execute SQL query and return array*/

$f = $qb->execute();



MySQL query executed:



SELECT article.id FROM article_i,article 

WHERE 

article.topic < 2 AND article_i.title = 'News' 

AND article.id=article_i.sid





As you notice, title property is defined in article_i table while topic property is defined in article table.

Query Builder follows class' tables definition and is able to search for objects which has more than one table as storage.

$qb->execute(); returned array with only one object ( due to record returned by SELECT ), so



print_r($f[0]);





 NewMidgardArticle Object

        (

             [sitegroup] => 0

            [author] => 0

            [owner] => 0

            [realm] => article

            [guid] => cedda8cb461c9f846c73f043aaf888e9

            [changed] => 

            [updated] => 

            [action] => create

            [errno] => 0

            [errstr] => 

            [id] => 28

            [calstart] => 0000-00-00



etc etc etc



Let's try to use datetime fields:





$qb = new midgardquerybuilder("NewMidgardArticle");

$qb->addConstraint("revised", ">", "2003-04-30 09:46:00");

$f = $qb->execute();



MySQL query executed:



SELECT article.id FROM article_i,article 

WHERE

article.revised > '2003-04-30 09:46:00' 

AND article.id=article_i.sid





Now $qb->execute() returned array with 5 objects. I do not want to

print'em all , so let's look at revised properties if were selected

correctly:



print_r($f);



Array(



    [0] => NewMidgardArticle Object

       

$query->setOffset($_REQUEST["startfrom"