[[toc]] + Chuck's Horde 4 Thoughts ++ Overall Design * http://www.25hoursaday.com/weblog/2008/08/04/AvoidingTheSecondSystemEffectInSoftwareDevelopment.aspx ++ Controllers Horde_Controller - make Horde_Controller_Dispatcher Horde_FrontController_Http or similar, then add other front controllers for Soap, !XmlRpc, !JsonRpc, Cli ... ++ User Interface * Get rid of popups. Any new window functionality should be in an ajax overlay, or a full new browser window. ++ API design * http://www.lornajane.net/posts/2009/Error-Feedback-for-Web-Services * http://www.lornajane.net/posts/2009/Status-Codes-for-Web-Services * http://www.lornajane.net/posts/2009/Version-Parameters-for-Web-Services ++ Documentation +++ Developer docs (PHPDoc alternatives)* http://ajaxian.com/archives/beautiful-code-documentation** http://sphinx.pocoo.org/++ Debug support* http://code.google.com/p/webgrind/* http://www.sitepoint.com/blogs/2008/05/13/useful-in-browser-development-tools-for-php/* http://badapi.trib.tv/* http://bergie.iki.fi/blog/sql-level_debugging_with_midgard.html* http://code.google.com/p/formaldehyde/* Debug "wrapper" drivers - encapsulate another driver and delegate all calls, but provide before/after hooks for any function along with timing, profiling, reporting of calls and arguments, etc. ++ URLs++ URLs ** We should sign (with a timestamp and HMAC, per Horde::signQueryString) all URLs that perform destructive actions.++ Configuration* use Horde_Policy* Allow conf.d directory styles, like Apache2 config (see http://bugs.horde.org/ticket/4747).* Use return $... in PHP config files to avoid defining local-scope variables? (http://www.urdalen.com/blog/?p=257)++ Permissions * http://blog.wolff-hamburg.de/archives/25-A-pragmatic-approach-to-rights-management.html * http://stonean.com/wiki/lockdown ++ Profiling Lots of overlap with debugging * http://docs.kohanaphp.com/libraries/profiler * See Zend_Db_Profiler, and idea for Cache profiler also, including Firebug plugins for both ++ Testing * http://www.phpunit.de/pocket_guide/3.2/en/database.html * http://sebastian-bergmann.de/archives/702-Data-Providers-in-PHPUnit-3.2.html * http://www.xaprb.com/blog/2008/08/19/how-to-unit-test-code-that-interacts-with-a-database/ * http://mikenaberezny.com/2008/10/17/php-temporary-streams/ ++ Error handling * http://derickrethans.nl/five_reasons_why_the_shutop_operator_@_should_be_avoided.php * http://eirikhoem.wordpress.com/2008/03/15/dying-with-grace-phps-register_shutdown_function/ ++ Jabber/XMPP support * http://www.danga.com/mogilefs/ * http://hadoop.apache.org/core/ * http://hadoop.apache.org/core/docs/r0.16.4/hdfs_design.html * http://en.wikipedia.org/wiki/Hadoop * http://wiki.apache.org/hadoop/ProjectDescription * http://code.google.com/p/the-cassandra-project/++ Object instantiation (sometimes known as dependency injection) * http://bergie.iki.fi/blog/midcom_3_and_context_injectors.html * http://www.sitepoint.com/blogs/2008/02/04/dealing-with-dependencies/ * http://usrportage.de/archives/897-Antipattern-the-verbose-constructor.html * http://usrportage.de/archives/904-Antipattern-chaining-stateless-protocol-requests.html <code> horde_ctx::getQueue $horde-> (no, no global variable) Horde::$queue (__getStatic?) Horde::queue() (__callStatic, include extra $args) Horde_Factory::make(...) ::makeQueue() (using __callStatic to introspect) Horde_Builder </code>++ Idea sources Don't get too caught up in everything everyone else is doing. However, some things that might be useful food for thought are listed below. Projects may be added to or discarded from this list quickly as they are synthesized. * http://kohanaphp.com/home.html * http://code.whytheluckystiff.net/camping * http://api.rubyonrails.com/ * http://toys.lerdorf.com/archives/38-The-no-framework-PHP-MVC-framework.html * http://www.xml.lt/Resources/Framework * http://cognifty.com/ * http://docs.kohanaphp.com/ * http://merbivore.com/documentation.html * http://weblog.rubyonrails.org/2009/2/1/rails-2-3-0-rc1-templates-engines-rack-metal-much-more * http://static.repoze.org/bfgdocs/ * http://oddments.org/?p=78 * http://www.brandonsavage.net/use-registry-to-remember-objects-so-you-dont-have-to/ ++ Horde 4 Administratation * http://bergie.iki.fi/blog/asgard_welcome_page_just_got_useful.html ++ Package structure * http://www.apsstandard.com/doc ++ Unsorted http://incubator.apache.org/thrift/ http://www.gearmanproject.org/ http://codeigniter.com/wiki/Modular_Extensions_-_HMVC/ http://ojay.othermedia.org/articles/keyboard.html http://www.lkozma.net/autocomplete.html http://writer.bighugelabs.com/ http://www.spread.org/ http://www.backhand.org/wackamole/ protocol-independent URLs: http://nedbatchelder.com/blog/200710.html#e20071017T215538 // return a 304 if the file hasn't been modified since the If-Modified-Since date // no point in resending all the data if the browser already has it cached if (function_exists("apache_request_headers")) { $headers = apache_request_headers(); if ($headers['If-Modified-Since']) { $ims = strtotime($headers['If-Modified-Since']); if ($ims >= $serve_data['modified_time']) { Header ("HTTP/1.0 304 Not Modified"); exit(0); } } } I just went through my first signup process that required an SMS-capable device for confirmation. It also didn't make me pick my credit card type, and instead used my country code (+1) to decide on a card detection algorithm. skip-external-locking skip-thread-priority key_buffer = 64M max_connections = 1024 max_connect_errors = 1000 max_allowed_packet = 8M table_cache = 512 sort_buffer_size = 8M read_buffer_size = 1M read_rnd_buffer_size = 2M myisam_sort_buffer_size = 64M thread_cache_size = 50 query_cache_size = 128M tmp_table_size= 1024M thread_concurrency = 12 wait_timeout = 60 interactive_timeout = 60 log_slow_queries index.php - global dispatcher how to do themes/custom templates? chain local -> app -> horde? a horde 4 installation: config/ lib/ apps/ public/ <- with app/ subdirs containing images, etc. everything routable goes in apps/ apps/ login/ help/ prefs/ admin/ etc... app name is the first part of the route > /login subdomain support route aliases - prefer PHP over XML merge Rdo and Mad into Horde_Db use subpackages or multiple *.xml for packages to avoid silliness? should have parallel web and cli configuration and installation/update tools; web requires webserver to have write access to a config/ dir and to public/; cli tools do not (if run as another user) Horde 4 app - a Horde 3.x app updated for PHP 5 and to use the latest libraries Rampage app - "RAD" (rapid application development) MVC app that uses Horde 4 /horde/page/ -> dispatcher for Rampage modules w/ views (overridable), routes, controllers, etc.? have generic views for rampage_login, rampage_admin_*, etc. configuration: config/routes.php config/routes_local.php -> do this for all config files Horde_Content_Index -> horde-wide search Random Horde Ideas mini-cms for building your own sidebar/menu/etc? - shortcuts to any bit of horde labels labels labels keywords also or just labels? probably just flexible labels "smart folders" Getting Things Done support? (other apps that do it - Tracks, Kinkless GTD, Midnight Inbox) make mnemo into more of a snippet keeper? sort of like a personal cms - or wiki. carry the encryption feature through to other kinds of content create an outliner! tags/labels for mail rename virtual folders to smart folders? too apple? freetext boolean mail searches: apples & oranges apples | oranges apples ! oranges (apples but not oranges) apples & (oranges | lemons) Event-driven apps: "Understanding and implementing this event model can free your application from the constraints of defined elements. For example, instead of applying an event listener for each link in a menu, you can assign a single listener to the menu item itself and retrieve the event target. That way you don?t need to change your script when the menu gets larger or when links get removed from it." http://yuiblog.com/blog/2007/01/17/event-plan/ tagging/instant hierarchies as specialized permission-based search RBAC what is horde? groupware? horde data services? horde data access? ui layers be the php dojo framework? or the php yui framework? see http://tigermouse.epsi.pl/ ? or, don't do desktop-like widgets? see UI design bookmarks try to rely only on thread-safe extensions? reduce dependency tree avoid globals and non horde-namespaced functions/methods in framework and core app code class-based registry apisagainst edge cases: http://www.bakesalehq.com/contents/show/12/featuresagainst edge cases: http://www.bakesalehq.com/contents/show/12/ features from Prado? http://www.urdalen.com/blog/?p=198 use functions where appropriate for shortcuts/helpers, like Mike's t("translated string") function? but would be horde_t? would call configured translation system helper sets for dojo, protaculous, yui - simple functions like dojo_editor(), dojo_pane(), yui_map(), etc. Load with something like Horde/Layout/Helpers/YUI.php, etc. See http://www.ngcoders.com/projax/ Horde as a set of apps and methodology needs to pick a js lib, pick a template methodology, etc. - this is Rampage Horde as a framework can allow for flexibility To make it even better, separate the control logic from the presentation. That way, back could be reverse, etc. I do this in all my forms since application logic and presentation "word play" are two distinct things to me. This is what I use: <form method="post" action="form.php"> <input name="submit[back]" value="reverse" type="submit" /> <input name="submit[next]" value="speed ahead" type="submit" /> <input name="submit[home]" value="no place like home" type="submit" /> </form> Then, you can have a simple routine that captures submit actions regardless of the presentation value. You check for the array submit -- count 1 and whitelist against the acceptable values. A multi-row table can expand upon the theme by using this: submit[edit_3], submit[delete_3]m submit[edit_5], etc. caching make sure Rdo and other services allow dropping in caching rules http://sebastian-bergmann.de/pages/talks.html phpunit - @test markup in methods phpunit + selenium cruise control? really hope google will integrate any product of theirs with any other products of theirs? receive an email, transform it to document, add spreadheet, add notes, add bookmarks saved from search history and a link to an event in calendar anyone? From nyphp-talk: The other day I had to get an application started in a hurry. It's doing something useful at < 700 lines, but I'm considering options that could grow it out to about 10 times that. It depends on a "core library" that's < 500 lines. This library deals with common issues in string handling, parameter handling, and HTML form generation. About 10% of the application, or 70 lines, is a microframework that's loosely built on Struts. About 20 of those lines are in 2 functions which would be generally useful for microframeworks (such as file_exists_in_include_path()). Like Struts, the microframework chooses an "action" based on form parameters: the action then chooses a "view" -- a "view" is basically a template that a designer can edit which can be supplemented by an optional "query" which pulls stuff out of the database. Like Ruby-on-Rails, the microframework uses convention instead of configuration: the dispatcher computes an "action name" based on query parameters, and uses that to compute a filename... It checks that the file exists and executes it with the "require method". The microframework uses no object-oriented techniques. That's not because I have any antipathy to OO, but because I didn't need it, and I like writing my actions, queries, and views in a style that "feels like PHP". Yes, my microframework is nowhere near as powerful as CakePHP or Symfony. Yet, it's more flexible, because I can codesign it with my application. Because it's so simple, I can easily adapt it to do what I want. If I decide I really hate it, I can write a new one in an hour. I'm an expert on it, because I developed it, and I wouldn't have to take on the technical, social and emotional burdens of "forking" an open-source codebase if I wanted to make a change in direction. I'm moving towards a vision of web app architecture where we move towards shared vocabulary and standardized interfaces. Rather than working with a "comprehensive framework" that does everything, I'd like to have a "framework construction set" that contains a number of elements that I can take or leave." Resources: http://www.ryandaigle.com/articles/2006/06/30/whats-new-in-edge-rails-activeresource-is-here mixins: http://www.symfony-project.com/book/trunk/17-Extending-Symfony http://dataspill.org/pages/projects/ruby-activeldap SPL features: Regex Iterators, SplFileObject CSV support, Caching Iterator Data: stream support set date.timezone ini setting automatically based on user? 'Bounce' takes the currently selected emails and sends back an email to the addresses the email(s) came from saying basically that 'the email address does not exist' in a standard internet email protocol way. Some more organised spammers remove these from their lists. After sending the bounce response, the messages are deleted." * If accessed with a browser, public folder is also a personal web-site, accessible at http://username.fastmail.fm * Provide tool allowing synchronization of Outlook Express etc address book with FastMail contacts, possibly using LDAP Access Control: Under Preferences, there is a "Grant Access" link for the calendar, addressbook, infolog, and projects. It allows you to select Read, Add, Edit, Delete, and Private access for each group and each user. Again, very flexible. Categories: Multiple category selection is allowed in the addressbook, projects, calendar and infolog. Custom Fields: I can create custom fields. PHP_SELF Executive summary: PHP_SELF intentionally includes extra URL garbage (or valuable URL variables, take your pick) tacked on by the user. Don't use it without knowing what it does. Here's what you get when you hit the URL: http://example.com/info.php/testing1?testing2 : _SERVER["REQUEST_URI"] /info.php/testing1?testing2 _SERVER["PHP_SELF"] /info.php/testing1 _SERVER["SCRIPT_NAME"] /info.php Get it? If you don't want that extra stuff tacked on by the user, use the correct _SERVER variable. If you use REQUEST_URI or PHP_SELF, be aware the user can affect the contents of that variable. 99% of the time, you want SCRIPT_NAME, not PHP_SELF. By the way, here's another test: http://example.com/info.php/testing<script>?testing : _SERVER["REQUEST_URI"] /info.php/testing%3Cscript%3E?testing _SERVER["PHP_SELF"] /info.php/testing<script> _SERVER["SCRIPT_NAME"] /info.php Note that the REQUEST_URI variable, which comes from Apache, is encoded, while the PHP_SELF variable, which comes from PHP, is not. So PHP 5.2.0 still makes it possible to shoot yourself in the foot, and as I've pointed out below, well-known PHP authorities actually recommend that you do so. Subject: Re: [nyphp-talk] $_SERVER['PHP_SELF'} not working? Date: Friday 22 July 2005 12:05 pm From: Michael Sims <jellicle@gmail.com> To: NYPHP Talk <talk@lists.nyphp.org> On Thursday 21 July 2005 17:16, Dan Cech wrote: You could put: $_SERVER['PHP_SELF'] = $_SERVER['SCRIPT_NAME']; into one of your common include files. Yes. I'm afraid I don't understand this entire thread. Apparently because of the numerous PHP developer articles recommending it, and because of the php.net page which for whatever reason lists it first on the list of predefined variables, people are using PHP_SELF when they really want SCRIPT_NAME. SCRIPT_NAME solves all the problems mentioned in this thread - it's just the script name, without any extra garbage that might be tacked on by the user. PHP_SELF explicitly includes that extra garbage, so solutions in this thread that involve stripping the garbage off of PHP_SELF to make it safe are really, really missing the point - just use SCRIPT_NAME instead. Please don't use FORM ACTION=""; according to the spec, what the browser does with that is undefined, so even if it works in current browsers, it might not work in future ones. People can be forgiven for making this mistake -- I'm here holding my copy of _Learning PHP 5_, and it recommends on page 8 and again on page 86 the use of PHP_SELF for self-referencing forms, ahem -- but it's time to put it to bed: PHP_SELF is unsafe for any usage where it is echoed back to the page. SESSIONS: I'll try to reply to this and some other people who replied to my previous message. I'll start with my background. I've often been the person who the buck stops with -- somebody else develops an application that almost works (perhaps even puts it in production) and then I have to clean up the mess. The app might be written in PHP, Java, Cold Fusion, Perl, you name it. I've learned to see session variables as a "bad smell". When I develop my own applications, I use cookies for personalization and caching. I use the authentication system described in this mechanism can carry a "session id", which in turn can be used a key against application state stored in a relational database. I think through the boundary cases, and find that my greenfield apps behave predictably -- my only woe is that you'll discover that browsers have a lot of undocumented behavior connected with cookies, form handling, and caching. All problems that you still need to fight with if you use sessions, see the comments for usability and security: recent studies show that 80% of web applications have serious security problems http://www.whitehatsec.com/home/resources/presentations/files/wh_security_stats_webinar.pdf http://www.useit.com/ Perhaps the top 20% of programmers can write applications with $_SESSION that don't have serious security and usability problems, but what about the other 80%? (1) Session variables are treacherous. Odd things can happen in boundary cases, such as when sessions expire, or when you are targeted by session fixation attacks. I've looked at many apps that use sessions that seem to be working... Until you walk away for two hours, come back, and discover that you're logged in as somebody else. I suppose I could have spent hours or days tracking down an intermittent problem, which involved some confluence of browser oddness (IE was fine, Firefox was screwy), the behavior of the session system, and crooked logic in the application. Or I could use cryptographically signed cookies to implement an authentication system which won't give me surprises in the future. other 5% right requires a deep understanding of state and statelessness on the web... Which is what (many) people are trying to avoid when they use $_SESSION variables. There are more than twenty configuration variables that affect the way sessions work under PHP. Incorrect configuration of any of these can cause applications to fail, often in intermittent ways. The use of a custom session handler can have unpredictable effects on security, reliability and performance. Other languages are a lot worse than PHP -- the use of the "scope" concept in languages such as Cold Fusion and Tango makes it easy to use a session variable without realizing it... Resulting in an application that "works" sometimes, but fails in mysterious ways. (2) Session variables are bound to a particular language. In the real world, I work with legacy systems that might be written in other languages. I might have some old pages in Cold Fusion that work just fine, and I won't rework them in PHP until I've got a good reason. If users can set a customization parameter, such as the background of a page, it's easy to write a cookie that all languages can read. Applications stuck in the session variable roach motel aren't as maintainable and portable. (3) PHPSESSID. Do I need to say more? I consider the client that wants user tracking and can't accept cookies, so all the pages on their site look like http://www.example.com/about_us.php?PHPSESSID=**pseudo-random blob** Three months later they come back and wonder why their site isn't being indexed in Google. Yes, there's a saner way to use this feature, but this "cure" to privacy violation is worse than the cookie "disease", since session ids will leak out through referrers, bookmarks, links that people cut-and-pate... (4) The back button. When somebody asks a question about sessions on a forum, they'll usually ask another question a few days or weeks later: "How do I disable the back button?" The underlying problem is a deep aspect of the structure of the web. There is certain state information that's particular to a request (GET and POST variables) and certain state information that has a more persistent scope (cookies, session information, a relational database.) The back button makes it possible for these two things to get out of sync. the complete state of the application in form variables. Applications that use this pattern always work perfectly with the back button. This pattern doesn't work always (hitting the back button shouldn't cancel your order on an e-commerce site), but it works often... For instance, you can use hidden variables to hold onto form variables for complicated forms that spread over several pages, (5) Multiple windows. I think it's a human right to be able to have more than one window open on a web site. If I'm shopping, for instance, I'd like to be able to look at two products simultaneously. An application that keeps state in form variables doesn't care how many you have open. If you're looking for jobs at an organization that uses taleo.net's software, you'll find that it uses trickery to prevent you from having more than one window open... So you can't look at two jobs at once, or look at the job description while you're filling out the application. I suspect that they did this because they don't want to spend forever debugging "race conditions" that could be caused by a user acting in two windows simultaneously. session for each page displayed. This hurts the performance of pages that use dynamically generated images and Javascript, and can mysteriously deadlock AJAX applications. on particulars. Sessions can be lightning-fast in systems that keep them in RAM, such as Java and Cold Fusion. The default session handler in PHP uses files, and is probably faster than a relational database in a direct comparison: however, the session handler will load all of the data into RAM, whereas a relational implementation may only need to load information when it's needed. Keeping information in POST variables or cookies also involves a tradeoff -- this is as scalable as it gets so far as server resources, but requires that the state be passed back and forth between the browser and server. This is no big deal if the state is 500 bytes. It's unacceptable if the state is 500 megabytes. In most cases, it starts looking expensive when we're passing an extra 10k-100k around. I've recently been working on a legacy app that contains a query (select a subset of items) and reporting (display user-selected fields of those items) function. The interface between those modules is simple: the query system passes a comma-separated list of item identifiers to the reporting system. I like this, because it meant that one system could be changed without affecting the other. I had to update the app so it would work with a changed database schema, so both sides needed some work. I discovered that the app was passing the item list as a session variable. This worked: unless I was using the application in two windows at a time. In that case, a query in one window would change the report delivered in another window. I thought about it, and realized that in this case, result sets would always be under about 10k, and usually be around 1k. Therefore, it made sense to pass this as a hidden variable in the form and ditch the session variable. This shows the kind of problems that regularly turn up in the applications that developers "throw over the wall" to testers and clients. Choose a session variable, and your application behaves mysteriously for a user who didn't respect the "one window at a time" assumption you made. Passing hidden variables in forms, on the other hand, might work OK when you're testing with a small data set over a LAN, but could rapidly become a performance nightmare for dialup users using a production database. Performance can be improved in a number of ways: for instance, by delta-sigma compressing the item list, or creating a "form scope" variable that's keyed against a unique identifier in the form. Either way, quality web applications take quality thought. (7) Lack of engineered application state: Engineered Application State is the gem of database-backed web applications. If you keep the state of your application in a relational database, you need to ~design~ the state of your application. You need to ~think~ every time you add or change a table in your relational database. You can add a new variable to your application as easily as typing '$'. Desktop apps keep the application state in a tangle of pointers. C and C++ applications tend to contain 5 or more defects per thousand lines of code. Errors show up in data structures over time, just as mutations occur in your cells. Memory leaks, application hangs, and crashes are cancers caused by these mutations. don't accumulate errors over time. Web application environments such as Java and Cold Fusion that involve a long-running process regularly hang or crash and require restarts. When is the last time you've had to restart PHP? $_SESSION["logged-in"]=false; in another, introducing unpredictable behavior and security holes. A relational database will give you an error if you try something like that. ------------- Can users of $_SESSION avoid the seven deadly sins? Yes.Yes. InIn practice they don't. Paul, That looks like a lot of info to digest without specific examples. Is there a book or other resource on session management that you recommend that deals with these issues in more detail? Thanks. -Leo I'm not aware of one, but I wish there was. I think the question isn't so much "session management" but about how to manage state in a stateless protocol -- sessions are one abstraction for doing that, but other abstractions exist too. For instance, there's the pattern of "Stateless Server" -- the complete state of the application (or subsystem thereof) is kept in hidden POST and GET variables. You accept some limits, but get some real benefits: infinite scalability, no headaches with the back button, no need for cookies... form variables... People are complaining that your app is slow. Now you can generate a unique id each time you draw a form ("Generated Form Scope", for lack of a better term.) You can stuff your "hidden" variables into the database under this key, and restore them when the key comes back... If your code is organized right (does something like $vars=$_POST, and only looks at $vars afterwards), you can do this transparently to the rest of your app. you can at least stop people from submitting the same form more than once, by checking to see if a form with that unique id has been submitted before. "Shopping Cart" is another pattern. People often use session variables to handle shopping carts, but that's really not ideal from a user interface perspective... Ideally, each instance of a shopping cart has it's own unique id... Imagine we want to make an e-commerce site that behaves like amazon.com: (1) User visits e-commerce site from a home computer -- a long-term tracking cookie gets stuck on their browser (2) User adds item A to their shopping cart... A new shopping cart is created with id #101, associated with the tracking cookie. (3) User adds items B,C,D, and E to their shopping cart in the course of 30 minutes of browsing. Each time an item is added, we add a row to a table in the database that links the item id to the shopping cart id. (4) 4-year old hits reset button (5) User comes back to e-commerce site... He's happy to find his cart is still there. User creates account #202 to check out. Shopping cart #101 is associated with account #202 (6) User checks out shopping cart. (7) User comes back a week later, wants to buy a few more items. The site recognizes who he is. He adds two of item A and an item F to a newly created shopping cart with id #102, associated with user account #202. (8) User goes to work, logs in... The system sees that he has shopping cart #102 open. He adds item G, and then checks out. (9) User learns that he can trust this site to work correctly and becomes a loyal customer. It's nice that we've got a historical record of the shopping cart after the fact, but there's a more important point -- we could have lost the customer's dollar at many points in the above transaction if we were using a $_SESSION based cart. The session wouldn't have survived step 4, for instance. A good user interface isn't academic here... It puts money in our pocket. The above scenario is complex, and it might not be fair to expect that a first-generation shopping cart has those features. A $_SESSION-based shopping cart would need to be completely reworked to add the features above. A cart that uses a unique "cart id" and relational back end, will be a lot more maintainable... You could even start out using $_SESSION to keep track of the "cart id", then keep it in a cookie, then associate it with a user name, add the facility to promote an anonymous cart to an authenticated cart and so on. Starting with a good design, we can provide the interface that we ~want~ to provide, not that one that our abstract layer ~forces~ us to provide. In regards to slides 29 and 30, can you elaborate and give a more detailed example what they are trying to say? Are they saying that the session key should contain a hash of the data? Or does the hash become the "salt" in crypting the data? Finally, how does doing that make it easier to prevent circumvention and forgeability. Let's take it a step at a time... Imagine we've got a token of the following format... $token="$user_id:$session_id" The session_id doesn't have to be unpredictable -- it could could from an auto_increment column in a database table... With the caveat that people could estimate the usage of your site by looking at the session id's. have users who knew how to look at or change the cookies. An attacker who understands cookies can easily change the user id, or session_id. $hash=sha1($token); $signed_token="$hash:$token"; We could check the integrity of the token by recomputing the hash and see if it matches the one in the signed token. This protects against accidental damage, or very simple attacks. Still, it's quite possible that an attacker could guess what you're doing: it wouldn't be safe at all in an open source system. That's where the salt comes in... For a particular web site, we create a random "salt" that, effectively, gives us a unique hash function for our web site. function private_hash($token) { global $salt; return sha1("$salt:$token"); } $private_hash=sha1("$salt:$token"); $signed_token="$private_hash:$token"; somebody has logged in -- you don't need to look at the database or keep ~any~ server side state. This makes it a highly scalable system... This basic approach is used on some of the biggest sites in the world, such as yahoo.com. Nothing stops a person from saving his token and presenting later -- after his account may have been deactivated, or after associated session information has been purged (an error condition.) An attacker that gets the person's cookie jar, or who intercepts network traffic, can also steal the token. It's not possible to completely protect against sophisticated attacks where a hostile party controls your network without installing complex software on both ends, and solving some intrinsically difficult problems having to do with mutual authentication. Let's just say that the developers of SSL have solved these problems, and that you We can, however, make replay attacks a lot harder by adding a timestamp... Now the token looks like create table session ( session id ... session id ... primary key user_id ... user id ..., last_updated ... timestamp ..., begin_time ... timestamp ..., end_time ... timestamp ... ); Now we've got two constants: write the timestamp to the last_updated column. EXPIRE_TIME: how old a timestamp is before we eliminate the session. You might think you could put the client ip address in the token, and lock the session to an ip address to make it harder to steal tokens. I tried this, but found out that some of the largest ISPs (such as aol) have a proxy server that makes users seem to "jump around". You can do it if you know people are logging from a sane ISP, but you can't do it in general. This system can be improved in numerous ways, such as adding anonymous sessions, operating in a split http/https mode, and caching authorization system in the token. If you're worried about information leakage (you don't want someone to know that he got session 88427 yesterday and 99105 today), you can encrypt the token. But be careful... It's easy to use cryptography the wrong way: don't rely on encryption to protect token integrity against tampering -- most of the obvious schemes don't really work. cookie usage: 20 per domain, 4094 characters (bytes) in the value Page/Block object - how to return block from driver, inherit Block methods, but also inherit Rdo_Base? Mapper! _Mappers are the drivers_ Nag - tasks are a model different models for different sources of tasks so maybe horde_rdo_model isn't extension but delegate? form helpers go into horde_view helper pack Horde_Model: validation: validatesAcceptanceOf validatesConfirmationOf webroot has: index.php .htaccess assets/ (css, images, js) mod_rewrite rules everything else pear-installable make assets pear installable somehow viewbuilder/pagebuilder - custom views command line and web service actions (still api/method/params) catalyst::message() - replaces logmessage - fatal, notification, observer - has a return value (?) session object management cms for rampage based on (replacing) ulaform + wicked + giapeto horde_form - db and xml descriptions instead of just php buildingreconcilereconcile driver architecture with Rdo Models apps provide models instead of forms? apps provide route bundles? (if frontcontroller) forms are models! what do routes point to (models? mappers? views?) -> controllers controllers handle mappers vs. models? composite mapper? (turba, etc.) * browser allowing for a possible re-POST if the user clicks OK. * Typically this is not what you want. * * * {{code: php * header("Cache-Control: no-store, no-cache, must-revalidate"); * header("Cache-Control: post-check=0, pre-check=0", false); * header("Pragma: no-cache"); * }} * @param int|string $code The HTTP status code to redirect with; default * is '303 See Other'. * @return void * */ protected function _redirectNoCache($spec, $code = 303) { // reset cache-control $this->_response->setHeader( 'Cache-Control', 'post-check=0, pre-check=0', false ); // continue with redirection return $this->_redirect($spec, $code); } apps provide route bundles apps provide controllers seekable iterators? use of ArrayIterator adding LimitIterators and FilterIterators on top of Rdo # Just add water: give me my prototype now! # Donít make me think: I can do this stuff even on my dumbest days. # DRY: making the same change 50 times is not cool. # Anti-pasta: help me avoid spaghetti # Security: no nasty surprises please. Help me get this right first time. # Testing: help me protect myself against myself. 1. centralized control over page rights and access 2. ability to remap urls due to changes in web-site structure 3. handling 404-errors intelligently 4. ability to dynamically add headers and footers to pages for displaying alerts such as "system going down at 5pm" 5. separates content from presentation in a reasonable manner, eg. with templates 6. managing tainted data (eg. POSTS, GETS, COOKIES) Please pardon the provocative title, but this post is intended to surface one point I buried in yesterday's presentation in the hopes that by making it a separate post it will attract a wider audience. I intend for this to post to be constructive, so I will focus on two specific suggestions which hopefully will serve as the seed for the development of a set of best practices for AJAX. Here are the two humble suggestions on things that people should standardize on: * the data should first be encoded as octets according to the UTF-8 character encoding * GET should never be used to initiate another operation which will change state ìIÒtÎrntiÙnlizÊti¯nî into your tool and observe what comes out the other side. When expressed as a part of the query component of a URI, it should look like I%C3%B1t%C3%ABrn%C3%A2ti%C3%B4n%C3%A0liz%C3%A6ti%C3%B8n. Standardizing improves interoperability, and the reason why I am suggesting UTF-8 is that it is backwards compatible with ASCII, can express the full range of the Unicode character set, and is widely implemented. Idempotency Looking into the current PHP implementation of SAJAX, you will see the following: // Bust cache in the head header ("Expires: Mon, 26 Jul 1997 05:00:00 GMT"); // Date in the past header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT"); // always modified header ("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1 header ("Pragma: no-cache"); // HTTP/1.0 This code should be a rather large clue that you are probably doing something wrong. Apparently the author recognized that these headers are somewhat sporadically and inconsistently implemented, and hoped that by combining them that the chances of success would be improved. The danger that the responses may be cached is actually the smaller of several concerns. A much bigger concern is that unsuspecting grandmothers and bots everywhere can be tricked into modifying online databases simply by following a link. Judicious use of HTTP GET can be a very good thing. Perhaps toolkits can adopt a convention that procedure names that start with the characters ìGetî use GET, everything else uses POST. meta tags to include: <meta name="MSSmartTagsPreventParsing" content="true" /> <meta http-equiv="imagetoolbar" content="no" /> PHP_SELF SERVER_NAME Referer - never depend on it passwords - don't use just md5, add a salt. Edge cases: $_SESSION, backend databases. If you don't consider it input, then it's part of your application for security purposes. Never display credit card info - this means it shouldn't be repopulated! Filtering is _inspection_, not correction. Don't try to correct invalid data. Casts are relatively safe but still miss simplistic attacks. When possible, whitelist - prove data valid. Simple list of values, or a regexp. Everything else is bad. Need a model for making the filtered data clearly available, and don't touch the tainted data. HTML, javascript, cli output, session data, rss feeds, XML, etc. Any remote destination. Need a clever way to integrate this into the template system! Perhaps a content-type on variables (too much?) of text/html, text/plain, text/xml, etc.? How about instead of tag, have text:foo, html:foo, xml:foo? Or <tag:foo type="html">, defaulting to type="text". Escaping MUST be charset aware. Data escaped for us-ascii might result in JavaScript in Japanese (not necessarily a valid example). display_errors - write a custom error handler, handle errors elegantly & integrated with Log object. http://phpsec.org/ http://brainbulb.com/ http://shiflett.org/ http://md5.rednoize.com/ http://www.midgard-project.org/updates/2003-05-29-000.html * Standardized URL-to-object mapping * Standardized object-to-application mapping * Standardized navigational system * Standardized object extensibility API * Standardized way to make application output configurable So, MidCOM is about standardizing how to build Midgard applications and site features. Lets look at each of the points in more detail Before MidCOM Midgard site and application developers have had to figure out how to map URL requests into Midgard objects, typically to topics and articles. Everybody has rolled their own solution for this, using object names, IDs or GUIDs as the identifiers, and using either GET parameters or active page arguments. With MidCOM, application development doesn't any more have to start by writing a URL parser, as the MidCOM system provides this already. URL parsing happens completely in topic and article space, using object names as the identifiers. This makes for very clean URLs. Consider the following: "spring-2003" under topic "gallery". Clean, pronounceable and easy to use. An even better, any Midgard object instanced using a MidCOM component is aware of its location, providing the URL through MidCOM's metadata API. In addition to connecting URLs to Midgard objects, URLs also need to be connected to specific applications, or in MidCOM terms, components. All topics in MidCOM are assigned to be managed by a component. This means that different parts of the site can work in different ways. For example, URL: article "midgard-tutorial" to be handled and displayed by it. The newsticker component can fully control the administrative interface for managing content under it, and the output provided by URLs it manages. Component is selected for each topic separately. This means that example URL: /news/contacts/bergius.html Could be handled by a "employee directory" component. Standardized navigational system Each MidCOM component provides all navigational information about objects managed by it to a system called NAP, which is accessible by an easy object-oriented API. The NAP system means that site developers don't worry about different components or object types when writing the site's navigational interface. You can write one script for generating the whole site navigation, and it will work with the site and any component under it. This makes standardized navigational tools like breadcrumbs or the NemeinNavBar utility much more useful, as they can be used with any MidCOM-based site. I expect that in near future site developers will have a huge library of prebuilt navigational systems to select from. Standardized object extensibility API Enabling content managers to define their own object types or metadata fields has always been a problem with Midgard, meaning that any new metadata field has forced site developers to write their own content creation UIs. MidCOM provides an easier system for this called datamanager. With datamanager, site developers can define their own customer data structures, called "layouts". Layouts are PHP arrays telling datamanager what fields to allow for objects handled for that component, how to present those fields in an administrative interface, and where to store them (parameters, object fields or attachments). Using datamanager component writers don't really have to care about what object fields site developers will want to use, they just need to use the datamanager utility. Data structure "layouts" can be provided as part of the default component configuration, and can be overridden on a per-sitegroup basis. interface, providing customized editing forms for all components based on widgets defined in the "layouts" configuration. The widgets can be anything from text input boxes to a WYSIWYG editor or image upload system. Standardized way to make application output configurable The MidCOM specification requires that all application output is handled through the MidCOM style system. MidCOM's style engine is an extension of the Midgard style engine, allowing component outputs to be configured using style elements, but also for fallback elements to be provided as snippets. This means that output of any MidCOM component will be fully configurable by site developers using the familiar Midgard style engine. Style to be used can be defined separately for all topics, allowing for different output styles from same components on per site area basis. Because components can be loaded dynamically to a Midgard page, site developers can have different parts of the same page use different styles, making administration of the style elements much easier. Conclusions MidCOM brings into Midgard something that has been lacking so far: a "write once and run everywhere" framework for building site components, styles and navigational tools. This promotes component sharing and code reuse, both within a single Midgard solution provider company, and within the international Open Source community. So far Midgard has provided a nice content management framework, but actual sites have needed to be built from scratch. MidCOM promises to change that, making Midgard much easier to implement. Of course, sloppy coding is still possible with MidCOM, but if component writers adher to the MidCOM specification, PEAR coding standards and use NemeinLocalization for internationalizing their components, we should achieve global reusability. I invite all Midgard developers to seriously study and consider MidCOM for their projects. There is some learning curve, but real code reusability should repay that very quickly. The Midgard Framework is a powerful toolkit for managing online information. Writing applications and functionalities to the platform is done using the easy-to-learn PHP scripting language. All interfacing with the system is done via a regular Web browser, and no special tools are needed for developers or content authors. Main features of Midgard Framework include: * Easy and well documented Application Programming Interface (API) * Efficient management of Web content using a hierarchical topic system * Separation of layout, content and site logic * Support for editorial workflow and approval mechanisms * Attachment of metadata to all content objects * Management of PIM data including contacts and calendaring information * Multilingual support (including Unicode) and localization * Replication for clustered setups and staging * Multi-company support using virtual databases * Flexible user and group management Midgard works on most common UNIX platforms, including Linux, FreeBSD and Solaris. Prebuilt binary packages are available for some Linux platforms (including Red Hat, Debian and Mandrake), and the system can be installed from sources to most other environments. For other environments, including hosted servers and Windows systems, there is the pure-PHP implementation, Midgard Lite. The Midgard Application Server is free software developed internationally with the Open Source model and distributed under the GNU licenses. Commercial support, applications and services for the platform are available from a range of companies worldwide. The PHPmole toolkit provides Midgard developers with a freely-available Integrated Development Environment (IDE) comparable to DreamWeaver and MS Visual Studio, with additional content management functionalities. With the Midgard CMS package, the ease-of-use of productivity software and office suites can be brought to Midgard content management. query building: // Instantiate the Query Builder for seeking MidgardArticles $query = new MidgardQueryBuilder("MidgardArticle"); // List articles only from specific topic $query->addConstraint("topic", "=", $topic->id); // List only articles that have been approved since some timestamp $query->addConstraint("approved", ">", $starting_time); // Order the articles based on their approval time $query->addOrder("approved", "DESC"); $query->setLimit(20); // Start from the Nth page of this article list$query->setOffset($_REQUEST["startfrom"]); // Execute the query returning an array of matching MidgardArticle objects // The MidgardArticles are the full article objects with all regular methods $articles = $query->execute(); if (!$articles) { // Handle error } // And then display your articles print_r($articles); ?> Query Builder in action Thanks to Jukka's efforts, we have already working MidgardQueryBuilder. Let's start with simple example. /* Define which MgdSchema type should be used and returned by QB */ $qb = new midgardquerybuilder("NewMidgardArticle"); /* Define constraints */ $qb->addConstraint("topic", "<", 2); $qb->addConstraint("title", "=", "News"); /* Execute SQL query and return array*/ $f = $qb->execute(); MySQL query executed: SELECT article.id FROM article_i,article WHERE article.topic < 2 AND article_i.title = 'News' AND article.id=article_i.sid As you notice, title property is defined in article_i table while topic property is defined in article table. Query Builder follows class' tables definition and is able to search for objects which has more than one table as storage. $qb->execute(); returned array with only one object ( due to record returned by SELECT ), so print_r($f[0]); NewMidgardArticle Object ( [sitegroup] => 0 [author] => 0 [owner] => 0 [realm] => article [guid] => cedda8cb461c9f846c73f043aaf888e9 [changed] => [updated] => [action] => create [errno] => 0 [errstr] => [id] => 28 [calstart] => 0000-00-00 etc etc etc Let's try to use datetime fields: $qb = new midgardquerybuilder("NewMidgardArticle"); $qb->addConstraint("revised", ">", "2003-04-30 09:46:00"); $f = $qb->execute(); MySQL query executed: SELECT article.id FROM article_i,article WHERE article.revised > '2003-04-30 09:46:00' AND article.id=article_i.sid Now $qb->execute() returned array with 5 objects. I do not want to print'em all , so let's look at revised properties if were selected correctly: print_r($f); Array( [0] => NewMidgardArticle Object$query->setOffset($_REQUEST["startfrom"