[[toc]] + Chuck's Horde 4 Thoughts ++ Debug support * http://code.google.com/p/webgrind/ * http://www.sitepoint.com/blogs/2008/05/13/useful-in-browser-development-tools-for-php/ * http://bergie.iki.fi/blog/sql-level_debugging_with_midgard.html * Debug "wrapper" drivers - encapsulate another driver and delegate all calls, but provide before/after hooks for any function along with timing, profiling, reporting of calls and arguments, etc. ++ Profiling Lots of overlap with debugging * http://docs.kohanaphp.com/libraries/profiler ++ Testing * http://www.phpunit.de/pocket_guide/3.2/en/database.html * http://sebastian-bergmann.de/archives/702-Data-Providers-in-PHPUnit-3.2.html ++ Idea sources * http://kohanaphp.com/home.html * http://code.whytheluckystiff.net/camping * http://api.rubyonrails.com/ * http://toys.lerdorf.com/archives/38-The-no-framework-PHP-MVC-framework.html * http://www.xml.lt/Resources/Framework * http://cognifty.com/ * http://docs.kohanaphp.com/` ++ Horde 4 Administratation * http://bergie.iki.fi/blog/asgard_welcome_page_just_got_useful.html++ Horde 4 Administratation * http://bergie.iki.fi/blog/asgard_welcome_page_just_got_useful.html++ Unsorted http://www.sitepoint.com/blogs/2008/02/04/dealing-with-dependencies/ http://codeigniter.com/wiki/Modular_Extensions_-_HMVC/ http://ojay.othermedia.org/articles/keyboard.html http://www.lkozma.net/autocomplete.html http://writer.bighugelabs.com/ protocol-independent URLs: http://nedbatchelder.com/blog/200710.html#e20071017T215538 using register_shutdown_function to show a nicer error page on fatal errors: http://eirikhoem.wordpress.com/2008/03/15/dying-with-grace-phps-register_shutdown_function/ // return a 304 if the file hasn't been modified since the If-Modified-Since date // no point in resending all the data if the browser already has it cached if (function_exists("apache_request_headers")) { $headers = apache_request_headers(); if ($headers['If-Modified-Since']) { $ims = strtotime($headers['If-Modified-Since']); if ($ims >= $serve_data['modified_time']) { Header ("HTTP/1.0 304 Not Modified"); exit(0); } } } horde apps - "instance" of a horde app == installed horde app + a group of horde_policies that configure it let those policies be named instead of using shipping/foo api calls, use $instance->foo() $Horde->api->method() (chaining)? I just went through my first signup process that required an SMS-capable device for confirmation. It also didn't make me pick my credit card type, and instead used my country code (+1) to decide on a card detection algorithm. update_client.pl /modules/future_contribution /modules/future_signup I think I found now the right mysql-server settings, with which the performance is quite Ok. Increasing the sort_buffer_size was one of the changes that helped. skip-external-locking skip-thread-priority key_buffer = 64M max_connections = 1024 max_connect_errors = 1000 max_allowed_packet = 8M table_cache = 512 sort_buffer_size = 8M read_buffer_size = 1M read_rnd_buffer_size = 2M myisam_sort_buffer_size = 64M thread_cache_size = 50 query_cache_size = 128M tmp_table_size= 1024M thread_concurrency = 12 wait_timeout = 60 interactive_timeout = 60 log_slow_queries add dynamic finders (find_by_name, find_by_id, etc.) to Rdo Mappers or Horde_Db_Model or whatever Controller classes/objects vs. Action classes/objects vs. Resources vs. API how to develop? give up central config? http://www.w3.org/Provider/Style/URI index.php - global dispatcher how to do themes/custom templates? chain local -> app -> horde? a horde 4 installation: config/ lib/ apps/ public/ <- with app/ subdirs containing images, etc. everything routable goes in apps/ apps/ login/ help/ prefs/ admin/ etc... ... auto-install web files to a writable dir, either in web ui or in cli? keep apps self-contained that way? app name is the first part of the route > /login subdomain support route aliases an app should uncompress over a horde/ dir - /config/app/*.php -> config dir is compiled/cached horde is not rails. it is designed as a container for multiple, collaborating apps horde apps are configured by Horde_Policy objects need Horde_Db, whatever implements DML, DDL, and SQL - Mad? MDB2? - prefer PHP over XML merge Rdo and Mad into Horde_Db allow for overriding the mappers so that non-SQL can be used, but, default to SQL/sqlite and leverage it framework repository/module Horde/lib/... Rampage/lib/... use subpackages or multiple *.xml for packages to avoid silliness? apps should be installable into a horde container. shouldn't be tied to the app name - keep imp, krono, etc, but install as mail, cal, events (should be able to install two versions of krono w/ different permissions - see HordeSpaces) installing gives a slug, that slug manages config, templates, themes, perms, etc. ---- figure out how to merge luxor into Chora ---- for now, build Horde_Content_* based on Rdo, then move to Horde_Db Horde_Db provides Horde_Db_Mapper which creates Horde_Model_Base objects apps have a config/ dir, but that's just defaults and defining base routes, polices, etc. user settings are stored in the db or a global directory. should have parallel web and cli configuration and installation/update tools; web requires webserver to have write access to a config/ dir and to public/; cli tools do not (if run as another user) Horde 4 app - a Horde 3.x app updated for PHP 5 and to use the latest libraries Rampage app - "RAD" (rapid application development) MVC app that uses Horde 4 /horde/page/ -> dispatcher for Rampage modules w/ views (overridable), routes, controllers, etc.? have generic views for rampage_login, rampage_admin_*, etc. configuration: config/routes.php config/routes_local.php -> do this for all config files Horde_Content_Index -> horde-wide search Random Horde Ideas mini-cms for building your own sidebar/menu/etc? - shortcuts to any bit of horde labels labels labels keywords also or just labels? probably just flexible labels "smart folders" Getting Things Done support? (other apps that do it - Tracks, Kinkless GTD, Midnight Inbox) make mnemo into more of a snippet keeper? sort of like a personal cms - or wiki. carry the encryption feature through to other kinds of content create an outliner! tags/labels for mail rename virtual folders to smart folders? too apple? freetext boolean mail searches: apples & oranges apples | oranges apples ! oranges (apples but not oranges) apples & (oranges | lemons) security of redirects: http://www.xssed.com/mirror/39494/ This is sort of an interesting one. For the actual attack he merely figured out that we are base64 encoding the successurl and reflecting back whatever is there. The interesting thing is that merely filtering the unecoded data is not going to save us here. The string was javascript:alert(/XSS.By.Mityo/) and was being loaded into the URL field of a meta redirect. So our max filter of strip_tags is useless. It just illustrates the rationale for Phase 2 of the security build out where we have to be careful when we are dealing with redirects. In this particular case, we need to make sure we are getting a valid URL format. That will prevent javascript insertions. But we also want to make sure the URL is not redirecting outside the intended domain for some phishing scam. In this case I will fix the problem by validating the URL on the cons/login.inc.php where the data is coming in but will also try doing it on the generic show_redirect_message() call if I think I can do so without breaking other pages. Event-driven apps: "Understanding and implementing this event model can free your application from the constraints of defined elements. For example, instead of applying an event listener for each link in a menu, you can assign a single listener to the menu item itself and retrieve the event target. That way you don?t need to change your script when the menu gets larger or when links get removed from it." http://yuiblog.com/blog/2007/01/17/event-plan/ tagging/instant hierarchies as specialized permission-based search RBAC what is horde? groupware? horde data services? horde data access? ui layers be the php dojo framework? or the php yui framework? see http://tigermouse.epsi.pl/ ? or, don't do desktop-like widgets? see UI design bookmarks move away from gettext, at least as a default? midgard i18n notes: http://www.midgard-project.org/discussion/developer-forum/midgard-s-multilang-support/ try to rely only on thread-safe extensions? reduce dependency tree avoid globals and non horde-namespaced functions/methods in framework and core app code class-based registry apis against edge cases: http://www.bakesalehq.com/contents/show/12/ features from Prado? http://www.urdalen.com/blog/?p=198 use functions where appropriate for shortcuts/helpers, like Mike's t("translated string") function? but would be horde_t? would call configured translation system helper sets for dojo, protaculous, yui - simple functions like dojo_editor(), dojo_pane(), yui_map(), etc. Load with something like Horde/Layout/Helpers/YUI.php, etc. See http://www.ngcoders.com/projax/ Horde as a set of apps and methodology needs to pick a js lib, pick a template methodology, etc. - this is Rampage Horde as a framework can allow for flexibility To make it even better, separate the control logic from the presentation. That way, back could be reverse, etc. I do this in all my forms since application logic and presentation "word play" are two distinct things to me. This is what I use: <form method="post" action="form.php"> <input name="submit[back]" value="reverse" type="submit" /> <input name="submit[next]" value="speed ahead" type="submit" /> <input name="submit[home]" value="no place like home" type="submit" /> </form> Then, you can have a simple routine that captures submit actions regardless of the presentation value. You check for the array submit -- count 1 and whitelist against the acceptable values. A multi-row table can expand upon the theme by using this: submit[edit_3], submit[delete_3]m submit[edit_5], etc. caching make sure Rdo and other services allow dropping in caching rules http://sebastian-bergmann.de/pages/talks.html phpunit - @test markup in methods phpunit + selenium cruise control? really hope google will integrate any product of theirs with any other products of theirs? receive an email, transform it to document, add spreadheet, add notes, add bookmarks saved from search history and a link to an event in calendar anyone? From nyphp-talk: The other day I had to get an application started in a hurry. It's doing something useful at < 700 lines, but I'm considering options that could grow it out to about 10 times that. It depends on a "core library" that's < 500 lines. This library deals with common issues in string handling, parameter handling, and HTML form generation. About 10% of the application, or 70 lines, is a microframework that's loosely built on Struts. About 20 of those lines are in 2 functions which would be generally useful for microframeworks (such as file_exists_in_include_path()). Like Struts, the microframework chooses an "action" based on form parameters: the action then chooses a "view" -- a "view" is basically a template that a designer can edit which can be supplemented by an optional "query" which pulls stuff out of the database. Like Ruby-on-Rails, the microframework uses convention instead of configuration: the dispatcher computes an "action name" based on query parameters, and uses that to compute a filename... It checks that the file exists and executes it with the "require method". The microframework uses no object-oriented techniques. That's not because I have any antipathy to OO, but because I didn't need it, and I like writing my actions, queries, and views in a style that "feels like PHP". Yes, my microframework is nowhere near as powerful as CakePHP or Symfony. Yet, it's more flexible, because I can codesign it with my application. Because it's so simple, I can easily adapt it to do what I want. If I decide I really hate it, I can write a new one in an hour. I'm an expert on it, because I developed it, and I wouldn't have to take on the technical, social and emotional burdens of "forking" an open-source codebase if I wanted to make a change in direction. I'm moving towards a vision of web app architecture where we move towards shared vocabulary and standardized interfaces. Rather than working with a "comprehensive framework" that does everything, I'd like to have a "framework construction set" that contains a number of elements that I can take or leave." Resources: http://www.ryandaigle.com/articles/2006/06/30/whats-new-in-edge-rails-activeresource-is-here mixins: http://www.symfony-project.com/book/trunk/17-Extending-Symfony split db ideas: http://pear.php.net/pepr/pepr-proposal-show.php?id=359 http://dataspill.org/pages/projects/ruby-activeldap More php features to look in to: __toString works everywhere SPL features: Regex Iterators, SplFileObject CSV support, Caching Iterator Data: stream support DateTime and DateTimeZone classes set date.timezone ini setting automatically based on user? Search engine sitemap stuff - of use at all? maybe support in rampage cms http://p7.hostingprod.com/@www.ysearchblog.com/archives/000437.html 5. I want a registration info tab like Inbox.lv where they can change their personal stuff they put on file with us on the signup forms. 14. We may need Windows address book synchronization(this is a feature that fastmail is adding, and hotmail already has, so I guess we will have to also?) It is not a must in my books. 17. I want to add a new feature next to the attach button that is like send message after attached, so if they are uploading a big file the can leave and it will be sent automatically. 19. We MUST have an easy user interface. Fastmail has lots of features and they try to make it where you can do everything in 2 clicks or less. We need to try to do this. Fastmail is all bunched up and looks like shit though. We need to make ours more of a packed with features like fastmail, but spread out like AOL has or fastmail. This will attract all the old people and beginners of the internet who have just gotten off of AOL and moved to DSL Fastmail looks like it is only made for advanced users and is hard to get used to. We need to Have a main Navigation bar and which is on every page, which has all the mail icons that people use the most like, compose, inbox, addressbook, options, and the main Navagation bar should be on every page at the top.Then we wil have a subnavagation bar for each other page , for example, if you were to hit the calander icon on the main navagition bar that is on the top of EVERY page, then it would take you to the calander page and show you the calander and the subnavagation bar would have all the calander icons like add events ect. I was thinking, in IMP we could have the logo at the top left coner of the page, then on the top right we could have all the main navagation icons. Both the logo and the main navagitions would be o every sign page in IMP, so it would be easy to get around. Then the sub navagation bars coulkd go where the main navagition bar is now on IMP, understand? 2. Make a bounce button like fastmail.fm. This is how fastmail explains their bounce button: 'Bounce' takes the currently selected emails and sends back an email to the addresses the email(s) came from saying basically that 'the email address does not exist' in a standard internet email protocol way. Some more organised spammers remove these from their lists. After sending the bounce response, the messages are deleted." * If accessed with a browser, public folder is also a personal web-site, accessible at http://username.fastmail.fm * Provide tool allowing synchronization of Outlook Express etc address book with FastMail contacts, possibly using LDAP * Use JavaScript for browsers that support it to speed up many actions, such as searching through the address book * A general notification system, so you can send a pager message, SMS message, instant message, or short email eGroupWare over Horde reasons Linking: There is the "infolog" for linking items. An infolog item can be a to-do, call, or note. It can link to the addressbook, projects, calendar, or another infolog item. That is very flexible. Access Control: Under Preferences, there is a "Grant Access" link for the calendar, addressbook, infolog, and projects. It allows you to select Read, Add, Edit, Delete, and Private access for each group and each user. Again, very flexible. Categories: Multiple category selection is allowed in the addressbook, projects, calendar and infolog. Custom Fields: I can create custom fields. PHP_SELF Executive summary: PHP_SELF intentionally includes extra URL garbage (or valuable URL variables, take your pick) tacked on by the user. Don't use it without knowing what it does. Here's what you get when you hit the URL: http://example.com/info.php/testing1?testing2 : _SERVER["REQUEST_URI"] /info.php/testing1?testing2 _SERVER["PHP_SELF"] /info.php/testing1 _SERVER["SCRIPT_NAME"] /info.php Get it? If you don't want that extra stuff tacked on by the user, use the correct _SERVER variable. If you use REQUEST_URI or PHP_SELF, be aware the user can affect the contents of that variable. 99% of the time, you want SCRIPT_NAME, not PHP_SELF. By the way, here's another test: http://example.com/info.php/testing<script>?testing : _SERVER["REQUEST_URI"] /info.php/testing%3Cscript%3E?testing _SERVER["PHP_SELF"] /info.php/testing<script> _SERVER["SCRIPT_NAME"] /info.php Note that the REQUEST_URI variable, which comes from Apache, is encoded, while the PHP_SELF variable, which comes from PHP, is not. So PHP 5.2.0 still makes it possible to shoot yourself in the foot, and as I've pointed out below, well-known PHP authorities actually recommend that you do so. Here's the email that I sent at in July 2005: Subject: Re: [nyphp-talk] $_SERVER['PHP_SELF'} not working? Date: Friday 22 July 2005 12:05 pm From: Michael Sims <jellicle@gmail.com> To: NYPHP Talk <talk@lists.nyphp.org> On Thursday 21 July 2005 17:16, Dan Cech wrote: You could put: $_SERVER['PHP_SELF'] = $_SERVER['SCRIPT_NAME']; into one of your common include files. Yes. I'm afraid I don't understand this entire thread. Apparently because of the numerous PHP developer articles recommending it, and because of the php.net page which for whatever reason lists it first on the list of predefined variables, people are using PHP_SELF when they really want SCRIPT_NAME. SCRIPT_NAME solves all the problems mentioned in this thread - it's just the script name, without any extra garbage that might be tacked on by the user. PHP_SELF explicitly includes that extra garbage, so solutions in this thread that involve stripping the garbage off of PHP_SELF to make it safe are really, really missing the point - just use SCRIPT_NAME instead. Please don't use FORM ACTION=""; according to the spec, what the browser does with that is undefined, so even if it works in current browsers, it might not work in future ones. People can be forgiven for making this mistake -- I'm here holding my copy of _Learning PHP 5_, and it recommends on page 8 and again on page 86 the use of PHP_SELF for self-referencing forms, ahem -- but it's time to put it to bed: PHP_SELF is unsafe for any usage where it is echoed back to the page. SESSIONS: I'll try to reply to this and some other people who replied to my previous message. I'll start with my background. I've often been the person who the buck stops with -- somebody else develops an application that almost works (perhaps even puts it in production) and then I have to clean up the mess. The app might be written in PHP, Java, Cold Fusion, Perl, you name it. I've learned to see session variables as a "bad smell". When I develop my own applications, I use cookies for personalization and caching. I use the authentication system described in http://cookies.lcs.mit.edu/pubs/webauth:sec10-slides.ps.gz this mechanism can carry a "session id", which in turn can be used a key against application state stored in a relational database. I think through the boundary cases, and find that my greenfield apps behave predictably -- my only woe is that you'll discover that browsers have a lot of undocumented behavior connected with cookies, form handling, and caching. All problems that you still need to fight with if you use sessions, see the comments for http://www.php.net/manual/en/function.session-cache-limiter.php ---- The context of this is that the average web application is poor in the areas of usability and security: recent studies show that 80% of web applications have serious security problems http://www.whitehatsec.com/home/resources/presentations/files/wh_security_stats_webinar.pdf Jacob Nielsen's website has been chronicling the sorry state of web application usability: http://www.useit.com/ Perhaps the top 20% of programmers can write applications with $_SESSION that don't have serious security and usability problems, but what about the other 80%? ---- (1) Session variables are treacherous. Odd things can happen in boundary cases, such as when sessions expire, or when you are targeted by session fixation attacks. http://shiflett.org/articles/security-corner-feb2004 I've looked at many apps that use sessions that seem to be working... Until you walk away for two hours, come back, and discover that you're logged in as somebody else. I suppose I could have spent hours or days tracking down an intermittent problem, which involved some confluence of browser oddness (IE was fine, Firefox was screwy), the behavior of the session system, and crooked logic in the application. Or I could use cryptographically signed cookies to implement an authentication system which won't give me surprises in the future. Anybody can write applications that work 95% of the time with $_SESSION. Getting the other 5% right requires a deep understanding of state and statelessness on the web... Which is what (many) people are trying to avoid when they use $_SESSION variables. There are more than twenty configuration variables that affect the way sessions work under PHP. Incorrect configuration of any of these can cause applications to fail, often in intermittent ways. The use of a custom session handler can have unpredictable effects on security, reliability and performance. Other languages are a lot worse than PHP -- the use of the "scope" concept in languages such as Cold Fusion and Tango makes it easy to use a session variable without realizing it... Resulting in an application that "works" sometimes, but fails in mysterious ways. (2) Session variables are bound to a particular language. In the real world, I work with legacy systems that might be written in other languages. I might have some old pages in Cold Fusion that work just fine, and I won't rework them in PHP until I've got a good reason. If users can set a customization parameter, such as the background of a page, it's easy to write a cookie that all languages can read. Applications stuck in the session variable roach motel aren't as maintainable and portable. (3) PHPSESSID. Do I need to say more? I consider the client that wants user tracking and can't accept cookies, so all the pages on their site look like http://www.example.com/about_us.php?PHPSESSID=**pseudo-random blob** Three months later they come back and wonder why their site isn't being indexed in Google. Yes, there's a saner way to use this feature, but this "cure" to privacy violation is worse than the cookie "disease", since session ids will leak out through referrers, bookmarks, links that people cut-and-pate... (4) The back button. When somebody asks a question about sessions on a forum, they'll usually ask another question a few days or weeks later: "How do I disable the back button?" The underlying problem is a deep aspect of the structure of the web. There is certain state information that's particular to a request (GET and POST variables) and certain state information that has a more persistent scope (cookies, session information, a relational database.) The back button makes it possible for these two things to get out of sync. Ultimately, we need a systematic strategy to deal with this. One pattern is to put the complete state of the application in form variables. Applications that use this pattern always work perfectly with the back button. This pattern doesn't work always (hitting the back button shouldn't cancel your order on an e-commerce site), but it works often... For instance, you can use hidden variables to hold onto form variables for complicated forms that spread over several pages, (5) Multiple windows. I think it's a human right to be able to have more than one window open on a web site. If I'm shopping, for instance, I'd like to be able to look at two products simultaneously. An application that keeps state in form variables doesn't care how many you have open. If you're looking for jobs at an organization that uses taleo.net's software, you'll find that it uses trickery to prevent you from having more than one window open... So you can't look at two jobs at once, or look at the job description while you're filling out the application. I suspect that they did this because they don't want to spend forever debugging "race conditions" that could be caused by a user acting in two windows simultaneously. Session variables introduce problems of locking. PHP gets an exclusive lock on the session for each page displayed. This hurts the performance of pages that use dynamically generated images and Javascript, and can mysteriously deadlock AJAX applications. (6) Scalability, Reliability, and all that. This is a tricky one, because it depends on particulars. Sessions can be lightning-fast in systems that keep them in RAM, such as Java and Cold Fusion. The default session handler in PHP uses files, and is probably faster than a relational database in a direct comparison: however, the session handler will load all of the data into RAM, whereas a relational implementation may only need to load information when it's needed. Keeping information in POST variables or cookies also involves a tradeoff -- this is as scalable as it gets so far as server resources, but requires that the state be passed back and forth between the browser and server. This is no big deal if the state is 500 bytes. It's unacceptable if the state is 500 megabytes. In most cases, it starts looking expensive when we're passing an extra 10k-100k around. I've recently been working on a legacy app that contains a query (select a subset of items) and reporting (display user-selected fields of those items) function. The interface between those modules is simple: the query system passes a comma-separated list of item identifiers to the reporting system. I like this, because it meant that one system could be changed without affecting the other. I had to update the app so it would work with a changed database schema, so both sides needed some work. I discovered that the app was passing the item list as a session variable. This worked: unless I was using the application in two windows at a time. In that case, a query in one window would change the report delivered in another window. I thought about it, and realized that in this case, result sets would always be under about 10k, and usually be around 1k. Therefore, it made sense to pass this as a hidden variable in the form and ditch the session variable. This shows the kind of problems that regularly turn up in the applications that developers "throw over the wall" to testers and clients. Choose a session variable, and your application behaves mysteriously for a user who didn't respect the "one window at a time" assumption you made. Passing hidden variables in forms, on the other hand, might work OK when you're testing with a small data set over a LAN, but could rapidly become a performance nightmare for dialup users using a production database. Performance can be improved in a number of ways: for instance, by delta-sigma compressing the item list, or creating a "form scope" variable that's keyed against a unique identifier in the form. Either way, quality web applications take quality thought. (7) Lack of engineered application state: Engineered Application State is the gem of database-backed web applications. If you keep the state of your application in a relational database, you need to ~design~ the state of your application. You need to ~think~ every time you add or change a table in your relational database. You can add a new variable to your application as easily as typing '$'. Desktop apps keep the application state in a tangle of pointers. C and C++ applications tend to contain 5 or more defects per thousand lines of code. Errors show up in data structures over time, just as mutations occur in your cells. Memory leaks, application hangs, and crashes are cancers caused by these mutations. PHP apps die at the end of each request, and are reborn for the next request. They don't accumulate errors over time. Web application environments such as Java and Cold Fusion that involve a long-running process regularly hang or crash and require restarts. When is the last time you've had to restart PHP? A database protects you from errors in multiple ways. Transactions, for instance, protect against data corruption caused by crashing scripts. It's easy to write $_SESSION["logged_in"]=true; in one place and $_SESSION["logged-in"]=false; in another, introducing unpredictable behavior and security holes. A relational database will give you an error if you try something like that. ------------- Can users of $_SESSION avoid the seven deadly sins? Yes. In practice they don't. Paul, That looks like a lot of info to digest without specific examples. Is there a book or other resource on session management that you recommend that deals with these issues in more detail? Thanks. -Leo I'm not aware of one, but I wish there was. I think the question isn't so much "session management" but about how to manage state in a stateless protocol -- sessions are one abstraction for doing that, but other abstractions exist too. I think the best approach here is the "Pattern Vocabulary" approach. There are certain practices, that when applied to an application, have certain results. For instance, there's the pattern of "Stateless Server" -- the complete state of the application (or subsystem thereof) is kept in hidden POST and GET variables. You accept some limits, but get some real benefits: infinite scalability, no headaches with the back button, no need for cookies... You might try the above and then notice that you're passing 100K around in your hidden form variables... People are complaining that your app is slow. Now you can generate a unique id each time you draw a form ("Generated Form Scope", for lack of a better term.) You can stuff your "hidden" variables into the database under this key, and restore them when the key comes back... If your code is organized right (does something like $vars=$_POST, and only looks at $vars afterwards), you can do this transparently to the rest of your app. The same kind of thinking can protect you against certain kinds of back button woes -- you can at least stop people from submitting the same form more than once, by checking to see if a form with that unique id has been submitted before. "Shopping Cart" is another pattern. People often use session variables to handle shopping carts, but that's really not ideal from a user interface perspective... Ideally, each instance of a shopping cart has it's own unique id... Imagine we want to make an e-commerce site that behaves like amazon.com: (1) User visits e-commerce site from a home computer -- a long-term tracking cookie gets stuck on their browser (2) User adds item A to their shopping cart... A new shopping cart is created with id #101, associated with the tracking cookie. (3) User adds items B,C,D, and E to their shopping cart in the course of 30 minutes of browsing. Each time an item is added, we add a row to a table in the database that links the item id to the shopping cart id. (4) 4-year old hits reset button (5) User comes back to e-commerce site... He's happy to find his cart is still there. User creates account #202 to check out. Shopping cart #101 is associated with account #202 (6) User checks out shopping cart. (7) User comes back a week later, wants to buy a few more items. The site recognizes who he is. He adds two of item A and an item F to a newly created shopping cart with id #102, associated with user account #202. (8) User goes to work, logs in... The system sees that he has shopping cart #102 open. He adds item G, and then checks out. (9) User learns that he can trust this site to work correctly and becomes a loyal customer. It's nice that we've got a historical record of the shopping cart after the fact, but there's a more important point -- we could have lost the customer's dollar at many points in the above transaction if we were using a $_SESSION based cart. The session wouldn't have survived step 4, for instance. A good user interface isn't academic here... It puts money in our pocket. The above scenario is complex, and it might not be fair to expect that a first-generation shopping cart has those features. A $_SESSION-based shopping cart would need to be completely reworked to add the features above. A cart that uses a unique "cart id" and relational back end, will be a lot more maintainable... You could even start out using $_SESSION to keep track of the "cart id", then keep it in a cookie, then associate it with a user name, add the facility to promote an anonymous cart to an authenticated cart and so on. Starting with a good design, we can provide the interface that we ~want~ to provide, not that one that our abstract layer ~forces~ us to provide. In regards to slides 29 and 30, can you elaborate and give a more detailed example what they are trying to say? Are they saying that the session key should contain a hash of the data? Or does the hash become the "salt" in crypting the data? Finally, how does doing that make it easier to prevent circumvention and forgeability. Let's take it a step at a time... Imagine we've got a token of the following format... $token="$user_id:$session_id" The session_id doesn't have to be unpredictable -- it could could from an auto_increment column in a database table... With the caveat that people could estimate the usage of your site by looking at the session id's. You could put this in a cookie, and it would work quite well, as long as you didn't have users who knew how to look at or change the cookies. An attacker who understands cookies can easily change the user id, or session_id. To protect the cookies from tampering, we could do something like $hash=sha1($token); $signed_token="$hash:$token"; We could check the integrity of the token by recomputing the hash and see if it matches the one in the signed token. This protects against accidental damage, or very simple attacks. Still, it's quite possible that an attacker could guess what you're doing: it wouldn't be safe at all in an open source system. That's where the salt comes in... For a particular web site, we create a random "salt" that, effectively, gives us a unique hash function for our web site. $salt="... a random salt defined in a per-site configuration file ..."; function private_hash($token) { global $salt; return sha1("$salt:$token"); } $private_hash=sha1("$salt:$token"); $signed_token="$private_hash:$token"; Now, nobody can alter your tokens unless they know your salt. Because the tokens are cryptographically signed, the token itself is a proof that somebody has logged in -- you don't need to look at the database or keep ~any~ server side state. This makes it a highly scalable system... This basic approach is used on some of the biggest sites in the world, such as yahoo.com. Except for one little detail: replay attacks. Nothing stops a person from saving his token and presenting later -- after his account may have been deactivated, or after associated session information has been purged (an error condition.) An attacker that gets the person's cookie jar, or who intercepts network traffic, can also steal the token. It's not possible to completely protect against sophisticated attacks where a hostile party controls your network without installing complex software on both ends, and solving some intrinsically difficult problems having to do with mutual authentication. Let's just say that the developers of SSL have solved these problems, and that you should use SSL for applications with the strongest security needs. We can, however, make replay attacks a lot harder by adding a timestamp... Now the token looks like $timestamp:$user_id:$session_id Now we're keeping a table on the server that looks like create table session ( session id ... session id ... primary key user_id ... user id ..., last_updated ... timestamp ..., begin_time ... timestamp ..., end_time ... timestamp ... ); Now we've got two constants: REFRESH_TIME: how old a timestamp is before we issue a token with a new timestamp and write the timestamp to the last_updated column. EXPIRE_TIME: how old a timestamp is before we eliminate the session. You might think you could put the client ip address in the token, and lock the session to an ip address to make it harder to steal tokens. I tried this, but found out that some of the largest ISPs (such as aol) have a proxy server that makes users seem to "jump around". You can do it if you know people are logging from a sane ISP, but you can't do it in general. --- This system can be improved in numerous ways, such as adding anonymous sessions, operating in a split http/https mode, and caching authorization system in the token. If you're worried about information leakage (you don't want someone to know that he got session 88427 yesterday and 99105 today), you can encrypt the token. But be careful... It's easy to use cryptography the wrong way: don't rely on encryption to protect token integrity against tampering -- most of the obvious schemes don't really work. cookie usage: 20 per domain, 4094 characters (bytes) in the value Horde_Model -> Horde_Rdo_Model extends it Horde_Type Page/Block object - how to return block from driver, inherit Block methods, but also inherit Rdo_Base? Mapper! _Mappers are the drivers_ Nag - tasks are a model different models for different sources of tasks so maybe horde_rdo_model isn't extension but delegate? types are string, etc. types can be used by rdo as well as by forms (models) form helpers go into horde_view helper pack Horde_Model: validation: validatesPresenceOf validatesUniquenessOf validatesAcceptanceOf validatesConfirmationOf one database, one real filesystem space no globals webroot has: index.php .htaccess assets/ (css, images, js) mod_rewrite rules everything else pear-installable make assets pear installable somehow viewbuilder/pagebuilder - custom views command line and web service actions (still api/method/params) catalyst::message() - replaces logmessage - fatal, notification, observer - has a return value (?) session object management cms for rampage based on (replacing) ulaform + wicked + giapeto horde_form - db and xml descriptions instead of just php building reconcile driver architecture with Rdo Models apps provide models instead of forms? apps provide route bundles? (if frontcontroller) forms are models! reconcile models and mappers what do routes point to (models? mappers? views?) -> controllers controllers handle mappers vs. models? composite mapper? (turba, etc.) After reading that theserververside.com entry, it seems like we've been doing this in Solar (framework for PHP5) for a little while now. Essentially, after processing a form, you call $this->_redirectNoCache('controller/action') and you shouldn't get any re-POST troubles. Boring code from the page-controller follows. <http://solarphp.com/svn/trunk/Solar/Controller/Page.php>;; /** * * Redirects to another page and action after disabling HTTP caching. * * The _redirect() method is often called after a successful POST * operation, to show a "success" or "edit" page. In such cases, clicking * clicking "back" or "reload" will generate a warning in the * browser allowing for a possible re-POST if the user clicks OK. * Typically this is not what you want. * * In those cases, use _redirectNoCache() to turn off HTTP caching, so * that the re-POST warning does not occur. * * This method sends the following headers before setting Location: * * {{code: php * header("Cache-Control: no-store, no-cache, must-revalidate"); * header("Cache-Control: post-check=0, pre-check=0", false); * header("Pragma: no-cache"); * }} * * @param Solar_Uri_Action|string $spec The URI to redirect to. * * @param int|string $code The HTTP status code to redirect with; default * is '303 See Other'. * * @return void * */ protected function _redirectNoCache($spec, $code = 303) { // reset cache-control $this->_response->setHeader( 'Cache-Control', 'no-store, no-cache, must-revalidate' ); // append cache-control $this->_response->setHeader( 'Cache-Control', 'post-check=0, pre-check=0', false ); // reset pragma header $this->_response->setHeader('Pragma', 'no-cache'); // continue with redirection return $this->_redirect($spec, $code); } apps provide models instead of forms apps provide route bundles apps provide controllers seekable iterators? use of ArrayIterator adding LimitIterators and FilterIterators on top of Rdo match up RDO with making resources first class - a wiki page, a task, etc. all get a URI Meanwhile HTTP was designed for access to resources, the ìprimary keyî being determined by itís URL (vs. having to worry about the insert id). If you think ìdocumentsî, itís clear thereís no need to make a distinction between creating and updatingócreating a document results in the first version. Updating means overwriting an existing document with a new version. But in both cases the client is POSTing the same thing and does not need to be aware of whether the document already existed or not. Meanwhile a common first demo app for server side frameworks is a CRUD example. The implication here is frameworks place a strong emphasis on the database, while HTTP is largely ignored (itís rare to even see HTTP status codes as a fundamental part of a framework). Avoiding a long filesystem vs. database discussion (like the need for virtual file systems with extensible properties) suffice to sayóconsider how Dokuwiki stores wiki pages 1-to-1 as files compared to MediaWiki. What makes more sense to you? Perhaps our websites have been driven too far by the database? The point here is, given the mismatch between HTTP and CRUD, weíve put CRUD first which in turns makes actions first class in our frameworks. We aim to support N different types of action (verbs) when really we should have been dealing with only threeóGET, POST and DELETE (the latter being perhaps re-routed to a specific ìresource classî method according to some framework / form conventions). # Nannying: tell me how to get organisedóclear signposts for where to put my code. # Just add water: give me my prototype now! # Donít make me think: I can do this stuff even on my dumbest days. # DRY: making the same change 50 times is not cool. # Anti-pasta: help me avoid spaghetti # Security: no nasty surprises please. Help me get this right first time. # Testing: help me protect myself against myself. To me, what we should look at is the basic reasons why we want to manage web-pages and satisfy them: 1. centralized control over page rights and access 2. ability to remap urls due to changes in web-site structure 3. handling 404-errors intelligently 4. ability to dynamically add headers and footers to pages for displaying alerts such as "system going down at 5pm" 5. separates content from presentation in a reasonable manner, eg. with templates 6. managing tainted data (eg. POSTS, GETS, COOKIES) AJAX Considered Harmful Please pardon the provocative title, but this post is intended to surface one point I buried in yesterday's presentation in the hopes that by making it a separate post it will attract a wider audience. I intend for this to post to be constructive, so I will focus on two specific suggestions which hopefully will serve as the seed for the development of a set of best practices for AJAX. Here are the two humble suggestions on things that people should standardize on: * the data should first be encoded as octets according to the UTF-8 character encoding * GET should never be used to initiate another operation which will change state Rationale for these two suggestions follows. Encoding For the former, I proposed a simple test: The first thing I want you to do is to copy the string ìIÒtÎrntiÙnlizÊti¯nî into your tool and observe what comes out the other side. When expressed as a part of the query component of a URI, it should look like I%C3%B1t%C3%ABrn%C3%A2ti%C3%B4n%C3%A0liz%C3%A6ti%C3%B8n. Standardizing improves interoperability, and the reason why I am suggesting UTF-8 is that it is backwards compatible with ASCII, can express the full range of the Unicode character set, and is widely implemented. Idempotency Looking into the current PHP implementation of SAJAX, you will see the following: // Bust cache in the head header ("Expires: Mon, 26 Jul 1997 05:00:00 GMT"); // Date in the past header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT"); // always modified header ("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1 header ("Pragma: no-cache"); // HTTP/1.0 This code should be a rather large clue that you are probably doing something wrong. Apparently the author recognized that these headers are somewhat sporadically and inconsistently implemented, and hoped that by combining them that the chances of success would be improved. The danger that the responses may be cached is actually the smaller of several concerns. A much bigger concern is that unsuspecting grandmothers and bots everywhere can be tricked into modifying online databases simply by following a link. Judicious use of HTTP GET can be a very good thing. Perhaps toolkits can adopt a convention that procedure names that start with the characters ìGetî use GET, everything else uses POST. possible dispatcher: <?php define('HORDE_CONFIG_DSN', 'file:///var/horde/config/head-site1'); define('HORDE_BASE', '/var/horde/head'); require_once HORDE_BASE . 'core.php'; Horde_Rampage_Dispatcher::run(); no-cache headers: header("Cache-Control: no-store, private, must-revalidate, proxy-revalidate, post-check=0, pre-check=0, max-age=0, s-maxage=0"); meta tags to include: <meta name="MSSmartTagsPreventParsing" content="true" /> <meta http-equiv="imagetoolbar" content="no" /> Things to watch out for: PHP_SELF SERVER_NAME Referer - never depend on it passwords - don't use just md5, add a salt. or, consider everything in $_SERVER tainted. Of course $_GET, $_POST, $_REQUEST, $_COOKIE are. Edge cases: $_SESSION, backend databases. If you don't consider it input, then it's part of your application for security purposes. Never display credit card info - this means it shouldn't be repopulated! Filtering is _inspection_, not correction. Don't try to correct invalid data. Casts are relatively safe but still miss simplistic attacks. When possible, whitelist - prove data valid. Simple list of values, or a regexp. Everything else is bad. Need a model for making the filtered data clearly available, and don't touch the tainted data. ctype_* - fast, and charset aware. Much better than regexp tests. Output filtering Escaping is preservation, not changing data. HTML, javascript, cli output, session data, rss feeds, XML, etc. Any remote destination. Need a clever way to integrate this into the template system! Perhaps a content-type on variables (too much?) of text/html, text/plain, text/xml, etc.? How about instead of tag, have text:foo, html:foo, xml:foo? Or <tag:foo type="html">, defaulting to type="text". Escaping MUST be charset aware. Data escaped for us-ascii might result in JavaScript in Japanese (not necessarily a valid example). For filtering complex data, use checksums instead. fopen_wrappers - turn off if possible? display_errors - write a custom error handler, handle errors elegantly & integrated with Log object. complexity leads to mistakes http://phpsec.org/ http://brainbulb.com/ http://shiflett.org/ http://md5.rednoize.com/ http://www.midgard-project.org/updates/2003-05-29-000.html * Standardized URL-to-object mapping * Standardized object-to-application mapping * Standardized navigational system * Standardized object extensibility API * Standardized way to make application output configurable So, MidCOM is about standardizing how to build Midgard applications and site features. Lets look at each of the points in more detail Standardized URL-to-object mapping Before MidCOM Midgard site and application developers have had to figure out how to map URL requests into Midgard objects, typically to topics and articles. Everybody has rolled their own solution for this, using object names, IDs or GUIDs as the identifiers, and using either GET parameters or active page arguments. With MidCOM, application development doesn't any more have to start by writing a URL parser, as the MidCOM system provides this already. URL parsing happens completely in topic and article space, using object names as the identifiers. This makes for very clean URLs. Consider the following: /gallery/spring-2003/IMG_2442.html This example would translate to article named "IMG_2442" in topic "spring-2003" under topic "gallery". Clean, pronounceable and easy to use. An even better, any Midgard object instanced using a MidCOM component is aware of its location, providing the URL through MidCOM's metadata API. Standardized object-to-application mapping In addition to connecting URLs to Midgard objects, URLs also need to be connected to specific applications, or in MidCOM terms, components. All topics in MidCOM are assigned to be managed by a component. This means that different parts of the site can work in different ways. For example, URL: /news/midgard-tutorial.html Could load a "news ticker" component, and provide the topic "news" and article "midgard-tutorial" to be handled and displayed by it. The newsticker component can fully control the administrative interface for managing content under it, and the output provided by URLs it manages. Component is selected for each topic separately. This means that example URL: /news/contacts/bergius.html Could be handled by a "employee directory" component. Standardized navigational system Each MidCOM component provides all navigational information about objects managed by it to a system called NAP, which is accessible by an easy object-oriented API. The NAP system means that site developers don't worry about different components or object types when writing the site's navigational interface. You can write one script for generating the whole site navigation, and it will work with the site and any component under it. This makes standardized navigational tools like breadcrumbs or the NemeinNavBar utility much more useful, as they can be used with any MidCOM-based site. I expect that in near future site developers will have a huge library of prebuilt navigational systems to select from. Standardized object extensibility API Enabling content managers to define their own object types or metadata fields has always been a problem with Midgard, meaning that any new metadata field has forced site developers to write their own content creation UIs. MidCOM provides an easier system for this called datamanager. With datamanager, site developers can define their own customer data structures, called "layouts". Layouts are PHP arrays telling datamanager what fields to allow for objects handled for that component, how to present those fields in an administrative interface, and where to store them (parameters, object fields or attachments). Using datamanager component writers don't really have to care about what object fields site developers will want to use, they just need to use the datamanager utility. Data structure "layouts" can be provided as part of the default component configuration, and can be overridden on a per-sitegroup basis. Datamanager is integrated to the MidCOM AIS content management interface, providing customized editing forms for all components based on widgets defined in the "layouts" configuration. The widgets can be anything from text input boxes to a WYSIWYG editor or image upload system. Standardized way to make application output configurable The MidCOM specification requires that all application output is handled through the MidCOM style system. MidCOM's style engine is an extension of the Midgard style engine, allowing component outputs to be configured using style elements, but also for fallback elements to be provided as snippets. This means that output of any MidCOM component will be fully configurable by site developers using the familiar Midgard style engine. Style to be used can be defined separately for all topics, allowing for different output styles from same components on per site area basis. Because components can be loaded dynamically to a Midgard page, site developers can have different parts of the same page use different styles, making administration of the style elements much easier. Conclusions MidCOM brings into Midgard something that has been lacking so far: a "write once and run everywhere" framework for building site components, styles and navigational tools. This promotes component sharing and code reuse, both within a single Midgard solution provider company, and within the international Open Source community. So far Midgard has provided a nice content management framework, but actual sites have needed to be built from scratch. MidCOM promises to change that, making Midgard much easier to implement. Of course, sloppy coding is still possible with MidCOM, but if component writers adher to the MidCOM specification, PEAR coding standards and use NemeinLocalization for internationalizing their components, we should achieve global reusability. I invite all Midgard developers to seriously study and consider MidCOM for their projects. There is some learning curve, but real code reusability should repay that very quickly. The Midgard Framework is a powerful toolkit for managing online information. Writing applications and functionalities to the platform is done using the easy-to-learn PHP scripting language. All interfacing with the system is done via a regular Web browser, and no special tools are needed for developers or content authors. Main features of Midgard Framework include: * Easy and well documented Application Programming Interface (API) * Efficient management of Web content using a hierarchical topic system * Separation of layout, content and site logic * Support for editorial workflow and approval mechanisms * Attachment of metadata to all content objects * Management of PIM data including contacts and calendaring information * Multilingual support (including Unicode) and localization * Replication for clustered setups and staging * Multi-company support using virtual databases * Flexible user and group management Midgard works on most common UNIX platforms, including Linux, FreeBSD and Solaris. Prebuilt binary packages are available for some Linux platforms (including Red Hat, Debian and Mandrake), and the system can be installed from sources to most other environments. For other environments, including hosted servers and Windows systems, there is the pure-PHP implementation, Midgard Lite. The Midgard Application Server is free software developed internationally with the Open Source model and distributed under the GNU licenses. Commercial support, applications and services for the platform are available from a range of companies worldwide. The PHPmole toolkit provides Midgard developers with a freely-available Integrated Development Environment (IDE) comparable to DreamWeaver and MS Visual Studio, with additional content management functionalities. With the Midgard CMS package, the ease-of-use of productivity software and office suites can be brought to Midgard content management. query building: <?php // Instantiate the Query Builder for seeking MidgardArticles $query = new MidgardQueryBuilder("MidgardArticle"); // Next add the SQL constraints you need // List articles only from specific topic $query->addConstraint("topic", "=", $topic->id); // List only articles that have been approved since some timestamp $query->addConstraint("approved", ">", $starting_time); // Order the articles based on their approval time $query->addOrder("approved", "DESC"); // Get only 20 articles for this particular view $query->setLimit(20); // Start from the Nth page of this article list $query->setOffset($_REQUEST["startfrom"]); // Execute the query returning an array of matching MidgardArticle objects // The MidgardArticles are the full article objects with all regular methods $articles = $query->execute(); if (!$articles) { // Handle error } // And then display your articles print_r($articles); ?> Query Builder in action Thanks to Jukka's efforts, we have already working MidgardQueryBuilder. Let's start with simple example. /* Define which MgdSchema type should be used and returned by QB */ $qb = new midgardquerybuilder("NewMidgardArticle"); /* Define constraints */ $qb->addConstraint("topic", "<", 2); $qb->addConstraint("title", "=", "News"); /* Execute SQL query and return array*/ $f = $qb->execute(); MySQL query executed: SELECT article.id FROM article_i,article WHERE article.topic < 2 AND article_i.title = 'News' AND article.id=article_i.sid As you notice, title property is defined in article_i table while topic property is defined in article table. Query Builder follows class' tables definition and is able to search for objects which has more than one table as storage. $qb->execute(); returned array with only one object ( due to record returned by SELECT ), so print_r($f[0]); NewMidgardArticle Object ( [sitegroup] => 0 [author] => 0 [owner] => 0 [realm] => article [guid] => cedda8cb461c9f846c73f043aaf888e9 [changed] => [updated] => [action] => create [errno] => 0 [errstr] => [id] => 28 [calstart] => 0000-00-00 etc etc etc Let's try to use datetime fields: $qb = new midgardquerybuilder("NewMidgardArticle"); $qb->addConstraint("revised", ">", "2003-04-30 09:46:00"); $f = $qb->execute(); MySQL query executed: SELECT article.id FROM article_i,article WHERE article.revised > '2003-04-30 09:46:00' AND article.id=article_i.sid Now $qb->execute() returned array with 5 objects. I do not want to print'em all , so let's look at revised properties if were selected correctly: print_r($f); Array( [0] => NewMidgardArticle Object ( [revised] => 2003-04-30 10:30:06 [1] => NewMidgardArticle Object ( [revised] => 2003-04-30 10:01:18 [2] => NewMidgardArticle Object ( [revised] => 2003-04-30 11:03:31 [3] => NewMidgardArticle Object ( [revised] => 2005-04-05 16:29:16 [4] => NewMidgardArticle Object ( [revised] => 2005-05-12 12:36:18 Simple , fast and usefull :) OK, now try to read about classes which extend MgdSchema classes and think how this could be used with Query Builder. PHP classes names are not case-sensitive, and MgdSchema type's names are. So if we could use only lowercases for type and classes names in MgdSchema we could extend MgdSchema classes and use own classes and objects with Query Builder too. Just like this: class Amerigard extends NewMidgardArticle { } class FlyHigh extends Amerigard { } $qb = new midgardquerybuilder("FlyHigh"); $qb->addConstraint("topic", "<", 2); $qb->addConstraint("title", "=", "News"); $f = $qb->execute(); MySQL query executed: SELECT article.id FROM article_i,article WHERE article.topic < 2 AND article_i.title = 'News' AND article.id=article_i.sid Above example is not working example of course , but could be :)