Horde_Controller - make Horde_Controller_Dispatcher Horde_FrontController_Http or similar, then add other front controllers for Soap, XmlRpc, JsonRpc, Cli ...
Lots of overlap with debugging
Don't get too caught up in everything everyone else is doing. However, some things that might be useful food for thought are listed below. Projects may be added to or discarded from this list quickly as they are synthesized.
http://incubator.apache.org/thrift/
http://www.gearmanproject.org/
http://codeigniter.com/wiki/Modular\_Extensions\_-\_HMVC/
http://ojay.othermedia.org/articles/keyboard.html
http://www.lkozma.net/autocomplete.html
http://writer.bighugelabs.com/
http://www.backhand.org/wackamole/
protocol-independent URLs:
http://nedbatchelder.com/blog/200710.html#e20071017T215538
$headers = apache\_request\_headers\(\);
if \($headers\['If-Modified-Since'\]\) \{
$ims = strtotime\($headers\['If-Modified-Since'\]\);
if \($ims \>= $serve\_data\['modified\_time'\]\) \{
Header \("HTTP/1.0 304 Not Modified"\);
exit\(0\);
\}
\}
}
horde apps - "instance" of a horde app == installed horde app + a group of horde_policies that configure it
let those policies be named
instead of using shipping/foo api calls, use $instance->foo()
$Horde->api->method() (chaining)?
I just went through my first signup process that required an
SMS-capable device for confirmation. It also didn't make me pick my
credit card type, and instead used my country code (+1) to decide on a
card detection algorithm.
update_client.pl /modules/future_contribution /modules/future_signup
I think I found now the right mysql-server settings, with which the performance is quite Ok. Increasing the sort_buffer_size was one of the changes that helped.
skip-external-locking
skip-thread-priority
key_buffer = 64M
max_connections = 1024
max_connect_errors = 1000
max_allowed_packet = 8M
table_cache = 512
sort_buffer_size = 8M
read_buffer_size = 1M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 64M
thread_cache_size = 50
query_cache_size = 128M
tmp_table_size= 1024M
thread_concurrency = 12
wait_timeout = 60
interactive_timeout = 60
log_slow_queries
add dynamic finders (find_by_name, find_by_id, etc.) to Rdo Mappers or Horde_Db_Model or whatever
Controller classes/objects vs. Action classes/objects vs. Resources vs. API
how to develop? give up central config?
http://www.w3.org/Provider/Style/URI
index.php - global dispatcher
how to do themes/custom templates? chain local -> app -> horde?
a horde 4 installation:
config/
lib/
apps/
public/ <- with app/ subdirs containing images, etc.
everything routable goes in apps/
apps/
login/
help/
prefs/
admin/
etc...
... auto-install web files to a writable dir, either in web ui or in cli? keep apps self-contained that way?
app name is the first part of the route > /login
subdomain support
route aliases
an app should uncompress over a horde/ dir - /config/app/*.php -> config dir is compiled/cached
horde is not rails. it is designed as a container for multiple, collaborating apps
need Horde_Db, whatever implements DML, DDL, and SQL - Mad? MDB2?
merge Rdo and Mad into Horde_Db
allow for overriding the mappers so that non-SQL can be used, but, default to SQL/sqlite and leverage it
framework repository/module
Horde/lib/...
Rampage/lib/...
use subpackages or multiple \*.xml for packages to avoid silliness?
apps should be installable into a horde container. shouldn't be tied to the app name - keep imp, krono, etc, but install as mail, cal, events (should be able to install two versions of krono w/ different permissions - see HordeSpaces)
installing gives a slug, that slug manages config, templates, themes, perms, etc.
figure out how to merge luxor into Chora
for now, build Horde_Content_* based on Rdo, then move to Horde_Db
Horde_Db provides Horde_Db_Mapper which creates Horde_Model_Base objects
apps have a config/ dir, but that's just defaults and defining base routes, polices, etc. user settings are stored in the db or a global directory.
should have parallel web and cli configuration and installation/update tools; web requires webserver to have write access to a config/ dir and to public/; cli tools do not (if run as another user)
Horde 4 app - a Horde 3.x app updated for PHP 5 and to use the latest libraries
Rampage app - "RAD" (rapid application development) MVC app that uses Horde 4
/horde/page/ -> dispatcher for Rampage modules w/ views (overridable), routes, controllers, etc.?
have generic views for rampage_login, rampage_admin_*, etc.
configuration:
config/routes.php
config/routes_local.php -> do this for all config files
Horde_Content_Index -> horde-wide search
Random Horde Ideas
mini-cms for building your own sidebar/menu/etc?
labels labels labels
keywords also or just labels? probably just flexible labels
"smart folders"
Getting Things Done support? (other apps that do it - Tracks, Kinkless GTD, Midnight Inbox)
make mnemo into more of a snippet keeper? sort of like a personal cms - or wiki. carry the encryption feature through to other kinds of content
create an outliner!
tags/labels for mail
rename virtual folders to smart folders? too apple?
freetext boolean mail searches:
apples & oranges
apples | oranges
apples ! oranges (apples but not oranges)
apples & (oranges | lemons)
security of redirects:
http://www.xssed.com/mirror/39494/
This is sort of an interesting one. For the actual attack he merely figured out that we are base64 encoding the successurl and reflecting back whatever is there. The interesting thing is that merely filtering the unecoded data is not going to save us here. The string was javascript:alert(/XSS.By.Mityo/) and was being loaded into the URL field of a meta redirect. So our max filter of strip_tags is useless. It just illustrates the rationale for Phase 2 of the security build out where we have to be careful when we are dealing with redirects.
In this particular case, we need to make sure we are getting a valid URL format. That will prevent javascript insertions. But we also want to make sure the URL is not redirecting outside the intended domain for some phishing scam. In this case I will fix the problem by validating the URL on the cons/login.inc.php where the data is coming in but will also try doing it on the generic show_redirect_message() call if I think I can do so without breaking other pages.
Event-driven apps:
"Understanding and implementing this event model can free your application from the constraints of defined elements. For example, instead of applying an event listener for each link in a menu, you can assign a single listener to the menu item itself and retrieve the event target. That way you don?t need to change your script when the menu gets larger or when links get removed from it."
http://yuiblog.com/blog/2007/01/17/event-plan/
tagging/instant hierarchies as specialized permission-based search
RBAC
what is horde?
groupware?
horde data services?
horde data access?
ui layers
be the php dojo framework? or the php yui framework?
see http://tigermouse.epsi.pl/ ?
or, don't do desktop-like widgets? see UI design bookmarks
move away from gettext, at least as a default? midgard i18n notes:
http://www.midgard-project.org/discussion/developer-forum/midgard-s-multilang-support/
try to rely only on thread-safe extensions?
reduce dependency tree
avoid globals and non horde-namespaced functions/methods in framework and core app code
class-based registry apis
against edge cases: http://www.bakesalehq.com/contents/show/12/
features from Prado? http://www.urdalen.com/blog/?p=198
use functions where appropriate for shortcuts/helpers, like Mike's t("translated string") function? but would be horde_t? would call configured translation system
helper sets for dojo, protaculous, yui - simple functions like dojo_editor(), dojo_pane(), yui_map(), etc. Load with something like Horde/Layout/Helpers/YUI.php, etc. See http://www.ngcoders.com/projax/
Horde as a set of apps and methodology needs to pick a js lib, pick a template methodology, etc. - this is Rampage Horde as a framework can allow for flexibility
To make it even better, separate the control logic from the presentation. That way, back could be reverse, etc. I do this in all my forms since application logic and presentation "word play" are two distinct things to me. This is what I use:
<form method="post" action="form.php">
<input name="submitback" value="reverse" type="submit" />
<input name="submitnext" value="speed ahead" type="submit" />
<input name="submithome" value="no place like home" type="submit" />
</form>
Then, you can have a simple routine that captures submit actions regardless of the presentation value. You check for the array submit -- count 1 and whitelist against the acceptable values. A multi-row table can expand upon the theme by using this: submitedit_3, submitdelete_3m submitedit_5, etc.
caching
make sure Rdo and other services allow dropping in caching rules
http://sebastian-bergmann.de/pages/talks.html
phpunit - @test markup in methods
phpunit + selenium
cruise control?
really hope google will integrate any product of theirs with any other products of theirs? receive an email, transform it to document, add spreadheet, add notes, add bookmarks saved from search history and a link to an event in calendar anyone?
From nyphp-talk:
The other day I had to get an application started in a hurry. It's
doing something useful at < 700 lines, but I'm considering options that
could grow it out to about 10 times that. It depends on a "core
library" that's < 500 lines. This library deals with common issues in
string handling, parameter handling, and HTML form generation.
About 10% of the application, or 70 lines, is a microframework
that's loosely built on Struts. About 20 of those lines are in 2
functions which would be generally useful for microframeworks (such as
file_exists_in_include_path()). Like Struts, the microframework
chooses an "action" based on form parameters: the action then chooses a
"view" -- a "view" is basically a template that a designer can edit
which can be supplemented by an optional "query" which pulls stuff out
of the database. Like Ruby-on-Rails, the microframework uses
convention instead of configuration: the dispatcher computes an "action
name" based on query parameters, and uses that to compute a
filename... It checks that the file exists and executes it with the
"require method".
The microframework uses no object-oriented techniques. That's not
because I have any antipathy to OO, but because I didn't need it, and
I like writing my actions, queries, and views in a style that "feels
like PHP".
Yes, my microframework is nowhere near as powerful as [CakePHP](CakePHP) or
Symfony. Yet, it's more flexible, because I can codesign it with my
application. Because it's so simple, I can easily adapt it to do what
I want. If I decide I really hate it, I can write a new one in an
hour. I'm an expert on it, because I developed it, and I wouldn't
have to take on the technical, social and emotional burdens of
"forking" an open-source codebase if I wanted to make a change in direction.
I'm moving towards a vision of web app architecture where we move
towards shared vocabulary and standardized interfaces. Rather than
working with a "comprehensive framework" that does everything, I'd like
to have a "framework construction set" that contains a number of
elements that I can take or leave."
Resources:
http://www.ryandaigle.com/articles/2006/06/30/whats-new-in-edge-rails-activeresource-is-here
mixins: http://www.symfony-project.com/book/trunk/17-Extending-Symfony
split db ideas: http://pear.php.net/pepr/pepr-proposal-show.php?id=359
http://dataspill.org/pages/projects/ruby-activeldap
More php features to look in to:
__toString works everywhere
SPL features: Regex Iterators, SplFileObject CSV support, Caching Iterator
Data: stream support
DateTime and DateTimeZone classes
set date.timezone ini setting automatically based on user?
Search engine sitemap stuff - of use at all? maybe support in rampage cms
http://p7.hostingprod.com/@www.ysearchblog.com/archives/000437.html
I want a registration info tab like Inbox.lv where they can change their personal stuff they put on file with us on the signup forms.
We may need Windows address book synchronization(this is a feature that fastmail is adding, and hotmail already has, so I guess we will have to also?) It is not a must in my books.
I want to add a new feature next to the attach button that is like send message after attached, so if they are uploading a big file the can leave and it will be sent automatically.
We MUST have an easy user interface. Fastmail has lots of features and they try to make it where you can do everything in 2 clicks or less. We need to try to do this. Fastmail is all bunched up and looks like shit though. We need to make ours more of a packed with features like fastmail, but spread out like AOL has or fastmail. This will attract all the old people and beginners of the internet who have just gotten off of AOL and moved to DSL Fastmail looks like it is only made for advanced users and is hard to get used to. We need to
Have a main Navigation bar and which is on every page, which has all the mail icons that people use the most like, compose, inbox, addressbook, options, and the main Navagation bar should be on every page at the top.Then we wil have a subnavagation bar for each other page , for example, if you were to hit the calander icon on the main navagition bar that is on the top of EVERY page, then it would take you to the calander page and show you the calander and the subnavagation bar would have all the calander icons like add events ect. I was thinking, in IMP we could have the logo at the top left coner of the page, then on the top right we could have all the main navagation icons. Both the logo and the main navagitions would be o every sign page in IMP, so it would be easy to get around. Then the sub navagation bars coulkd go where the main navagition bar is now on IMP, understand?
'Bounce' takes the currently selected emails and sends back an email to the addresses the email(s) came from saying basically that 'the email address does not exist' in a standard internet email protocol way. Some more organised spammers remove these from their lists. After sending the bounce response, the messages are deleted."
eGroupWare over Horde reasons
Linking: There is the "infolog" for linking items. An infolog item can be a to-do, call, or note. It can link to the addressbook, projects, calendar, or another infolog item. That is very flexible.
Access Control: Under Preferences, there is a "Grant Access" link for the calendar, addressbook, infolog, and projects. It allows you to select Read, Add, Edit, Delete, and Private access for each group and each user. Again, very flexible.
Categories: Multiple category selection is allowed in the addressbook, projects, calendar and infolog.
Custom Fields: I can create custom fields.
PHP_SELF
Executive summary: PHP_SELF intentionally includes extra URL garbage (or
valuable URL variables, take your pick) tacked on by the user. Don't use
it without knowing what it does.
Here's what you get when you hit the URL:
http://example.com/info.php/testing1?testing2 :
_SERVER["REQUEST_URI"] /info.php/testing1?testing2
_SERVER["PHP_SELF"] /info.php/testing1
_SERVER["SCRIPT_NAME"] /info.php
Get it? If you don't want that extra stuff tacked on by the user, use the
correct _SERVER variable. If you use REQUEST_URI or PHP_SELF, be aware the
user can affect the contents of that variable. 99% of the time, you want
SCRIPT_NAME, not PHP_SELF.
By the way, here's another test:
http://example.com/info.php/testing\<script>?testing :
_SERVER["REQUEST_URI"] /info.php/testing%3Cscript%3E?testing
_SERVER["PHP_SELF"] /info.php/testing<script>
_SERVER["SCRIPT_NAME"] /info.php
Note that the REQUEST_URI variable, which comes from Apache, is encoded,
while the PHP_SELF variable, which comes from PHP, is not. So PHP 5.2.0
still makes it possible to shoot yourself in the foot, and as I've pointed
out below, well-known PHP authorities actually recommend that you do so.
Here's the email that I sent at in July 2005:
Subject: Re: nyphp-talk $_SERVER['PHP_SELF'} not working?
Date: Friday 22 July 2005 12:05 pm
From: Michael Sims <jellicle@gmail.com>
To: NYPHP Talk <talk@lists.nyphp.org>
On Thursday 21 July 2005 17:16, Dan Cech wrote:
You could put:
$_SERVER['PHP_SELF'] = $_SERVER['SCRIPT_NAME'];
into one of your common include files.
Yes. I'm afraid I don't understand this entire thread. Apparently
because of the numerous PHP developer articles recommending it, and
because of the php.net page which for whatever reason lists it first on
the list of predefined variables, people are using PHP_SELF when they
really want SCRIPT_NAME. SCRIPT_NAME solves all the problems mentioned
in this thread - it's just the script name, without any extra garbage
that might be tacked on by the user. PHP_SELF explicitly includes that
extra garbage, so solutions in this thread that involve stripping the
garbage off of PHP_SELF to make it safe are really, really missing the
point - just use SCRIPT_NAME instead. Please don't use FORM ACTION="";
according to the spec, what the browser does with that is undefined, so
even if it works in current browsers, it might not work in future ones.
People can be forgiven for making this mistake -- I'm here holding my
copy of _Learning PHP 5_, and it recommends on page 8 and again on page
86 the use of PHP_SELF for self-referencing forms, ahem -- but it's time
to put it to bed: PHP_SELF is unsafe for any usage where it is echoed
back to the page.
SESSIONS:
I'll try to reply to this and some other people who replied to my previous message.
I'll start with my background. I've often been the person who the buck stops with --
somebody else develops an application that almost works (perhaps even puts it in
production) and then I have to clean up the mess. The app might be written in PHP,
Java, Cold Fusion, Perl, you name it. I've learned to see session variables as a "bad
smell".
When I develop my own applications, I use cookies for personalization and caching. I
use the authentication system described in
http://cookies.lcs.mit.edu/pubs/webauth:sec10-slides.ps.gz
this mechanism can carry a "session id", which in turn can be used a key against
application state stored in a relational database. I think through the boundary cases,
and find that my greenfield apps behave predictably -- my only woe is that you'll
discover that browsers have a lot of undocumented behavior connected with cookies, form
handling, and caching. All problems that you still need to fight with if you use
sessions, see the comments for
http://www.php.net/manual/en/function.session-cache-limiter.php
The context of this is that the average web application is poor in the areas of
usability and security: recent studies show that 80% of web applications have serious
security problems
http://www.whitehatsec.com/home/resources/presentations/files/wh\_security\_stats\_webinar.pdf
Jacob Nielsen's website has been chronicling the sorry state of web application
usability:
Perhaps the top 20% of programmers can write applications with $_SESSION that don't
have serious security and usability problems, but what about the other 80%?
(1) Session variables are treacherous. Odd things can happen in boundary cases, such
as when sessions expire, or when you are targeted by session fixation attacks.
http://shiflett.org/articles/security-corner-feb2004
I've looked at many apps that use sessions that seem to be working... Until you walk
away for two hours, come back, and discover that you're logged in as somebody else. I
suppose I could have spent hours or days tracking down an intermittent problem, which
involved some confluence of browser oddness (IE was fine, Firefox was screwy), the
behavior of the session system, and crooked logic in the application. Or I could use
cryptographically signed cookies to implement an authentication system which won't give
me surprises in the future.
Anybody can write applications that work 95% of the time with $_SESSION. Getting the
other 5% right requires a deep understanding of state and statelessness on the web...
Which is what (many) people are trying to avoid when they use $_SESSION variables.
There are more than twenty configuration variables that affect the way sessions work
under PHP. Incorrect configuration of any of these can cause applications to fail,
often in intermittent ways. The use of a custom session handler can have unpredictable
effects on security, reliability and performance.
Other languages are a lot worse than PHP -- the use of the "scope" concept in
languages such as Cold Fusion and Tango makes it easy to use a session variable without
realizing it... Resulting in an application that "works" sometimes, but fails in
mysterious ways.
(2) Session variables are bound to a particular language. In the real world, I work
with legacy systems that might be written in other languages. I might have some old
pages in Cold Fusion that work just fine, and I won't rework them in PHP until I've got
a good reason. If users can set a customization parameter, such as the background of a
page, it's easy to write a cookie that all languages can read. Applications stuck in
the session variable roach motel aren't as maintainable and portable.
(3) PHPSESSID. Do I need to say more? I consider the client that wants user tracking
and can't accept cookies, so all the pages on their
site look like
http://www.example.com/about\_us.php?PHPSESSID=**pseudo-random blob**
Three months later they come back and wonder why their site isn't being indexed in
Google. Yes, there's a saner way to use this feature, but this "cure" to privacy
violation is worse than the cookie "disease", since session ids will leak out through
referrers, bookmarks, links that people cut-and-pate...
(4) The back button. When somebody asks a question about sessions on a forum, they'll
usually ask another question a few days or weeks later: "How do I disable the back
button?"
The underlying problem is a deep aspect of the structure of the web. There is certain
state information that's particular to a request (GET and POST variables) and certain
state information that has a more persistent scope (cookies, session information, a
relational database.) The back button makes it possible for these two things to get out
of sync.
Ultimately, we need a systematic strategy to deal with this. One pattern is to put
the complete state of the application in form variables. Applications that use this
pattern always work perfectly with the back button. This pattern doesn't work always
(hitting the back button shouldn't cancel your order on an e-commerce site), but it
works often... For instance, you can use hidden variables to hold onto form variables
for complicated forms that spread over several pages,
(5) Multiple windows. I think it's a human right to be able to have more than one window
open on a web site. If I'm shopping, for instance, I'd like to be able to look at two
products simultaneously. An application that keeps state in form variables doesn't care
how many you have open. If you're looking for jobs at an organization that uses
taleo.net's software, you'll find that it uses trickery to prevent you from having more
than one window open... So you can't look at two jobs at once, or look at the job
description while you're filling out the application. I suspect that they did this
because they don't want to spend forever debugging "race conditions" that could be caused
by a user acting in two windows simultaneously.
Session variables introduce problems of locking. PHP gets an exclusive lock on the
session for each page displayed. This hurts the performance of pages that use
dynamically generated images and Javascript, and can mysteriously deadlock AJAX
applications.
(6) Scalability, Reliability, and all that. This is a tricky one, because it depends
on particulars. Sessions can be lightning-fast in systems that keep them in RAM, such
as Java and Cold Fusion. The default session handler in PHP uses files, and is probably
faster than a relational database in a direct comparison: however, the session handler
will load all of the data into RAM, whereas a relational implementation may only need to
load information when it's needed. Keeping information in POST variables or cookies also
involves a tradeoff -- this is as scalable as it gets so far as server resources, but
requires that the state be passed back and forth between the browser and server. This is
no big deal if the state is 500 bytes. It's unacceptable if the state is 500 megabytes.
In most cases, it starts looking expensive when we're passing an extra 10k-100k around.
I've recently been working on a legacy app that contains a query (select a subset of
items) and reporting (display user-selected fields of those items) function. The
interface between those modules is simple: the query system passes a comma-separated
list of item identifiers to the reporting system. I like this, because it meant that
one system could be changed without affecting the other. I had to update the app so it
would work with a changed database schema, so both sides needed some work.
I discovered that the app was passing the item list as a session variable. This worked:
unless I was using the application in two windows at a time. In that case, a query in
one window would change the report delivered in another window. I thought about it, and
realized that in this case, result sets would always be under about 10k, and usually be
around 1k. Therefore, it made sense to pass this as a hidden variable in the form and
ditch the session variable.
This shows the kind of problems that regularly turn up in the applications that
developers "throw over the wall" to testers and clients. Choose a session variable, and
your application behaves mysteriously for a user who didn't respect the "one window at a
time" assumption you made. Passing hidden variables in forms, on the other hand, might
work OK when you're testing with a small data set over a LAN, but could rapidly become a
performance nightmare for dialup users using a production database.
Performance can be improved in a number of ways: for instance, by delta-sigma
compressing the item list, or creating a "form scope" variable that's keyed against a
unique identifier in the form. Either way, quality web applications take quality
thought.
(7) Lack of engineered application state: Engineered Application State is the gem of
database-backed web applications.
If you keep the state of your application in a relational database, you need to ~design~
the state of your application. You need to ~think~ every time you add or change a table
in your relational database. You can add a new variable to your application as easily as
typing '$'.
Desktop apps keep the application state in a tangle of pointers. C and C++ applications
tend to contain 5 or more defects per thousand lines of code. Errors show up in data
structures over time, just as mutations occur in your cells. Memory leaks, application
hangs, and crashes are cancers caused by these mutations.
PHP apps die at the end of each request, and are reborn for the next request. They
don't accumulate errors over time. Web application environments such as Java and Cold
Fusion that involve a long-running process regularly hang or crash and require restarts.
When is the last time you've had to restart PHP?
A database protects you from errors in multiple ways. Transactions, for instance,
protect against data corruption caused by crashing scripts. It's easy to write
$_SESSION["logged_in"]=true;
in one place and
$_SESSION["logged-in"]=false;
in another, introducing unpredictable behavior and security holes. A relational
database will give you an error if you try something like that.
Can users of $_SESSION avoid the seven deadly sins?
Yes.
In practice they don't.
Paul,
That looks like a lot of info to digest without specific examples. Is there a book or
other resource on session management that you recommend that deals with these issues in
more detail?
Thanks.
-Leo
I'm not aware of one, but I wish there was. I think the question isn't so much "session management" but about how to manage state in a stateless protocol -- sessions
are one abstraction for doing that, but other abstractions exist too.
I think the best approach here is the "Pattern Vocabulary" approach. There are
certain practices, that when applied to an application, have certain results.
For instance, there's the pattern of "Stateless Server" -- the complete state of the
application (or subsystem thereof) is kept in hidden POST and GET variables. You accept
some limits, but get some real benefits: infinite scalability, no headaches with the
back button, no need for cookies...
You might try the above and then notice that you're passing 100K around in your hidden
form variables... People are complaining that your app is slow. Now you can generate a
unique id each time you draw a form ("Generated Form Scope", for lack of a better term.)
You can stuff your "hidden" variables into the database under this key, and restore
them when the key comes back... If your code is organized right (does something like
$vars=$_POST, and only looks at $vars afterwards), you can do this transparently to the
rest of your app.
The same kind of thinking can protect you against certain kinds of back button woes --
you can at least stop people from submitting the same form more than once, by checking
to see if a form with that unique id has been submitted before.
"Shopping Cart" is another pattern. People often use session variables to handle
shopping carts, but that's really not ideal from a user interface perspective...
Ideally, each instance of a shopping cart has it's own unique id... Imagine we want to
make an e-commerce site that behaves like amazon.com:
(1) User visits e-commerce site from a home computer -- a long-term tracking cookie gets
stuck on their browser
(2) User adds item A to their shopping cart... A new shopping cart is created with id
#101, associated with the tracking cookie. (3) User adds items B,C,D, and E to their
shopping cart in the course of 30 minutes of browsing. Each time an item is added, we
add a row to a table in the database that links the item id to the shopping cart id.
(4) 4-year old hits reset button
(5) User comes back to e-commerce site... He's happy to find his cart is still there.
User creates account #202 to check out. Shopping cart #101 is associated with account
#202
(6) User checks out shopping cart.
(7) User comes back a week later, wants to buy a few more items. The site recognizes
who he is. He adds two of item A and an item F to a newly created shopping cart with id
#102, associated with user account #202.
(8) User goes to work, logs in... The system sees that he has shopping cart #102 open.
He adds item G, and then checks out.
(9) User learns that he can trust this site to work correctly and becomes a loyal
customer.
It's nice that we've got a historical record of the shopping cart after the fact, but
there's a more important point -- we could have lost the customer's dollar at many points
in the above transaction if we were using a $_SESSION based cart. The session wouldn't
have survived step 4, for instance. A good user interface isn't academic here... It
puts money in our pocket.
The above scenario is complex, and it might not be fair to expect that a
first-generation shopping cart has those features. A $_SESSION-based shopping cart would
need to be completely reworked to add the features above. A cart that uses a unique
"cart id" and relational back end, will be a lot more maintainable... You could even
start out using $_SESSION to keep track of the "cart id", then keep it in a cookie,
then associate it with a user name, add the facility to promote an anonymous cart to an
authenticated cart and so on. Starting with a good design, we can provide the interface
that we ~want~ to provide, not that one that our abstract layer ~forces~ us to provide.
In regards to slides 29 and 30, can you elaborate and give a more detailed
example what they are trying to say? Are they saying that the session key
should contain a hash of the data? Or does the hash become the "salt" in
crypting the data? Finally, how does doing that make it easier to prevent
circumvention and forgeability.
Let's take it a step at a time... Imagine we've got a token of the following format...
$token="$user_id:$session_id"
The session_id doesn't have to be unpredictable -- it could could from an
auto_increment column in a database table... With the caveat that people could estimate
the usage of your site by looking at the session id's.
You could put this in a cookie, and it would work quite well, as long as you didn't
have users who knew how to look at or change the cookies. An attacker who understands
cookies can easily change the user id, or session_id.
To protect the cookies from tampering, we could do something like
$hash=sha1($token);
$signed_token="$hash:$token";
We could check the integrity of the token by recomputing the hash and see if it
matches the one in the signed token. This protects against accidental damage, or very
simple attacks. Still, it's quite possible that an attacker could guess what you're
doing: it wouldn't be safe at all in an open source system.
That's where the salt comes in... For a particular web site, we create a random
"salt" that, effectively, gives us a unique hash function for our web site.
$salt="... a random salt defined in a per-site configuration file ...";
function private_hash($token) {
global $salt;
return sha1("$salt:$token");
}
$private_hash=sha1("$salt:$token");
$signed_token="$private_hash:$token";
Now, nobody can alter your tokens unless they know your salt.
Because the tokens are cryptographically signed, the token itself is a proof that
somebody has logged in -- you don't need to look at the database or keep ~any~ server
side state. This makes it a highly scalable system... This basic approach is used on
some of the biggest sites in the world, such as yahoo.com.
Except for one little detail: replay attacks.
Nothing stops a person from saving his token and presenting later -- after his account
may have been deactivated, or after associated session information has been purged (an
error condition.) An attacker that gets the person's cookie jar, or who intercepts
network traffic, can also steal the token.
It's not possible to completely protect against sophisticated attacks where a hostile
party controls your network without installing complex software on both ends, and
solving some intrinsically difficult problems having to do with mutual authentication.
Let's just say that the developers of SSL have solved these problems, and that you
should use SSL for applications with the strongest security needs.
We can, however, make replay attacks a lot harder by adding a timestamp... Now the
token looks like
$timestamp:$user_id:$session_id
Now we're keeping a table on the server that looks like
create table session (
session id ... session id ... primary key
user_id ... user id ...,
last_updated ... timestamp ...,
begin_time ... timestamp ...,
end_time ... timestamp ...
);
Now we've got two constants:
REFRESH_TIME: how old a timestamp is before we issue a token with a new timestamp and
write the timestamp to the last_updated column.
EXPIRE_TIME: how old a timestamp is before we eliminate the session.
You might think you could put the client ip address in the token, and lock the
session to an ip address to make it harder to steal tokens. I tried this, but found out
that some of the largest ISPs (such as aol) have a proxy server that makes users seem to
"jump around". You can do it if you know people are logging from a sane ISP, but you
can't do it in general.
This system can be improved in numerous ways, such as adding anonymous sessions,
operating in a split http/https mode, and caching authorization system in the token.
If you're worried about information leakage (you don't want someone to know that he
got session 88427 yesterday and 99105 today), you can encrypt the token. But be
careful... It's easy to use cryptography the wrong way: don't rely on encryption to
protect token integrity against tampering -- most of the obvious schemes don't really
work.
cookie usage:
20 per domain, 4094 characters (bytes) in the value
Horde_Model -> Horde_Rdo_Model extends it
Horde_Type
Page/Block object
Mapper! _Mappers are the drivers_
Nag - tasks are a model
different models for different sources of tasks
so maybe horde_rdo_model isn't extension but delegate?
types are string, etc.
types can be used by rdo as well as by forms (models)
form helpers go into horde_view helper pack
Horde_Model:
validation:
validatesPresenceOf
validatesUniquenessOf
validatesAcceptanceOf
validatesConfirmationOf
one database, one real filesystem space
no globals
webroot has:
index.php
.htaccess
assets/ (css, images, js)
mod_rewrite rules
everything else pear-installable
make assets pear installable somehow
viewbuilder/pagebuilder - custom views
command line and web service actions (still api/method/params)
catalyst::message() - replaces logmessage - fatal, notification, observer - has a return value (?)
session object management
cms for rampage based on (replacing) ulaform + wicked + giapeto
horde_form
reconcile driver architecture with Rdo Models
apps provide models instead of forms?
apps provide route bundles? (if frontcontroller)
forms are models!
reconcile models and mappers
what do routes point to (models? mappers? views?) -> controllers
controllers handle mappers vs. models?
composite mapper? (turba, etc.)
After reading that theserververside.com entry, it seems like we've been doing this in Solar (framework for PHP5) for a little while now. Essentially, after processing a form, you call $this->_redirectNoCache('controller/action') and you shouldn't get any re-POST troubles.
Boring code from the page-controller follows.
<http://solarphp.com/svn/trunk/Solar/Controller/Page.php>;;
/\*\*
\*
Redirects to another page and action after disabling HTTP caching.
*
The _redirect() method is often called after a successful POST
operation, to show a "success" or "edit" page. In such cases, clicking
clicking "back" or "reload" will generate a warning in the
browser allowing for a possible re-POST if the user clicks OK.
Typically this is not what you want.
*
In those cases, use _redirectNoCache() to turn off HTTP caching, so
that the re-POST warning does not occur.
*
This method sends the following headers before setting Location:
*
{{code: php
header\("Cache-Control: no-store, no-cache, must-revalidate"\);
header\("Cache-Control: post-check=0, pre-check=0", false\);
header\("Pragma: no-cache"\);
}}
*
@param Solar_Uri_Action|string $spec The URI to redirect to.
*
@param int|string $code The HTTP status code to redirect with; default
is '303 See Other'.
*
@return void
*
*/
protected function _redirectNoCache($spec, $code = 303)
{
* reset cache-control
$this-\>\_response-\>setHeader\(
'Cache-Control',
'no-store, no-cache, must-revalidate'
\);
* append cache-control
$this-\>\_response-\>setHeader\(
'Cache-Control',
'post-check=0, pre-check=0',
false
\);
* reset pragma header
$this-\>\_response-\>setHeader\('Pragma', 'no-cache'\);
* continue with redirection
return $this-\>\_redirect\($spec, $code\);
}
apps provide models instead of forms
apps provide route bundles
apps provide controllers
seekable iterators?
use of ArrayIterator
adding LimitIterators and FilterIterators on top of Rdo
match up RDO with making resources first class - a wiki page, a task, etc. all get a URI
Meanwhile HTTP was designed for access to resources, the ìprimary keyî being determined by itís URL (vs. having to worry about the insert id). If you think ìdocumentsî, itís clear thereís no need to make a distinction between creating and updatingócreating a document results in the first version. Updating means overwriting an existing document with a new version. But in both cases the client is POSTing the same thing and does not need to be aware of whether the document already existed or not.
Meanwhile a common first demo app for server side frameworks is a CRUD example. The implication here is frameworks place a strong emphasis on the database, while HTTP is largely ignored (itís rare to even see HTTP status codes as a fundamental part of a framework).
Avoiding a long filesystem vs. database discussion (like the need for virtual file systems with extensible properties) suffice to sayóconsider how Dokuwiki stores wiki pages 1-to-1 as files compared to MediaWiki. What makes more sense to you? Perhaps our websites have been driven too far by the database?
The point here is, given the mismatch between HTTP and CRUD, weíve put CRUD first which in turns makes actions first class in our frameworks. We aim to support N different types of action (verbs) when really we should have been dealing with only threeóGET, POST and DELETE (the latter being perhaps re-routed to a specific ìresource classî method according to some framework / form conventions).
To me, what we should look at is the basic reasons why we want to manage web-pages and satisfy them:
centralized control over page rights and access
ability to remap urls due to changes in web-site structure
handling 404-errors intelligently
ability to dynamically add headers and footers to pages for displaying alerts such as "system going down at 5pm"
separates content from presentation in a reasonable manner, eg. with templates
managing tainted data (eg. POSTS, GETS, COOKIES)
AJAX Considered Harmful
Please pardon the provocative title, but this post is intended to
surface one point I buried in yesterday's presentation in the hopes
that by making it a separate post it will attract a wider audience.
I intend for this to post to be constructive, so I will focus on two
specific suggestions which hopefully will serve as the seed for the
development of a set of best practices for AJAX. Here are the two
humble suggestions on things that people should standardize on:
the data should first be encoded as octets according to the
UTF-8 character encoding
GET should never be used to initiate another operation which
will change state
Rationale for these two suggestions follows.
Encoding
For the former, I proposed a simple test:
The first thing I want you to do is to copy the string
ìIÒtÎrntiÙnlizÊti¯nî into your tool
and observe what comes out the other side.
When expressed as a part of the query component of a URI, it should
look like I%C3%B1t%C3%ABrn%C3%A2ti%C3%B4n%C3%A0liz%C3%A6ti%C3%B8n.
Standardizing improves interoperability, and the reason why I am
suggesting UTF-8 is that it is backwards compatible with ASCII, can
express the full range of the Unicode character set, and is widely
implemented.
Idempotency
Looking into the current PHP implementation of SAJAX, you will see the
following:
header ("Expires: Mon, 26 Jul 1997 05:00:00 GMT"); * Date in the past
header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
* always modified
header ("Cache-Control: no-cache, must-revalidate"); * HTTP/1.1
header ("Pragma: no-cache"); * HTTP/1.0
This code should be a rather large clue that you are probably doing
something wrong. Apparently the author recognized that these headers
are somewhat sporadically and inconsistently implemented, and hoped
that by combining them that the chances of success would be improved.
The danger that the responses may be cached is actually the smaller of
several concerns. A much bigger concern is that unsuspecting
grandmothers and bots everywhere can be tricked into modifying online
databases simply by following a link.
Judicious use of HTTP GET can be a very good thing. Perhaps toolkits
can adopt a convention that procedure names that start with the
characters ìGetî use GET, everything else uses POST.
possible dispatcher:
<?php
define('HORDE_CONFIG_DSN', 'file:*/var/horde/config/head-site1');
define('HORDE_BASE', '/var/horde/head');
require_once HORDE_BASE . 'core.php';
Horde_Rampage_Dispatcher::run();
no-cache headers:
header("Cache-Control: no-store, private, must-revalidate, proxy-revalidate, post-check=0, pre-check=0, max-age=0, s-maxage=0");
meta tags to include:
<meta name="MSSmartTagsPreventParsing" content="true" />
<meta http-equiv="imagetoolbar" content="no" />
Things to watch out for:
PHP_SELF
SERVER_NAME
Referer - never depend on it
passwords - don't use just md5, add a salt.
or, consider everything in $_SERVER tainted. Of course $_GET, $_POST,
$_REQUEST, $_COOKIE are.
Edge cases: $_SESSION, backend databases. If you don't consider it
input, then it's part of your application for security purposes.
Never display credit card info - this means it shouldn't be
repopulated!
Filtering is _inspection_, not correction. Don't try to correct
invalid data. Casts are relatively safe but still miss simplistic
attacks.
When possible, whitelist - prove data valid. Simple list of values, or
a regexp. Everything else is bad.
Need a model for making the filtered data clearly available, and don't
touch the tainted data.
ctype_* - fast, and charset aware. Much better than regexp tests.
Output filtering
Escaping is preservation, not changing data.
HTML, javascript, cli output, session data, rss feeds, XML, etc. Any
remote destination.
Need a clever way to integrate this into the template system! Perhaps
a content-type on variables (too much?) of text/html, text/plain,
text/xml, etc.? How about instead of tag, have text:foo, html:foo,
xml:foo? Or <tag:foo type="html">, defaulting to type="text".
Escaping MUST be charset aware. Data escaped for us-ascii might result
in JavaScript in Japanese (not necessarily a valid example).
For filtering complex data, use checksums instead.
fopen_wrappers - turn off if possible?
display_errors - write a custom error handler, handle errors elegantly
& integrated with Log object.
complexity leads to mistakes
http://www.midgard-project.org/updates/2003-05-29-000.html
So, MidCOM is about standardizing how to build Midgard applications
and site features. Lets look at each of the points in more detail
Standardized URL-to-object mapping
Before MidCOM Midgard site and application developers have had to
figure out how to map URL requests into Midgard objects, typically to
topics and articles. Everybody has rolled their own solution for this,
using object names, IDs or GUIDs as the identifiers, and using either
GET parameters or active page arguments.
With MidCOM, application development doesn't any more have to start by
writing a URL parser, as the MidCOM system provides this already. URL
parsing happens completely in topic and article space, using object
names as the identifiers. This makes for very clean URLs. Consider the
following:
/gallery/spring-2003/IMG_2442.html
This example would translate to article named "IMG_2442" in topic
"spring-2003" under topic "gallery". Clean, pronounceable and easy to
use. An even better, any Midgard object instanced using a MidCOM
component is aware of its location, providing the URL through MidCOM's
metadata API.
Standardized object-to-application mapping
In addition to connecting URLs to Midgard objects, URLs also need to
be connected to specific applications, or in MidCOM terms, components.
All topics in MidCOM are assigned to be managed by a component. This
means that different parts of the site can work in different ways. For
example, URL:
/news/midgard-tutorial.html
Could load a "news ticker" component, and provide the topic "news" and
article "midgard-tutorial" to be handled and displayed by it.
The newsticker component can fully control the administrative
interface for managing content under it, and the output provided by
URLs it manages.
Component is selected for each topic separately. This means that
example URL:
/news/contacts/bergius.html
Could be handled by a "employee directory" component.
Standardized navigational system
Each MidCOM component provides all navigational information about
objects managed by it to a system called NAP, which is accessible by
an easy object-oriented API.
The NAP system means that site developers don't worry about different
components or object types when writing the site's navigational
interface. You can write one script for generating the whole site
navigation, and it will work with the site and any component under it.
This makes standardized navigational tools like breadcrumbs or the
NemeinNavBar utility much more useful, as they can be used with any
MidCOM-based site. I expect that in near future site developers will
have a huge library of prebuilt navigational systems to select from.
Standardized object extensibility API
Enabling content managers to define their own object types or metadata
fields has always been a problem with Midgard, meaning that any new
metadata field has forced site developers to write their own content
creation UIs.
MidCOM provides an easier system for this called datamanager. With
datamanager, site developers can define their own customer data
structures, called "layouts". Layouts are PHP arrays telling
datamanager what fields to allow for objects handled for that
component, how to present those fields in an administrative interface,
and where to store them (parameters, object fields or attachments).
Using datamanager component writers don't really have to care about
what object fields site developers will want to use, they just need to
use the datamanager utility. Data structure "layouts" can be provided
as part of the default component configuration, and can be overridden
on a per-sitegroup basis.
Datamanager is integrated to the MidCOM AIS content management
interface, providing customized editing forms for all components based
on widgets defined in the "layouts" configuration. The widgets can be
anything from text input boxes to a WYSIWYG editor or image upload
system.
Standardized way to make application output configurable
The MidCOM specification requires that all application output is
handled through the MidCOM style system. MidCOM's style engine is an
extension of the Midgard style engine, allowing component outputs to
be configured using style elements, but also for fallback elements to
be provided as snippets.
This means that output of any MidCOM component will be fully
configurable by site developers using the familiar Midgard style
engine. Style to be used can be defined separately for all topics,
allowing for different output styles from same components on per site
area basis.
Because components can be loaded dynamically to a Midgard page, site
developers can have different parts of the same page use different
styles, making administration of the style elements much easier.
Conclusions
MidCOM brings into Midgard something that has been lacking so far: a
"write once and run everywhere" framework for building site
components, styles and navigational tools.
This promotes component sharing and code reuse, both within a single
Midgard solution provider company, and within the international Open
Source community.
So far Midgard has provided a nice content management framework, but
actual sites have needed to be built from scratch. MidCOM promises to
change that, making Midgard much easier to implement.
Of course, sloppy coding is still possible with MidCOM, but if
component writers adher to the MidCOM specification, PEAR coding
standards and use NemeinLocalization for internationalizing their
components, we should achieve global reusability.
I invite all Midgard developers to seriously study and consider MidCOM
for their projects. There is some learning curve, but real code
reusability should repay that very quickly.
The Midgard Framework is a powerful toolkit for managing online
information. Writing applications and functionalities to the platform
is done using the easy-to-learn PHP scripting language. All
interfacing with the system is done via a regular Web browser, and no
special tools are needed for developers or content authors.
Main features of Midgard Framework include:
Midgard works on most common UNIX platforms, including Linux, FreeBSD
and Solaris. Prebuilt binary packages are available for some Linux
platforms (including Red Hat, Debian and Mandrake), and the system can
be installed from sources to most other environments.
For other environments, including hosted servers and Windows systems,
there is the pure-PHP implementation, Midgard Lite.
The Midgard Application Server is free software developed
internationally with the Open Source model and distributed under the
GNU licenses. Commercial support, applications and services for the
platform are available from a range of companies worldwide.
The PHPmole toolkit provides Midgard developers with a
freely-available Integrated Development Environment (IDE) comparable
to DreamWeaver and MS Visual Studio, with additional content
management functionalities.
With the Midgard CMS package, the ease-of-use of productivity software
and office suites can be brought to Midgard content management.
query building:
<?php
$query = new MidgardQueryBuilder("MidgardArticle");
Next add the SQL constraints you need
List articles only from specific topic
$query->addConstraint("topic", "=", $topic->id);
$query->addConstraint("approved", ">", $starting_time);
$query->addOrder("approved", "DESC");
$query->setLimit(20);
// Start from the Nth page of this article list
$query->setOffset($_REQUEST["startfrom"