PHP 5 CMS Framework Development / 2nd Edition
上QQ阅读APP看书,第一时间看更新

Chapter 1. CMS Architecture

This chapter lays the groundwork that helps us to understand what Content Management Systems (CMS) are all about. First, it summarizes the whole idea of a CMS where it came from and what it looks like. This is followed by a review of the technology that is advocated here for CMS building. Next, we will take account of how the circumstances in which a CMS is deployed affect its design; some of the important environmental factors, including security, are considered. Finally, all these things are brought together in an overview of CMS architecture. Along the way, Aliro is introduced—the CMS framework that is used for illustrating implementations throughout this book.

The idea of a CMS

Since you are reading this book, most likely you have already decided to build or use a CMS. But before we go into any detail, it is worth spending some time presenting a clear picture of where we are and how we got here. To be more precise, I will describe how I got here, in the expectation that at least some aspects of my experiences are quite typical.

The World Wide Web (WWW) is a huge set of interlinked documents built using a small group of simple protocols, originally put together by Tim Berners-Lee. Prominent among them was HTML, a simplified markup language. The protocols utilized the Internet with the immediate aim of sharing academic papers. The Web performed this useful function for some years while the Internet remained relatively closed, with access limited primarily to academics. As the Internet opened up during the nineties, early efforts at web pages were very simple. I started up a monthly magazine that reflected my involvement at the time with OS/2 and wrote the pages using a text editor. While writing a page, a tag was needed occasionally, but the work was simple, since for the most part the only tags used were headings and paragraphs, with the occasional bold or italic. With the addition of the odd graphic, perhaps including a repeating background, the result was perfectly presentable by the standards of the time.

But that was followed by a period in which competition between browsers was accompanied by radical development of complex HTML to create far higher standards of presentation. It became much harder for amateurs to create presentable websites, and people started to look for tools. One early success was the development of Lotus Notes as a CMS, by grafting HTML capability onto the existing document-handling features. While this was not a final solution, it certainly demonstrated some key features of CMS. One was the attempt to separate the skills of the web designer from the knowledge of the people who understood the content. Another was to take account of the fact that websites increasingly needed a way to organize large volumes of regularly changing material.

As HTML evolved, so did the servers and programs that delivered it. A significant evolutionary step was the introduction of server-side scripting languages, the most notable being PHP. They built on traditional "third generation" programming language concepts, but allied to special features designed for the creation of HTML for the Web. As they evolved, scripting languages acquired numerous features that are geared specifically to the web environment.

The next turning point was the appearance of complete systems designed to organize material, and present it in a slick way. In particular, open source systems offered website-building capabilities to people with little or no budget. That was exactly my situation a few years ago, as a consultant wanting a respectable website that could be easily maintained, but costing little or nothing to buy and run. A number of systems could lay claim to being ground breakers in this area, and I tried a few that seemed to me to not quite achieve a solution.

The idea of a CMS

For me, the breakthrough came with Mambo 4.5. It installed in a few minutes, and already there was the framework of a complete website, with navigation and a few other useful capabilities. The vital feature was that it came with templates that made my plain text look good. By spending a small amount of money, it was possible to have a personalized template that looked professional, and then it took no special skills to insert articles of one kind or another. Mambo also included some simple publishing to support the workflow involved in the creation and publication of articles. Mambo and its grown up offspring Joomla! have become well-known features in the CMS world.

My own site relied on Mambo for a number of years, and I gradually became more and more involved with the software, eventually becoming leader of the Mambo development team for a critical period in the development of version 4.6. For various reasons, though, I finally departed from the Mambo organization and eventually wrote my own CMS framework, called Aliro. Extensions that I develop are usually capable of running on any of MiaCMS, Mambo, Joomla!, or Aliro. The Aliro system is used to provide all the code examples given here, and you can find a site that is running the exact software described in this book at http://packt.aliro.org.

Some people said of the first edition of this book that it was only about Aliro. In one sense that is true, but in another it is not. Something like a CMS consists of many parts, but they all need to integrate successfully. This makes it difficult to take one part from here, another from there, and hope to make them work together. And in order to give code examples that could be relied on to work, I was anxious to take them from a complete system. However, when creating Aliro I sought to question every single design decision and never do anything without considering alternatives. This book aims to explain the issues that were reviewed along the way, as well as the choices made. You may look at the same issues and make different choices, but I hope to help you in making your choices. I also hope that people will find that some of the ideas here can be applied in areas other than CMS frameworks.

From time to time, you will find mentions of backwards compatibility, mostly in relation to the code examples taken from Aliro. In this context, backwards compatibility should be understood to be features that have been put into Aliro so that software originally designed to run with Mambo (or its various descendants) can be used with relatively little modification in Aliro. The vast majority of the Aliro code is completely new, and no feature of older systems has been retained if it seriously restricts desirable features or requires serious compromise of sound design.

Critical CMS features

It might seem that we have now defined a CMS as a system for managing content on the Web. That would be to look backwards rather than forwards, though. In retrospect, it is apparent that one of the limitations of systems like Mambo is that their design is geared too heavily to handling documents. While every website has some pages of text, few are now confined to that. Even where text is primary, older systems are pushed to the limit by demands for more flexibility in who has access to what, and who can do what.

While the so called "core" Mambo system could be installed with useful functionality, an essential part of Mambo's success was the ability to add extensions. Outside the core development, numerous extra functions were created. The existence of this pool of added capabilities was vital to many users of Mambo. For many common requirements, there was an extension available off the shelf. For unusual cases, either the existing code could be customized or new code could be commissioned within the Mambo framework. The big advantages were the ability to impose overall styling and the existence of site-wide schemes for navigation and other basic services.

The outcome is that the systems have outgrown the CMS tag, as the world of the Web has become ever more interactive. Sites such as Amazon and eBay have inspired many other innovations where the website is far more than a compendium of articles. This is reflected in a trend for the CMS to migrate towards being a framework for the creation of web capabilities. Presentation of text, often with illustrations, is one important capability, but flexibility and extensibility are critical.

So what is left? As with computing, generally, new ideas are often implemented as islands. There is then pressure to integrate them. At the very least, the aim is to show users a single, rich interface, preferably with a common look and feel. The functionality is likely to be richer if the integration runs deeper than the top presentation level. For example, integration is excessively superficial if users have to authenticate themselves separately for different facilities in the same website. Ideally, the CMS framework would be able to take the best-of-breed applications and weave them together through commonly-agreed APIs, RESTful interfaces, and XML-RPC exchanges. Today's reality is far from this, and progress has been slow, but some integration is possible.

It should now be possible to create a list of essential requirements and another list of desirable features for a CMS. The essentials are:

  • Continuity: Despite the limitations of basic web protocols, many website functions need to retain information through a series of user interactions and the information must be protected from hijacking. The framework should handle this in a way that makes it easy for extensions to keep whatever data they need.
  • User management: The framework needs to provide the fundamentals for a system of controlling users via some form of authentication. But this needs to be flexible so that the least amount of code is installed to handle the requirement, which can range from a single administrative user to handling hundreds of thousands of distinct users and a variety of authentication systems.
  • Access control: Constraints are always required, if only to limit who can configure the website. Often much more is needed as various groups of users are allocated different privileges. It is now widely agreed that the best approach is the Role-Based Access Control (RBAC) system. This means that it is roles that are granted permissions, and accessors are allocated roles. It is preferable to think of accessors rather than users, since roles also need to be given to things other than just users, such as computer systems.
  • Extension management: A framework is useful if it can be easily extended. There is no single user visible facility that is essential to every website, so ideally the framework is stripped of all such functions. Each capability visible to users can then be added as an extension. When the requirements for building a website are considered, it turns out that there are several different kinds of extensions. One well known classification is into components, modules, plugins, and templates. These are explained in detail in Chapter 8,Handling Extensions.
  • Security and error handling: Everyone is aware of the tide of threats from spam to malicious cracking of websites. To be effective, security has to be built in from the start so that the framework not only achieves the best possible security, but also provides a helpful environment for building secure extensions. Errors are significant both as a usability problem and a potential security flaw, so a standard error handling mechanism is also required.

Desirable CMS features

Most people would not be content to stop with the list of critical features. Although they are the essentials, it is likely that more facilities will be needed in practice, especially if the creation of extensions is to be made easy. The list of desirable features certainly includes:

  • Efficient and maintainable code handling: The framework is likely to consist of a number of separate code files. It is essential that they be loaded when needed, and preferable that they are not loaded if not needed. The mechanisms used need to be capable of handling extra code files added as extensions.
  • Database interface: Many web applications need access to a database to be able to function efficiently. The framework itself needs a database to perform its own functions. While PHP provides an interface to various databases, there is much that can be done in a CMS framework to provide higher level functions to meet common requirements. These are needed both by the framework and by many extensions.
  • Caches: These are used in many different contexts for Internet processing. To date, the two most productive areas have been object and XHTML caching. Both the speed of operation and the processing load benefit considerably from well implemented caches. So it is highly desirable for a CMS framework to provide suitable mechanisms that are lightweight and easy to use.
  • Menus: These are a common feature of websites, especially when taken in the widest sense to include such things as navigation bars and other ways to present what are essentially lists of links. It is not desirable for the framework to create final XHTML because that preempts decisions about presentation that should belong to templates or other extensions. But it is desirable for the framework to provide the logic for creating and managing menus, including a standard interface to extensions for menu creation. The framework should also provide menu data in a way that makes it easy to create a menu display.
  • Languages: Nowadays, as a minimum, software development should take account of the requirements imposed by implementation in different languages, including those that need multi-byte characters. It is now broadly agreed that part of the solution to this requirement is the use of UTF-8. A mechanism to allow fixed text to be translated is highly desirable. The bundle of issues raised by demands for language support are usually described using the terms internationalization and localization. The first is the building of capabilities into a system to support different ways of doing things, of which the most prominent is choice of language. Localization is the deployment of specific local characteristics into a system that has been internationalized. Apart from language itself, matters to be considered include the presentation of dates, times, monetary amounts, and numbers.

Many other services are useful, such as handling the sending of e-mails, assistance in the creation of XHTML, insulating applications from the file system, and so on. But before considering an approach to implementation, there is an important matter of how a CMS is to be managed.

System management

In this discussion of system management, it is assumed that a web interface is provided. The person in control of a site, typically called the manager or administrator, is often in the same situation as the user of the site. That is to say, the site itself is installed on a hosted web server distant from both its users and its managers. A logical response to this scenario is to implement all interactions with the site through web interfaces.

There are disagreements about how much, if any, system management should be kept apart from user access. One school of thought requires a distinct management login using a slightly different URI. Opposing this is the view that everything should be done from the same starting point, but allowing different facilities according to the identity of the user. Drupal is the best known example of the latter approach, while Mambo and Joomla! keep the administrator separate. Aliro continues along the path trodden by Mambo and Joomla!

There is some justification for the idea that everything should be merged, with no distinct administrator area. As the CMS grows in sophistication, user groups proliferate; the distinction between an administrator and a privileged user is hard to sustain. Typically, visitors may be given quite a lot of read access to site material, but constrained write access, mainly because of misuse problems. But users who have identified themselves to the site may be given quite extensive capabilities. These might extend to having areas of the site where they are able to publish their own material. The registered user can thus become an administrator of his/her own material, needing similar facilities to a site administrator.

The argument in favor of splitting off some administrative functions is largely to do with security. Someone at the highest administrator level is likely to have access to tools that are capable of destroying the site and possibly the whole server. With everything merged, the safety of key administrative functions depends critically on the robustness of user management. It is difficult to be completely confident in this, especially as the total volume of software deployed on a site becomes large. Allowing access to the most sensitive administrative functions only through a distinct URI and login mechanism allows for other security mechanisms to be combined with the CMS user management. This might be a different user and password scheme implemented using Apache, or it might be a constraint on the IP addresses permitted to access the administrator login URI. No security mechanism is perfect, but combining more than one mechanism increases the chances of keeping out intruders. More information is said about security issues in a later section of this chapter.

Because of the separatist arguments, Aliro is implemented with a distinct administrator login to a small range of critical functions. Extensions added to the CMS have the ability to implement an administrator-side interface, but are free to make their own design decisions on the balance to be struck. The functions provided by the Aliro base system for administrators are as follows:

  • Basic system configuration such as details of databases used, caching options, mailing options, and presentation of system information
  • Management of extensions through the ability to install packages of software or to remove them, and the ability to manage what appears on which display
  • A particular part of extension management is the handling of themes (formerly known as templates in the Mambo world) that affect the presentation of the whole site
  • Management of a folder system that supports a tree structure of arbitrary depth, around which site content can be constructed
  • Creation and management of menu information
  • Access to error reports that contain detailed diagnostic information
  • A generalized system for modifying URIs to be friendly to humans and search engines, and to manage metadata
  • Whatever management functions are provided by extensions to the basic CMS

In Aliro, some of the critical classes that provide these facilities are not known to the general user side of the system, which provides another obstacle to misuse. Indeed it is possible to rename the directory under which code exclusive to the administrator side of the system resides. Code on the general user side does not have any straightforward means to find out where the administrator code exists. On balance, I believe that splitting off the most fundamental administrative functions is the more secure policy.

Now we have lists of essential and desirable CMS features, together with a set of administrator functions. We also need to start thinking about the technology needed for building a CMS.