Here is a list of standard options.
- Optimize the access to the database to only do what you need, efficiently. A good DBA can help here a lot. This is a basic step that most companies do.
- Cache data away from the database using something like memcached. This is usually done at the application layer, and is highly effective. Virtually every competent website should do this.
- More ambitiously, maintain read-only copies of the database, and direct queries there when possible. On the database side the necessary technology is called “replication” and the read-only copies are often also backups for failover from the main database. If you’re doing a million dynamic pages per hour, odds are that you are doing this, or have thought about it.
- Buy really, really expensive hardware for the database. I know that PayPal did this as of 4 years ago, and changing their architecture would have been difficult so they possibly still are.
- Shard the database into multiple pieces with ranges of data. This is a very intrusive change into application design. A well-known example of a company that does this is eBay.
- Try to use a database that scales onto multiple machines. Oracle RAC scales onto clusters, but doesn’t let you distribute data widely. Other offerings exist that are supposed to be easier to distribute, including Microsoft’s SQL Azure and FathomDB. I have not used those offerings and don’t know how well they work. I suspect better than nothing, but I doubt they scale horizontally that well.
- Relational databases generally try to provide ACID guarantees. But the CAP theorem makes it very difficult to do that in a distributed system, particularly while letting you do things like join data. Therefore people have come up with many NoSQL alternatives that explicitly offer weaker guarantees and avoid problematic operations in return for fully distributed scalability. Well-known examples of companies that use scalable NoSQL data stores include Google, Facebook and Twitter.