├── .gitignore ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── _pages ├── a-real-world-scalable-architecture.md ├── about.md ├── contribute.md ├── contributors.md ├── development-and-workflow.md ├── elastic-architecture.md ├── object-caching.md ├── page-caching.md ├── query-performance.md ├── resources.md ├── scale.md └── searching-for-scale.md ├── diagrams ├── continuous_integration.png ├── horizontal_scale.png ├── mysql_replica.png ├── object_cache.png ├── page_caching.png ├── pull_requests.png ├── real_world.png ├── search_index.png ├── simple_cluster.png └── varnish_vs_batcache.png └── logo.png /.gitignore: -------------------------------------------------------------------------------- 1 | # PHPStorm Files # 2 | ################## 3 | .idea/ 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | This repo is linked to the [scalewp.io](https://www.scalewp.io) website. The core website content is managed here to facilitate contributions from the community. 2 | 3 | While we do some amount of discussion in issues, the best contributions start with a concrete suggestion in the form of a pull request. 4 | 5 | Pull requests merged to master go directly to the site. 6 | 7 | ### Editing existing pages 8 | 9 | Pull requests can be made on any files within this repository. Please fork the repository and name your branch based on the change you are making. 10 | 11 | The WordPress site to which this repository connects uses the [GitHub Flavored Markdown for WordPress](https://github.com/makotokw/wp-gfm) plugin to render text. So the markdown formatting used in this repository by GitHub should match that seen on the WordPress site. 12 | 13 | #### Shortcodes 14 | 15 | Many of the files within this repository use WordPress shortcodes for related links and share widgets. Leave those sections as-is when making Pull Requests. 16 | 17 | #### Images 18 | 19 | Images are added within the git repository within the top-level `diagrams` folder rather than from the WordPress user interface. Images can be linked using GitHub's 'raw' submain using urls like https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/master/diagrams/simple_cluster.png. For an example, see [the Elastic Architecture page](https://github.com/pantheon-systems/wordpress-at-scale/blob/master/_pages/elastic-architecture.md). 20 | 21 | ### Adding pages 22 | 23 | Here is an example of text that would go in `/_pages/new-file.md`. 24 | 25 | ``` 26 | --- 27 | post_title: New File 28 | layout: page 29 | published: true 30 | --- 31 | 32 | New Text goes here 33 | 34 | ## You can use headings. 35 | 36 | ``` 37 | 38 | The new page will be picked up by the WordPress site once the new file is merged to master. This sync happens through the [WordPress GitHub Sync plugin](https://wordpress.org/plugins/wp-github-sync/). The page will have a numeric ID once WordPress saves it in the database. Following this save, the WordPress GitHub Sync plugin will write back to this repository on GitHub to include the numeric ID and permalink as part of the YAML Frontmatter at the top of the .md file. 39 | 40 | #### Properties to include 41 | 42 | When adding a new file include the following properties in the YAML Frontmatter at the top of the .md file. 43 | 44 | * `post_title` - The name of the post needs to correspond to the name of the .md file (See the next section on "Naming the file") 45 | * `layout` - This value should simply be `page`. 46 | * `published` - This value should simply be `true`. The [WordPress GitHub Sync plugin](https://wordpress.org/plugins/wp-github-sync/) does allow for the syncing of unpublished content but our configuration does not include such pages. 47 | 48 | #### Naming the file 49 | 50 | The new file on GitHub must have a name that matches the title within the post. WordPress GitHub Sync names files based on title and will delete mismatched files which makes git history harder to track. [The enforcement of this matching is currently manual step in the PR process](https://github.com/pantheon-systems/wordpress-at-scale/issues/2). The name of the file should be the same as the title with all special characters removed and spaces converted to hyphens. 51 | 52 | ### Resource links 53 | 54 | Currently we are tracking the resource links only in the WordPress database. You'll notice the code for the resources page is just a [collection of shortcodes](https://github.com/pantheon-systems/wordpress-at-scale/blob/master/_pages/resources.md). However, we are very open to including more resources of value. Please [file an issue](https://github.com/pantheon-systems/wordpress-at-scale/issues/new?labels=resource) if you have a resource suggestion. 55 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Attribution 4.0 International Public License 2 | 3 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. 4 | 5 | Section 1 – Definitions. 6 | 7 | Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. 8 | Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. 9 | Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. 10 | Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. 11 | Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. 12 | Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License. 13 | Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. 14 | Licensor means the individual(s) or entity(ies) granting rights under this Public License. 15 | Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. 16 | Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. 17 | You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. 18 | Section 2 – Scope. 19 | 20 | License grant. 21 | Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: 22 | reproduce and Share the Licensed Material, in whole or in part; and 23 | produce, reproduce, and Share Adapted Material. 24 | Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 25 | Term. The term of this Public License is specified in Section 6(a). 26 | Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. 27 | Downstream recipients. 28 | Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. 29 | No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 30 | No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). 31 | Other rights. 32 | 33 | Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 34 | Patent and trademark rights are not licensed under this Public License. 35 | To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties. 36 | Section 3 – License Conditions. 37 | 38 | Your exercise of the Licensed Rights is expressly made subject to the following conditions. 39 | 40 | Attribution. 41 | 42 | If You Share the Licensed Material (including in modified form), You must: 43 | 44 | retain the following if it is supplied by the Licensor with the Licensed Material: 45 | identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); 46 | a copyright notice; 47 | a notice that refers to this Public License; 48 | a notice that refers to the disclaimer of warranties; 49 | a URI or hyperlink to the Licensed Material to the extent reasonably practicable; 50 | indicate if You modified the Licensed Material and retain an indication of any previous modifications; and 51 | indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 52 | You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 53 | If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 54 | If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License. 55 | Section 4 – Sui Generis Database Rights. 56 | 57 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: 58 | 59 | for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database; 60 | if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and 61 | You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. 62 | For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. 63 | Section 5 – Disclaimer of Warranties and Limitation of Liability. 64 | 65 | Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You. 66 | To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You. 67 | The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. 68 | Section 6 – Term and Termination. 69 | 70 | This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. 71 | Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 72 | 73 | automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 74 | upon express reinstatement by the Licensor. 75 | For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. 76 | For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. 77 | Sections 1, 5, 6, 7, and 8 survive termination of this Public License. 78 | Section 7 – Other Terms and Conditions. 79 | 80 | The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. 81 | Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. 82 | Section 8 – Interpretation. 83 | 84 | For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. 85 | To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. 86 | No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. 87 | Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. 88 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | WordPress at Scale 2 | 3 | # WordPress at Scale 4 | 5 | Gathering best practices from the community to help developers and site owners find success in scaling WordPress. The contents of this repository are synced with [scalewp.io](https://www.scalewp.io/) via [the WordPress Github Sync plugin](https://wordpress.org/plugins/wp-github-sync/). 6 | 7 | ## The Problem 8 | 9 | WordPress can scale; this is proven. But the expertise on exactly how to do so exists in silos, pockets, as quote-unquote "tribal knowledge". Many people who are new to WordPress (or new to the challenges of scaling) are uncertain, and it isn't easy to find correct answers through searching online. 10 | 11 | ## Mission 12 | 13 | This repository exists as a resource for any developer or site owner to learn about the challenges of scalability, and how they can be overcome using bulletproof industry standard approaches. It is meant to provide developers with architectural guidance for infrastructure and services, and to educate site owners so they can better evaluate their options. 14 | 15 | ## Contents 16 | 17 | The content is broken up into several pages to make it more easy to consume and share: 18 | 19 | 1. [Outline](_pages/scale.md) 20 | 2. [Elastic Architecture](_pages/elastic-architecture.md) 21 | 3. [Page Caching](_pages/page-caching.md) 22 | 4. [Object Caching](_pages/object-caching.md) 23 | 5. [Query Performance](_pages/query-performance.md) 24 | 6. [Search](_pages/searching-for-scale.md) 25 | 7. [Full Stack Diagram](_pages//a-real-world-scalable-architecture.md) 26 | 8. [Development and Workflow](_pages/development-and-workflow.md) 27 | 28 | 29 | ## Contributing 30 | 31 | Pull requests are welcome! The data collected here is the result of many years of work by many people and there's always more to learn. Make a contribution, and we would be happy to list you on the Contributors page! 32 | 33 | To suggest a change or addition to the main content, please fork the repo and make a pull request against the proper file within the `_pages` directory. If you have a more general question, issue, or suggestion, you may use GitHub's issues functionality. 34 | -------------------------------------------------------------------------------- /_pages/a-real-world-scalable-architecture.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 20 3 | post_title: A Real-World Scalable Architecture 4 | author: Josh Koenig 5 | post_date: 2015-12-04 10:38:45 6 | post_excerpt: > 7 | Scalable Architecture is More Than Just 8 | LAMP 9 | layout: page 10 | permalink: > 11 | http://www.scalewp.io/a-real-world-scalable-architecture/ 12 | published: true 13 | --- 14 | 15 | ## Putting it all Together 16 | 17 | ### This Ain't Your Parents' "LAMP Stack" 18 | 19 | Putting all these components together, it’s clear that the “stack” has become a lot more complex since WordPress got started. Rather than the classic Linux Apache MySQL PHP (LAMP) configuration, you now have: 20 | 21 | * Linux 22 | * Varnish or Edge Cache Provider 23 | * Content Delivery Network 24 | * Apache or Nginx 25 | * PHP 26 | * Memcached or Redis 27 | * ElasticSearch or Apache Solr 28 | * MySQL and Database Replication 29 | 30 | It doesn’t lend itself to a nice acronym, and a full diagram can start to feel overwhelming: 31 | 32 | 33 | 34 | **Don’t panic!** This is why the discipline of DevOps exists. There are professionals who are experienced with setting up this style of implementation, and several of the component pieces are now available as cloud services. 35 | 36 | Also, you don’t have to run it yourself. This style of architecture can be obtained from managed hosting and platform providers. If you are looking to outsource your website infrastructure — an increasingly common choice — you should now be armed with sufficient knowledge to evaluate various providers: 37 | 38 | * Do they provide load-balancing and reverse-proxy caching? 39 | * Is their infrastructure truly elastic? What is the turnaround time for scaling horizontally? 40 | * How do they handle the need for a network filesystem for uploads? 41 | * Which persistent object caching system(s) do they support? 42 | * Can they provide MySQL replication? Does it support HyperDB? 43 | * What options do they offer for a Search Index? 44 | 45 | Beyond that, there’s another key concern when it comes to running WordPress at scale, but it isn’t about the production infrastructure. It’s about how you can be agile and effective developing and debugging your site, and how to make sure this kind of complexity doesn't lead to a fear of deploying changes. 46 | 47 | [social_links] 48 | 49 | [link-library categorylistoverride="20"] 50 | -------------------------------------------------------------------------------- /_pages/about.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 30 3 | post_title: About 4 | author: Josh Koenig 5 | post_date: 2015-12-04 11:02:30 6 | post_excerpt: "" 7 | layout: page 8 | permalink: > 9 | http://live-wp-microsite.pantheon.io/about/ 10 | published: true 11 | --- 12 | 13 | ## About "WordPress at Scale" 14 | 15 | WordPress now powers about [1 in every 4 websites](https://ma.tt/2015/11/seventy-five-to-go/) and continues to grow. Many of these sites have massive audiences and traffic and that [trend is accelerating](https://pantheon.io/blog/wordpress-moves-upmarket). In order to serve this growing market, we need more of our developers to understand how to build WordPress sites that scale. 16 | 17 | ### Enter the Community 18 | This site is the product of lots of real world experience, many discussions and collaboration from our community. This site couldn’t have happened without [our contributors](/contributors/) coming together to share these best practices. 19 | 20 | ### It’s Not Done Yet! 21 | This is a living repository of knowledge. Help us improve this site and [contribute your insights](/contribute). All of our content is stored in GitHub and editable via Pull Requests. 22 | 23 | ### About the Site 24 | This site is made with [WordPress](https://wordpress.org/), was designed and built by a team at [Pantheon](https://pantheon.io/) and is a living example of all of these best practices in use. 25 | 26 | Our theme is built with the [Underscores starter theme](http://underscores.me/). Notable plugins include [WordPress GitHub Sync](https://github.com/mAAdhaTTah/wordpress-github-sync) and [GitHub Flavored Markdown for WordPress](https://github.com/makotokw/wp-gfm) to sync content from Github, [Google (XML) Sitemaps Generator for WordPress](http://www.arnebrachhold.de/projects/wordpress-plugins/google-xml-sitemaps-generator) to help our search engine friends and [WP Redis](https://wordpress.org/plugins/wp-redis/) to make use of Redis for [persistent object caching](./object-caching). 27 | -------------------------------------------------------------------------------- /_pages/contribute.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 27 3 | post_title: Contribute 4 | author: admin 5 | post_date: 2015-12-04 10:58:04 6 | post_excerpt: "" 7 | layout: page 8 | permalink: http://www.scalewp.io/contribute/ 9 | published: true 10 | --- 11 | 12 | ## Contribute to this Website 13 | 14 | ### Pull Requests are Welcome! 15 | 16 | We welcome you to [contribute to "WordPress at Scale" on GitHub](https://github.com/pantheon-systems/wordpress-at-scale). We use a version control repository to collaboratively maintain these pages. They have [many authors who have contributed over time](/contributors), and your additional input and insight is welcome! 17 | 18 | For more information on how to get started, see the [CONTRIBUTING.md](https://github.com/pantheon-systems/wordpress-at-scale/blob/master/CONTRIBUTING.md) file in the GitHub repository. 19 | 20 | ## Other ways to help 21 | 22 | The value of this initiative is as a "center of gravity" for information about WordPress at Scale. If you are supportive of this effort, but don't have specific content contributions, we can still use your help. 23 | 24 | ### Building the resource library 25 | 26 | One site will never have all the answers, and so we need help collecting links to valued resources. Do you have a go-to guide you use for solving specific problems at scale? [Let us know](https://github.com/pantheon-systems/wordpress-at-scale/issues/new?labels=resource) and we can add it to our resources page. 27 | 28 | ### Spreading the word 29 | 30 | Making the most out of this effort means having this content be widely viewed. Help us make others aware of this site: you can tweet it out of course, but more importantly you can reference it in your own work or on other issue threads. Let's put an end to "forum threads to nowhere" by making sure people find the best answers possible. -------------------------------------------------------------------------------- /_pages/contributors.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 395 3 | post_title: Contributors 4 | layout: page 5 | published: true 6 | --- 7 | 8 | ## Meet Our Contributors 9 | 10 | ### The Proverbial "Shoulders of Giants" 11 | 12 | The contents of this website are a result of many individual efforts. This page is still a work in progress as we gather everyone's preferred one-line bio, headshot, and link. 13 | 14 | ### Daniel Bachhuber 15 | 16 | 17 | 18 | Daniel Bachhuber is a hardcore about open source. He’s the Principal at [Hand Built](https://handbuilt.co/), maintains WP-CLI, is a contributing developer to the WP-API project, and believes wholeheartedly in the pull request workflow. 19 | 20 | Daniel lives in the suburbs of Portland, Oregon with his wife Leah, daughter Ava, and son Charlie. On Fridays at lunch, he likes to take a break and play soccer in the park. If you fancy an invite to his home, he’ll likely make grilled pizza for dinner. 21 | 22 | ### Zack Tollman 23 | 24 | 25 | 26 | Zack Tollman is Lead Developer at WIRED, and has contributed a great deal of code, documentation, and though leadership to the WordPress community when it comes to scale and performance via caching. His blog ([tollmanz.com](https://www.tollmanz.com/)) is a must-follow for developers interested in pushing wordpress forward. 27 | 28 | ### Weston Ruter 29 | 30 | 31 | 32 | Weston Ruter is CTO at [XWP](https://xwp.co/) and has been developing with Web technologies since the late 90's. He has a degree in Computer Science and has long worked to push the envelope of what can be done on the Web. For most of its life, he has been developing with WordPress, and he is an active contributor and committer to WordPress core, focusing on the Customizer. 33 | 34 | ### Scott Walkinshaw 35 | 36 | 37 | 38 | Scott Walkinshaw is one of the members of [Roots](https://roots.io/about/), providing developers with tools to help them build better WordPress sites. 39 | 40 | ### Josh Koenig 41 | 42 | 43 | 44 | Josh is a co-founder and head of product for [Pantheon](https://pantheon.io), the website management plaform for WordPress and Drupal. He has been building the internet for twenty years, including over a decade specializing in high performance, large scale, and enterprise use-cases. He is supremely grateful to all the fine engineers and developers who've mentored him over the years. 45 | 46 | ### Steve Persch 47 | 48 | 49 | 50 | Steve is an Agency and Community Engineer at [Pantheon](https://pantheon.io). Steve spent years building WordPress and Drupal sites for non-profit organizations, higher education institutions, and media companies. He also loves contributing code back to the open source community and speaking at camps and conferences. 51 | -------------------------------------------------------------------------------- /_pages/development-and-workflow.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 22 3 | post_title: Development and Workflow 4 | author: admin 5 | post_date: 2015-12-04 10:39:18 6 | post_excerpt: > 7 | Measure Twice. Cut Once. Revert as 8 | Needed 9 | layout: page 10 | permalink: > 11 | http://www.scalewp.io/development-and-workflow/ 12 | published: true 13 | --- 14 | 15 | ## Scale Your Development Process 16 | 17 | ### Measure Twice. Cut Once. Revert as Needed. 18 | 19 | One of the big challenges with WordPress at Scale is the developer experience. The best production infrastructure in the world doesn’t matter much if the website doesn’t have the right design and functionality, or if you are constantly suffering from downtime because of deployments, or if shipping hits bottlenecks that can hold things up for weeks on end. 20 | 21 | Agile development workflows are a whole discipline, and this page is by no means comprehensive. However, everybody knows the era of Cowboy Coding is coming to a close, and there are two industry standard best-practices that are increasingly common with professional WordPress teams. 22 | 23 | Anyone interested in developing for Scale should strongly consider adopting both **Pull Requests** and **Continuous Integration.** To achieve these, you need to have every change to the website codebase tracked in version control, preferably Git, widely considered the industry standard. You’ll also need workflow documentation, and ideally some automation, to follow the processes correctly. 24 | 25 | #### Pull Requests 26 | 27 | This workflow has been widely popularized by GitHub — but it is also an important part of Git Flow in general. Basically it means having developers collaborate on a branch in version control, allowing them to work without stepping on other people’s toes. When code in a branch is deemed ready to go, a "pull request" is created, which allows for a line-by-line review. 28 | 29 | 30 | 31 | In a perfect world, developers can also stand up a running website for the branch under development. This allows normal acceptance testing or QA on a running copy of what’s proposed in the pull request, creating a much more transparent process which is friendly for project managers and site owners. The pull request can also be integrated with a continuous integration service, such as Travis CI, to run automated tests on the codebase to ensure its quality before it is allowed to be merged and deployed. 32 | 33 | #### Continuous Integration 34 | 35 | Continuous Integration is the practice of merging development branches to a project's master branch as frequently as possible to minimize divergence between development environments and the live environments. Additionally, Continuous Integration encourages a structured flow for QA and acceptance testing before a deploy to production. This means having feature branch teams pull from the main version control branch (usually “master”) as soon as possible, as well as being able to frequently pull content back from the Live environment to their development or testing spaces. 36 | 37 | 38 | 39 | In the Continuous Integration workflow, code changes tracked in version control are deployed “out” from left to right, through Test to the Live environment. The content of the site (database, uploads, etc) are cloned “back” from right to left. It’s the bulletproof way to deploy to a running website. 40 | 41 | There should be standard processes (or better yet scripts) that describe how a developer freshens their workspace, and how a test environment is prepared. Developers need to be able to do their work against a recent “snapshot of reality”, and you definitely want to test against an up-to-date copy of production data before you deploy. 42 | 43 | Finally, every release should be tagged so you have a record of changes to production. Deploying off tags is essential in a scalable environment: you need to know that a deploy is the same across all PHP Application environments. 44 | 45 | ### Environmental Consistency 46 | 47 | Because of the cost — both for servers, but also in maintenance complexity — of running a horizontally scalable cluster, it’s still common for the “production” instance to be unique. That creates unfortunate tradeoffs. 48 | 49 | If you only have one production instance, that means that development, acceptance testing, and any load or performance tests are being done on a different architecture, be that a developer’s laptop or a separate “staging” server. Inevitably that leads to bugs that manifest only in production, unexpected performance regressions, and protestations that “it worked on my machine.” 50 | 51 | The way to deal with this is to have as much consistency as is reasonable. Tools like Vagrant — the VVV project — and the emerging model of containers can help to minimize these gaps when it comes to local development environments. When it comes to online development or staging environments, try to minimize the compromises you make. Ultimately, perfect consistency may not be feasible, but understanding where there are differences, and what impacts are expected, can help you to mitigate consistency risks. 52 | 53 | ### Challenges 54 | 55 | * **Site Configuration:** the configuration of WordPress sites is in a tricky in-between state. Frequently you’d like to deploy config changes along with code, but most configuration is stored in the wp_options table in the database. The best answer is to utilize the WP-CFM plugin, which allows you to export configuration to JSON, and track it in version control along with code. Still, this can be a challenge. 56 | * **Local Development:** while the quality of virtualization tools is progressing, they are still complex and require their own upkeep; losing a day of productivity due to local dev problems is always a pain. In addition, it’s very difficult to represent a scalable multi-server environment on a developer’s laptop, so some compromises here are inevitable. 57 | * **Setup Cost:** for many teams without an existing platform or scripts, setting up a best practice workflow is a costly investment of time, which may or may not be supported by the project budget. Unfortunately many projects suffer because the proper tooling isn’t put into place, resulting in stressful deployment windows. 58 | 59 | 60 | 61 | [social_links] 62 | ####Further Reading 63 | [link-library settings="1" categorylistoverride="21"] 64 | -------------------------------------------------------------------------------- /_pages/elastic-architecture.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 11 3 | post_title: Elastic Architecture 4 | author: Josh Koenig 5 | post_date: 2015-12-04 09:59:36 6 | post_excerpt: > 7 | Horizontal Scalability Is the 8 | Scalability That Matters 9 | layout: page 10 | permalink: > 11 | http://www.scalewp.io/elastic-architecture/ 12 | published: true 13 | --- 14 | 15 | ## Take the Air-Quotes Out of "Cloud" 16 | 17 | ### Horizontal Scalability Is the Scalability That Matters 18 | 19 | The key to running a WordPress site that can handle a large amount of traffic consistently, without risking downtime, is an elastic architecture. Simply put, this means the ability to run the website on many machines at once. Your website *must* transcend a single server in order to scale. 20 | 21 | With an elastic architecture, when traffic increases, you can provision more machines to run your website. When traffic has calmed down a bit, you can save resources and turn off your extra capacity. Without an elastic architecture, your website is inherently not scalable. 22 | 23 | This is also the only way to ensure that your site is **Highly Available** — not at the mercy of a single server’s uptime. This capability is also often called “horizontal scalability”, from the way architecture diagrams have been drawn since the 1990s. Let’s take a look. The most simple representation of an elastic architecture is drawn as a four-square cluster, like so: 24 | 25 | 26 | 27 | This cluster model lets you add additional (n) PHP Application servers (sometimes called “Web Heads”) as well as additional replica databases to increase your website’s capacity to serve pages. That’s fundamental to elasticity. As a drawing, the architecture appears to scale out “horizontally”: 28 | 29 | 30 | 31 | In addition to allowing you to add capacity horizontally, with this style of architecture any of the PHP App servers can go down, and others can take the load. Likewise, if a database fails, a replica can be promoted to the new master. That’s how you get away from any individual machine being a single point of failure. This is **High Availability.** 32 | 33 | While there are practical limitations to any architecture — and real-world implementations are usually more complex, as we’ll see — this is the model of how most web applications scale, including WordPress. This architecture is used today to push individual WordPress sites into the hundreds of millions and billions of pageviews. 34 | 35 | In the modern era of the cloud, best in class architecture allows you to expand or contract on-demand. This is a key part of being elastic — if it takes weeks (or even days) to bring a new box online that’s better than nothing, but it’s really not where you want to be. 36 | 37 | ### Challenges 38 | 39 | The common challenges running an elastic, horizontally scalable, and highly available architecture are: 40 | 41 | * **Load Balancing:** Something needs to distribute traffic across the available PHP App servers. Open-source tools like Nginx, HAProxy, and Pound can fill this role, but you can also solve this via hardware (e.g. an F5 appliance) or with a cloud-based load-balancer (e.g. Amazon’s ELBs). 42 | * **Shared Media:** One of WordPress’s most important functions is managing media — images, documents, etc — that go along with posts. These are placed in the `uploads` area of `wp-content`, but in order to make WordPress horizontally scalable, you must find a way for uploads to be available to all PHP App servers. Open-source tools like GlusterFS, NFS, and Ceph are common answers, and Amazon’s EFS is an option in that cloud ecosystem. 43 | * **Consistency:** A classic downfall for clusters is a lack of consistency. If not all the environments at each layer have the same characteristics and configuration, that will compromise elasticity and create maddeningly illusive bugs. Likewise, changes to the application itself — e.g. a WordPress core update — must be deployed consistently, which requires some kind of orchestration and workflow. These are challenges solved by systems administrators and DevOps practitioners, using tools such as Chef, Puppet, Ansible, and Capistrano. There are also cloud tools, such as AWS’s CloudFormation, or OpenStack’s HEAT that can help. 44 | 45 | 46 | 47 | [social_links] 48 | ####Further Reading 49 | [link-library categorylistoverride="15"] 50 | -------------------------------------------------------------------------------- /_pages/object-caching.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 16 3 | post_title: Object Caching 4 | author: admin 5 | post_date: 2015-12-04 10:37:15 6 | post_excerpt: > 7 | Persistent Object Caching Speeds Up 8 | Dynamic Pageviews 9 | layout: page 10 | permalink: http://www.scalewp.io/object-caching/ 11 | published: true 12 | --- 13 | 14 | ## Speed Up Dynamic Pageviews 15 | 16 | ### Persistent Object Caching, aka The WordPress Turbo Button 17 | 18 | As a content management system, WordPress is naturally heavily dependent on its database, and database efficiency is crucial to scaling WordPress. If requests to your website generate a large number of database queries, your database server’s resources can become overwhelmed. With your database server overloaded, your site performance and uptime will suffer across the board. 19 | 20 | In 2005, WordPress introduced its internal object cache — a way of automatically storing any data from the database (not just objects) in PHP memory to prevent unnecessary queries. However, out of the box, WordPress will discard all of those objects at the end of the request, requiring them to be rebuilt from scratch for the next pageload. Not very efficient. 21 | 22 | Luckily, WordPress easily integrates with persistent/external storage backends like Redis or Memcached via object cache drop-in plugins, making it possible to persist the object cache between requests. This speeds up PHP execution time while lessening the load on the Database, a real win-win scenario. 23 | 24 | 25 | 26 | Much like reverse proxy caching for pages, object caching is an established pattern in software development. It is an element to every large-scale application architecture, from Facebook or Google Docs to WordPress.com. Your WordPress site should be no exception. 27 | 28 | Persisting database objects with a high-performance storage backend is a must-have for scaling WordPress for logged in users and dynamic pages. Without it, the inefficiency of regenerating the data in the object cache will start to hurt the performance of your website, particularly as site complexity and dynamic traffic increase. 29 | 30 | While it’s possible to persist objects within a single PHP instance (e.g. APC), scaling your object cache requires sharing persistence layers among PHP Application servers. If you don’t share the persistent object cache values between PHP application servers, you’ll quickly run into problems of “cache (in)coherency” — when your application servers have different versions of cached objects, leading to buggy behavior. 31 | 32 | Further, for all that object caching gives WordPress automatically, it is of even more value for developers when optimizing a particular use case. The challenges of serving varying data to many concurrent logged-in users, or high volumes of dynamic traffic, are the hard problems that engineers love, and there is no “one size fits all” solution. 33 | 34 | WordPress stores its object cache as simple named key=>value pairs, so there many backends can serve as a persistent object cache. The most common open source tools for persistent object caching are Memcached and Redis. In addition there are cloud services (e.g. AWS’s ElastiCache, Azure’s Managed Cache) that provide equivalent functionality. 35 | 36 | ### Challenges: 37 | 38 | * **Complexity:** This is yet another layer in the stack, and it can create challenges to run operationally. In addition, you will need to make sure you have the same object caching solution present for development environments, and as part of your acceptance testing/QA process. 39 | * **Invalidation:** If your site is actively used, WordPress data can be updated frequently. However, you don’t necessarily want this activity to invalidate the object cache with the same frequency. You may need to intelligently purge specific keys within the object cache. 40 | * **Eviction:** Most popular cache backends have storage limits, and use a LRU (least recently used) strategy for “evicting” items from the cache when more room is needed. This can create unexpected expirations and can sometimes confuse developers. 41 | * **Optimization:** Using a persistent storage backend to cache objects for a highly dynamic application isn’t as simple as implementing a full page cache. You’ll need to smartly cache data based on an equation of how expensive it is to generate, how frequently it’s requested (aka likelihood of actually being served), and how much capacity you have in your persistent storage backend. 42 | 43 | 44 | 45 | [social_links] 46 | ####Further Reading 47 | [link-library settings="1" categorylistoverride="16"] 48 | -------------------------------------------------------------------------------- /_pages/page-caching.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 13 3 | post_title: Page Caching 4 | author: Josh Koenig 5 | post_date: 2015-12-04 10:00:19 6 | post_excerpt: > 7 | The Key to Internet-Scale Traffic is 8 | Reverse-Proxy Caching 9 | layout: page 10 | permalink: http://www.scalewp.io/page-caching/ 11 | published: true 12 | --- 13 | 14 | ## Handle Internet-Scale Traffic 15 | 16 | ### Reverse Proxy Caching — Don't Go Viral Without It. 17 | 18 | Everyone using WordPress knows caching needs to be a part of their performance and scalability strategy, which is why there are several cache-oriented tools in the top 10 list of WordPress plugins, such as Batcache. A page caching plugin can mean the difference between weathering a traffic spike or going down in flames, but when the solution is driven via the app itself, serving repetitive traffic becomes the full-time job of the PHP Application servers, making it more difficult to scale. 19 | 20 | The front page of a vanilla WordPress installation with one post requires billions of CPU instructions. Even with the fastest processors and the most highly optimized site, a caching approach that still requires loading the application on every request isn’t the optimal solution for scalability. 21 | 22 | Take a look at [these charts from Joe Hoyle at HumanMade](https://hmn.md/2012/12/17/testing-batcache-versus-varnish/) comparing Batcache (which loads WordPress to serve its cache) to Varnish (which doesn’t touch WordPress at all): 23 | 24 | 25 | 26 | As concurrency increases, Batcache’s response remain well within the realm of acceptable, under 1/10th of a second. However, a server load of 7 indicates that the application environment is pegged. The engine is redlining. While it’s nice that this flood of visitors are still seeing pages relatively quickly, good luck trying to do anything in the admin area (e.g. fix a crucial typo) with the server load that high. 27 | 28 | The good news is that the wider open source community has been working on this problem since the early days of the internet, as it is a problem shared by every website and web application. There are well-established tools and patterns that can be put to use, which the WordPress community have adopted and integrated as best-practices. The answer for page caching with WordPress at scale is to use a reverse proxy. 29 | 30 | 31 | 32 | In this model, requests and responses to/from WordPress flow through an intermediary service known as a reverse proxy. The proxy can serve a cached copy of a response for a specified period of time. Caches can also be actively expired or flushed. 33 | 34 | To put this in more concrete terms, consider a request from a visitor to your website’s homepage. On the very first request for the homepage, the reverse proxy doesn’t yet have a cached version of the homepage, so it passes the request back to WordPress. WordPress generates the homepage, and serves the response. The reverse proxy passes the response along, but also stores a version for itself. Then, on a subsequent visit to the homepage, the reverse proxy can serve the response without needing to connect to WordPress. 35 | 36 | This model helps solve the challenge of scaling WordPress because a reverse proxy can be literally three orders of magnitude (1000x) more efficient than a PHP web server at delivering cached responses. PHP web servers need to perform a large number of relatively complex operations — connect to a database, execute thousands of lines of code, process business logic, etc. All a reverse proxy needs to do is serve a static value from cache. 37 | 38 | The most popular open source reverse proxy is Varnish, but Nginx also has reverse proxy functionality and there are also hardware and commercial options available. In addition, some CDNs can act as massive distributed reverse proxies. Akamai, CloudFlare and Fastly are all excellent products for globally-distributed reverse proxy solutions. With these services, not only will your site scale for high traffic your site visitors will also get much faster responses because they'll be connecting to edge servers located near them as opposed to having to connect to the web servers in your data center. 39 | 40 | ### Challenges: 41 | 42 | * **Cache TTL and Expiration:** Initial communication between WordPress and the reverse proxy happens through the use of HTTP headers which can specify the lifetime, or Time To Live (“TTL”) of a cached page. However, actively expiring or flushing a cache requires additional work — you will have to devise a mechanism for clearing the cache that’s more sustainable than restarting the proxy. 43 | * **Cookies:** Most Proxies rely heavily on browser-side characteristics — e.g. cookies — to decide whether a request can be given a cached response, or if it must be passed back to WordPress. If the reverse proxy, or WordPress, are improperly configured, then too many requests can be sent back to WordPress, which can drastically decrease the effectiveness of the proxy cache. This can present some challenges in a world where there are many different systems using Cookies differently. 44 | * **Complexity:** For many developers, the conceptual complexity of introducing a new system, and not having the web-browser talk “directly” to WordPress, can create confusion around how to debug problems. Taking the time to ensure everyone on the team is comfortable with the implementation is important. 45 | 46 | 47 | 48 | [social_links] 49 | ####Further Reading 50 | [link-library settings="1" categorylistoverride="18"] 51 | -------------------------------------------------------------------------------- /_pages/query-performance.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 18 3 | post_title: Query Performance 4 | author: Josh Koenig 5 | post_date: 2015-12-04 10:38:02 6 | post_excerpt: The Database Is the Ultimate Bottleneck 7 | layout: page 8 | permalink: http://www.scalewp.io/query-performance/ 9 | published: true 10 | --- 11 | 12 | ## Scale Your Queries 13 | 14 | ### Database Replicas, When Great Caching Just Isn't Enough 15 | 16 | Your website’s database is the ultimate bottleneck when scaling. The two caching strategies we’ve outlined mostly serve to prevent load on the database, first by handling pages before they even hit WordPress, and then by making WordPress less dependent on the Database. 17 | 18 | However, you'll eventually need to scale your database, most likely to handle a high volume of read requests (`SELECT` queries). An elastic architecture allows you increase your capacity through the use of replicas. 19 | 20 | 21 | 22 | WordPress has had support for scalable read replicas for quite some time via the HyperDB plugin. This allows you not only to scale out, but also to implement specific strategies for how queries use the master and replica instances. 23 | 24 | For example, it’s easy to configure your site to use the master database for all requests related to administrative or logged-in functionality, while using replicas exclusively for anonymous traffic. This pattern ensures that content editors are always able to use the site, even if it’s under sustained heavy load for normal content reading use. 25 | 26 | ### Avoiding “Queries of Death” 27 | 28 | Scaling via database replication still assumes that your queries are generally performant. If your use case means you have a content footprint (100s of thousands or millions of posts) the WordPress default query builder (aka WP_Query()) may generate “queries of death”: requests to the database that can take several seconds to compute. 29 | 30 | These are called “queries of death” for a reason. They can suddenly and drastically affect site performance, even to the point of causing downtime. Long-running queries are intensive, often involving the creation of a whole temporary table to compute the result. They bog down database performance for all queries and tie up PHP application capacity, a lose-lose combination. 31 | 32 | Slow queries block the PHP Application threads that kick them off. If they’re happening in high volume they can overwhelm even a horizontally scalable “elastic” infrastructure. Eventually all your PHP threads are waiting for slow queries to respond, at which point the site is effectively offline. 33 | 34 | Even with best-practice architecture, an important part of scalability hygiene is reviewing query performance. MySQL has a built in capability called the slow query log, which will allows you to build and analyze data on your query times. You may also find value here in using application performance monitoring tools such as New Relic. 35 | 36 | ### Challenges: 37 | 38 | * **Query routing:** The HyperDB plugin has a lot of options for distributing queries across your database instances, but it’s usually best to keep it simple for starters. While your specific use-case may require customization, it’s best to minimize the amount of change and moving parts as you take the first step away from a monolithic single-instance implementation. 39 | * **Replication lag:** While under ideal circumstances replication lag is nearly instantaneous, not all circumstances are ideal. You’ll need to be sure that your implementation can sanely handle multi-second lag, and you’ll need to monitor if lag spikes to unacceptable levels, or if replication breaks down altogether. 40 | * **Debuggability:** When dealing with troublesome queries, you not only need access to the slow query log, but also to safely and reliably replicate the situation order to conclusively resolve the issue. That means having an environment to debug in with all the data needed to trigger the slow query. It also means taking the time to isolate where in the site code the query is being generated, so that an alternative can be implemented. 41 | * **Regressions:** As noted above, the impacts from a query of death can be swift. For sites with large datasets, it is important to consider the implications of new queries, as well as to test them before they are released into production. 42 | 43 | 44 | 45 | [social_links] 46 | ####Further Reading 47 | [link-library settings="1" categorylistoverride="19"] 48 | -------------------------------------------------------------------------------- /_pages/resources.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 200 3 | post_title: Resources 4 | author: Josh Koenig 5 | post_date: 2015-12-13 20:35:59 6 | post_excerpt: "" 7 | layout: page 8 | permalink: http://www.scalewp.io/resources/ 9 | published: true 10 | --- 11 | 12 | ## Further Reading 13 | 14 | ### Knowledge is Power. Arm Yourself. 15 | 16 | In addition to the core content on this site, we maintain a library of quality external resources and implementation guides. If you have additional resource links to suggest, we welcome those contributions. See the [contributing](/contribute/) page for more information. 17 | 18 | ### Elastic Architecture 19 | [link-library settings="1" categorylistoverride="15"] 20 | 21 | ### Page Caching 22 | [link-library settings="1" categorylistoverride="18"] 23 | 24 | ### Object Caching 25 | [link-library settings="1" categorylistoverride="16"] 26 | 27 | ### Query Performance 28 | [link-library settings="1" categorylistoverride="19"] 29 | 30 | ### A Real-World Scalable Architecture 31 | [link-library settings="1" categorylistoverride="20"] 32 | 33 | ### Development and Workflow 34 | [link-library settings="1" categorylistoverride="21"] 35 | 36 | Have something to add to this page? [File an issue on GitHub](https://github.com/pantheon-systems/wordpress-at-scale/issues/new?labels=resource) and we can consider adding it to the library. 37 | 38 | 39 | 40 | [social_links] 41 | -------------------------------------------------------------------------------- /_pages/scale.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 7 3 | post_title: WordPress at Scale 4 | author: Josh Koenig 5 | post_date: 2015-12-04 09:54:57 6 | post_excerpt: > 7 | WordPress is moving upmarket, powering 8 | high-traffic, mission-critical websites. 9 | Find out what it takes to run WordPress 10 | at scale. 11 | layout: page 12 | permalink: http://www.scalewp.io/ 13 | published: true 14 | --- 15 |
16 | ## Can WordPress Scale? 17 | 18 | ### Yes. But there are requirements: 19 | 20 | A single WordPress can push hundreds of millions of pageviews a month; it can serve tens of thousands of concurrent logged-in users; and it can be lightning-fast the whole time. It is known. The question is not whether WordPress itself can scale, but whether or not your implementation is ready. 21 | 22 | This site contains best practices from teams with real-world experience in delivering WordPress at scale. They provide a solid foundation for planning a scalable implementation, both for development teams and site owners alike. 23 | 24 | [bullet_box align="left" title="Developers"] 25 | We all stand on the shoulders of giants. Learn trusted architectural patterns for scalable website infrastructure, as well as the optimizations that nearly all big sites leverage. 26 | [/bullet_box] 27 | [bullet_box align="right" title="Site Owners"] 28 | When and how to invest in scalability is one of the most important business decisions you'll make. Use this site to have productive discussions with your developers and vendors. 29 | [/bullet_box] 30 | 31 | This website will arm you with the knowledge you need to realize your vision for a highly scalable WordPress implementation. Get started with the links below. 32 | 33 | 34 | 35 | [article_links] 36 | 37 | [social_links] -------------------------------------------------------------------------------- /_pages/searching-for-scale.md: -------------------------------------------------------------------------------- 1 | --- 2 | ID: 382 3 | post_title: Searching for Scale 4 | author: Josh Koenig 5 | post_date: 2015-12-22 16:14:52 6 | post_excerpt: > 7 | Improve User Experience and Database 8 | Performance With a Search Index 9 | layout: page 10 | permalink: > 11 | http://www.scalewp.io/searching-for-scale/ 12 | published: true 13 | --- 14 | ## Enjoy Faster, More Relevant Results 15 | 16 | ### Improved User Experience and Database Performance? Yes, Please. 17 | 18 | One of the first queries to hit the wall in terms of scalability is WordPress's built-in content search. It runs slowly if you have a large number of posts, and cannot produce results based on relevance. It also does not have any built in faceting or “drill down” capabilities, leaving many users wanting more from a features standpoint as well as speed. 19 | 20 | This is a known weak spot for WordPress, and the good news is there are emerging best practices on scaling site content: using a dedicated search index. An index improves performance significantly, and allows a richer user experience. It can even help scale problematic queries not related to content searching. 21 | 22 | Scaling search queries via an index gets those queries out of the database, allowing them to be handled by a dedicated high-performance subsystem. 23 | 24 | 25 | 26 | One of the most exciting techniques for scaling large sites with complex content is using the search index to handle a wider range of queries that would normally go to the database, taking this tool beyond the basic content search. With a proper search index backend, developers can implement various resource-intensive queries — autocompletes, queries by postmeta, multi-taxonomy queries — to use the most efficient resource, speeding things up for users and avoiding “queries of death.” 27 | 28 | Of course, you can always develop around many of the challenges of big datasets by optimizing your code to use object caching to reduce the volume of slow queries, and/or re-factor the queries themselves to be more efficient. However, the ability to “throw the queries at the index” is an excellent option to have at your disposal. 29 | 30 | Most large-scale sites want a better answer for core content search, and the ability to use a search index as an alternate target for gnarly queries can be a bonus win from having that infrastructure in place. Most large scale WordPress sites make use of an ElasticSearch or ApacheSolr search index, though some also use AWS’s CloudSearch, which uses similar technology. 31 | 32 | ### Challenges 33 | 34 | * **Overriding WP_Query():** most implementations of a search index backed involve overriding the built-in WP_Query() object. While this is relatively straightforward to do for vanilla out-of-the-box WordPress post search, doing it for more specific queries will require care and attention from a developer. 35 | * **Index Rebuilds:** while not common, you may come across a situation that requires you to rebuild your content index. This means being able to “fall back” to the database, at least temporarily. 36 | * **Complexity:** as with other dedicated subsystems, a search index is yet another piece of infrastructure to set up, monitor, and manage. While the payoffs are clearly worth it, this does become another ongoing responsibility to maintain. 37 | 38 | 39 | 40 | [social_links] 41 | ####Further Reading 42 | [link-library settings="1" categorylistoverride="22"] 43 | -------------------------------------------------------------------------------- /diagrams/continuous_integration.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/continuous_integration.png -------------------------------------------------------------------------------- /diagrams/horizontal_scale.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/horizontal_scale.png -------------------------------------------------------------------------------- /diagrams/mysql_replica.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/mysql_replica.png -------------------------------------------------------------------------------- /diagrams/object_cache.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/object_cache.png -------------------------------------------------------------------------------- /diagrams/page_caching.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/page_caching.png -------------------------------------------------------------------------------- /diagrams/pull_requests.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/pull_requests.png -------------------------------------------------------------------------------- /diagrams/real_world.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/real_world.png -------------------------------------------------------------------------------- /diagrams/search_index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/search_index.png -------------------------------------------------------------------------------- /diagrams/simple_cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/simple_cluster.png -------------------------------------------------------------------------------- /diagrams/varnish_vs_batcache.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/diagrams/varnish_vs_batcache.png -------------------------------------------------------------------------------- /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pantheon-systems/wordpress-at-scale/79bbfb94cc2611d73de535fc6374291b4d56624d/logo.png --------------------------------------------------------------------------------