├── .gitignore ├── LICENSE ├── README.md ├── draft ├── Ruby Optimization.docx ├── Ruby Optimization.pdf ├── conclusion.md ├── introduction.md ├── mri.md └── rubinius.md ├── final ├── final.md └── the_final.pdf └── styles.css /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014 John Otander 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Ruby Language Optimization Techniques 2 | 3 | NICHOLAS BENDER, Boise State University 4 | BEN NEELY, Boise State University 5 | JOHN OTANDER, Boise State University 6 | 7 | The Ruby programming language has experienced a recent period of intense adoption and growth due to its excellent speed of iteration, elegant syntax, and passionate community. Additionally, the popular web framework, Ruby on Rails, has given the Ruby language exceptional legitimacy, especially in the prototyping, startup space. It is a tool that emphasizes developer happiness, productivity, and places the responsibility of program in the developer's hands. This gives the language a lot of power, but can serve as a double-edged sword. When leveraged incorrectly, projects can swiftly become inefficient and unmaintainable. Additionally, allowing this flexibility has serious implications with memory management, efficiency, and execution times. 8 | 9 | While support is growing steadily for the language, it is largely dismissed as not having effective scalability, and having far slower runtimes than more compiled, strongly-typed languages. In this article, we propose that many sophisticated techniques exist to enhance Ruby’s performance both in using existing runtimes to compile ruby to statically typed languages, and in using common anti-patterns to improve performance natively. Through experimentation and thorough research we conclude that Ruby performs competitively against it’s similar scripting language counterparts, and can see large increases in many cases. 10 | 11 | __Categories and Subject Descriptors:__ D.2.3 [Coding Tools and Techniques]: Object-oriented programming, B.6.3 [Design Aids]: Optimization 12 | __General Terms:__ Optimization, Algorithms, Performance 13 | __Additional Key Words and Phrases:__ Ruby, Web Development, JRE, C++, C 14 | 15 | ## INTRODUCTION 16 | 17 | Ruby is an object oriented, dynamically-typed, high-level scripting language. It is a programming language that was written for humans and just happens to run on computers. It's intended to promote developer happiness through simplicity, elegant libraries, and terse, readable syntax. Ruby also uses duck typing, meaning type is determined through methods and properties. With each of these techniques and language features there exist certain sacrifices. In this exploration we will conclude that the best practices for stable, performant Ruby programs exist by utilizing the newest versions of the core language properly. 18 | 19 | In recent years, the Ruby programming language has grown its community and established itself as a valuable, popular tool for many tasks [O’Donoghue, 2014]. The success of Ruby on Rails as a prototyping framework, as well as a full-stack solution for some larger companies, has brought forth a myriad of techniques to ensure that the language’s speed differences compared to similar languages are minimal. Ruby’s slower performance, as compared to C or Java, is attributed to interpreted execution, dynamic typing, meta-programming support, and the Global Interpreter Lock [Odaira, Castanos, and Tomari, 2014]. This increase in popularity has caused a large number independent optimization efforts to arise from large corporations such as IBM and AT&T, as well as efforts from the Ruby open-source community. 20 | 21 | > When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.
_- Heim, Michael (2007)._ 22 | 23 | ## 1. MRI (> 1.9) 24 | 25 | The MRI is short for Matz's Ruby Interpreter, which is sometimes also referred to as CRuby. The MRI is named after Yukihiro Matsumoto, the chief designer of the Ruby language. The original MRI was the runtime environment from Ruby's inception to 1.8.7. 26 | 27 | ### 1.1 Program Execution at a High Level 28 | 29 | ``` 30 | | --------- | 31 | | Ruby | 32 | | --------- | 33 | | Tokens | 34 | | --------- | 35 | | AST Nodes | 36 | | --------- | 37 | | 38 | | Interpret 39 | | 40 | v 41 | | --------- | 42 | | C | 43 | | --------- | 44 | | Machine | 45 | | Language | 46 | | --------- | 47 | ``` 48 |
Fig. 1. Ruby Program Execution
49 | 50 | A Ruby script undergoes a tokenization step, which is then parsed into an Abstract Syntax Tree. The Ruby C code (MRI), reads and executes the AST. Note that there is no compilation or translation step. 51 | 52 | ### 1.2 Performance 53 | 54 | Since there isn't a bytecode compilation step, the execution of Ruby programs requires walking the MRI's internal Abstract Syntax Tree. This slows the execution speed significantly because it's more costly to interpret the AST data structure during runtime. 55 | 56 | ![ch_abstract_syntree](https://cloud.githubusercontent.com/assets/1424573/2803533/3c359946-cc9d-11e3-9b35-217ccda504df.png) 57 |
Fig. 2. Abstract Syntax Tree
58 | 59 | ### 1.3 Optimizations 60 | 61 | Use receiver methods whenever possible because it avoids the allocation of a copied string. 62 | 63 | ``` 64 | 2.1.1 :003 > str = "A string.\n" 65 | => "A string.\n" 66 | 2.1.1 :004 > str2 = str 67 | => "A string.\n" 68 | 2.1.1 :005 > str.chomp! 69 | => "A string." 70 | 2.1.1 :006 > str2 71 | => "A string." 72 | 2.1.1 :007 > 73 | ``` 74 |
Fig. 3. Receiver modifying methods vs receiver duplicating methods
75 | 76 | ### 1.4 Summary 77 | 78 | The initial implementation of the MRI is one of the primary reasons that Ruby get its "bad wrap" for code execution speed. 79 | 80 | ## 2. JRUBY 81 | 82 | ### 2.1 Purpose 83 | 84 | Jruby endeavors to solve many Ruby performance issues by eliminating the standard interpreter and instead taking ruby syntax and compiling as much of the core libraries as possible to Java bytecode. Current versions of JRuby support both just-in-time compilation as well as ahead-of-time compilation to Java bytecode. In using these various stages of bytecode in addition to some portions of the standard interpreter, this allows for several advantages over the standard interpreter. 85 | 86 | One of the more obvious improvements is the ability to call and use standard Java libraries and classes from within ruby projects. For larger organizations already using Java for core library support, this allows for improved flexibility of the development environment. 87 | 88 | ### 2.2. Performance 89 | 90 | In 2007, JRuby’s overall performace was compared with Ruby 1.8.5, the Yarv interpreter (now merged into Ruby’s official interpreter), and Rubinius. In it, only 10% of tests performed had JRuby outperforming standard Ruby. These speed enhancements, however, still managed to run all Ruby benchmarks without timing out or producing an error, a claim that no other non-standard Ruby implementation could make. 91 | 92 | Recent benchmarks performed in 2014 between the latest implementations of JRuby and Ruby are comparable to standard Ruby. While some benchmarks provided an optimized runtime, the increased memory overhead of JRuby (>10x) makes scaling ruby applications problematic. 93 | 94 | In addition to JRuby's memory woes, the biggest performance downside of JRuby comes from the speed of initializing the JVM to begin with. A simple ruby script that would take the MRI a fraction of a second to run would require several additional seconds just due to JVM launch times. 95 | 96 | ### 2.3 Lack of C Support 97 | 98 | While JRuby allows for enhanced support and compatibility with Java libraries and applets, the majority of Ruby users (especially those using Ruby on Rails) are used to using libraries that contain native C support. In choosing to support Java, JRuby forces the incompatibility with native C extensions. Most notably are a variety of database interfaces and web servers. 99 | 100 | ### 2.4. Development Lag 101 | 102 | Due to JRuby’s implementation being dependent on Ruby releases prior to implementation and support, this has created an unfortunately long lag time, with the most recent release of JRuby only supporting Ruby version 1.9.3, which was initially released in 2011. 103 | 104 | ### 2.5 Summary 105 | 106 | While JRuby does offer some improved benchmark performance in a minority of cases, the slow development cycle and potential for a massive increase to memory footprint make it an unsuitable option for pure ruby development stacks. 107 | 108 | ## 3. Rubinius 109 | 110 | ### 3.1 Purpose 111 | 112 | Rubinius is an implementation of the Ruby programming language and includes a bytecode virtual machine, Ruby syntax parser, bytecode compiler, generational garbage collector, just-in-time (JIT) native machine code compliler, and Ruby Core and Standard Libraries. Rubinius is written using Ruby and C++. 113 | 114 | ### 3.2 History 115 | 116 | Rubinius was originally created to be a Ruby virtual machine and runtime written in pure ruby. The current ruby interpreter is primarily writen in non-Ruby langauges such as C. From 2007 to 2013, the software company Engine Yard was a primary backer of Rubinius. During that time the focus of Rubinius evolved from creating a completely bootstrapped Ruby VM to instead offering an implementation of Ruby with increased performance. Under this new direction, Rubinius partially abandoned the idea of bootstrapping the Ruby VM in all Ruby code, and instead sought to use C++ to increase performance and establish Rubinius as the fastest Ruby implementation. Recently Rubinius has focused on supporting concurrency and multi-threading. 117 | 118 | ### 3.3 Performance 119 | 120 | Rubinius initially achieved performance equal or slightly better to that of the Yarv interpreter. However, in recent years the MRI interpreter has consistently out performed Rubinius on most benchmark tests. 121 | 122 | Rubinius consistently benchmarks as one of the slowest modern implementations of the Ruby language. 123 | 124 | ### 3.4 Concurrency 125 | 126 | Rubinius does outperform the MRI in threading and concurrency benchmark tests. As shown in the figure bellow, Rubinius (represented by rbx-2.0.0) has a nontrivial advantage over MRI and other Ruby implementations when exciting multithreaded code. 127 | Rubinius is unique amongst Ruby implementations in that it does not have Global Interpreter Lock (GIL). The GIL in all other Reuby implementation allows only one thread to execute at at a time, no matter how many processor cores are available. Not implementing the GIL gives Rubinius the ability to support true threading 128 | 129 | ### 3.5 Summary 130 | 131 | Rubinius’ development has been spotty, depending heaving on a few developers and a few corporate sponsors. As a result Rubinius has constantly shifted focus. Rubinius currently offers a significant advantage over other Ruby interpreters only with regards to programming involving threading and concurrency. For all other uses, the standard MRI Ruby interpreter is faster and more consistently supported. 132 | 133 | ## 4. YARV 134 | 135 | ### 4.1 Background 136 | 137 | ``` 138 | CODE => TOKENIZATION => PARSE TREE => COMPILATION => YARV INSTRUCTIONS 139 | ``` 140 | 141 | When a Ruby program is executed, it first tokenizes the program. This means that the contents are converted into a collection of tokens with associated types. Ruby uses the LALR (Look-Ahead Left Reversed Rightmost Derivation) Parser to apply meaning to the tokens and construct the Abstract Syntax Tree. The compilation step was introduced with Ruby 1.9, and is where the YARV (Yet Another Ruby Virtual Machine) comes into play. It translates the code into bytecode, or YARV instructions. 142 | 143 | ``` 144 | ~|||$ irb 145 | 2.1.1 :001 > code = < puts 1 + 2 147 | 2.1.1 :003"> CODE 148 | => "puts 1 + 2\n" 149 | 2.1.1 :004 > puts RubyVM::InstructionSequence.compile(code).disasm 150 | == disasm: @>========== 151 | 0000 trace 1 ( 1) 152 | 0002 putself 153 | 0003 putobject_OP_INT2FIX_O_1_C_ 154 | 0004 putobject 2 155 | 0006 opt_plus 156 | 0008 opt_send_simple 157 | 0010 leave 158 | => nil 159 | ``` 160 |
Fig. 4. YARV instructions for a simple program
161 | 162 | The introduction of the compilation step and YARV have significantly helped the execution speed of Ruby programs. However, there's always room for more improvements. 163 | 164 | ### 4.2 Purpose 165 | 
 166 | The Ruby MRI is short for Matz's Ruby Interpreter, and is the reference implementation for the Ruby programming language. It was released to the public in 1995, and is still actively developed, with the latest stable build being Ruby 2.1.1. 167 | 168 | The YARV is an interpreter developed by Koichi Sasada that's also known as the KRI. It was developed in order to reduce the execution time of Ruby programs, and was very successful. As a result, YARV was merged into Ruby 1.9.0 and has replaced the MRI. 169 | 170 | As the default interpreter for the Ruby programming language, the MRI has received it's fair share of criticism, primarily due to it's execution speeds and memory consumption. However, recent Ruby versions have seen significant enhancements, and is on par with similar scripting languages like Python. Not to mention the fact that some comparisons have the audacity to compare a compiled language to a scripting language, which is apples to oranges. Developers typically choose Ruby for it's ease of writing/prototyping, understanding the fact that its execution time will always be significantly slower than its compiled counterparts. 171 | 172 | That being said, there are numerous methods and best practices that developers can follow in order to ensure that they're avoiding unnecessary bottlenecks. 173 | 174 | ### 4.3 Performance Out of the Box 175 | 176 | Thanks to the introduction of YARV, vanilla Ruby, on a single thread, has the ability to outperform other alternative Ruby implementations. Consider the following figure, that measures Rails requests per second. 177 | 178 | ![screen shot 2014-05-02 at 6 22 04 pm](https://cloud.githubusercontent.com/assets/1424573/2869079/0305431c-d259-11e3-8b58-6f2ea6ff23e9.png) 179 |
Fig. 5. Rails requests per second.
180 | 181 | ### 4.4 Global Interpreter Lock 182 | 183 | When attempting to optimize execution speed, threads are often utilized in order to process tasks concurrently. This is a feature that Ruby supports, too. However, the MRI/YARV incorporates a Global Interpreter Lock, or GIL, that doesn't permit any true concurrency. A GIL refers to an interpreter thread that doesn't allow code that isn't thread safe to share itself with other threads. This results in little, to no, actual gain in speed when running threads on a multiprocessor machine. 184 | 185 | The primary reason is that the GIL is used to avoid race conditions within C extensions. There are also thread safety reasons, too. Parts of Ruby aren't thread safe (Hash), and numerous C libraries that are wrapped by Ruby's internals. Additionally, the GIL is integral to data integrity, because it ensures that the developer doesn't write any unsafe threading code. 186 | 187 | This, interestingly enough, runs contrary to the fundamental principles of the Ruby language, where all the responsibility is laid on the developer. Ruby allows the developer to have the ultimate freedom without hand holding, yet the GIL is just that, hand holding. 188 | 189 | The GIL isn’t going anywhere. It is deeply intertwined with Ruby and its internals, and many influential Ruby-core figures don't plan on removing the GIL anytime in the near future. Though, this doesn't mean the concurrency can't be achieved. 190 | 191 | Though, you can sidestep the GIL with multiple virtual machines. Sasada Koichi has proposed a Multiple VM (MVM) solution, which is currently being developed. This would consist of multiple virtual machines, running their own processes, and communicate via sockets. 192 | 193 | Granted, this is a drastic step away from typical threading, but some proponents believe that traditional threading isn't necessarily the correct paradigm to follow. Especially considering the fact the Ruby leverages green threads above the GIL rather than talking to the OS directly. 194 | 195 | > Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities. Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads. 196 | 197 | ### 4.5 Simple Code Enhancements 198 | 199 | String interpolation is significantly more performant than concatentation because it doesn't need to allocate new strings, it just modifies a single string in place. 200 | 201 | ```ruby 202 | require 'benchmark' 203 | 204 | concat_time = Benchmark.measure do 205 | 20000000.times do 206 | str = 'str1' << 'str2' << 'str3' 207 | end 208 | end 209 | 210 | # => # 211 | 212 | interp_time = Benchmark.measure do 213 | 20000000.times do 214 | str = "#{str1}#{str2}#{str3}" 215 | end 216 | end 217 | 218 | # => # 219 | ``` 220 |
Fig. 6. Interpolation vs concatenation of Ruby Strings
221 | 222 | The collect|map methods with blocks are faster because it returns a new array rather than an enumerator. This can be leveraged to increase speed when compared to Symbol.to_proc implementations. Though, the latter is typically much more preferable to read. The reason that the Symbol.to_proc is slower is because to_proc is called on the symbol to perform the following conversion: 223 | 224 | ``` 225 | :method.to_proc 226 | # => -> x { x.method } 227 | fake_data = 20.times.map { |t| Fake.new(t) } 228 | 229 | proc_time = Benchmark.measure do 230 | 200000000.times do 231 | fake_data.map(&:id) 232 | end 233 | end 234 | 235 | # => # 236 | ``` 237 | 238 | ``` 239 | block_time = Benchmark.measure do 240 | 200000000.times do 241 | fake_data.map { |d| d.id } 242 | end 243 | end 244 | 245 | # => # 246 | ``` 247 | 248 | ``` 249 | collect_time = Benchmark.measure do 250 | 200000000.times do 251 | fake_data.collect { |d| d.id } 252 | end 253 | end 254 | 255 | # => # 256 | :037 > 257 | ``` 258 |
Fig. 7. Procs vs Blocks vs Collects
259 | 260 | There are also garbage collection modifications that can be made in order to further optimize Ruby execution speed for most systems. 261 | 262 | ``` 263 | # This is 60(!) times larger than default 264 | RUBY_HEAP_MIN_SLOTS=600000 265 | 266 | # This is 7 times larger than default 267 | RUBY_GC_MALLOC_LIMIT=59000000 268 | 269 | # This is 24 times larger than default 270 | RUBY_HEAP_FREE_MIN=100000 271 | ``` 272 |
Fig. 8. Garbage Collection Modification
273 | 274 | ### 4.5 Use Unicorn 275 | 276 | For Ruby on Rails web applications, a server typically runs on a single process, which means that every request is processed one at a time. This can create a significant bottle neck in your application. Fortunately, there are libraries to incorporate concurrency in your application. One of which is Unicorn. 277 | 278 | Unicorn uses Unix forks within a dyno (web worker) to create multiple instances of itself. Now, there are multiple OS instances that can all respond to requests, and complete tasks concurrently. This results in smaller queues, quicker responses, and a faster web application as a whole. The only drawback is memory usage, which can grow to large sizes. Though, with decreasing hardware costs, this becomes a worthwhile expenditure to ensure quick development time for the software components. This also doesn't require thread safe code, since each worker is a self-sufficient clone of the parent. 279 | 280 | Ruby 2.0 makes process forking even more efficient with Unicorn because it implements Copy-on-Write (CoW), which means that a parent and child share physical memory until a write needs to be made. This is a very efficient sharing of resources that can drastically reduce memory use. 281 | 282 | Sometimes, there are still issues with memory leakage, which occurs when workers get stuck or timeout. With the inclusion of a gem, and a small snippet of code that's included below, these edge cases are covered. 283 | 284 | ``` 285 | if ENV['RAILS_ENV'] == 'production' 286 | require 'unicorn/worker_killer' 287 | 288 | max_request_min = 500 289 | max_request_max = 600 290 | 291 | # Max requests per worker 292 | use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max 293 | 294 | oom_min = (240) * (1024**2) 295 | oom_max = (260) * (1024**2) 296 | 297 | # Max memory size (RSS) per worker 298 | use Unicorn::WorkerKiller::Oom, oom_min, oom_max 299 | end 300 | 301 | require ::File.expand_path('../config/environment', __FILE__) 302 | run YourApp::Application 303 | ``` 304 |
Fig. 9. Example Unicorn Implementation
305 | 306 | ## CONCLUSIONS 307 | 308 | In this article, we examined a number independent Ruby optimization efforts. Each of these efforts seek to achieve performance improvements through a variety of techniques. In our examination we’ve determined that for each of these techniques there are certain sacrifices, that outweigh the marginal benefits are gained. Unless a particular feature is needed (such as full threading support or inline Java) the best practices for stable, performant Ruby code exist by utilizing the newest versions of the core language. 309 | 310 | ## ACKNOWLEDGMENTS 311 | 312 | The authors would like to thank Douglas Wiegley. He knows what he did. 313 | 314 | ## REFERENCES 315 | 316 | ROBERT O'DONOGHUE. 2014. Careers Close-up: programmers and software engineers. (March 2014). Retrieved March 31, 2014 http://www.siliconrepublic.com/careers/item/36001-crs-cls-up 317 | 318 | REI ODAIRA, JOSE G. CASTANOS, HISANOBU TOMARI. 2014. Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory. PPoPP’14, February 15-19 2014, Orlando, FL, USA. DOI: http://dx.doi.org/10.1145/2555243.2555247 319 | 320 | ANTONIO CANGIANO. 2007. The Great Ruby Shootout (December 2007). Retrieved March 31, 2014 http://programmingzen.com/2007/12/03/the-great-ruby-shootout/ 321 | 322 | PAT SHAUGHNESSY. 2014. Ruby Under a Microscope: An Illustrated Guide to Ruby Internals 323 | 324 | BUSSINK DIRKJAN. Rubinius - Tales from the Trenches of Developing a Ruby implementation, Barcelona Ruby Conference, 2012. 325 | 326 | NUTTER CHARLES. Why JRuby?, Aloha Ruby Conf, 2012. 327 | 328 | SASADA KOICHI. YARV: Yet Another RubyVM-The Implementation and Evaluation. Transactions of Information Processing Society of Japan. Volume 47. 2006. Pages 57-73. 329 | 330 | SASADA KOICHI. YARV: yet another RubyVM: innovating the ruby interpreter. OOPSLA '05 Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. Pages 158-159. 331 | 332 | SHAUGHNESSY PAT. Visualizing Garbage Collection in Rubinius, JRuby and Ruby 2.0, Ruby Conference, 2013. 333 | 334 | YUKIHIRO MATSUMOTO. 2010. From Lisp to Ruby to Rubinius. 335 | -------------------------------------------------------------------------------- /draft/Ruby Optimization.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johno/ruby_optimization_techniques/27f5528e00a04332f886cef0cc846d54c4bf6111/draft/Ruby Optimization.docx -------------------------------------------------------------------------------- /draft/Ruby Optimization.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johno/ruby_optimization_techniques/27f5528e00a04332f886cef0cc846d54c4bf6111/draft/Ruby Optimization.pdf -------------------------------------------------------------------------------- /draft/conclusion.md: -------------------------------------------------------------------------------- 1 | Conclusion 2 | -------------------------------------------------------------------------------- /draft/introduction.md: -------------------------------------------------------------------------------- 1 | # An Analysis of Ruby Optimization Techniques 2 | 3 | The Ruby programming language has experienced a recent period of intense adoption and growth due to its excellent speed of iteration and due, in no small part, to the acceptance of the Ruby on Rails web framework within the startup sphere. While support is growing steadily for the language, it is largely dismissed as not having effective scalability, or having far slower runtimes than more traditional strongly-typed complex languages. In this article, we propose that many sophisticated techniques exist to enhance Ruby’s performance both in using existing runtimes to compile ruby to statically typed languages, and in using common anti-patterns to improve performance natively. Through experimentation and thorough research we conclude that Ruby performs competitively against it’s similar scripting language counterparts, and can see increases of [XXXXX]% in many cases. 4 | 5 | ## Introduction 6 | 7 | In recent years, the Ruby programming language has grown its community and established itself as a valuable and popular tool for many tasks [O’Donoghue, 2014]. The success of Ruby on Rails as a prototyping framework as well as a full-stack solution for some larger companies has brought forth a myriad of techniques to ensure that the language’s speed differences compared to similar languages are minimal. Ruby’s slower performance as compared to C or Java is attributed to interpreted execution, dynamic typing, meta-programming support, and the Global Interpreter Lock [Odaira, Castanos, and Tomari, 2014]. This increase in popularity has caused a large number independent optimization efforts to arise from large corporations such as IBM, as well as efforts from the Ruby open-source community. 8 | 9 | With each of these techniques there exist certain sacrifices, but in this exploration we will conclude that the best practices for stable, performant Ruby code exist by utilizing the newest versions of the core language properly, and not by utilizing other third party interpreters or solutions. 10 | 11 | ## An Overview of the Ruby Language 12 | 13 | Ruby is an object oriented, dynamically-typed, high-level scripting language. It is a programming language that was written for humans and just happens to run on computers. It's intended to promote developer happiness through simplicity, elegant libraries, and terse, readable syntax. Ruby uses duck typing, meaning type is determined through methods and properties. 14 | 15 | > When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck. 16 | 17 | Heim, Michael (2007). Exploring Indiana Highways. Exploring America's Highway. p. 68. ISBN 978-0-9744358-3-1. 18 | 19 | ### Program Execution at a High Level 20 | 21 | CODE => TOKENIZATION => PARSE TREE => COMPILATION => YARV INSTRUCTIONS 22 | 23 | #### Tokenizing a Ruby program 24 | 25 | When a Ruby program is executed, it first tokenizes the program. This means that the contents are converted into a collection of tokens with associated types. 26 | 27 | #### Parsing the tokens 28 | 29 | Ruby uses the LALR (Look-Ahead Left Reversed Rightmost Derivation) Parser to apply meaning to the tokens and construct the Abstract Syntax Tree. 30 | 31 | #### The compilation step 32 | 33 | The compilation step was introduced with Ruby 1.9, and is where the YARV (Yet Another Ruby Virtual Machine) comes into play. It translates the code into _bytecode_, or YARV instructions. 34 | 35 | #### YARV instructions for a simple program 36 | 37 | ``` 38 | ~|||$ irb 39 | 2.1.1 :001 > code = < puts 1 + 2 41 | 2.1.1 :003"> CODE 42 | => "puts 1 + 2\n" 43 | 2.1.1 :004 > puts RubyVM::InstructionSequence.compile(code).disasm 44 | == disasm: @>========== 45 | 0000 trace 1 ( 1) 46 | 0002 putself 47 | 0003 putobject_OP_INT2FIX_O_1_C_ 48 | 0004 putobject 2 49 | 0006 opt_plus 50 | 0008 opt_send_simple 51 | 0010 leave 52 | => nil 53 | ``` 54 | 55 | ### A significant performance boost 56 | 57 | The introduction of the compilation step and YARV have significantly helped the execution speed of Ruby programs. However, there's always room for more improvements. 58 | -------------------------------------------------------------------------------- /draft/mri.md: -------------------------------------------------------------------------------- 1 | # What is the MRI/YARV 2 | 3 | The Ruby MRI is short for Matz's Ruby Interpreter, and is the reference implementation for the Ruby programming language. It was released to the public in 1995, and is still actively developed, with the latest stable build being Ruby 2.1.1. 4 | 5 | The YARV is an interpreter developed by Koichi Sasada that's also known as the KRI. It was developed in order to reduce the execution time of Ruby programs, and was very successful. As a result, YARV was merged into Ruby 1.9.0 and has replaced the MRI. 6 | 7 | As the default interpreter for the Ruby programming language, the MRI has received it's fair share of criticism, primarily due to it's execution speeds and memory consumption. However, recent Ruby versions have seen significant enhancements, and is on par with similar scripting languages like Python. Not to mention the fact that some comparisons have the audacity to compare a compiled language to a scripting language, which is apples to oranges. Developers typically choose Ruby for it's ease of writing/prototyping, understanding the fact that its execution time will _always_ be significantly slower than its compiled counterparts. 8 | 9 | That being said, there are numerous methods and best practices that developers can follow in order to ensure that they're avoiding unnecessary bottlenecks. 10 | 11 | ## Performance out of the box 12 | 13 | Thanks to the introduction of YARV, vanilla Ruby, on a single thread, has the ability to outperform other alternative Ruby implementations. Consider the following figure, that measures Rails requests per second. 14 | 15 | ![screen shot 2014-03-31 at 5 06 28 pm](https://cloud.githubusercontent.com/assets/1424573/2574040/8c8da974-b929-11e3-84c8-04d792bcbbd9.png) 16 | http://www.isrubyfastyet.com/ 17 | 18 | ## Global Interpreter Lock 19 | 20 | When attempting to optimize execution speed, threads are often utilized in order to process tasks concurrently. This is a feature that Ruby supports, too. However, the MRI/YARV incorporates a Global Interpreter Lock, or GIL, that doesn't permit any true concurrency. A GIL refers to an interpreter thread that doesn't allow code that isn't thread safe to share itself with other threads. This results in little, to no, actual gain in speed when running threads on a multiprocessor machine. 21 | 22 | ![screen shot 2014-03-31 at 5 15 19 pm](https://cloud.githubusercontent.com/assets/1424573/2574079/5e0967fe-b92a-11e3-9806-65ea3d4d04cf.png) 23 | http://www.igvita.com/2008/11/13/concurrency-is-a-myth-in-ruby/ 24 | 25 | ### Why implement a GIL? 26 | 27 | The primary reason is that the GIL is used to avoid race conditions within C extensions. There are also thread safety reasons, too. Parts of Ruby aren't thread safe (Hash), and numerous C libraries that are wrapped by Ruby's internals. Additionally, the GIL is integral to data integrity, because it ensures that the developer doesn't write any unsafe threading code. 28 | 29 | This, interestingly enough, runs contrary to the fundamental principles of the Ruby language, where all the responsibility is laid on the developer. Ruby allows the developer to have the ultimate freedom without hand holding, yet the GIL is just that, hand holding. 30 | 31 | ### The GIL is here to stay 32 | 33 | The GIL is deeply intertwined with Ruby and its internals, and many influential Ruby-core figures don't plan on removing the GIL anytime in the near future. Though, this doesn't mean the concurrency can't be achieved. 34 | 35 | ### Sidestep the GIL with multiple virtual machines 36 | 37 | Sasada Koichi has proposed a Multiple VM (MVM) solution, which is currently being developed. This would consist of multiple virtual machines, running their own processes, and communicate via sockets. 38 | 39 | Granted, this is a drastic step away from typical threading, but some proponents believe that traditional threading isn't necessarily the correct paradigm to follow. Especially considering the fact the Ruby leverages green threads above the GIL rather than talking to the OS directly. 40 | 41 | > Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities. Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads. 42 | 43 | ## Some simple code enhancements 44 | 45 | ### String Optimization 46 | 47 | String interpolation is significantly more performant than concatentation because it doesn't need to allocate new strings, it just modifies a single string in place. 48 | 49 | ```ruby 50 | require 'benchmark' 51 | 52 | concat_time = Benchmark.measure do 53 | 20000000.times do 54 | str = 'str1' << 'str2' << 'str3' 55 | end 56 | end 57 | 58 | # => # 59 | 60 | interp_time = Benchmark.measure do 61 | 20000000.times do 62 | str = "#{ 'str1' }#{ 'str2' }#{ 'str3' }" 63 | end 64 | end 65 | 66 | # => # 67 | ``` 68 | 69 | ### Blocks vs Procs 70 | 71 | The `collect|map` methods with blocks are faster because it returns a new array rather than an enumerator. This can be leveraged to increase speed when compared to `Symbol.to_proc` implementations. Though, the latter is typically much more preferable to read. The reason that the `Symbol.to_proc` is slower is because `to_proc` is called on the symbol to perform the following conversion: 72 | 73 | ```ruby 74 | :method.to_proc 75 | # => -> x { x.method } 76 | ``` 77 | 78 | ```ruby 79 | fake_data = 20.times.map { |t| Fake.new(t) } 80 | 81 | proc_time = Benchmark.measure do 82 | 200000000.times do 83 | fake_data.map(&:id) 84 | end 85 | end 86 | 87 | # => # 88 | 89 | block_time = Benchmark.measure do 90 | 200000000.times do 91 | fake_data.map { |d| d.id } 92 | end 93 | end 94 | 95 | # => # 96 | 97 | collect_time = Benchmark.measure do 98 | 200000000.times do 99 | fake_data.collect { |d| d.id } 100 | end 101 | end 102 | 103 | # => # 104 | :037 > 105 | ``` 106 | 107 | ### Modify Garbage Collection 108 | 109 | ```ruby 110 | RUBY_HEAP_MIN_SLOTS=600000 # This is 60(!) times larger than default 111 | RUBY_GC_MALLOC_LIMIT=59000000 # This is 7 times larger than default 112 | RUBY_HEAP_FREE_MIN=100000 # This is 24 times larger than default 113 | ``` 114 | 115 | ### Use Unicorn 116 | 117 | For Ruby on Rails web applications, a server typically runs on a single process, which means that every request is processed one at a time. This can create a significant bottle neck in your application. Fortunately, there are libraries to incorporate concurrency in your application. One of which is Unicorn. 118 | 119 | Unicorn uses Unix forks within a dyno (web worker) to create multiple instances of itself. Now, there are multiple OS instances that can all respond to requests, and complete tasks concurrently. This results in smaller queues, quicker responses, and a faster web application as a whole. The only drawback is memory usage, which can grow to large sizes. Though, with decreasing hardware costs, this becomes a worthwhile expenditure to ensure quick development time for the software components. This also doesn't require thread safe code, since each worker is a self-sufficient clone of the parent. 120 | 121 | Ruby 2.0 makes process forking even more efficient with Unicorn because it implements Copy-on-Write (CoW), which means that a parent and child share physical memory until a write needs to be made. This is a very efficient sharing of resources that can drastically reduce memory use. 122 | 123 | Sometimes, there are still issues with memory leakage, which occurs when workers get stuck or timeout. With the inclusion of a gem, and a small snippet of code that's included below, these edge cases are covered. 124 | 125 | ```ruby 126 | # --- Start of unicorn worker killer code --- 127 | 128 | if ENV['RAILS_ENV'] == 'production' 129 | require 'unicorn/worker_killer' 130 | 131 | max_request_min = 500 132 | max_request_max = 600 133 | 134 | # Max requests per worker 135 | use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max 136 | 137 | oom_min = (240) * (1024**2) 138 | oom_max = (260) * (1024**2) 139 | 140 | # Max memory size (RSS) per worker 141 | use Unicorn::WorkerKiller::Oom, oom_min, oom_max 142 | end 143 | 144 | # --- End of unicorn worker killer code --- 145 | 146 | require ::File.expand_path('../config/environment', __FILE__) 147 | run YourApp::Application 148 | ``` 149 | 150 | ###### GIL 151 | 152 | - https://mail.python.org/pipermail/python-3000/2007-May/007414.html 153 | - http://archive.is/yCFB 154 | - http://archive.is/X1kh 155 | - http://www.confreaks.com/videos/1272-rubyconf2012-implementation-details-of-ruby-2-0-vm 156 | - https://news.ycombinator.com/item?id=3070382 157 | - http://merbist.com/2011/10/18/data-safety-and-gil-removal/ 158 | - http://merbist.com/2011/10/03/about-concurrency-and-the-gil/ 159 | 160 | ###### GC 161 | 162 | - http://www.rubyenterpriseedition.com/documentation.html 163 | - https://lightyearsoftware.com/2012/11/speed-up-mri-ruby-1-9/ 164 | 165 | ###### Unicorn 166 | 167 | - https://www.digitalocean.com/community/articles/how-to-optimize-unicorn-workers-in-a-ruby-on-rails-app 168 | 169 | ###### Use Ruby Threads and Fibers 170 | 171 | - http://merbist.com/2011/02/22/concurrency-in-ruby-explained/ 172 | 173 | ###### Fine Tune Your Objects 174 | 175 | - http://patshaughnessy.net/2013/2/8/ruby-mri-source-code-idioms-3-embedded-objects 176 | 177 | ###### Code optimizations 178 | 179 | - http://www.ruby-doc.org/core-2.1.1/Array.html#M000249 180 | -------------------------------------------------------------------------------- /draft/rubinius.md: -------------------------------------------------------------------------------- 1 | ## What is Rubinius? 2 | Rubinius is an implementation of the Ruby programming language and includes a bytecode virtual machine, Ruby syntax parser, bytecode compiler, generational garbage collector, just-in-time (JIT) native machine code compliler, and Ruby Core and Standard Libraries. [1](http://rubini.us/doc) Rubinius is written using Ruby and C++. 3 | 4 | ## History 5 | Rubinius was originally created to be a Ruby virtual machine and runtime written in pure ruby. The current ruby interpreter is primarily writen in non-Ruby langauges such as C. From 2007 to 2013, the software company Engine Yard was a primary backer of Rubinius. During that time the focus of Rubinius evolved from creating a completely bootstrapped Ruby VM to instead offering an implementation of Ruby with increased performance. Under this new direction, Rubinius partially abandoned the idea of bootstrapping the Ruby VM in all Ruby code, and instead sought to use C++ to increase performance and establish Rubinius as the fastest Ruby implementation. Sadly this effort was not successful, as is shown below. [2](http://programmingzen.com/2010/07/19/the-great-ruby-shootout-july-2010/) 6 | 7 | https://www.dropbox.com/s/2og3qad0d05wryo/Screenshot%202014-03-31%2017.28.28.png 8 | 9 | The one area that Rubinius has shined is in it's support for concurrency and multi-threading, which has become the focus of the Rubinius project. 10 | 11 | https://www.dropbox.com/s/ahdzxveyhcubg89/Screenshot%202013-11-06%2013.12.50.png?m= 12 | https://www.dropbox.com/s/dx5s3zbntfsis04/Screenshot%202014-03-31%2017.36.58.png 13 | 14 | ## How does Rubinius Work? 15 | 16 | -------------------------------------------------------------------------------- /final/final.md: -------------------------------------------------------------------------------- 1 | # Ruby Language Optimization Techniques 2 | 3 | NICHOLAS BENDER, Boise State University 4 | BEN NEELY, Boise State University 5 | JOHN OTANDER, Boise State University 6 | 7 | The Ruby programming language has experienced a recent period of intense adoption and growth due to its excellent speed of iteration, elegant syntax, and passionate community. Additionally, the popular web framework, Ruby on Rails, has given the Ruby language exceptional legitimacy, especially in the prototyping, startup space. It is a tool that emphasizes developer happiness, productivity, and places the responsibility of program in the developer's hands. This gives the language a lot of power, but can serve as a double-edged sword. When leveraged incorrectly, projects can swiftly become inefficient and unmaintainable. Additionally, allowing this flexibility has serious implications with memory management, efficiency, and execution times. 8 | 9 | While support is growing steadily for the language, it is largely dismissed as not having effective scalability, and having far slower runtimes than more compiled, strongly-typed languages. In this article, we propose that many sophisticated techniques exist to enhance Ruby’s performance both in using existing runtimes to compile ruby to statically typed languages, and in using common anti-patterns to improve performance natively. Through experimentation and thorough research we conclude that Ruby performs competitively against it’s similar scripting language counterparts, and can see large increases in many cases. 10 | 11 | __Categories and Subject Descriptors:__ D.2.3 [Coding Tools and Techniques]: Object-oriented programming, B.6.3 [Design Aids]: Optimization 12 | __General Terms:__ Optimization, Algorithms, Performance 13 | __Additional Key Words and Phrases:__ Ruby, Web Development, JRE, C++, C 14 | 15 | ## INTRODUCTION 16 | 17 | Ruby is an object oriented, dynamically-typed, high-level scripting language. It is a programming language that was written for humans and just happens to run on computers. It's intended to promote developer happiness through simplicity, elegant libraries, and terse, readable syntax. Ruby also uses duck typing, meaning type is determined through methods and properties. With each of these techniques and language features there exist certain sacrifices. In this exploration we will conclude that the best practices for stable, performant Ruby programs exist by utilizing the newest versions of the core language properly. 18 | 19 | In recent years, the Ruby programming language has grown its community and established itself as a valuable, popular tool for many tasks [O’Donoghue, 2014]. The success of Ruby on Rails as a prototyping framework, as well as a full-stack solution for some larger companies, has brought forth a myriad of techniques to ensure that the language’s speed differences compared to similar languages are minimal. Ruby’s slower performance, as compared to C or Java, is attributed to interpreted execution, dynamic typing, meta-programming support, and the Global Interpreter Lock [Odaira, Castanos, and Tomari, 2014]. This increase in popularity has caused a large number independent optimization efforts to arise from large corporations such as IBM and AT&T, as well as efforts from the Ruby open-source community. 20 | 21 | > When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.
_- Heim, Michael (2007)._ 22 | 23 | ## 1. MRI (> 1.9) 24 | 25 | The MRI is short for Matz's Ruby Interpreter, which is sometimes also referred to as CRuby. The MRI is named after Yukihiro Matsumoto, the chief designer of the Ruby language. The original MRI was the runtime environment from Ruby's inception to 1.8.7. 26 | 27 | ### 1.1 Program Execution at a High Level 28 | 29 | ``` 30 | | --------- | 31 | | Ruby | 32 | | --------- | 33 | | Tokens | 34 | | --------- | 35 | | AST Nodes | 36 | | --------- | 37 | | 38 | | Interpret 39 | | 40 | v 41 | | --------- | 42 | | C | 43 | | --------- | 44 | | Machine | 45 | | Language | 46 | | --------- | 47 | ``` 48 |
Fig. 1. Ruby Program Execution
49 | 50 | A Ruby script undergoes a tokenization step, which is then parsed into an Abstract Syntax Tree. The Ruby C code (MRI), reads and executes the AST. Note that there is no compilation or translation step. 51 | 52 | ### 1.2 Performance 53 | 54 | Since there isn't a bytecode compilation step, the execution of Ruby programs requires walking the MRI's internal Abstract Syntax Tree. This slows the execution speed significantly because it's more costly to interpret the AST data structure during runtime. 55 | 56 | ![ch_abstract_syntree](https://cloud.githubusercontent.com/assets/1424573/2803533/3c359946-cc9d-11e3-9b35-217ccda504df.png) 57 |
Fig. 2. Abstract Syntax Tree
58 | 59 | ### 1.3 Optimizations 60 | 61 | Use receiver methods whenever possible because it avoids the allocation of a copied string. 62 | 63 | ``` 64 | 2.1.1 :003 > str = "A string.\n" 65 | => "A string.\n" 66 | 2.1.1 :004 > str2 = str 67 | => "A string.\n" 68 | 2.1.1 :005 > str.chomp! 69 | => "A string." 70 | 2.1.1 :006 > str2 71 | => "A string." 72 | 2.1.1 :007 > 73 | ``` 74 |
Fig. 3. Receiver modifying methods vs receiver duplicating methods
75 | 76 | ### 1.4 Summary 77 | 78 | The initial implementation of the MRI is one of the primary reasons that Ruby get its "bad wrap" for code execution speed. 79 | 80 | ## 2. JRUBY 81 | 82 | ### 2.1 Purpose 83 | 84 | Jruby endeavors to solve many Ruby performance issues by eliminating the standard interpreter and instead taking ruby syntax and compiling as much of the core libraries as possible to Java bytecode. Current versions of JRuby support both just-in-time compilation as well as ahead-of-time compilation to Java bytecode. In using these various stages of bytecode in addition to some portions of the standard interpreter, this allows for several advantages over the standard interpreter. 85 | 86 | One of the more obvious improvements is the ability to call and use standard Java libraries and classes from within ruby projects. For larger organizations already using Java for core library support, this allows for improved flexibility of the development environment. 87 | 88 | ### 2.2. Performance 89 | 90 | In 2007, JRuby’s overall performace was compared with Ruby 1.8.5, the Yarv interpreter (now merged into Ruby’s official interpreter), and Rubinius. In it, only 10% of tests performed had JRuby outperforming standard Ruby. These speed enhancements, however, still managed to run all Ruby benchmarks without timing out or producing an error, a claim that no other non-standard Ruby implementation could make. 91 | 92 | Recent benchmarks performed in 2014 between the latest implementations of JRuby and Ruby are comparable to standard Ruby. While some benchmarks provided an optimized runtime, the increased memory overhead of JRuby (>10x) makes scaling ruby applications problematic. 93 | 94 | In addition to JRuby's memory woes, the biggest performance downside of JRuby comes from the speed of initializing the JVM to begin with. A simple ruby script that would take the MRI a fraction of a second to run would require several additional seconds just due to JVM launch times. 95 | 96 | ### 2.3 Lack of C Support 97 | 98 | While JRuby allows for enhanced support and compatibility with Java libraries and applets, the majority of Ruby users (especially those using Ruby on Rails) are used to using libraries that contain native C support. In choosing to support Java, JRuby forces the incompatibility with native C extensions. Most notably are a variety of database interfaces and web servers. 99 | 100 | ### 2.4. Development Lag 101 | 102 | Due to JRuby’s implementation being dependent on Ruby releases prior to implementation and support, this has created an unfortunately long lag time, with the most recent release of JRuby only supporting Ruby version 1.9.3, which was initially released in 2011. 103 | 104 | ### 2.5 Summary 105 | 106 | While JRuby does offer some improved benchmark performance in a minority of cases, the slow development cycle and potential for a massive increase to memory footprint make it an unsuitable option for pure ruby development stacks. 107 | 108 | ## 3. Rubinius 109 | 110 | ### 3.1 Purpose 111 | 112 | Rubinius is an implementation of the Ruby programming language and includes a bytecode virtual machine, Ruby syntax parser, bytecode compiler, generational garbage collector, just-in-time (JIT) native machine code compliler, and Ruby Core and Standard Libraries. Rubinius is written using Ruby and C++. 113 | 114 | ### 3.2 History 115 | 116 | Rubinius was originally created to be a Ruby virtual machine and runtime written in pure ruby. The current ruby interpreter is primarily writen in non-Ruby langauges such as C. From 2007 to 2013, the software company Engine Yard was a primary backer of Rubinius. During that time the focus of Rubinius evolved from creating a completely bootstrapped Ruby VM to instead offering an implementation of Ruby with increased performance. Under this new direction, Rubinius partially abandoned the idea of bootstrapping the Ruby VM in all Ruby code, and instead sought to use C++ to increase performance and establish Rubinius as the fastest Ruby implementation. Recently Rubinius has focused on supporting concurrency and multi-threading. 117 | 118 | ### 3.3 Performance 119 | 120 | Rubinius initially achieved performance equal or slightly better to that of the Yarv interpreter. However, in recent years the MRI interpreter has consistently out performed Rubinius on most benchmark tests. 121 | 122 | Rubinius consistently benchmarks as one of the slowest modern implementations of the Ruby language. 123 | 124 | ### 3.4 Concurrency 125 | 126 | Rubinius does outperform the MRI in threading and concurrency benchmark tests. As shown in the figure bellow, Rubinius (represented by rbx-2.0.0) has a nontrivial advantage over MRI and other Ruby implementations when exciting multithreaded code. 127 | Rubinius is unique amongst Ruby implementations in that it does not have Global Interpreter Lock (GIL). The GIL in all other Reuby implementation allows only one thread to execute at at a time, no matter how many processor cores are available. Not implementing the GIL gives Rubinius the ability to support true threading 128 | 129 | ### 3.5 Summary 130 | 131 | Rubinius’ development has been spotty, depending heaving on a few developers and a few corporate sponsors. As a result Rubinius has constantly shifted focus. Rubinius currently offers a significant advantage over other Ruby interpreters only with regards to programming involving threading and concurrency. For all other uses, the standard MRI Ruby interpreter is faster and more consistently supported. 132 | 133 | ## 4. YARV 134 | 135 | ### 4.1 Background 136 | 137 | ``` 138 | CODE => TOKENIZATION => PARSE TREE => COMPILATION => YARV INSTRUCTIONS 139 | ``` 140 | 141 | When a Ruby program is executed, it first tokenizes the program. This means that the contents are converted into a collection of tokens with associated types. Ruby uses the LALR (Look-Ahead Left Reversed Rightmost Derivation) Parser to apply meaning to the tokens and construct the Abstract Syntax Tree. The compilation step was introduced with Ruby 1.9, and is where the YARV (Yet Another Ruby Virtual Machine) comes into play. It translates the code into bytecode, or YARV instructions. 142 | 143 | ``` 144 | ~|||$ irb 145 | 2.1.1 :001 > code = < puts 1 + 2 147 | 2.1.1 :003"> CODE 148 | => "puts 1 + 2\n" 149 | 2.1.1 :004 > puts RubyVM::InstructionSequence.compile(code).disasm 150 | == disasm: @>========== 151 | 0000 trace 1 ( 1) 152 | 0002 putself 153 | 0003 putobject_OP_INT2FIX_O_1_C_ 154 | 0004 putobject 2 155 | 0006 opt_plus 156 | 0008 opt_send_simple 157 | 0010 leave 158 | => nil 159 | ``` 160 |
Fig. 4. YARV instructions for a simple program
161 | 162 | The introduction of the compilation step and YARV have significantly helped the execution speed of Ruby programs. However, there's always room for more improvements. 163 | 164 | ### 4.2 Purpose 165 | 
 166 | The Ruby MRI is short for Matz's Ruby Interpreter, and is the reference implementation for the Ruby programming language. It was released to the public in 1995, and is still actively developed, with the latest stable build being Ruby 2.1.1. 167 | 168 | The YARV is an interpreter developed by Koichi Sasada that's also known as the KRI. It was developed in order to reduce the execution time of Ruby programs, and was very successful. As a result, YARV was merged into Ruby 1.9.0 and has replaced the MRI. 169 | 170 | As the default interpreter for the Ruby programming language, the MRI has received it's fair share of criticism, primarily due to it's execution speeds and memory consumption. However, recent Ruby versions have seen significant enhancements, and is on par with similar scripting languages like Python. Not to mention the fact that some comparisons have the audacity to compare a compiled language to a scripting language, which is apples to oranges. Developers typically choose Ruby for it's ease of writing/prototyping, understanding the fact that its execution time will always be significantly slower than its compiled counterparts. 171 | 172 | That being said, there are numerous methods and best practices that developers can follow in order to ensure that they're avoiding unnecessary bottlenecks. 173 | 174 | ### 4.3 Performance Out of the Box 175 | 176 | Thanks to the introduction of YARV, vanilla Ruby, on a single thread, has the ability to outperform other alternative Ruby implementations. Consider the following figure, that measures Rails requests per second. 177 | 178 | ![screen shot 2014-05-02 at 6 22 04 pm](https://cloud.githubusercontent.com/assets/1424573/2869079/0305431c-d259-11e3-8b58-6f2ea6ff23e9.png) 179 |
Fig. 5. Rails requests per second.
180 | 181 | ### 4.4 Global Interpreter Lock 182 | 183 | When attempting to optimize execution speed, threads are often utilized in order to process tasks concurrently. This is a feature that Ruby supports, too. However, the MRI/YARV incorporates a Global Interpreter Lock, or GIL, that doesn't permit any true concurrency. A GIL refers to an interpreter thread that doesn't allow code that isn't thread safe to share itself with other threads. This results in little, to no, actual gain in speed when running threads on a multiprocessor machine. 184 | 185 | The primary reason is that the GIL is used to avoid race conditions within C extensions. There are also thread safety reasons, too. Parts of Ruby aren't thread safe (Hash), and numerous C libraries that are wrapped by Ruby's internals. Additionally, the GIL is integral to data integrity, because it ensures that the developer doesn't write any unsafe threading code. 186 | 187 | This, interestingly enough, runs contrary to the fundamental principles of the Ruby language, where all the responsibility is laid on the developer. Ruby allows the developer to have the ultimate freedom without hand holding, yet the GIL is just that, hand holding. 188 | 189 | The GIL isn’t going anywhere. It is deeply intertwined with Ruby and its internals, and many influential Ruby-core figures don't plan on removing the GIL anytime in the near future. Though, this doesn't mean the concurrency can't be achieved. 190 | 191 | Though, you can sidestep the GIL with multiple virtual machines. Sasada Koichi has proposed a Multiple VM (MVM) solution, which is currently being developed. This would consist of multiple virtual machines, running their own processes, and communicate via sockets. 192 | 193 | Granted, this is a drastic step away from typical threading, but some proponents believe that traditional threading isn't necessarily the correct paradigm to follow. Especially considering the fact the Ruby leverages green threads above the GIL rather than talking to the OS directly. 194 | 195 | > Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities. Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads. 196 | 197 | ### 4.5 Simple Code Enhancements 198 | 199 | String interpolation is significantly more performant than concatentation because it doesn't need to allocate new strings, it just modifies a single string in place. 200 | 201 | ```ruby 202 | require 'benchmark' 203 | 204 | concat_time = Benchmark.measure do 205 | 20000000.times do 206 | str = 'str1' << 'str2' << 'str3' 207 | end 208 | end 209 | 210 | # => # 211 | 212 | interp_time = Benchmark.measure do 213 | 20000000.times do 214 | str = "#{str1}#{str2}#{str3}" 215 | end 216 | end 217 | 218 | # => # 219 | ``` 220 |
Fig. 6. Interpolation vs concatenation of Ruby Strings
221 | 222 | The collect|map methods with blocks are faster because it returns a new array rather than an enumerator. This can be leveraged to increase speed when compared to Symbol.to_proc implementations. Though, the latter is typically much more preferable to read. The reason that the Symbol.to_proc is slower is because to_proc is called on the symbol to perform the following conversion: 223 | 224 | ``` 225 | :method.to_proc 226 | # => -> x { x.method } 227 | fake_data = 20.times.map { |t| Fake.new(t) } 228 | 229 | proc_time = Benchmark.measure do 230 | 200000000.times do 231 | fake_data.map(&:id) 232 | end 233 | end 234 | 235 | # => # 236 | ``` 237 | 238 | ``` 239 | block_time = Benchmark.measure do 240 | 200000000.times do 241 | fake_data.map { |d| d.id } 242 | end 243 | end 244 | 245 | # => # 246 | ``` 247 | 248 | ``` 249 | collect_time = Benchmark.measure do 250 | 200000000.times do 251 | fake_data.collect { |d| d.id } 252 | end 253 | end 254 | 255 | # => # 256 | :037 > 257 | ``` 258 |
Fig. 7. Procs vs Blocks vs Collects
259 | 260 | There are also garbage collection modifications that can be made in order to further optimize Ruby execution speed for most systems. 261 | 262 | ``` 263 | # This is 60(!) times larger than default 264 | RUBY_HEAP_MIN_SLOTS=600000 265 | 266 | # This is 7 times larger than default 267 | RUBY_GC_MALLOC_LIMIT=59000000 268 | 269 | # This is 24 times larger than default 270 | RUBY_HEAP_FREE_MIN=100000 271 | ``` 272 |
Fig. 8. Garbage Collection Modification
273 | 274 | ### 4.5 Use Unicorn 275 | 276 | For Ruby on Rails web applications, a server typically runs on a single process, which means that every request is processed one at a time. This can create a significant bottle neck in your application. Fortunately, there are libraries to incorporate concurrency in your application. One of which is Unicorn. 277 | 278 | Unicorn uses Unix forks within a dyno (web worker) to create multiple instances of itself. Now, there are multiple OS instances that can all respond to requests, and complete tasks concurrently. This results in smaller queues, quicker responses, and a faster web application as a whole. The only drawback is memory usage, which can grow to large sizes. Though, with decreasing hardware costs, this becomes a worthwhile expenditure to ensure quick development time for the software components. This also doesn't require thread safe code, since each worker is a self-sufficient clone of the parent. 279 | 280 | Ruby 2.0 makes process forking even more efficient with Unicorn because it implements Copy-on-Write (CoW), which means that a parent and child share physical memory until a write needs to be made. This is a very efficient sharing of resources that can drastically reduce memory use. 281 | 282 | Sometimes, there are still issues with memory leakage, which occurs when workers get stuck or timeout. With the inclusion of a gem, and a small snippet of code that's included below, these edge cases are covered. 283 | 284 | ``` 285 | if ENV['RAILS_ENV'] == 'production' 286 | require 'unicorn/worker_killer' 287 | 288 | max_request_min = 500 289 | max_request_max = 600 290 | 291 | # Max requests per worker 292 | use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max 293 | 294 | oom_min = (240) * (1024**2) 295 | oom_max = (260) * (1024**2) 296 | 297 | # Max memory size (RSS) per worker 298 | use Unicorn::WorkerKiller::Oom, oom_min, oom_max 299 | end 300 | 301 | require ::File.expand_path('../config/environment', __FILE__) 302 | run YourApp::Application 303 | ``` 304 |
Fig. 9. Example Unicorn Implementation
305 | 306 | ## CONCLUSIONS 307 | 308 | In this article, we examined a number independent Ruby optimization efforts. Each of these efforts seek to achieve performance improvements through a variety of techniques. In our examination we’ve determined that for each of these techniques there are certain sacrifices, that outweigh the marginal benefits are gained. Unless a particular feature is needed (such as full threading support or inline Java) the best practices for stable, performant Ruby code exist by utilizing the newest versions of the core language. 309 | 310 | ## ACKNOWLEDGMENTS 311 | 312 | The authors would like to thank Douglas Wiegley. He knows what he did. 313 | 314 | ## REFERENCES 315 | 316 | ROBERT O'DONOGHUE. 2014. Careers Close-up: programmers and software engineers. (March 2014). Retrieved March 31, 2014 http://www.siliconrepublic.com/careers/item/36001-crs-cls-up 317 | 318 | REI ODAIRA, JOSE G. CASTANOS, HISANOBU TOMARI. 2014. Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory. PPoPP’14, February 15-19 2014, Orlando, FL, USA. DOI: http://dx.doi.org/10.1145/2555243.2555247 319 | 320 | ANTONIO CANGIANO. 2007. The Great Ruby Shootout (December 2007). Retrieved March 31, 2014 http://programmingzen.com/2007/12/03/the-great-ruby-shootout/ 321 | 322 | PAT SHAUGHNESSY. 2014. Ruby Under a Microscope: An Illustrated Guide to Ruby Internals 323 | 324 | BUSSINK DIRKJAN. Rubinius - Tales from the Trenches of Developing a Ruby implementation, Barcelona Ruby Conference, 2012. 325 | 326 | NUTTER CHARLES. Why JRuby?, Aloha Ruby Conf, 2012. 327 | 328 | SASADA KOICHI. YARV: Yet Another RubyVM-The Implementation and Evaluation. Transactions of Information Processing Society of Japan. Volume 47. 2006. Pages 57-73. 329 | 330 | SASADA KOICHI. YARV: yet another RubyVM: innovating the ruby interpreter. OOPSLA '05 Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. Pages 158-159. 331 | 332 | SHAUGHNESSY PAT. Visualizing Garbage Collection in Rubinius, JRuby and Ruby 2.0, Ruby Conference, 2013. 333 | 334 | YUKIHIRO MATSUMOTO. 2010. From Lisp to Ruby to Rubinius. 335 | -------------------------------------------------------------------------------- /final/the_final.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/johno/ruby_optimization_techniques/27f5528e00a04332f886cef0cc846d54c4bf6111/final/the_final.pdf -------------------------------------------------------------------------------- /styles.css: -------------------------------------------------------------------------------- 1 | * { 2 | margin-left: 20px; 3 | margin-right: 20px; 4 | } 5 | 6 | h1, 7 | h2, 8 | h3, 9 | h4, 10 | h5, 11 | h6 { 12 | margin-top: 60px !important; 13 | font-weight: 600; 14 | font-family: "Helvetica Neue"; 15 | } 16 | 17 | h1 { 18 | margin: 40px 40px 40px 20px; 19 | } 20 | 21 | h2 { 22 | margin: 20px 20px 20px 20px; 23 | } 24 | 25 | p { 26 | line-height: 1.5em; 27 | font-size: 25px; 28 | font-family: 'Times New Roman'; 29 | } 30 | 31 | pre { 32 | margin: 10px; 33 | font-family: Monaco; 34 | background-color: #eee; 35 | border-top: 10px solid #ddd; 36 | border-bottom: 10px solid #ddd; 37 | padding: 20px; 38 | } 39 | 40 | blockquote { 41 | background-color: #eee; 42 | padding: 20px; 43 | border-left: 10px solid #ddd; 44 | } 45 | 46 | .figure, 47 | .figure a { 48 | color: #888; 49 | font-size: 10px; 50 | margin-left: 10px; 51 | } 52 | 53 | img { 54 | margin: 10px; 55 | } 56 | 57 | --------------------------------------------------------------------------------