├── .gitignore
├── LICENSE
├── README.md
├── draft
    ├── Ruby Optimization.docx
    ├── Ruby Optimization.pdf
    ├── conclusion.md
    ├── introduction.md
    ├── mri.md
    └── rubinius.md
├── final
    ├── final.md
    └── the_final.pdf
└── styles.css


/.gitignore:
--------------------------------------------------------------------------------
1 | *.DS_Store


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2014 John Otander
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Ruby Language Optimization Techniques
  2 | 
  3 | NICHOLAS BENDER, Boise State University
  4 | BEN NEELY, Boise State University
  5 | JOHN OTANDER, Boise State University
  6 | 
  7 | The Ruby programming language has experienced a recent period of intense adoption and growth due to its excellent speed of iteration, elegant syntax, and passionate community. Additionally, the popular web framework, Ruby on Rails,  has given the Ruby language exceptional legitimacy, especially in the prototyping, startup space. It is a tool that emphasizes developer happiness, productivity, and places the responsibility of program in the developer's hands. This gives the language a lot of power, but can serve as a double-edged sword. When leveraged incorrectly, projects can swiftly become inefficient and unmaintainable. Additionally, allowing this flexibility has serious implications with memory management, efficiency, and execution times.
  8 | 
  9 | While support is growing steadily for the language, it is largely dismissed as not having effective scalability, and having far slower runtimes than more compiled, strongly-typed languages. In this article, we propose that many sophisticated techniques exist to enhance Ruby’s performance both in using existing runtimes to compile ruby to statically typed languages, and in using common anti-patterns to improve performance natively. Through experimentation and thorough research we conclude that Ruby performs competitively against it’s similar scripting language counterparts, and can see large increases in many cases.
 10 | 
 11 | __Categories and Subject Descriptors:__ D.2.3 [Coding Tools and Techniques]: Object-oriented programming, B.6.3 [Design Aids]: Optimization
 12 | __General Terms:__ Optimization, Algorithms, Performance
 13 | __Additional Key Words and Phrases:__ Ruby, Web Development, JRE, C++, C
 14 | 
 15 | ## INTRODUCTION
 16 | 
 17 | Ruby is an object oriented, dynamically-typed, high-level scripting language. It is a programming language that was written for humans and just happens to run on computers. It's intended to promote developer happiness through simplicity, elegant libraries, and terse, readable syntax. Ruby also uses duck typing, meaning type is determined through methods and properties. With each of these techniques and language features there exist certain sacrifices. In this exploration we will conclude that the best practices for stable, performant Ruby programs exist by utilizing the newest versions of the core language properly.
 18 | 
 19 | In recent years, the Ruby programming language has grown its community and established itself as a valuable, popular tool for many tasks [O’Donoghue, 2014]. The success of Ruby on Rails as a prototyping framework, as well as a full-stack solution for some larger companies, has brought forth a myriad of techniques to ensure that the language’s speed differences compared to similar languages are minimal. Ruby’s slower performance, as compared to C or Java, is attributed to interpreted execution, dynamic typing, meta-programming support, and the Global Interpreter Lock [Odaira, Castanos, and Tomari, 2014]. This increase in popularity has caused a large number independent optimization efforts to arise from large corporations such as IBM and AT&T, as well as efforts from the Ruby open-source community.
 20 | 
 21 | > When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck. <br>_- Heim, Michael (2007)._
 22 | 
 23 | ## 1. MRI (> 1.9)
 24 | 
 25 | The MRI is short for Matz's Ruby Interpreter, which is sometimes also referred to as CRuby. The MRI is named after Yukihiro Matsumoto, the chief designer of the Ruby language. The original MRI was the runtime environment from Ruby's inception to 1.8.7.
 26 | 
 27 | ### 1.1 Program Execution at a High Level
 28 | 
 29 | ```
 30 | | --------- |
 31 | |   Ruby    |
 32 | | --------- |
 33 | |  Tokens   |
 34 | | --------- |
 35 | | AST Nodes |
 36 | | --------- |
 37 |      |
 38 |      | Interpret
 39 |      |
 40 |      v
 41 | | --------- |
 42 | |     C     |
 43 | | --------- |
 44 | |  Machine  |
 45 | |  Language |
 46 | | --------- |
 47 | ```
 48 | <div class="figure">Fig. 1. Ruby Program Execution</div>
 49 | 
 50 | A Ruby script undergoes a tokenization step, which is then parsed into an Abstract Syntax Tree. The Ruby C code (MRI), reads and executes the AST. Note that there is no compilation or translation step.
 51 | 
 52 | ### 1.2 Performance
 53 | 
 54 | Since there isn't a bytecode compilation step, the execution of Ruby programs requires walking the MRI's internal Abstract Syntax Tree. This slows the execution speed significantly because it's more costly to interpret the AST data structure during runtime.
 55 | 
 56 | ![ch_abstract_syntree](https://cloud.githubusercontent.com/assets/1424573/2803533/3c359946-cc9d-11e3-9b35-217ccda504df.png)
 57 | <div class="figure">Fig. 2. Abstract Syntax Tree <http://edwinmeyer.com/Release_Integrated_RHG_09_10_2008/intro.html></div>
 58 | 
 59 | ### 1.3 Optimizations
 60 | 
 61 | Use receiver methods whenever possible because it avoids the allocation of a copied string.
 62 | 
 63 | ```
 64 | 2.1.1 :003 > str = "A string.\n"
 65 |  => "A string.\n"
 66 | 2.1.1 :004 > str2 = str
 67 |  => "A string.\n"
 68 | 2.1.1 :005 > str.chomp!
 69 |  => "A string."
 70 | 2.1.1 :006 > str2
 71 |  => "A string."
 72 | 2.1.1 :007 >
 73 | ```
 74 | <div class="figure">Fig. 3. Receiver modifying methods vs receiver duplicating methods</div>
 75 | 
 76 | ### 1.4 Summary
 77 | 
 78 | The initial implementation of the MRI is one of the primary reasons that Ruby get its "bad wrap" for code execution speed.
 79 | 
 80 | ## 2. JRUBY
 81 | 
 82 | ### 2.1 Purpose
 83 | 
 84 | Jruby endeavors to solve many Ruby performance issues by eliminating the standard interpreter and instead taking ruby syntax and compiling as much of the core libraries as possible to Java bytecode. Current versions of JRuby support both just-in-time compilation as well as ahead-of-time compilation to Java bytecode. In using these various stages of bytecode in addition to some portions of the standard interpreter, this allows for several advantages over the standard interpreter.
 85 | 
 86 | One of the more obvious improvements is the ability to call and use standard Java libraries and classes from within ruby projects. For larger organizations already using Java for core library support, this allows for improved flexibility of the development environment.
 87 | 
 88 | ### 2.2. Performance
 89 | 
 90 | In 2007, JRuby’s overall performace was compared with Ruby 1.8.5, the Yarv interpreter (now merged into Ruby’s official interpreter), and Rubinius. In it, only 10% of tests performed had JRuby outperforming standard Ruby. These speed enhancements, however, still managed to run all Ruby benchmarks without timing out or producing an error, a claim that no other non-standard Ruby implementation could make.
 91 | 
 92 | Recent benchmarks performed in 2014 between the latest implementations of JRuby and Ruby are comparable to standard Ruby. While some benchmarks provided an optimized runtime, the increased memory overhead of JRuby (>10x) makes scaling ruby applications problematic.
 93 | 
 94 | In addition to JRuby's memory woes, the biggest performance downside of JRuby comes from the speed of initializing the JVM to begin with. A simple ruby script that would take the MRI a fraction of a second to run would require several additional seconds just due to JVM launch times.
 95 | 
 96 | ### 2.3 Lack of C Support
 97 | 
 98 | While JRuby allows for enhanced support and compatibility with Java libraries and applets, the majority of Ruby users (especially those using Ruby on Rails) are used to using libraries that contain native C support. In choosing to support Java, JRuby forces the incompatibility with native C extensions. Most notably are a variety of database interfaces and web servers.
 99 | 
100 | ### 2.4. Development Lag
101 | 
102 | Due to JRuby’s implementation being dependent on Ruby releases prior to implementation and support, this has created an unfortunately long lag time, with the most recent release of JRuby only supporting Ruby version 1.9.3, which was initially released in 2011.
103 | 
104 | ### 2.5 Summary
105 | 
106 | While JRuby does offer some improved benchmark performance in a minority of cases, the slow development cycle and potential for a massive increase to memory footprint make it an unsuitable option for pure ruby development stacks.
107 | 
108 | ## 3. Rubinius
109 | 
110 | ### 3.1 Purpose
111 | 
112 | Rubinius is an implementation of the Ruby programming language and includes a bytecode virtual machine, Ruby syntax parser, bytecode compiler, generational garbage collector, just-in-time (JIT) native machine code compliler, and Ruby Core and Standard Libraries. Rubinius is written using Ruby and C++.
113 | 
114 | ### 3.2 History
115 | 
116 | Rubinius was originally created to be a Ruby virtual machine and runtime written in pure ruby. The current ruby interpreter is primarily writen in non-Ruby langauges such as C. From 2007 to 2013, the software company Engine Yard was a primary backer of Rubinius. During that time the focus of Rubinius evolved from creating a completely bootstrapped Ruby VM to instead offering an implementation of Ruby with increased performance. Under this new direction, Rubinius partially abandoned the idea of bootstrapping the Ruby VM in all Ruby code, and instead sought to use C++ to increase performance and establish Rubinius as the fastest Ruby implementation. Recently Rubinius has focused on supporting concurrency and multi-threading.
117 | 
118 | ### 3.3 Performance
119 | 
120 | Rubinius initially achieved performance equal or slightly better to that of the Yarv interpreter. However, in recent years the MRI interpreter has consistently out performed Rubinius on most benchmark tests.
121 | 
122 | Rubinius consistently benchmarks as one of the slowest modern implementations of the Ruby language.
123 | 
124 | ### 3.4 Concurrency
125 | 
126 | Rubinius does outperform the MRI in threading and concurrency benchmark tests. As shown in the figure bellow, Rubinius (represented by rbx-2.0.0) has a nontrivial advantage over MRI and other Ruby implementations when exciting multithreaded code.
127 | Rubinius is unique amongst Ruby implementations in that it does not have Global Interpreter Lock (GIL). The GIL in all other Reuby implementation allows only one thread to execute at at a time, no matter how many processor cores are available. Not implementing the GIL gives Rubinius the ability to support true threading
128 | 
129 | ### 3.5 Summary
130 | 
131 | Rubinius’ development has been spotty, depending heaving on a few developers and a few corporate sponsors. As a result Rubinius has constantly shifted focus. Rubinius currently offers a significant advantage over other Ruby interpreters only with regards to programming involving threading and concurrency. For all other uses, the standard MRI Ruby interpreter is faster and more consistently supported.
132 | 
133 | ## 4. YARV
134 | 
135 | ### 4.1 Background
136 | 
137 | ```
138 | CODE => TOKENIZATION => PARSE TREE => COMPILATION => YARV INSTRUCTIONS
139 | ```
140 | 
141 | When a Ruby program is executed, it first tokenizes the program. This means that the contents are converted into a collection of tokens with associated types.  Ruby uses the LALR (Look-Ahead Left Reversed Rightmost Derivation) Parser to apply meaning to the tokens and construct the Abstract Syntax Tree. The compilation step was introduced with Ruby 1.9, and is where the YARV (Yet Another Ruby Virtual Machine) comes into play. It translates the code into bytecode, or YARV instructions.
142 | 
143 | ```
144 | ~|||$ irb
145 | 2.1.1 :001 > code = <<CODE
146 | 2.1.1 :002"> puts 1 + 2
147 | 2.1.1 :003"> CODE
148 |  => "puts 1 + 2\n"
149 | 2.1.1 :004 > puts RubyVM::InstructionSequence.compile(code).disasm
150 | == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========
151 | 0000 trace            1                                               (   1)
152 | 0002 putself
153 | 0003 putobject_OP_INT2FIX_O_1_C_
154 | 0004 putobject        2
155 | 0006 opt_plus         <callinfo!mid:+, argc:1, ARGS_SKIP>
156 | 0008 opt_send_simple  <callinfo!mid:puts, argc:1, FCALL|ARGS_SKIP>
157 | 0010 leave
158 |  => nil
159 | ```
160 | <div class="figure">Fig. 4. YARV instructions for a simple program</div>
161 | 
162 | The introduction of the compilation step and YARV have significantly helped the execution speed of Ruby programs. However, there's always room for more improvements.
163 | 
164 | ### 4.2 Purpose
165 |  
166 | The Ruby MRI is short for Matz's Ruby Interpreter, and is the reference implementation for the Ruby programming language. It was released to the public in 1995, and is still actively developed, with the latest stable build being Ruby 2.1.1.
167 | 
168 | The YARV is an interpreter developed by Koichi Sasada that's also known as the KRI. It was developed in order to reduce the execution time of Ruby programs, and was very successful. As a result, YARV was merged into Ruby 1.9.0 and has replaced the MRI.
169 | 
170 | As the default interpreter for the Ruby programming language, the MRI has received it's fair share of criticism, primarily due to it's execution speeds and memory consumption. However, recent Ruby versions have seen significant enhancements, and is on par with similar scripting languages like Python. Not to mention the fact that some comparisons have the audacity to compare a compiled language to a scripting language, which is apples to oranges. Developers typically choose Ruby for it's ease of writing/prototyping, understanding the fact that its execution time will always be significantly slower than its compiled counterparts.
171 | 
172 | That being said, there are numerous methods and best practices that developers can follow in order to ensure that they're avoiding unnecessary bottlenecks.
173 | 
174 | ### 4.3 Performance Out of the Box
175 | 
176 | Thanks to the introduction of YARV, vanilla Ruby, on a single thread, has the ability to outperform other alternative Ruby implementations. Consider the following figure, that measures Rails requests per second.
177 | 
178 | ![screen shot 2014-05-02 at 6 22 04 pm](https://cloud.githubusercontent.com/assets/1424573/2869079/0305431c-d259-11e3-8b58-6f2ea6ff23e9.png)
179 | <div class="figure">Fig. 5. Rails requests per second.</div>
180 | 
181 | ### 4.4 Global Interpreter Lock
182 | 
183 | When attempting to optimize execution speed, threads are often utilized in order to process tasks concurrently. This is a feature that Ruby supports, too. However, the MRI/YARV incorporates a Global Interpreter Lock, or GIL, that doesn't permit any true concurrency. A GIL refers to an interpreter thread that doesn't allow code that isn't thread safe to share itself with other threads. This results in little, to no, actual gain in speed when running threads on a multiprocessor machine.
184 | 
185 | The primary reason is that the GIL is used to avoid race conditions within C extensions. There are also thread safety reasons, too. Parts of Ruby aren't thread safe (Hash), and numerous C libraries that are wrapped by Ruby's internals. Additionally, the GIL is integral to data integrity, because it ensures that the developer doesn't write any unsafe threading code.
186 | 
187 | This, interestingly enough, runs contrary to the fundamental principles of the Ruby language, where all the responsibility is laid on the developer. Ruby allows the developer to have the ultimate freedom without hand holding, yet the GIL is just that, hand holding.
188 | 
189 | The GIL isn’t going anywhere. It is deeply intertwined with Ruby and its internals, and many influential Ruby-core figures don't plan on removing the GIL anytime in the near future. Though, this doesn't mean the concurrency can't be achieved.
190 | 
191 | Though, you can sidestep the GIL with multiple virtual machines. Sasada Koichi has proposed a Multiple VM (MVM) solution, which is currently being developed. This would consist of multiple virtual machines, running their own processes, and communicate via sockets.
192 | 
193 | Granted, this is a drastic step away from typical threading, but some proponents believe that traditional threading isn't necessarily the correct paradigm to follow. Especially considering the fact the Ruby leverages green threads above the GIL rather than talking to the OS directly.
194 | 
195 | > Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities. Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads.
196 | 
197 | ### 4.5 Simple Code Enhancements
198 | 
199 | String interpolation is significantly more performant than concatentation because it doesn't need to allocate new strings, it just modifies a single string in place.
200 | 
201 | ```ruby
202 | require 'benchmark'
203 | 
204 | concat_time = Benchmark.measure do
205 |   20000000.times do
206 |     str = 'str1' << 'str2' << 'str3'
207 |   end
208 | end
209 | 
210 | # => #<Benchmark::Tms:0x007fdf9ba49ea8 @label="", @real=79.152523, @cstime=0.0, @cutime=0.0, @stime=0.04000000000000001, @utime=79.11, @total=79.15>
211 | 
212 | interp_time = Benchmark.measure do
213 |   20000000.times do
214 |     str = "#{str1}#{str2}#{str3}"
215 |   end
216 | end
217 | 
218 | # => #<Benchmark::Tms:0x007fdf9b990bd8 @label="", @real=22.713976, @cstime=0.0, @cutime=0.0, @stime=0.009999999999999995, @utime=22.689999999999998, @total=22.7>
219 | ```
220 | <div class="figure">Fig. 6. Interpolation vs concatenation of Ruby Strings</div>
221 | 
222 | The collect|map methods with blocks are faster because it returns a new array rather than an enumerator. This can be leveraged to increase speed when compared to Symbol.to_proc implementations. Though, the latter is typically much more preferable to read. The reason that the Symbol.to_proc is slower is because to_proc is called on the symbol to perform the following conversion:
223 | 
224 | ```
225 | :method.to_proc
226 | # => -> x { x.method }
227 | fake_data = 20.times.map { |t| Fake.new(t) }
228 | 
229 | proc_time = Benchmark.measure do
230 |   200000000.times do
231 |     fake_data.map(&:id)
232 |   end
233 | end
234 | 
235 | #  => #<Benchmark::Tms:0x007fdf9b8b0498 @label="", @real=491.332415, @cstime=0.0, @cutime=0.0, @stime=4.8, @utime=426.06999999999994, @total=430.86999999999995>
236 | ```
237 | 
238 | ```
239 | block_time = Benchmark.measure do
240 |   200000000.times do
241 |     fake_data.map { |d| d.id }
242 |   end
243 | end
244 | 
245 | # => #<Benchmark::Tms:0x007fdf9b931d40 @label="", @real=431.731424, @cstime=0.0, @cutime=0.0, @stime=2.66, @utime=416.21000000000004, @total=418.87000000000006>
246 | ```
247 | 
248 | ```
249 | collect_time = Benchmark.measure do
250 |   200000000.times do
251 |     fake_data.collect { |d| d.id }
252 |   end
253 | end
254 | 
255 | # => #<Benchmark::Tms:0x007fdf9b821518 @label="", @real=386.234513, @cstime=0.0, @cutime=0.0, @stime=1.1800000000000006, @utime=384.28, @total=385.46>
256 |  :037 >
257 | ```
258 | <div class="figure">Fig. 7. Procs vs Blocks vs Collects</div>
259 | 
260 | There are also garbage collection modifications that can be made in order to further optimize Ruby execution speed for most systems.
261 | 
262 | ```
263 | # This is 60(!) times larger than default
264 | RUBY_HEAP_MIN_SLOTS=600000
265 | 
266 | # This is 7 times larger than default
267 | RUBY_GC_MALLOC_LIMIT=59000000
268 | 
269 | # This is 24 times larger than default
270 | RUBY_HEAP_FREE_MIN=100000
271 | ```
272 | <div class="figure">Fig. 8. Garbage Collection Modification</div>
273 | 
274 | ### 4.5 Use Unicorn
275 | 
276 | For Ruby on Rails web applications, a server typically runs on a single process, which means that every request is processed one at a time. This can create a significant bottle neck in your application. Fortunately, there are libraries to incorporate concurrency in your application. One of which is Unicorn.
277 | 
278 | Unicorn uses Unix forks within a dyno (web worker) to create multiple instances of itself. Now, there are multiple OS instances that can all respond to requests, and complete tasks concurrently. This results in smaller queues, quicker responses, and a faster web application as a whole. The only drawback is memory usage, which can grow to large sizes. Though, with decreasing hardware costs, this becomes a worthwhile expenditure to ensure quick development time for the software components. This also doesn't require thread safe code, since each worker is a self-sufficient clone of the parent.
279 | 
280 | Ruby 2.0 makes process forking even more efficient with Unicorn because it implements Copy-on-Write (CoW), which means that a parent and child share physical memory until a write needs to be made. This is a very efficient sharing of resources that can drastically reduce memory use.
281 | 
282 | Sometimes, there are still issues with memory leakage, which occurs when workers get stuck or timeout. With the inclusion of a gem, and a small snippet of code that's included below, these edge cases are covered.
283 | 
284 | ```
285 | if ENV['RAILS_ENV'] == 'production'
286 |   require 'unicorn/worker_killer'
287 | 
288 |   max_request_min =  500
289 |   max_request_max =  600
290 | 
291 |   # Max requests per worker
292 |   use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max
293 | 
294 |   oom_min = (240) * (1024**2)
295 |   oom_max = (260) * (1024**2)
296 | 
297 |   # Max memory size (RSS) per worker
298 |   use Unicorn::WorkerKiller::Oom, oom_min, oom_max
299 | end
300 | 
301 | require ::File.expand_path('../config/environment',  __FILE__)
302 | run YourApp::Application
303 | ```
304 | <div class="figure">Fig. 9. Example Unicorn Implementation</div>
305 | 
306 | ## CONCLUSIONS
307 | 
308 | In this article, we examined a number independent Ruby optimization efforts. Each of these efforts seek to achieve performance improvements through a variety of techniques. In our examination we’ve determined that for each of these techniques there are certain sacrifices, that outweigh the marginal benefits are gained. Unless a particular feature is needed (such as full threading support or inline Java) the best practices for stable, performant Ruby code exist by utilizing the newest versions of the core language.
309 | 
310 | ## ACKNOWLEDGMENTS
311 | 
312 | The authors would like to thank Douglas Wiegley. He knows what he did.
313 | 
314 | ## REFERENCES
315 | 
316 | ROBERT O'DONOGHUE. 2014. Careers Close-up: programmers and software engineers. (March 2014). Retrieved March 31, 2014 http://www.siliconrepublic.com/careers/item/36001-crs-cls-up
317 | 
318 | REI ODAIRA, JOSE G. CASTANOS, HISANOBU TOMARI. 2014. Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory. PPoPP’14, February 15-19 2014, Orlando, FL, USA. DOI: http://dx.doi.org/10.1145/2555243.2555247
319 | 
320 | ANTONIO CANGIANO. 2007. The Great Ruby Shootout (December 2007). Retrieved March 31, 2014 http://programmingzen.com/2007/12/03/the-great-ruby-shootout/ 
321 | 
322 | PAT SHAUGHNESSY. 2014. Ruby Under a Microscope: An Illustrated Guide to Ruby Internals
323 | 
324 | BUSSINK DIRKJAN. Rubinius - Tales from the Trenches of Developing a Ruby implementation, Barcelona Ruby Conference, 2012.
325 | 
326 | NUTTER CHARLES. Why JRuby?, Aloha Ruby Conf, 2012.
327 | 
328 | SASADA KOICHI. YARV: Yet Another RubyVM-The Implementation and Evaluation. Transactions of Information Processing Society of Japan. Volume 47. 2006. Pages 57-73.  
329 | 
330 | SASADA KOICHI. YARV: yet another RubyVM: innovating the ruby interpreter. OOPSLA '05 Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. Pages 158-159.  
331 | 
332 | SHAUGHNESSY PAT. Visualizing Garbage Collection in Rubinius, JRuby and Ruby 2.0, Ruby Conference, 2013.
333 | 
334 | YUKIHIRO MATSUMOTO. 2010. From Lisp to Ruby to Rubinius. 
335 | 


--------------------------------------------------------------------------------
/draft/Ruby Optimization.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/johno/ruby_optimization_techniques/27f5528e00a04332f886cef0cc846d54c4bf6111/draft/Ruby Optimization.docx


--------------------------------------------------------------------------------
/draft/Ruby Optimization.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/johno/ruby_optimization_techniques/27f5528e00a04332f886cef0cc846d54c4bf6111/draft/Ruby Optimization.pdf


--------------------------------------------------------------------------------
/draft/conclusion.md:
--------------------------------------------------------------------------------
1 | Conclusion
2 | 


--------------------------------------------------------------------------------
/draft/introduction.md:
--------------------------------------------------------------------------------
 1 | # An Analysis of Ruby Optimization Techniques
 2 | 
 3 | The Ruby programming language has experienced a recent period of intense adoption and growth due to its excellent speed of iteration and due, in no small part, to the acceptance of the Ruby on Rails web framework within the startup sphere. While support is growing steadily for the language, it is largely dismissed as not having effective scalability, or having far slower runtimes than more traditional strongly-typed complex languages. In this article, we propose that many sophisticated techniques exist to enhance Ruby’s performance both in using existing runtimes to compile ruby to statically typed languages, and in  using common anti-patterns to improve performance natively. Through experimentation and thorough research we conclude that Ruby performs competitively against it’s similar scripting language counterparts, and can see increases of [XXXXX]% in many cases.
 4 | 
 5 | ## Introduction
 6 | 
 7 | In recent years, the Ruby programming language has grown its community and established itself as a valuable and popular tool for many tasks [O’Donoghue, 2014]. The success of Ruby on Rails as a prototyping framework as well as a full-stack solution for some larger companies has brought forth a myriad of techniques to ensure that the language’s speed differences compared to similar languages are minimal. Ruby’s slower performance as compared to C or Java is attributed to interpreted execution, dynamic typing, meta-programming support, and the Global Interpreter Lock [Odaira, Castanos, and Tomari, 2014]. This increase in popularity has caused a large number independent optimization efforts to arise from large corporations such as IBM, as well as efforts from the Ruby open-source community.
 8 | 
 9 | With each of these techniques there exist certain sacrifices, but in this exploration we will conclude that the best practices for stable, performant Ruby code exist by utilizing the newest versions of the core language properly, and not by utilizing other third party interpreters or solutions.
10 | 
11 | ## An Overview of the Ruby Language
12 | 
13 | Ruby is an object oriented, dynamically-typed, high-level scripting language. It is a programming language that was written for humans and just happens to run on computers. It's intended to promote developer happiness through simplicity, elegant libraries, and terse, readable syntax. Ruby uses duck typing, meaning type is determined through methods and properties.
14 | 
15 | > When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.
16 | 
17 | Heim, Michael (2007). Exploring Indiana Highways. Exploring America's Highway. p. 68. ISBN 978-0-9744358-3-1.
18 | 
19 | ### Program Execution at a High Level
20 | 
21 | CODE => TOKENIZATION => PARSE TREE => COMPILATION => YARV INSTRUCTIONS
22 | 
23 | #### Tokenizing a Ruby program
24 | 
25 | When a Ruby program is executed, it first tokenizes the program. This means that the contents are converted into a collection of tokens with associated types.
26 | 
27 | #### Parsing the tokens
28 | 
29 | Ruby uses the LALR (Look-Ahead Left Reversed Rightmost Derivation) Parser to apply meaning to the tokens and construct the Abstract Syntax Tree.
30 | 
31 | #### The compilation step
32 | 
33 | The compilation step was introduced with Ruby 1.9, and is where the YARV (Yet Another Ruby Virtual Machine) comes into play. It translates the code into _bytecode_, or YARV instructions.
34 | 
35 | #### YARV instructions for a simple program
36 | 
37 | ```
38 | ~|||$ irb
39 | 2.1.1 :001 > code = <<CODE
40 | 2.1.1 :002"> puts 1 + 2
41 | 2.1.1 :003"> CODE
42 |  => "puts 1 + 2\n" 
43 | 2.1.1 :004 > puts RubyVM::InstructionSequence.compile(code).disasm
44 | == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========
45 | 0000 trace            1                                               (   1)
46 | 0002 putself          
47 | 0003 putobject_OP_INT2FIX_O_1_C_ 
48 | 0004 putobject        2
49 | 0006 opt_plus         <callinfo!mid:+, argc:1, ARGS_SKIP>
50 | 0008 opt_send_simple  <callinfo!mid:puts, argc:1, FCALL|ARGS_SKIP>
51 | 0010 leave            
52 |  => nil
53 | ```
54 | 
55 | ### A significant performance boost
56 | 
57 | The introduction of the compilation step and YARV have significantly helped the execution speed of Ruby programs. However, there's always room for more improvements.
58 | 


--------------------------------------------------------------------------------
/draft/mri.md:
--------------------------------------------------------------------------------
  1 | # What is the MRI/YARV
  2 | 
  3 | The Ruby MRI is short for Matz's Ruby Interpreter, and is the reference implementation for the Ruby programming language. It was released to the public in 1995, and is still actively developed, with the latest stable build being Ruby 2.1.1.
  4 | 
  5 | The YARV is an interpreter developed by Koichi Sasada that's also known as the KRI. It was developed in order to reduce the execution time of Ruby programs, and was very successful. As a result, YARV was merged into Ruby 1.9.0 and has replaced the MRI.
  6 | 
  7 | As the default interpreter for the Ruby programming language, the MRI has received it's fair share of criticism, primarily due to it's execution speeds and memory consumption.  However, recent Ruby versions have seen significant enhancements, and is on par with similar scripting languages like Python.  Not to mention the fact that some comparisons have the audacity to compare a compiled language to a scripting language, which is apples to oranges.  Developers typically choose Ruby for it's ease of writing/prototyping, understanding the fact that its execution time will _always_ be significantly slower than its compiled counterparts.
  8 | 
  9 | That being said, there are numerous methods and best practices that developers can follow in order to ensure that they're avoiding unnecessary bottlenecks.
 10 | 
 11 | ## Performance out of the box
 12 | 
 13 | Thanks to the introduction of YARV, vanilla Ruby, on a single thread, has the ability to outperform other alternative Ruby implementations. Consider the following figure, that measures Rails requests per second.
 14 | 
 15 | ![screen shot 2014-03-31 at 5 06 28 pm](https://cloud.githubusercontent.com/assets/1424573/2574040/8c8da974-b929-11e3-84c8-04d792bcbbd9.png)
 16 | http://www.isrubyfastyet.com/
 17 | 
 18 | ## Global Interpreter Lock
 19 | 
 20 | When attempting to optimize execution speed, threads are often utilized in order to process tasks concurrently. This is a feature that Ruby supports, too. However, the MRI/YARV incorporates a Global Interpreter Lock, or GIL, that doesn't permit any true concurrency. A GIL refers to an interpreter thread that doesn't allow code that isn't thread safe to share itself with other threads. This results in little, to no, actual gain in speed when running threads on a multiprocessor machine.
 21 | 
 22 | ![screen shot 2014-03-31 at 5 15 19 pm](https://cloud.githubusercontent.com/assets/1424573/2574079/5e0967fe-b92a-11e3-9806-65ea3d4d04cf.png)
 23 | http://www.igvita.com/2008/11/13/concurrency-is-a-myth-in-ruby/
 24 | 
 25 | ### Why implement a GIL?
 26 | 
 27 | The primary reason is that the GIL is used to avoid race conditions within C extensions. There are also thread safety reasons, too. Parts of Ruby aren't thread safe (Hash), and numerous C libraries that are wrapped by Ruby's internals. Additionally, the GIL is integral to data integrity, because it ensures that the developer doesn't write any unsafe threading code.
 28 | 
 29 | This, interestingly enough, runs contrary to the fundamental principles of the Ruby language, where all the responsibility is laid on the developer. Ruby allows the developer to have the ultimate freedom without hand holding, yet the GIL is just that, hand holding.
 30 | 
 31 | ### The GIL is here to stay
 32 | 
 33 | The GIL is deeply intertwined with Ruby and its internals, and many influential Ruby-core figures don't plan on removing the GIL anytime in the near future. Though, this doesn't mean the concurrency can't be achieved. 
 34 | 
 35 | ### Sidestep the GIL with multiple virtual machines
 36 | 
 37 | Sasada Koichi has proposed a Multiple VM (MVM) solution, which is currently being developed. This would consist of multiple virtual machines, running their own processes, and communicate via sockets.
 38 | 
 39 | Granted, this is a drastic step away from typical threading, but some proponents believe that traditional threading isn't necessarily the correct paradigm to follow. Especially considering the fact the Ruby leverages green threads above the GIL rather than talking to the OS directly.
 40 | 
 41 | > Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities. Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads.
 42 | 
 43 | ## Some simple code enhancements
 44 | 
 45 | ### String Optimization
 46 | 
 47 | String interpolation is significantly more performant than concatentation because it doesn't need to allocate new strings, it just modifies a single string in place.
 48 | 
 49 | ```ruby
 50 | require 'benchmark'
 51 | 
 52 | concat_time = Benchmark.measure do
 53 |   20000000.times do
 54 |     str = 'str1' << 'str2' << 'str3'
 55 |   end
 56 | end
 57 | 
 58 | # => #<Benchmark::Tms:0x007fdf9ba49ea8 @label="", @real=79.152523, @cstime=0.0, @cutime=0.0, @stime=0.04000000000000001, @utime=79.11, @total=79.15> 
 59 | 
 60 | interp_time = Benchmark.measure do
 61 |   20000000.times do
 62 |     str = "#{ 'str1' }#{ 'str2' }#{ 'str3' }"
 63 |   end
 64 | end
 65 | 
 66 | # => #<Benchmark::Tms:0x007fdf9b990bd8 @label="", @real=22.713976, @cstime=0.0, @cutime=0.0, @stime=0.009999999999999995, @utime=22.689999999999998, @total=22.7>
 67 | ```
 68 | 
 69 | ### Blocks vs Procs
 70 | 
 71 | The `collect|map` methods with blocks are faster because it returns a new array rather than an enumerator. This can be leveraged to increase speed when compared to `Symbol.to_proc` implementations. Though, the latter is typically much more preferable to read. The reason that the `Symbol.to_proc` is slower is because `to_proc` is called on the symbol to perform the following conversion:
 72 | 
 73 | ```ruby
 74 | :method.to_proc 
 75 | # => -> x { x.method }
 76 | ```
 77 | 
 78 | ```ruby
 79 | fake_data = 20.times.map { |t| Fake.new(t) }
 80 | 
 81 | proc_time = Benchmark.measure do
 82 |   200000000.times do
 83 |     fake_data.map(&:id)
 84 |   end
 85 | end
 86 | 
 87 | #  => #<Benchmark::Tms:0x007fdf9b8b0498 @label="", @real=491.332415, @cstime=0.0, @cutime=0.0, @stime=4.8, @utime=426.06999999999994, @total=430.86999999999995>
 88 | 
 89 | block_time = Benchmark.measure do
 90 |   200000000.times do
 91 |     fake_data.map { |d| d.id }
 92 |   end
 93 | end
 94 | 
 95 | # => #<Benchmark::Tms:0x007fdf9b931d40 @label="", @real=431.731424, @cstime=0.0, @cutime=0.0, @stime=2.66, @utime=416.21000000000004, @total=418.87000000000006>
 96 | 
 97 | collect_time = Benchmark.measure do
 98 |   200000000.times do
 99 |     fake_data.collect { |d| d.id }
100 |   end
101 | end
102 | 
103 | # => #<Benchmark::Tms:0x007fdf9b821518 @label="", @real=386.234513, @cstime=0.0, @cutime=0.0, @stime=1.1800000000000006, @utime=384.28, @total=385.46> 
104 |  :037 >
105 | ```
106 | 
107 | ### Modify Garbage Collection
108 | 
109 | ```ruby
110 | RUBY_HEAP_MIN_SLOTS=600000 # This is 60(!) times larger than default
111 | RUBY_GC_MALLOC_LIMIT=59000000 # This is 7 times larger than default
112 | RUBY_HEAP_FREE_MIN=100000 # This is 24 times larger than default
113 | ```
114 | 
115 | ### Use Unicorn
116 | 
117 | For Ruby on Rails web applications, a server typically runs on a single process, which means that every request is processed one at a time. This can create a significant bottle neck in your application. Fortunately, there are libraries to incorporate concurrency in your application. One of which is Unicorn.
118 | 
119 | Unicorn uses Unix forks within a dyno (web worker) to create multiple instances of itself. Now, there are multiple OS instances that can all respond to requests, and complete tasks concurrently. This results in smaller queues, quicker responses, and a faster web application as a whole. The only drawback is memory usage, which can grow to large sizes. Though, with decreasing hardware costs, this becomes a worthwhile expenditure to ensure quick development time for the software components. This also doesn't require thread safe code, since each worker is a self-sufficient clone of the parent.
120 | 
121 | Ruby 2.0 makes process forking even more efficient with Unicorn because it implements Copy-on-Write (CoW), which means that a parent and child share physical memory until a write needs to be made. This is a very efficient sharing of resources that can drastically reduce memory use.
122 | 
123 | Sometimes, there are still issues with memory leakage, which occurs when workers get stuck or timeout. With the inclusion of a gem, and a small snippet of code that's included below, these edge cases are covered.
124 | 
125 | ```ruby
126 | # --- Start of unicorn worker killer code ---
127 | 
128 | if ENV['RAILS_ENV'] == 'production' 
129 |   require 'unicorn/worker_killer'
130 | 
131 |   max_request_min =  500
132 |   max_request_max =  600
133 | 
134 |   # Max requests per worker
135 |   use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max
136 | 
137 |   oom_min = (240) * (1024**2)
138 |   oom_max = (260) * (1024**2)
139 | 
140 |   # Max memory size (RSS) per worker
141 |   use Unicorn::WorkerKiller::Oom, oom_min, oom_max
142 | end
143 | 
144 | # --- End of unicorn worker killer code ---
145 | 
146 | require ::File.expand_path('../config/environment',  __FILE__)
147 | run YourApp::Application
148 | ```
149 | 
150 | ###### GIL
151 | 
152 |   - https://mail.python.org/pipermail/python-3000/2007-May/007414.html
153 |   - http://archive.is/yCFB
154 |   - http://archive.is/X1kh
155 |   - http://www.confreaks.com/videos/1272-rubyconf2012-implementation-details-of-ruby-2-0-vm
156 |   - https://news.ycombinator.com/item?id=3070382
157 |   - http://merbist.com/2011/10/18/data-safety-and-gil-removal/
158 |   - http://merbist.com/2011/10/03/about-concurrency-and-the-gil/
159 | 
160 | ###### GC
161 | 
162 |   - http://www.rubyenterpriseedition.com/documentation.html
163 |   - https://lightyearsoftware.com/2012/11/speed-up-mri-ruby-1-9/
164 | 
165 | ###### Unicorn
166 | 
167 |   - https://www.digitalocean.com/community/articles/how-to-optimize-unicorn-workers-in-a-ruby-on-rails-app
168 | 
169 | ###### Use Ruby Threads and Fibers
170 | 
171 |   - http://merbist.com/2011/02/22/concurrency-in-ruby-explained/
172 | 
173 | ###### Fine Tune Your Objects
174 | 
175 |   - http://patshaughnessy.net/2013/2/8/ruby-mri-source-code-idioms-3-embedded-objects
176 | 
177 | ###### Code optimizations
178 | 
179 |   - http://www.ruby-doc.org/core-2.1.1/Array.html#M000249
180 | 


--------------------------------------------------------------------------------
/draft/rubinius.md:
--------------------------------------------------------------------------------
 1 | ## What is Rubinius?
 2 | Rubinius is an implementation of the Ruby programming language and includes a bytecode virtual machine, Ruby syntax parser, bytecode compiler, generational garbage collector, just-in-time (JIT) native machine code compliler, and Ruby Core and Standard Libraries. [1](http://rubini.us/doc) Rubinius is written using Ruby and C++.
 3 | 
 4 | ## History
 5 | Rubinius was originally created to be a Ruby virtual machine and runtime written in pure ruby. The current ruby interpreter is primarily writen in non-Ruby langauges such as C. From 2007 to 2013, the software company Engine Yard was a primary backer of Rubinius. During that time the focus of Rubinius evolved from creating a completely  bootstrapped Ruby VM to instead offering an implementation of Ruby with increased performance. Under this new direction, Rubinius partially abandoned the idea of bootstrapping the Ruby VM in all Ruby code, and instead sought to use C++ to increase performance and establish Rubinius as the fastest Ruby implementation. Sadly this effort was not successful, as is shown below. [2](http://programmingzen.com/2010/07/19/the-great-ruby-shootout-july-2010/)
 6 | 
 7 | https://www.dropbox.com/s/2og3qad0d05wryo/Screenshot%202014-03-31%2017.28.28.png
 8 | 
 9 | The one area that Rubinius has shined is in it's support for concurrency and multi-threading, which has become the focus of the Rubinius project. 
10 | 
11 | https://www.dropbox.com/s/ahdzxveyhcubg89/Screenshot%202013-11-06%2013.12.50.png?m=
12 | https://www.dropbox.com/s/dx5s3zbntfsis04/Screenshot%202014-03-31%2017.36.58.png
13 | 
14 | ## How does Rubinius Work?
15 | 
16 | 


--------------------------------------------------------------------------------
/final/final.md:
--------------------------------------------------------------------------------
  1 | # Ruby Language Optimization Techniques
  2 | 
  3 | NICHOLAS BENDER, Boise State University
  4 | BEN NEELY, Boise State University
  5 | JOHN OTANDER, Boise State University
  6 | 
  7 | The Ruby programming language has experienced a recent period of intense adoption and growth due to its excellent speed of iteration, elegant syntax, and passionate community. Additionally, the popular web framework, Ruby on Rails,  has given the Ruby language exceptional legitimacy, especially in the prototyping, startup space. It is a tool that emphasizes developer happiness, productivity, and places the responsibility of program in the developer's hands. This gives the language a lot of power, but can serve as a double-edged sword. When leveraged incorrectly, projects can swiftly become inefficient and unmaintainable. Additionally, allowing this flexibility has serious implications with memory management, efficiency, and execution times.
  8 | 
  9 | While support is growing steadily for the language, it is largely dismissed as not having effective scalability, and having far slower runtimes than more compiled, strongly-typed languages. In this article, we propose that many sophisticated techniques exist to enhance Ruby’s performance both in using existing runtimes to compile ruby to statically typed languages, and in using common anti-patterns to improve performance natively. Through experimentation and thorough research we conclude that Ruby performs competitively against it’s similar scripting language counterparts, and can see large increases in many cases.
 10 | 
 11 | __Categories and Subject Descriptors:__ D.2.3 [Coding Tools and Techniques]: Object-oriented programming, B.6.3 [Design Aids]: Optimization
 12 | __General Terms:__ Optimization, Algorithms, Performance
 13 | __Additional Key Words and Phrases:__ Ruby, Web Development, JRE, C++, C
 14 | 
 15 | ## INTRODUCTION
 16 | 
 17 | Ruby is an object oriented, dynamically-typed, high-level scripting language. It is a programming language that was written for humans and just happens to run on computers. It's intended to promote developer happiness through simplicity, elegant libraries, and terse, readable syntax. Ruby also uses duck typing, meaning type is determined through methods and properties. With each of these techniques and language features there exist certain sacrifices. In this exploration we will conclude that the best practices for stable, performant Ruby programs exist by utilizing the newest versions of the core language properly.
 18 | 
 19 | In recent years, the Ruby programming language has grown its community and established itself as a valuable, popular tool for many tasks [O’Donoghue, 2014]. The success of Ruby on Rails as a prototyping framework, as well as a full-stack solution for some larger companies, has brought forth a myriad of techniques to ensure that the language’s speed differences compared to similar languages are minimal. Ruby’s slower performance, as compared to C or Java, is attributed to interpreted execution, dynamic typing, meta-programming support, and the Global Interpreter Lock [Odaira, Castanos, and Tomari, 2014]. This increase in popularity has caused a large number independent optimization efforts to arise from large corporations such as IBM and AT&T, as well as efforts from the Ruby open-source community.
 20 | 
 21 | > When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck. <br>_- Heim, Michael (2007)._
 22 | 
 23 | ## 1. MRI (> 1.9)
 24 | 
 25 | The MRI is short for Matz's Ruby Interpreter, which is sometimes also referred to as CRuby. The MRI is named after Yukihiro Matsumoto, the chief designer of the Ruby language. The original MRI was the runtime environment from Ruby's inception to 1.8.7.
 26 | 
 27 | ### 1.1 Program Execution at a High Level
 28 | 
 29 | ```
 30 | | --------- |
 31 | |   Ruby    |
 32 | | --------- |
 33 | |  Tokens   |
 34 | | --------- |
 35 | | AST Nodes |
 36 | | --------- |
 37 |      |
 38 |      | Interpret
 39 |      |
 40 |      v
 41 | | --------- |
 42 | |     C     |
 43 | | --------- |
 44 | |  Machine  |
 45 | |  Language |
 46 | | --------- |
 47 | ```
 48 | <div class="figure">Fig. 1. Ruby Program Execution</div>
 49 | 
 50 | A Ruby script undergoes a tokenization step, which is then parsed into an Abstract Syntax Tree. The Ruby C code (MRI), reads and executes the AST. Note that there is no compilation or translation step.
 51 | 
 52 | ### 1.2 Performance
 53 | 
 54 | Since there isn't a bytecode compilation step, the execution of Ruby programs requires walking the MRI's internal Abstract Syntax Tree. This slows the execution speed significantly because it's more costly to interpret the AST data structure during runtime.
 55 | 
 56 | ![ch_abstract_syntree](https://cloud.githubusercontent.com/assets/1424573/2803533/3c359946-cc9d-11e3-9b35-217ccda504df.png)
 57 | <div class="figure">Fig. 2. Abstract Syntax Tree <http://edwinmeyer.com/Release_Integrated_RHG_09_10_2008/intro.html></div>
 58 | 
 59 | ### 1.3 Optimizations
 60 | 
 61 | Use receiver methods whenever possible because it avoids the allocation of a copied string.
 62 | 
 63 | ```
 64 | 2.1.1 :003 > str = "A string.\n"
 65 |  => "A string.\n"
 66 | 2.1.1 :004 > str2 = str
 67 |  => "A string.\n"
 68 | 2.1.1 :005 > str.chomp!
 69 |  => "A string."
 70 | 2.1.1 :006 > str2
 71 |  => "A string."
 72 | 2.1.1 :007 >
 73 | ```
 74 | <div class="figure">Fig. 3. Receiver modifying methods vs receiver duplicating methods</div>
 75 | 
 76 | ### 1.4 Summary
 77 | 
 78 | The initial implementation of the MRI is one of the primary reasons that Ruby get its "bad wrap" for code execution speed.
 79 | 
 80 | ## 2. JRUBY
 81 | 
 82 | ### 2.1 Purpose
 83 | 
 84 | Jruby endeavors to solve many Ruby performance issues by eliminating the standard interpreter and instead taking ruby syntax and compiling as much of the core libraries as possible to Java bytecode. Current versions of JRuby support both just-in-time compilation as well as ahead-of-time compilation to Java bytecode. In using these various stages of bytecode in addition to some portions of the standard interpreter, this allows for several advantages over the standard interpreter.
 85 | 
 86 | One of the more obvious improvements is the ability to call and use standard Java libraries and classes from within ruby projects. For larger organizations already using Java for core library support, this allows for improved flexibility of the development environment.
 87 | 
 88 | ### 2.2. Performance
 89 | 
 90 | In 2007, JRuby’s overall performace was compared with Ruby 1.8.5, the Yarv interpreter (now merged into Ruby’s official interpreter), and Rubinius. In it, only 10% of tests performed had JRuby outperforming standard Ruby. These speed enhancements, however, still managed to run all Ruby benchmarks without timing out or producing an error, a claim that no other non-standard Ruby implementation could make.
 91 | 
 92 | Recent benchmarks performed in 2014 between the latest implementations of JRuby and Ruby are comparable to standard Ruby. While some benchmarks provided an optimized runtime, the increased memory overhead of JRuby (>10x) makes scaling ruby applications problematic.
 93 | 
 94 | In addition to JRuby's memory woes, the biggest performance downside of JRuby comes from the speed of initializing the JVM to begin with. A simple ruby script that would take the MRI a fraction of a second to run would require several additional seconds just due to JVM launch times.
 95 | 
 96 | ### 2.3 Lack of C Support
 97 | 
 98 | While JRuby allows for enhanced support and compatibility with Java libraries and applets, the majority of Ruby users (especially those using Ruby on Rails) are used to using libraries that contain native C support. In choosing to support Java, JRuby forces the incompatibility with native C extensions. Most notably are a variety of database interfaces and web servers.
 99 | 
100 | ### 2.4. Development Lag
101 | 
102 | Due to JRuby’s implementation being dependent on Ruby releases prior to implementation and support, this has created an unfortunately long lag time, with the most recent release of JRuby only supporting Ruby version 1.9.3, which was initially released in 2011.
103 | 
104 | ### 2.5 Summary
105 | 
106 | While JRuby does offer some improved benchmark performance in a minority of cases, the slow development cycle and potential for a massive increase to memory footprint make it an unsuitable option for pure ruby development stacks.
107 | 
108 | ## 3. Rubinius
109 | 
110 | ### 3.1 Purpose
111 | 
112 | Rubinius is an implementation of the Ruby programming language and includes a bytecode virtual machine, Ruby syntax parser, bytecode compiler, generational garbage collector, just-in-time (JIT) native machine code compliler, and Ruby Core and Standard Libraries. Rubinius is written using Ruby and C++.
113 | 
114 | ### 3.2 History
115 | 
116 | Rubinius was originally created to be a Ruby virtual machine and runtime written in pure ruby. The current ruby interpreter is primarily writen in non-Ruby langauges such as C. From 2007 to 2013, the software company Engine Yard was a primary backer of Rubinius. During that time the focus of Rubinius evolved from creating a completely bootstrapped Ruby VM to instead offering an implementation of Ruby with increased performance. Under this new direction, Rubinius partially abandoned the idea of bootstrapping the Ruby VM in all Ruby code, and instead sought to use C++ to increase performance and establish Rubinius as the fastest Ruby implementation. Recently Rubinius has focused on supporting concurrency and multi-threading.
117 | 
118 | ### 3.3 Performance
119 | 
120 | Rubinius initially achieved performance equal or slightly better to that of the Yarv interpreter. However, in recent years the MRI interpreter has consistently out performed Rubinius on most benchmark tests.
121 | 
122 | Rubinius consistently benchmarks as one of the slowest modern implementations of the Ruby language.
123 | 
124 | ### 3.4 Concurrency
125 | 
126 | Rubinius does outperform the MRI in threading and concurrency benchmark tests. As shown in the figure bellow, Rubinius (represented by rbx-2.0.0) has a nontrivial advantage over MRI and other Ruby implementations when exciting multithreaded code.
127 | Rubinius is unique amongst Ruby implementations in that it does not have Global Interpreter Lock (GIL). The GIL in all other Reuby implementation allows only one thread to execute at at a time, no matter how many processor cores are available. Not implementing the GIL gives Rubinius the ability to support true threading
128 | 
129 | ### 3.5 Summary
130 | 
131 | Rubinius’ development has been spotty, depending heaving on a few developers and a few corporate sponsors. As a result Rubinius has constantly shifted focus. Rubinius currently offers a significant advantage over other Ruby interpreters only with regards to programming involving threading and concurrency. For all other uses, the standard MRI Ruby interpreter is faster and more consistently supported.
132 | 
133 | ## 4. YARV
134 | 
135 | ### 4.1 Background
136 | 
137 | ```
138 | CODE => TOKENIZATION => PARSE TREE => COMPILATION => YARV INSTRUCTIONS
139 | ```
140 | 
141 | When a Ruby program is executed, it first tokenizes the program. This means that the contents are converted into a collection of tokens with associated types.  Ruby uses the LALR (Look-Ahead Left Reversed Rightmost Derivation) Parser to apply meaning to the tokens and construct the Abstract Syntax Tree. The compilation step was introduced with Ruby 1.9, and is where the YARV (Yet Another Ruby Virtual Machine) comes into play. It translates the code into bytecode, or YARV instructions.
142 | 
143 | ```
144 | ~|||$ irb
145 | 2.1.1 :001 > code = <<CODE
146 | 2.1.1 :002"> puts 1 + 2
147 | 2.1.1 :003"> CODE
148 |  => "puts 1 + 2\n"
149 | 2.1.1 :004 > puts RubyVM::InstructionSequence.compile(code).disasm
150 | == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========
151 | 0000 trace            1                                               (   1)
152 | 0002 putself
153 | 0003 putobject_OP_INT2FIX_O_1_C_
154 | 0004 putobject        2
155 | 0006 opt_plus         <callinfo!mid:+, argc:1, ARGS_SKIP>
156 | 0008 opt_send_simple  <callinfo!mid:puts, argc:1, FCALL|ARGS_SKIP>
157 | 0010 leave
158 |  => nil
159 | ```
160 | <div class="figure">Fig. 4. YARV instructions for a simple program</div>
161 | 
162 | The introduction of the compilation step and YARV have significantly helped the execution speed of Ruby programs. However, there's always room for more improvements.
163 | 
164 | ### 4.2 Purpose
165 |  
166 | The Ruby MRI is short for Matz's Ruby Interpreter, and is the reference implementation for the Ruby programming language. It was released to the public in 1995, and is still actively developed, with the latest stable build being Ruby 2.1.1.
167 | 
168 | The YARV is an interpreter developed by Koichi Sasada that's also known as the KRI. It was developed in order to reduce the execution time of Ruby programs, and was very successful. As a result, YARV was merged into Ruby 1.9.0 and has replaced the MRI.
169 | 
170 | As the default interpreter for the Ruby programming language, the MRI has received it's fair share of criticism, primarily due to it's execution speeds and memory consumption. However, recent Ruby versions have seen significant enhancements, and is on par with similar scripting languages like Python. Not to mention the fact that some comparisons have the audacity to compare a compiled language to a scripting language, which is apples to oranges. Developers typically choose Ruby for it's ease of writing/prototyping, understanding the fact that its execution time will always be significantly slower than its compiled counterparts.
171 | 
172 | That being said, there are numerous methods and best practices that developers can follow in order to ensure that they're avoiding unnecessary bottlenecks.
173 | 
174 | ### 4.3 Performance Out of the Box
175 | 
176 | Thanks to the introduction of YARV, vanilla Ruby, on a single thread, has the ability to outperform other alternative Ruby implementations. Consider the following figure, that measures Rails requests per second.
177 | 
178 | ![screen shot 2014-05-02 at 6 22 04 pm](https://cloud.githubusercontent.com/assets/1424573/2869079/0305431c-d259-11e3-8b58-6f2ea6ff23e9.png)
179 | <div class="figure">Fig. 5. Rails requests per second.</div>
180 | 
181 | ### 4.4 Global Interpreter Lock
182 | 
183 | When attempting to optimize execution speed, threads are often utilized in order to process tasks concurrently. This is a feature that Ruby supports, too. However, the MRI/YARV incorporates a Global Interpreter Lock, or GIL, that doesn't permit any true concurrency. A GIL refers to an interpreter thread that doesn't allow code that isn't thread safe to share itself with other threads. This results in little, to no, actual gain in speed when running threads on a multiprocessor machine.
184 | 
185 | The primary reason is that the GIL is used to avoid race conditions within C extensions. There are also thread safety reasons, too. Parts of Ruby aren't thread safe (Hash), and numerous C libraries that are wrapped by Ruby's internals. Additionally, the GIL is integral to data integrity, because it ensures that the developer doesn't write any unsafe threading code.
186 | 
187 | This, interestingly enough, runs contrary to the fundamental principles of the Ruby language, where all the responsibility is laid on the developer. Ruby allows the developer to have the ultimate freedom without hand holding, yet the GIL is just that, hand holding.
188 | 
189 | The GIL isn’t going anywhere. It is deeply intertwined with Ruby and its internals, and many influential Ruby-core figures don't plan on removing the GIL anytime in the near future. Though, this doesn't mean the concurrency can't be achieved.
190 | 
191 | Though, you can sidestep the GIL with multiple virtual machines. Sasada Koichi has proposed a Multiple VM (MVM) solution, which is currently being developed. This would consist of multiple virtual machines, running their own processes, and communicate via sockets.
192 | 
193 | Granted, this is a drastic step away from typical threading, but some proponents believe that traditional threading isn't necessarily the correct paradigm to follow. Especially considering the fact the Ruby leverages green threads above the GIL rather than talking to the OS directly.
194 | 
195 | > Nevertheless, you're right the GIL is not as bad as you would initially think: you just have to undo the brainwashing you got from Windows and Java proponents who seem to consider threads as the only way to approach concurrent activities. Just because Java was once aimed at a set-top box OS that didn't support multiple address spaces, and just because process creation in Windows used to be slow as a dog, doesn't mean that multiple processes (with judicious use of IPC) aren't a much better approach to writing apps for multi-CPU boxes than threads.
196 | 
197 | ### 4.5 Simple Code Enhancements
198 | 
199 | String interpolation is significantly more performant than concatentation because it doesn't need to allocate new strings, it just modifies a single string in place.
200 | 
201 | ```ruby
202 | require 'benchmark'
203 | 
204 | concat_time = Benchmark.measure do
205 |   20000000.times do
206 |     str = 'str1' << 'str2' << 'str3'
207 |   end
208 | end
209 | 
210 | # => #<Benchmark::Tms:0x007fdf9ba49ea8 @label="", @real=79.152523, @cstime=0.0, @cutime=0.0, @stime=0.04000000000000001, @utime=79.11, @total=79.15>
211 | 
212 | interp_time = Benchmark.measure do
213 |   20000000.times do
214 |     str = "#{str1}#{str2}#{str3}"
215 |   end
216 | end
217 | 
218 | # => #<Benchmark::Tms:0x007fdf9b990bd8 @label="", @real=22.713976, @cstime=0.0, @cutime=0.0, @stime=0.009999999999999995, @utime=22.689999999999998, @total=22.7>
219 | ```
220 | <div class="figure">Fig. 6. Interpolation vs concatenation of Ruby Strings</div>
221 | 
222 | The collect|map methods with blocks are faster because it returns a new array rather than an enumerator. This can be leveraged to increase speed when compared to Symbol.to_proc implementations. Though, the latter is typically much more preferable to read. The reason that the Symbol.to_proc is slower is because to_proc is called on the symbol to perform the following conversion:
223 | 
224 | ```
225 | :method.to_proc
226 | # => -> x { x.method }
227 | fake_data = 20.times.map { |t| Fake.new(t) }
228 | 
229 | proc_time = Benchmark.measure do
230 |   200000000.times do
231 |     fake_data.map(&:id)
232 |   end
233 | end
234 | 
235 | #  => #<Benchmark::Tms:0x007fdf9b8b0498 @label="", @real=491.332415, @cstime=0.0, @cutime=0.0, @stime=4.8, @utime=426.06999999999994, @total=430.86999999999995>
236 | ```
237 | 
238 | ```
239 | block_time = Benchmark.measure do
240 |   200000000.times do
241 |     fake_data.map { |d| d.id }
242 |   end
243 | end
244 | 
245 | # => #<Benchmark::Tms:0x007fdf9b931d40 @label="", @real=431.731424, @cstime=0.0, @cutime=0.0, @stime=2.66, @utime=416.21000000000004, @total=418.87000000000006>
246 | ```
247 | 
248 | ```
249 | collect_time = Benchmark.measure do
250 |   200000000.times do
251 |     fake_data.collect { |d| d.id }
252 |   end
253 | end
254 | 
255 | # => #<Benchmark::Tms:0x007fdf9b821518 @label="", @real=386.234513, @cstime=0.0, @cutime=0.0, @stime=1.1800000000000006, @utime=384.28, @total=385.46>
256 |  :037 >
257 | ```
258 | <div class="figure">Fig. 7. Procs vs Blocks vs Collects</div>
259 | 
260 | There are also garbage collection modifications that can be made in order to further optimize Ruby execution speed for most systems.
261 | 
262 | ```
263 | # This is 60(!) times larger than default
264 | RUBY_HEAP_MIN_SLOTS=600000
265 | 
266 | # This is 7 times larger than default
267 | RUBY_GC_MALLOC_LIMIT=59000000
268 | 
269 | # This is 24 times larger than default
270 | RUBY_HEAP_FREE_MIN=100000
271 | ```
272 | <div class="figure">Fig. 8. Garbage Collection Modification</div>
273 | 
274 | ### 4.5 Use Unicorn
275 | 
276 | For Ruby on Rails web applications, a server typically runs on a single process, which means that every request is processed one at a time. This can create a significant bottle neck in your application. Fortunately, there are libraries to incorporate concurrency in your application. One of which is Unicorn.
277 | 
278 | Unicorn uses Unix forks within a dyno (web worker) to create multiple instances of itself. Now, there are multiple OS instances that can all respond to requests, and complete tasks concurrently. This results in smaller queues, quicker responses, and a faster web application as a whole. The only drawback is memory usage, which can grow to large sizes. Though, with decreasing hardware costs, this becomes a worthwhile expenditure to ensure quick development time for the software components. This also doesn't require thread safe code, since each worker is a self-sufficient clone of the parent.
279 | 
280 | Ruby 2.0 makes process forking even more efficient with Unicorn because it implements Copy-on-Write (CoW), which means that a parent and child share physical memory until a write needs to be made. This is a very efficient sharing of resources that can drastically reduce memory use.
281 | 
282 | Sometimes, there are still issues with memory leakage, which occurs when workers get stuck or timeout. With the inclusion of a gem, and a small snippet of code that's included below, these edge cases are covered.
283 | 
284 | ```
285 | if ENV['RAILS_ENV'] == 'production'
286 |   require 'unicorn/worker_killer'
287 | 
288 |   max_request_min =  500
289 |   max_request_max =  600
290 | 
291 |   # Max requests per worker
292 |   use Unicorn::WorkerKiller::MaxRequests, max_request_min, max_request_max
293 | 
294 |   oom_min = (240) * (1024**2)
295 |   oom_max = (260) * (1024**2)
296 | 
297 |   # Max memory size (RSS) per worker
298 |   use Unicorn::WorkerKiller::Oom, oom_min, oom_max
299 | end
300 | 
301 | require ::File.expand_path('../config/environment',  __FILE__)
302 | run YourApp::Application
303 | ```
304 | <div class="figure">Fig. 9. Example Unicorn Implementation</div>
305 | 
306 | ## CONCLUSIONS
307 | 
308 | In this article, we examined a number independent Ruby optimization efforts. Each of these efforts seek to achieve performance improvements through a variety of techniques. In our examination we’ve determined that for each of these techniques there are certain sacrifices, that outweigh the marginal benefits are gained. Unless a particular feature is needed (such as full threading support or inline Java) the best practices for stable, performant Ruby code exist by utilizing the newest versions of the core language.
309 | 
310 | ## ACKNOWLEDGMENTS
311 | 
312 | The authors would like to thank Douglas Wiegley. He knows what he did.
313 | 
314 | ## REFERENCES
315 | 
316 | ROBERT O'DONOGHUE. 2014. Careers Close-up: programmers and software engineers. (March 2014). Retrieved March 31, 2014 http://www.siliconrepublic.com/careers/item/36001-crs-cls-up
317 | 
318 | REI ODAIRA, JOSE G. CASTANOS, HISANOBU TOMARI. 2014. Eliminating Global Interpreter Locks in Ruby through Hardware Transactional Memory. PPoPP’14, February 15-19 2014, Orlando, FL, USA. DOI: http://dx.doi.org/10.1145/2555243.2555247
319 | 
320 | ANTONIO CANGIANO. 2007. The Great Ruby Shootout (December 2007). Retrieved March 31, 2014 http://programmingzen.com/2007/12/03/the-great-ruby-shootout/ 
321 | 
322 | PAT SHAUGHNESSY. 2014. Ruby Under a Microscope: An Illustrated Guide to Ruby Internals
323 | 
324 | BUSSINK DIRKJAN. Rubinius - Tales from the Trenches of Developing a Ruby implementation, Barcelona Ruby Conference, 2012.
325 | 
326 | NUTTER CHARLES. Why JRuby?, Aloha Ruby Conf, 2012.
327 | 
328 | SASADA KOICHI. YARV: Yet Another RubyVM-The Implementation and Evaluation. Transactions of Information Processing Society of Japan. Volume 47. 2006. Pages 57-73.  
329 | 
330 | SASADA KOICHI. YARV: yet another RubyVM: innovating the ruby interpreter. OOPSLA '05 Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. Pages 158-159.  
331 | 
332 | SHAUGHNESSY PAT. Visualizing Garbage Collection in Rubinius, JRuby and Ruby 2.0, Ruby Conference, 2013.
333 | 
334 | YUKIHIRO MATSUMOTO. 2010. From Lisp to Ruby to Rubinius. 
335 | 


--------------------------------------------------------------------------------
/final/the_final.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/johno/ruby_optimization_techniques/27f5528e00a04332f886cef0cc846d54c4bf6111/final/the_final.pdf


--------------------------------------------------------------------------------
/styles.css:
--------------------------------------------------------------------------------
 1 | * {
 2 |   margin-left: 20px;
 3 |   margin-right: 20px;
 4 | }
 5 | 
 6 | h1,
 7 | h2,
 8 | h3,
 9 | h4,
10 | h5,
11 | h6 {
12 |   margin-top: 60px !important;
13 |   font-weight: 600;
14 |   font-family: "Helvetica Neue";
15 | }
16 | 
17 | h1 {
18 |   margin: 40px 40px 40px 20px;
19 | }
20 | 
21 | h2 {
22 |   margin: 20px 20px 20px 20px;
23 | }
24 | 
25 | p {
26 |   line-height: 1.5em;
27 |   font-size: 25px;
28 |   font-family: 'Times New Roman'; 
29 | }
30 | 
31 | pre {
32 |   margin: 10px;
33 |   font-family: Monaco;
34 |   background-color: #eee;
35 |   border-top: 10px solid #ddd;
36 |   border-bottom: 10px solid #ddd;
37 |   padding: 20px;
38 | }
39 | 
40 | blockquote {
41 |   background-color: #eee;
42 |   padding: 20px;
43 |   border-left: 10px solid #ddd;
44 | }
45 | 
46 | .figure,
47 | .figure a {
48 |   color: #888;
49 |   font-size: 10px;
50 |   margin-left: 10px;
51 | }
52 | 
53 | img {
54 |   margin: 10px;
55 | }
56 | 
57 | 


--------------------------------------------------------------------------------