Effective Scala

├── .gitignore ├── fmt.sh ├── .gitattributes ├── proc.sh ├── footer-cn.html.inc ├── footer-jp.html.inc ├── footer-ru.html.inc ├── footer.html.inc ├── publish.sh ├── README.md ├── toc.sh ├── Makefile ├── header.html.inc ├── header-cn.html.inc ├── header-jp.html.inc ├── header-ru.html.inc ├── LICENSE ├── effectivescala-cn.mo ├── effectivescala-ja.mo └── effectivescala.mo /.gitignore: -------------------------------------------------------------------------------- 1 | *.html 2 | *.png 3 | -------------------------------------------------------------------------------- /fmt.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | sed 's;^\.LP $.*$$;

2 | 3 | 4 |

Marius Eriksen, Twitter Inc.
marius@twitter.com (@marius)

[translated by hongjiang(@hongjiang), tongqing(@tongqing)]

5 | 6 |

7 | 8 | .TOC 9 | 10 | 11 |

其他语言

12 | English 13 | 日本語 14 | Русский 15 | 16 | ## 序言 17 | 18 | [Scala][Scala]是Twitter使用的主要应用编程语言之一。很多我们的基础架构都是用scala写的，[我们也有一些大的库](https://github.com/twitter/)支持我们使用。虽然非常有效, Scala也是一门大的语言,经验教会我们在实践中要非常小心。它有什么陷阱？哪些特性我们应该拥抱，哪些应该避开？我们什么时候采用“纯函数式风格”，什么时候应该避免？换句话说：哪些是我们发现的，可以高效地使用这门语言的地方？本指南试图把我们的经验提炼成短文，提供一系列最佳实践。我们使用scala主要创建一些大容量分布式系统服务——我们的建议也偏向于此——但这里的大多建议也应该自然的适用其他系统。这不是定律，但不当的使用应该被调整。 19 | 20 | Scala提供很多工具使表达式可以很简洁。敲的少读的就少，读的少就能更快的读，因此简洁增强了代码的清晰。然而简洁也是一把钝器(blunt tool)也可能起到相反的效果：在考虑正确性之后，也要为读者着想。 21 | 22 | 首先，用Scala编程，你不是在写Java，Haskell或Python；Scala程序不像这其中的任何一种。为了高效的使用语言，你必须用其术语表达你的问题。强制把Java程序转成Scala程序是无用的，因为大多数情况下它会不如原来的。 23 | 24 | 这不是对Scala的一篇介绍，我们假定读者熟悉这门语言。这儿有些学习Scala的资源： 25 | 26 | * [Scala School](https://twitter.github.com/scala_school/) 27 | * [Learning Scala](https://www.scala-lang.org/node/1305) 28 | * [Learning Scala in Small Bites](https://matt.might.net/articles/learning-scala-in-small-bites/) 29 | 30 | 这是一篇“活的”文档，我们会更新它,以反映我们当前的最佳实践，但核心的思想不太可能会变：永远重视可读性；写泛化的代码但不要牺牲清晰度； 31 | 利用简单的语言特性的威力，但避免晦涩难懂（尤其是类型系统）。最重要的，总要意识到你所做的取舍。一门成熟的(sophisticated)语言需要复杂的实现，复杂性又产生了复杂性：之于推理，之于语义，之于特性之间的交互，以及与你合作者之间的理解。因此复杂性是为成熟所交的税——你必须确保效用超过它的成本。 32 | 33 | 玩的愉快。 34 | 35 | ## 格式化 36 | 37 | 代码格式化的规范并不重要，只要它们实用。它的定义形式没有先天的好与坏，几乎每个人都有自己的偏好。然而，对于一贯地采用同一格式化规则的总会增加可读性。已经熟悉某种特定风格的读者不必非要去掌握另一套当地习惯，或译解另一个角落里的语言语法。 38 | 39 | 这对Scala来说也特别重要，因为它的语法高度的重叠。一个例子是方法调用：方法调用可以用"."后边跟圆括号，或不使用"."，后边用空格加不带圆括号(针对空元或一元方法)方式调用。此外，不同风格的方法调用揭露了它们在语法上不同的分歧(ambiguities)。当然一致的应用慎重的选择一组格式化规则，对人和机器来说都会消除大量的歧义。 40 | 41 | 我们依着[Scala style guide](https://docs.scala-lang.org/style/) 增加了以下规则： 42 | 43 | ### 空格 44 | 45 | 用两个空格缩进。避免每行长度超过100列。在两个方法、类、对象定义之间使用一个空白行。 46 | 47 | ### 命名 48 | 49 |

对作用域较短的变量使用短名字：

is, js 和 ks等可出现在循环中。

对作用域较长的变量使用长名字：

外部APIs应该用长的，不需加以说明便可理解的名字。例如：Future.collect 而非 Future.all 54 |

使用通用的缩写，避开隐秘难懂的缩写：

例如每个人都知道 ok,err, defn等缩写的意思，而sfri是不常用的。

不要在不同用途时重用同样的名字：

使用val(注：Scala中的不可变类型)

避免用 `声明保留字变量：

用typ替代 `type`

用主动语态(active)来命名有副作用的操作：

user.activate()而非 user.setActive()

对有返回值的方法使用具有描述性的名字：

src.isDefined 而非src.defined

getters不采用前缀get：

用get是多余的: site.count而非site.getCount

不必重复已经被package或object封装过的名字：

使用： 69 |

object User {
  70 |   def get(id: Int): Option[User]
  71 | }

72 | 而非： 73 |

object User {
  74 |   def getUser(id: Int): Option[User]
  75 | }

76 | 相比 get 方法 getUser 方法中的User是多余的，并不能提供额外的信息。 77 |

79 | 80 | 81 | 82 | ### Imports 83 | 84 |

对引入行按字母顺序排序：

这样既方便了视觉上的检查，也简化了自动操作。

当从一个包中引入多个名字时，用花括号：

import com.twitter.concurrent.{Broker, Offer}

当引入超过6个名字时使用通配符：

e.g.: import com.twitter.concurrent._ 91 |
不要轻率的使用: 一些包导入了太多的名字

当引入集合的时候，通过用import scala.collections.immutable(不可变集合)或scala.collections.mutable(可变集合)来限定名称 93 |

可变和不可变集合有相同的名字。限定名称让读者很明确知道使用的是哪个变量(e.g. "immutable.Map")

(译注，通常也会默认immutable，而在使用mutable时显式引入)

不要使用来自其它包的相对引用：

避免

import com.twitter
  97 | import concurrent

而应该用清晰的：

import com.twitter.concurrent

(译注，实际上上面的import不能编译通过，第二个import应该为：import twitter.concurrent 98 | 即import一个包实际是定义了这个包的别名。)

将import放在文件的顶部：

读者可以在一个地方参考所有的引用。

102 | 103 | ### 花括号 104 | 105 | 花括号用于创建复合表达式，复合表达式的返回值是最后一个表达式。避免对简单的表达式采用花括号；写成： 106 | 107 | def square(x: Int) = x*x 108 | 109 | .LP 而不是： 110 | 111 | def square(x: Int) = { 112 | x * x 113 | } 114 | 115 | .LP 尽管第二种方式用在区分方法体的语句构成很诱人。第一种选择更加简洁，易读。避免语句上的繁文缛节，除非需要阐明。 116 | 117 | ### 模式匹配 118 | 119 | 尽可能直接在函数定义的地方使用模式匹配。例如，下面的写法 match应该被折叠起来(collapse) 120 | 121 | list map { item => 122 | item match { 123 | case Some(x) => x 124 | case None => default 125 | } 126 | } 127 | 128 | .LP 用下面的写法替代： 129 | 130 | list map { 131 | case Some(x) => x 132 | case None => default 133 | } 134 | 135 | .LP 它很清晰的表达了 list中的元素都被映射，间接的方式让人不容易明白。 136 | 137 | ### 注释 138 | 139 | 使用[ScalaDoc](https://wiki.scala-lang.org/display/SW/Scaladoc)提供API文档。用下面的风格： 140 | 141 | /** 142 | * ServiceBuilder builds services 143 | * ... 144 | */ 145 | 146 | .LP 而不是非标准的ScalaDoc风格： 147 | 148 | /** ServiceBuilder builds services 149 | * ... 150 | */ 151 | 152 | 不要诉诸于ASCII码艺术或其他可视化修饰。用文档记录APIs但不要添加不必要的注释。如果你发现你自己添加注释解释你的代码行为，先问问自己是否可以调整代码结构，从而可以明显地可以看出它做了什么。相对于“it works, obviously” 更偏向于“obviously it works” 153 | 154 | ## 类型和泛型 155 | 156 | 类型系统的首要目的是检测程序错误。类型系统有效的提供了一个静态检测的有限形式，允许我们代码中明确某种类型的变量并且编译器可以验证。类型系统当然也提供了其他好处，但错误检测是他存在的理由(Raison d’Être) 157 | 158 | 我们使用类型系统应当反映这一目标，但我们必须考虑到读者（译注：读你代码的人）：明智地使用类型可以增加清晰度，而过份聪明只会迷乱。 159 | 160 | Scala的强大类型系统是学术探索和实践共同来源(例如[Type level programming in Scala](http://apocalisp.wordpress.com/2010/06/08/type-level-programming-in-scala/)) 。但这是一个迷人的学术话题，这些技术很少在应用和正式产品代码中使用。它们应该被避免。 161 | 162 | ### 返回类型注解(annotation) 163 | 164 | 尽管Scala允许返回类型是可以省略的，加上它们提供了很好的文档：这对public方法特别重要。而当一个方法不需要对外暴露，并且它的返回值类型是显而易见的时候，则可以直接省略。 165 | 166 | 在使用混入(mixin)实例化对象时这一点尤其重要，Scala编译器为这些对象创造了单类。例如： 167 | 168 | trait Service 169 | def make() = new Service { 170 | def getId = 123 171 | } 172 | 173 | .LP 上面的make不需要定义返回类型为Service；编译器会创建一个加工过的类型: Object with Service{def getId:Int}(译注:with是Scala里的mixin的语法)。若用一个显式的注释： 174 | 175 | def make(): Service = new Service{} 176 | 177 | 现在作者则不必改变make方法的公开类型而随意的混入(mix in) 更多的特质(traits)，使向后兼容很容易实现。 178 | 179 | ### 变型 180 | 181 | 变型(Variance)发生在泛型与子类型化(subtyping)结合的时候。与容器类型的子类型化有关，它们定义了对所包含的类型如何子类型化。因为Scala有声明点变型(declaration site variance)注释（annotation），公共库的作者——特别是集合——必须有丰富的注释器。这些注释对共享代码的可用性很重要，但滥用也会很危险。 182 | 183 | 不可变(invariants)是Scala类型系统中高级部分，但也是必须的一面，因为它有助于子类型化的应用，应该广泛(并且正确)地使用。 184 | 185 | *不可变(Immutable)集合应该是协变的(covariant)*。接受容器化类型得方法应该适当地降级(downgrade)集合： 186 | 187 | trait Collection[+T] { 188 | def add[U >: T](other: U): Collection[U] 189 | } 190 | 191 | *可变(mutable)集合应该是不可变的(invariant)*. 协变对于可变集合是典型无效的。考虑： 192 | 193 | trait HashSet[+T] { 194 | def add[U >: T](item: U) 195 | } 196 | 197 | .LP 和下面的类型层级： 198 | 199 | trait Mammal 200 | trait Dog extends Mammal 201 | trait Cat extends Mammal 202 | 203 | .LP 如果我现在有一个狗(dog)的 HashSet： 204 | 205 | val dogs: HashSet[Dog] 206 | 207 | .LP 把它作为一个哺乳动物的Set，增加一只猫(cat) 208 | 209 | val mammals: HashSet[Mammal] = dogs 210 | mammals.add(new Cat{}) 211 | 212 | .LP 这将不再是一个只存储狗(dog)的HashSet！ 213 | 214 | 218 | 219 | ### 类型别名 220 | 221 | 类型别名应当在其提供了便捷的命名或阐明意图时使用，但对于自解释（不言自明）的类型不要使用类型别名。比如 222 | 223 | () => Int 224 | 225 | .LP 比下面定义的别名IntMarker更清晰 226 | 227 | type IntMaker = () => Int 228 | IntMaker 229 | 230 | .LP 但，下面的别名: 231 | 232 | class ConcurrentPool[K, V] { 233 | type Queue = ConcurrentLinkedQueue[V] 234 | type Map = ConcurrentHashMap[K, Queue] 235 | ... 236 | } 237 | 238 | .LP 是有用的，因为它表达了目的并更加简短。 239 | 240 | 当使用类型别名的时候不要使用子类型化(subtyping) 241 | 242 | trait SocketFactory extends (SocketAddress => Socket) 243 | 244 | .LP SocketFactory 是一个生产Socket的方法。使用一个类型别名更好： 245 | 246 | type SocketFactory = SocketAddress => Socket 247 | 248 | .LP 我们现在可以对 SocketFactory类型的值提供函数字面量(function literals) ,也可以使用函数组合： 249 | 250 | val addrToInet: SocketAddress => Long 251 | val inetToSocket: Long => Socket 252 | 253 | val factory: SocketFactory = addrToInet andThen inetToSocket 254 | 255 | 类型别名通过用 package object 将名字绑定在顶层: 256 | 257 | package com.twitter 258 | package object net { 259 | type SocketFactory = (SocketAddress) => Socket 260 | } 261 | 262 | 注意类型别名不是新类型——他们等价于在语法上用别名代替了原类型。 263 | 264 | ### 隐式转换 265 | 266 | 隐式转换是类型系统里一个强大的功能，但应当谨慎地使用。它们有复杂的解决规则，使得通过简单的词法检查领会实际发生了什么很困难。在下面的场景使用隐式转换是OK的： 267 | 268 | * 扩展或增加一个Scala风格的集合 269 | * 适配或扩展一个对象(pimp my library模式）（译注参见：http://www.artima.com/weblogs/viewpost.jsp?thread=179766) 270 | * 通过提供约束证据来加强类型安全。 271 | * 提供了类型的证据 (typeclassing，haskell中的概念，指定义一组函数，其实现因所给的数据类型不同而不同) 272 | * 用于Manifests (注：Manifest[T]包含类型T的运行时信息) 273 | 274 | 如果你发现自己在用隐式转换，总要问问自己是否不使用这种方式也可以达到目的。 275 | 276 | 不要使用隐式转换对两个相似的数据类型做自动转换(例如，把list转换为stream);显示地做更好，因为不同类型有不同的语意，读者应该意识到这些含义。 277 | 译注： 1）一些单词的意义不同，但翻译为中文时可能用的相似的词语，比如mutable， Immutable 这两个翻译为可变和不可变，它们是指数据的可变与不可变。 278 | variance, invariant 也翻译为可变和不可变，（variance也翻译为“变型”），它们是指类型的可变与不可变。variance指支持协变或逆变的类型，invariant则相反。 279 | 280 | ## 集合 281 | 282 | Scala有一个非常通用，丰富，强大，可组合的集合库；集合是高阶的(high level)并暴露了一大套操作方法。很多集合的处理和转换可以被表达的简洁又可读，但不审慎地用它们的功能也会导致相反的结果。每个Scala程序员应该阅读集合设计文档；通过它可以很好地洞察集合库，并了解设计动机。 283 | 284 | 总使用最简单的集合来满足你的需求 285 | 286 | ### 层级 287 | 288 | 集合库很大：除了精心设计的层级(Hierarchy)——根是 Traversable[T] —— 大多数集合都有不可变(immutable)和可变(mutable)两种变体。无论其复杂性，下面的图表包含了可变和不可变集合层级的重要差异。 289 | 290 |

291 | .cmd 292 | pic2graph -format png >coll.png < "one", 2 -> "two", 3 -> "three") 331 | 332 | .LP 这种风格从语意上分离了集合与它的实现，让集合库使用更适当的类型：你需要Map，而不是必须一个红黑树(Red-Black Tree，注：红黑树TreeMap是Map的实现者) 333 | 此外，默认的构造函数通常使用专有的表达式，例如：Map() 将使用有3个成员的对象(专用的Map3类)来映射3个keys。 334 | 335 | 上面的推论是：在你自己的方法和构造函数里，适当地接受最宽泛的集合类型。通常可以归结为Iterable, Seq, Set, 或 Map中的一个。如果你的方法需要一个 sequence，使用 Seq[T]，而不是List[T] 336 | 337 | 341 | 342 | ### 风格 343 | 344 | 函数式编程鼓励使用流水线转换将一个不可变的集合塑造为想要的结果。这常常会有非常简明的方案，但也容易迷糊读者——很难领悟作者的意图，或跟踪所有隐含的中间结果。例如，我们想要从一组语言中汇集不同的程序语言的投票，按照得票的顺序显示(语言，票数)： 345 | 346 | val votes = Seq(("scala", 1), ("java", 4), ("scala", 10), ("scala", 1), ("python", 10)) 347 | val orderedVotes = votes 348 | .groupBy(_._1) 349 | .map { case (which, counts) => 350 | (which, counts.foldLeft(0)(_ + _._2)) 351 | }.toSeq 352 | .sortBy(_._2) 353 | .reverse 354 | 355 | .LP 上面的代码简洁并且正确，但几乎每个读者都不能理解作者的原本意图。一个策略是声明中间结果和参数： 356 | 357 | val votesByLang = votes groupBy { case (lang, _) => lang } 358 | val sumByLang = votesByLang map { case (lang, counts) => 359 | val countsOnly = counts map { case (_, count) => count } 360 | (lang, countsOnly.sum) 361 | } 362 | val orderedVotes = sumByLang.toSeq 363 | .sortBy { case (_, count) => count } 364 | .reverse 365 | 366 | .LP 代码也同样简洁，但更清晰的表达了转换的发生(通过命名中间值)，和正在操作的数据的结构(通过命名参数)。如果你担心这种风格污染了命名空间，用大括号{}来将表达式分组: 367 | 368 | val orderedVotes = { 369 | val votesByLang = ... 370 | ... 371 | } 372 | 373 | 374 | ### 性能 375 | 376 | 高阶集合库（通常也伴随高阶构造）使推理性能更加困难：你越偏离直接指示计算机——即命令式风格——就越难准确预测一段代码的性能影响。然而推理正确性通常很容易；可读性也是加强的。在Java运行时使用Scala使得情况更加复杂，Scala对你隐藏了装箱(boxing)/拆箱(unboxing)操作，可能引发严重的性能或内存空间问题。 377 | 378 | 在关注于低层次的细节之前，确保你使用的集合适合你。确保你的数据结构没有不期望的渐进复杂度。各种Scala集合的复杂性描述在[这儿](http://www.scala-lang.org/docu/files/collections-api/collections_40.html)。 379 | 380 | 性能优化的第一条原则是理解你的应用为什么这么慢。不要使用空数据操作。在执行前分析^[[Yourkit](http://yourkit.com) 是一个很好的profiler]你的应用。关注的第一点是热循环(hot loops) 和大型的数据结构。过度关注优化通常是浪费精力。记住Knuth(高德纳)的格言：“过早优化是万恶之源”。 381 | 382 | 如果是需要更高性能或者空间效率的场景，通常更适合使用低级的集合。对大序列使用数组替代列表(List) (不可变Vector提供了一个指称透明的转换到数组的接口) ，并考虑使用buffers替代直接序列的构造来提高性能。 383 | 384 | ### Java集合 385 | 386 | 使用 scala.collection.JavaConverters 与Java集合交互。它有一系列的隐式转换，添加了asJava和asScala的转换方法。使用它们这些方法确保转换是显式的，有助于阅读： 387 | 388 | import scala.collection.JavaConverters._ 389 | 390 | val list: java.util.List[Int] = Seq(1,2,3,4).asJava 391 | val buffer: scala.collection.mutable.Buffer[Int] = list.asScala 392 | 393 | ## 并发 394 | 395 | 现代服务是高度并发的—— 服务器通常是在10-100秒内并列上千个同时的操作——处理隐含的复杂性是创作健壮系统软件的中心主题。 396 | 397 | *线程*提供了一种表达并发的方式：它们给你独立的，堆共享的(heap-sharing)由操作系统调度的执行上下文。然而，在Java里线程的创建是昂贵的，是一种必须托管的资源，通常借助于线程池。这对程序员创造了额外的复杂，也造成高度的耦合：很难从所使用的基础资源中分离应用逻辑。 398 | 399 | 当创建高度分散(fan-out)的服务时这种复杂度尤其明显：每个输入请求导致一大批对另一层系统的请求。在这些系统中，线程池必须被托管以便根据每一层请求的比例来平衡：一个线程池的管理不善会导致另一个线程池也出现问题。 400 | 401 | 一个健壮系统必须考虑超时和取消，两者都需要引入更多“控制”线程，使问题更加复杂。注意若线程很廉价这些问题也将会被削弱：不再需要一个线程池，超时的线程将被丢弃，不再需要额外的资源管理。 402 | 403 | 因此，资源管理危害了模块化。 404 | 405 | ### Future 406 | 407 | 使用Future管理并发。它们将并发操作从资源管理里解耦出来：例如，Finagle（译注：twitter的一个RFC框架）以有效的方式在少量线程上实现并发操作的复用。Scala有一个轻量级的闭包字面语法(literal syntax)，所以Futures引入了很少的语法开销，它们成为很多程序员的第二本能。 408 | 409 | Futures允许程序员用一种可扩充的，有处理失败原则的声明风格，来表达并发计算。这些特性使我们相信它们尤其适合在函数式编程中用，这也是鼓励使用的风格。 410 | 411 | *更愿意转换（transforming）future而非自己创造*。Future的转换(transformations)确保失败会传播，可以通过信号取消，对于程序员来说不必考虑Java内存模型的含义。甚至一个仔细的程序员会写出下面的代码，顺序地发出10次RPC请求而后打印结果： 412 | 413 | val p = new Promise[List[Result]] 414 | var results: List[Result] = Nil 415 | def collect() { 416 | doRpc() onSuccess { result => 417 | results = result :: results 418 | if (results.length < 10) 419 | collect() 420 | else 421 | p.setValue(results) 422 | } onFailure { t => 423 | p.setException(t) 424 | } 425 | } 426 | 427 | collect() 428 | p onSuccess { results => 429 | printf("Got results %s\n", results.mkString(", ")) 430 | } 431 | 432 | 程序员不得不确保RPC失败是可传播的，代码散布在控制流程中；糟糕的是，代码是错误的！没有声明results是volatile，我们不能确保results每次迭代会保持前一次值。Java内存模型是一个狡猾的野兽，幸好我们可以通过用声明式风格(declarative style)避开这些陷阱： 433 | 434 | def collect(results: List[Result] = Nil): Future[List[Result]] = 435 | doRpc() flatMap { result => 436 | if (results.length < 9) 437 | collect(result :: results) 438 | else 439 | Future.value(result :: results) 440 | } 441 | 442 | collect() onSuccess { results => 443 | printf("Got results %s\n", results.mkString(", ")) 444 | } 445 | 446 | 我们用flatMap顺序化操作，把我们处理中的结果预追加(prepend)到list中。这是一个通用的函数式编程习语的Futures译本。这是正确的，不仅需要的样板代码(boilerplate)可以减少，易出错的可能性也会减少，并且读起来更好。 447 | 448 | *Future组合子(combinators)的使用*。当操作多个futures时，Future.select，Future.join和Future.collect应该被组合编写出通用模式。 449 | 450 | ### 集合 451 | 452 | 并发集合的主题充满着意见、微妙(subtleties)、教条、恐惧/不确定/怀疑(FUD)。在大多实际场景都不存在问题：总是先用最简单,最无聊，最标准的集合解决问题。在你知道不能使用synchronized前不要去用一个并发集合：JVM有着老练的手段来使得同步开销更小，所以它的效率能让你惊讶。 453 | 454 | 如果一个不可变(immutable)集合可行，就尽可能用不可变集合——它们是指称透明的(referentially transparent)，所以在并发上下文推断它们是简单的。不可变集合的改变通常用更新引用到当前值(一个var单元或一个AtomicReference)。必须小心正确地应用：原子型的(atomics)必须重试(retried)，变量(var类型的)必须声明为volatile以保证它们发布(published)到它们的线程。 455 | 456 | 可变的并发集合有着复杂的语义，并利用Java内存模型的微妙的一面，所以在你使用前确定你理解它的含义——尤其对于发布更新(新的公开方法)。同步的集合同样写起来更好：像getOrElseUpdate操作不能够被并发集合正确的实现，创建复合(composite)集合尤其容易出错。 457 | 458 | 469 | 470 | 471 | ## 控制结构 472 | 473 | 函数式风格的程序倾向于需要更少的传统的控制结构，并且使用声明式风格写的程序读起来更好。这通常意味着打破你的逻辑，拆分到若干个小的方法或函数，用匹配表达式(match expression)把他们粘在一起。函数式程序也倾向于更多面向表达式(expression-oriented)：条件分支是同一类型的值计算，for（..) yield 表达式，以及递归都是司空见惯的。 474 | 475 | ### 递归 476 | 477 | *用递归术语来表达你的问题常常会使问题简化*，如果应用了尾递归优化(可以通过@tailrec注释检测)，编译器甚至会将你的代码转换为正常的循环。对比一个标准的命令式版本的堆排序(fix-down): 478 | 479 | def fixDown(heap: Array[T], m: Int, n: Int): Unit = { 480 | var k: Int = m 481 | while (n >= 2*k) { 482 | var j = 2*k 483 | if (j < n && heap(j) < heap(j + 1)) 484 | j += 1 485 | if (heap(k) >= heap(j)) 486 | return 487 | else { 488 | swap(heap, k, j) 489 | k = j 490 | } 491 | } 492 | } 493 | 494 | 每次进入while循环，我们工作在前一次迭代时污染过的状态。每个变量的值是那一分支所进入函数，当找到正确的位置时会在循环中返回。 495 | (敏锐的读者会在Dijkstra的[“Go To声明是有害的”](http://www.u.arizona.edu/~rubinson/copyright_violations/Go_To_Considered_Harmful.html)一文找到相似的观点) 496 | 497 | 考虑尾递归的实现^[From [Finagle's heap balancer](https://github.com/twitter/finagle/blob/master/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer/Heap.scala#L41)]: 498 | 499 | @tailrec 500 | final def fixDown(heap: Array[T], i: Int, j: Int) { 501 | if (j < i*2) return 502 | 503 | val m = if (j == i*2 || heap(2*i) < heap(2*i+1)) 2*i else 2*i + 1 504 | if (heap(m) < heap(i)) { 505 | swap(heap, i, m) 506 | fixDown(heap, m, j) 507 | } 508 | } 509 | 510 | .LP 每次迭代都是一个明确定义的历史清白的变量，并且没有引用单元：到处都是不变的(invariants)。更容易实现，也容易阅读。也没有性能方面的惩罚：因为方法是尾递归的，编译器会转换为标准的命令式的循环。 511 | 512 | 515 | 516 | 517 | ### 返回(Return) 518 | 519 | 并不是说命令式结构没有价值。在很多例子中它们很适合于提前终止计算而非对每个可能终止的点存在一个条件分支：的确在上面的fixDown函数，如果我们已经在堆的结尾，一个return用于提前终止。 520 | 521 | Returns可以用于切断分支和建立不变量(establish invariants)。这减少了嵌套，并且容易推断后续的代码的正确性，从而帮助了读者。这尤其适用于守卫语句(guard clauses)： 522 | 523 | def compare(a: AnyRef, b: AnyRef): Int = { 524 | if (a eq b) 525 | return 0 526 | 527 | val d = System.identityHashCode(a) compare System.identityHashCode(b) 528 | if (d != 0) 529 | return d 530 | 531 | // slow path.. 532 | } 533 | 534 | 使用return增加了可读性 535 | 536 | def suffix(i: Int) = { 537 | if (i == 1) return "st" 538 | else if (i == 2) return "nd" 539 | else if (i == 3) return "rd" 540 | else return "th" 541 | } 542 | 543 | .LP 上面是针对命令式语言的，在Scala中鼓励省略return 544 | 545 | def suffix(i: Int) = 546 | if (i == 1) "st" 547 | else if (i == 2) "nd" 548 | else if (i == 3) "rd" 549 | else "th" 550 | 551 | .LP 但使用模式匹配更好： 552 | 553 | def suffix(i: Int) = i match { 554 | case 1 => "st" 555 | case 2 => "nd" 556 | case 3 => "rd" 557 | case _ => "th" 558 | } 559 | 560 | 注意，return会有隐性开销：当在闭包内部使用时。 561 | 562 | seq foreach { elem => 563 | if (elem.isLast) 564 | return 565 | 566 | // process... 567 | } 568 | 569 | .LP 在字节码层实现为一个异常的捕获/声明(catching/throwing)对，用在频繁的执行的代码中，会有性能影响。 570 | 571 | ### for循环和for推导 572 | 573 | for对循环和聚集提供了简洁和自然的表达。它在扁平化(flattening)很多序列时特别有用。for语法通过分配和派发闭包隐藏了底层的机制。这会导致意外的开销和语义；例如： 574 | 575 | for (item <- container) { 576 | if (item != 2) return 577 | } 578 | 579 | .LP 如果容器延迟计算(delays computation)会引起运行时错误，使返回不在本地上下文 (making the return nonlocal) 580 | 因为这些原因，常常更可取的是直接调用foreach, flatMap, map和filter —— 但在其意义清楚的时候使用for。 581 | 582 | ### 要求require和断言(assert) 583 | 584 | 要求(require)和断言(assert)都起到可执行文档的作用。两者都在类型系统不能表达所要求的不变量(invariants)的场景里有用。 585 | assert用于代码假设的不变量(invariants) （内部或外部的）例如：(译注，不变量 invariant 是指类型不可变，即不支持协变或逆变的类型变量) 586 | 587 | val stream = getClass.getResourceAsStream("someclassdata") 588 | assert(stream != null) 589 | 590 | 相反，require用于表达API契约： 591 | 592 | def fib(n: Int) = { 593 | require(n > 0) 594 | ... 595 | } 596 | 597 | ## 函数式编程 598 | 599 | 面向值（value-oriented ）编程有很多优势，特别是用在与函数式编程结构相结合。这种风格强调值的转换（译注：由一个不变的值生成另一个不变的值）而非状态的改变，生成的代码是指称透明的(referentially transparent)，提供了更强的不变型(invariants)，因此容易实现。Case类(也被翻译为样本类)，模式匹配，解构绑定(destructuring bindings)，类型推断，轻量级的闭包和方法创建语法都是这一类的工具。 600 | 601 | ### Case类模拟代数数据类型 602 | 603 | Case类可实现代数数据类型(ADT)编码：它们对大量的数据结构进行建模时有用，用强不变类型(invariants)提供了简洁的代码。尤其在结合模式匹配情况下。模式匹配实现了全面解析提供更强大的静态保护。 604 | (译注：ADTs是Algebraic Data Type代数数据类型的缩写，关于这个概念见我的另一篇[博客](http://hongjiang.info/scala-case-class-and-algebraic-data-type/)) 605 | 606 | 下面是用case类模拟代数数据类型的模式 607 | 608 | sealed trait Tree[T] 609 | case class Node[T](left: Tree[T], right: Tree[T]) extends Tree[T] 610 | case class Leaf[T](value: T) extends Tree[T] 611 | 612 | 类型 Tree[T] 有两个构造函器：Node和Leaf。定义类型为sealed(封闭类)允许编译器进行彻底的分析(这是针对模式匹配的，参考Programming in Scala)因为构造器将不能从外部源文件中添加。 613 | 614 | 与模式匹配一同，这个建模使得代码简洁并且显然是正确的(obviously correct) 615 | 616 | def findMin[T <: Ordered[T]](tree: Tree[T]) = tree match { 617 | case Node(left, right) => Seq(findMin(left), findMin(right)).min 618 | case Leaf(value) => value 619 | } 620 | 621 | 尽管一些递归结构，如树的组成是典型的ADTs(代数数据类型)应用，它们的用处领域更大。 622 | disjoint,unions特别容易的用ADTs建模；这些频繁发生在状态机上(state machines)。 623 | 624 | ### Options 625 | 626 | Option类型是一个容器，空(None)或满(Some(value))二选一。它提供了使用null的另一种安全选择，应该尽可能的替代null。它是一个集合(最多只有一个元素)并用集合操作所修饰，尽量用Option。 627 | 628 | 用 629 | 630 | var username: Option[String] = None 631 | ... 632 | username = Some("foobar") 633 | 634 | .LP 代替 635 | 636 | var username: String = null 637 | ... 638 | username = "foobar" 639 | 640 | .LP 因为前者更安全：Option类型静态地强制username必须对空(emptyness)做检测。 641 | 642 | 对一个Option值做条件判断应该用foreach 643 | 644 | if (opt.isDefined) 645 | operate(opt.get) 646 | 647 | .LP 上面的代码应该用下面的方式替代： 648 | 649 | opt foreach { value => 650 | operate(value)} 651 | 652 | 风格可能看起来有些古怪，但更安全，更简洁。如果两种情况都有(Option的None或Some)，用模式匹配 653 | 654 | opt match { 655 | case Some(value) => operate(value) 656 | case None => defaultAction() 657 | } 658 | 659 | .LP 但如果缺少的是缺省值，用getOrElse方法： 660 | 661 | operate(opt getOrElse defaultValue) 662 | 663 | 不要过度使用Option：如果有一个明确的缺省值——一个[*Null对象*](http://en.wikipedia.org/wiki/Null_Object_pattern)——直接用Null而不必用Option 664 | 665 | Option还有一个方便的构造器用于包装空值(nullable value) 666 | 667 | Option(getClass.getResourceAsStream("foo")) 668 | 669 | .LP 得到一个 Option[InputStream] 假定空值(None)时getResourceAsStream会返回null。 670 | 671 | ### 模式匹配 672 | 673 | 模式匹配(x match { …) 在良好的Scala代码中无处不在：用于合并条件执行、解构(destructuring) 、在构造中造型。使用好模式匹配可以增加程序的明晰度和安全性。 674 | 675 | 使用模式匹配实现类型转换： 676 | 677 | obj match { 678 | case str: String => ... 679 | case addr: SocketAddress => ... 680 | 681 | 682 | 模式匹配在和解构(destructuring)联合使用时效果最好（例如你要匹配case类）；下面的写法 683 | 684 | animal match { 685 | case dog: Dog => "dog (%s)".format(dog.breed) 686 | case _ => animal.species 687 | } 688 | 689 | .LP 应该被替代为： 690 | 691 | animal match { 692 | case Dog(breed) => "dog (%s)".format(breed) 693 | case other => other.species 694 | } 695 | 696 | 697 | 写[自定义的抽取器](http://www.scala-lang.org/node/112) (extractor)时必须有双重构造器(译注：成对出现的apply方法与unapply方法)，否则可能是不适合的。 698 | 699 | 当默认的方法更有意义时，对条件执行不要用模式匹配。集合库的方法通常返回Options，避免： 700 | 701 | val x = list match { 702 | case head :: _ => head 703 | case Nil => default 704 | } 705 | 706 | .LP 因为 707 | 708 | val x = list.headOption getOrElse default 709 | 710 | .LP 更短并且更能表达目的。 711 | 712 | ### 偏函数 713 | 714 | Scala提供了定义PartialFunction的语法简写： 715 | 716 | val pf: PartialFunction[Int, String] = { 717 | case i if i%2 == 0 => "even" 718 | } 719 | 720 | .LP 它们也可能和 orElse 组合: 721 | 722 | val tf: (Int => String) = pf orElse { case _ => "odd"} 723 | 724 | tf(1) == "odd" 725 | tf(2) == "even" 726 | 727 | 偏函数出现在很多场景，并以PartialFunction有效地编码，例如方法参数： 728 | 729 | trait Publisher[T] { 730 | def subscribe(f: PartialFunction[T, Unit]) 731 | } 732 | 733 | val publisher: Publisher[Int] = .. 734 | publisher.subscribe { 735 | case i if isPrime(i) => println("found prime", i) 736 | case i if i%2 == 0 => count += 2 737 | /* ignore the rest */ 738 | } 739 | 740 | .LP 或在返回一个Option的情况下： 741 | 742 | // Attempt to classify the the throwable for logging. 743 | type Classifier = Throwable => Option[java.util.logging.Level] 744 | 745 | .LP 可以更好的用PartialFunction表达 746 | 747 | type Classifier = PartialFunction[Throwable, java.util.Logging.Level] 748 | 749 | .LP 因为它提供了更好的可组合性： 750 | 751 | val classifier1: Classifier 752 | val classifier2: Classifier 753 | 754 | val classifier = classifier1 orElse classifier2 orElse { _ => java.util.Logging.Level.FINEST } 755 | 756 | 757 | ### 解构绑定 758 | 759 | 解构绑定与模式匹配有关。它们用了相同的机制，但解构绑定可应用在当匹配只有一种选项的时候 (以免你接受异常的可能)。解构绑定特别适用于元组(tuple)和样本类(case class). 760 | 761 | val tuple = ('a', 1) 762 | val (char, digit) = tuple 763 | 764 | val tweet = Tweet("just tweeting", Time.now) 765 | val Tweet(text, timestamp) = tweet 766 | 767 | ### 惰性赋值 768 | 769 | 当使用lazy修饰一个val成员时，其赋值情况是在需要时才赋值的(by need)，因为Scala中成员与方法是等价的（除了private[this]成员） 770 | 771 | lazy val field = computation() 772 | 773 | .LP 相当于下面的简写: 774 | 775 | var _theField = None 776 | def field = if (_theField.isDefined) _theField.get else { 777 | _theField = Some(computation()) 778 | _theField.get 779 | } 780 | 781 | .LP 也就是说，它在需要时计算结果并会记住结果，在要达到这种目的时使用lazy成员；但当语意上需要惰性赋值时(by semantics)，要避免使用惰性赋值，这种情况下，最好显式赋值因为它使得成本模型是明确的，并且副作用被严格的控制。 782 | 783 | Lazy成员是线程安全的。 784 | 785 | ### 传名调用 786 | 787 | 方法参数可以指定为传名参数 (by-name) 意味着参数不是绑定到一个值，而是一个可能需要反复进行的计算。这一特性需要小心使用；期待传值(by-value)语法的调用者会感到惊讶。这一特性的动机是构造语法自然的 DSLs——使新的控制结构可以看起来更像本地语言特征。 788 | 789 | 只在下面的控制结构中使用传名调用, 调用者明显传递的是一段代码块(block)而非一个确定的计算结果。传名参数必须放在参数列表的最后一位。当使用传名调用时，确保方法名称让调用者明显感知到方法参数是传名参数。 790 | 791 | 当你想要一个值被计算多次，特别是这个计算会引起副作用时，使用显式函数： 792 | 793 | class SSLConnector(mkEngine: () => SSLEngine) 794 | 795 | .LP 这样意图很明确，调用者不会感到惊奇。 796 | 797 | ### `flatMap` 798 | 799 | flatMap——结合了map 和 flatten —— 的使用要特别小心，它有着难以琢磨的威力和强大的实用性。类似它的兄弟 map，它也是经常在非传统的集合中使用的，例如 Future , Option。它的行为由它的（函数）签名揭示；对于一些容器 Container[A] 800 | 801 | flatMap[B](f: A => Container[B]): Container[B] 802 | 803 | .LP flatMap对集合中的每个元素调用了函数 f 产生一个新的集合，将它们全部 flatten 后放入结果中。例如，获取两个字符的字符串的所有排列，相同的字符不能出现两次 804 | 805 | val chars = 'a' to 'z' 806 | val perms = chars flatMap { a => 807 | chars flatMap { b => 808 | if (a != b) Seq("%c%c".format(a, b)) 809 | else Seq() 810 | } 811 | } 812 | 813 | .LP 等价于下面这段更简洁的 for-comprehension （基本就是针对上面的语法糖） 814 | 815 | val perms = for { 816 | a <- chars 817 | b <- chars 818 | if a != b 819 | } yield "%c%c".format(a, b) 820 | 821 | `flatMap`在处理Options常常很有用—— 它将多个options链合并为一个， 822 | 823 | val host: Option[String] = .. 824 | val port: Option[Int] = .. 825 | 826 | val addr: Option[InetSocketAddress] = 827 | host flatMap { h => 828 | port map { p => 829 | new InetSocketAddress(h, p) 830 | } 831 | } 832 | 833 | .LP 也可以使用更简洁的for来实现： 834 | 835 | val addr: Option[InetSocketAddress] = for { 836 | h <- host 837 | p <- port 838 | } yield new InetSocketAddress(h, p) 839 | 840 | 对flatMap在在Futures中的使用futures一节中有讨论。 841 | 842 | ## 面向对象的编程 843 | 844 | Scala的博大很大程度上在于它的对象系统。Scala中所有的值都是对象，就这一意义而言Scala是门纯粹的语言；基本类型和组合类型没有区别。Scala也提供了mixin的特性允许更多正交地、细粒度地构造一些在编译时受益于静态类型检测的可被灵活组装的模块。 845 | 846 | mixin系统的背后动机之一是消除传统的依赖注入。这种“组件风格(component style)”编程的高潮是[the cake pattern](http://jonasboner.com/real-world-scala-dependency-injection-di/). 847 | 848 | ### 依赖注入 849 | 850 | 在我们的使用中，我们发现Scala本身删除了很多经典(构造函数)依赖注入的语法开销，我们更愿意就这样用: 它更清晰，依赖仍然植根于（构造）类型，而类构造语法是如此微不足道而变得轻而易举。有些无聊，简单，但有效。*对模块化编程时使用依赖注入*，特别是，*组合优于继承*—这使得程序更加模块化和可测试的。当遇到需要继承的情况，问问自己：在语言缺乏对继承支持的情况下如何构造程序？答案可能是令人信服的。 851 | 852 | 依赖注入典型的使用到 trait （译注：可以理解为和Java中Interface相似） 853 | 854 | trait TweetStream { 855 | def subscribe(f: Tweet => Unit) 856 | } 857 | class HosebirdStream extends TweetStream ... 858 | class FileStream extends TweetStream .. 859 | 860 | class TweetCounter(stream: TweetStream) { 861 | stream.subscribe { tweet => count += 1 } 862 | } 863 | 864 | 这是常见的注入*工厂* -- 用于产生其他对象的对象。在这些例子中，更青睐用简单的函数而非专有的工厂类型。 865 | 866 | class FilteredTweetCounter(mkStream: Filter => TweetStream) { 867 | mkStream(PublicTweets).subscribe { tweet => publicCount += 1 } 868 | mkStream(DMs).subscribe { tweet => dmCount += 1 } 869 | } 870 | 871 | ### Trait 872 | 873 | 依赖注入不妨碍使用公共接口，或在trait中实现公共代码。恰恰相反—正是因为以下原因而高度鼓励使用trait：一个具体的类可以实现多接口(traits)，公共的代码可以通过这些类复用。 874 | 875 | 保持traits简短并且是正交的：不要把分离的功能混在一个trait里，考虑将最小的相关的意图放在一起。例如，想象一下你要做一些IO的操作： 876 | 877 | trait IOer { 878 | def write(bytes: Array[Byte]) 879 | def read(n: Int): Array[Byte] 880 | } 881 | 882 | .LP 分离两个行为： 883 | 884 | trait Reader { 885 | def read(n: Int): Array[Byte] 886 | } 887 | trait Writer { 888 | def write(bytes: Array[Byte]) 889 | } 890 | 891 | .LP 可以将它们以混入(mix)的方式实现一个IOer : new Reader with Writer...接口最小化促使更好的正交性和更清晰的模块化。 892 | 893 | ### 可见性 894 | 895 | Scala有很丰富的可见性修饰。使用这些可见性修饰很重要，因为它们定义了哪些构成公开API。公开APIs应该限制，这样用户不会无意中依赖实现细节并限制了作者修改它们的能力: 896 | 它们对于好的模块化设计是至关重要的。一般来说，扩展公开APIs比收缩公开的APIs容易的多。差劲的注释（annotation）也能危害到你代码向后的二进制兼容性。（译注：comments和annotation都可翻译成注释，但意义不同。annotation在Java和Scala有特定的含义） 897 | 898 | 899 | #### `private[this]` 900 | 901 | 一个类的成员标记为私有的， 902 | 903 | private val x: Int = ... 904 | 905 | .LP 它对这个类的所有实例来说都是可见的（但对其子类不可见）。大多情况，你想要的是 private[this] 。 906 | 907 | private[this] val: Int = .. 908 | 909 | .LP 这个修饰限制了它只对当前特定的实例可见。Scala编译器会把private[this]翻译为一个简单的字段访问(因为访问仅限于静态定义的类)，这样有时有助于性能优化。 910 | 911 | #### 单例类型 912 | 913 | 在Scala中创建单例类型是很常见的，例如： 914 | 915 | def foo() = new Foo with Bar with Baz { 916 | ... 917 | } 918 | 919 | .LP 在这种情况下，可以通过声明返回类型来限制可见性： 920 | 921 | def foo(): Foo with Bar = new Foo with Bar with Baz { 922 | ... 923 | } 924 | 925 | .LP foo()方法的调用者会看到以返回实例(Foo with Bar)的受限视图。 926 | 927 | ### 结构类型 928 | 929 | 不要在正常情况下使用结构类型。结构类型有着便利且强大的特性，但不幸的是在JVM上的实现不是很高效。 930 | 然而——由于实现的怪癖——它提供了对执行反射(reflection)很好的简写形式。 931 | 932 | val obj: AnyRef 933 | obj.asInstanceOf[{def close()}].close() 934 | 935 | ## 垃圾回收 936 | 937 | 我们在生产中花了很多时间来调整垃圾回收。垃圾回收的关注点与Java大致相似，尽管一些惯用的Scala代码比起惯用的Java代码会容易产生更多(短暂的)垃圾——函数式风格的副产品。Hotspot的分代垃圾收集通常使这不成问题，因为短暂的(short-lived)垃圾在大多情形下会被有效的释放掉。 938 | 939 | 在谈GC调优话题前，先看看[这个](http://www.infoq.com/presentations/JVM-Performance-Tuning-twitter)Attila的报告，它阐述了我们在GC方面的一些经验。 940 | 941 | Scala固有的问题，你能够缓解GC的方法是产生更少的垃圾；但不要在没有数据的情况下行动。除非你做了某些明显的恶化，使用各种Java的profiling工具——我们拥有的包括[heapster](https://github.com/mariusaeriksen/heapster)和[gcprof](https://github.com/twitter/jvmgcprof)。 942 | 943 | 944 | ## Java 兼容性 945 | 946 | 当我们写的Scala代码被Java调用时，我们要确保从Java来用仍然习惯。这常常不需要额外的努力——class和纯的trait等价于Java的中的对应类型 —— 但有时需要提供独立的Java API。一种感受你的库中的Java API好的方式是用Java写单元测试(只是为了兼容性);这也确保了你的库中的Java视图保持稳定，在这一点上不会随着时间因Scala编译器的波动而影响。 947 | 948 | 包含部分实现的Trait不能直接被Java使用：改为 extends 一个抽象类 949 | 950 | // 不能直接被java使用 951 | trait Animal { 952 | def eat(other: Animal) 953 | def eatMany(animals: Seq[Animal) = animals foreach(eat(_)) 954 | } 955 | 956 | // 改为这样: 957 | abstract class JavaAnimal extends Animal 958 | 959 | ## Twitter标准库 960 | 961 | Twitter最重要的标准库是 [Util](http://github.com/twitter/util) 和 [Finagle](https://github.com/twitter/finagle)。Util 可以理解为Scala和Java的标准库扩展，提供了标准库中没有的功能或已有功能的更合适的实现。Finagle 是我们的RPC系统，核心分布式系统组件。 962 | 963 | ### Future 964 | 965 | Futures已经在并发一节中简单讨论过。它是调异步处理的中心机制，渗透在我们代码库中，也是Finagle的核心。Futures允许组合并发事件，简化了高并发操作。也是JVM上异步并发的一种高效的实现。 966 | 967 | Twitter的future是*异步*的，所以基本上任何操作(阻塞操作)——基本上任何可以suspend它的线程的执行；网络IO和磁盘IO是就是例子——必须由系统处理，它为结果提供future。Finagle为网络IO提供了这样一种系统。 968 | 969 | Futures清晰简单：它们持有一个尚未完成运算结果的 promise 。它们是一个简单的容器——一个占位符。一次计算当然可能会失败，这种状况必须被编码：一个Future可以是三种状态之一： pending, failed, completed。 970 | 971 |

972 |

闲话: 组合(composition)

973 |

让我们重新审视我们所说的组合：将简单的组件合成一个更复杂的。函数组合的一个权威的例子：给定函数 f 和 g，组合函数 (g∘f)(x) = g(f(x)) ——结果先对 x使用f函数，然后在使用g函数——用Scala来写：

974 | 975 |

val f = (i: Int) => i.toString
 976 | val g = (s: String) => s+s+s
 977 | val h = g compose f  // : Int => String
 978 |     
 979 | scala> h(123)
 980 | res0: java.lang.String = 123123123

981 | 982 | .LP 复合函数h，是个新的函数，由之前定义的f和g函数合成。 983 |

984 | 985 | Futures是一种集合类型——它是个包含0或1个元素的容器——你可以发现他们有标准的集合方法（eg：map, filter, foreach）。因为Future的值是延迟的，结果应用这些方法中的任何一种必然也延迟；在 986 | 987 | val result: Future[Int] 988 | val resultStr: Future[String] = result map { i => i.toString } 989 | 990 | .LP 函数 { i => i.toString } 不会被调用，直到int值可用；转换集合的resultStr在可用之前也一直是待定状态。 991 | 992 | List可以被扁平化(flattened)： 993 | 994 | val listOfList: List[List[Int]] = .. 995 | val list: List[Int] = listOfList.flatten 996 | 997 | .LP 这对future也是有意义的： 998 | 999 | val futureOfFuture: Future[Future[Int]] = .. 1000 | val future: Future[Int] = futureOfFuture.flatten 1001 | 1002 | .LP 因为future是延迟的，flatten的实现——立即返回——不得不返回一个等待外部future (Future[Future[Int]]) 完成的future (Future[Future[Int]]).如果外部future失败，内部flattened future也将失败。 1003 | 1004 | Future (类似List) 也定义了flatMap；Future[A] 定义方法flatMap的签名 1005 | 1006 | flatMap[B](f: A => Future[B]): Future[B] 1007 | 1008 | .LP 如同组合 map 和 flatten，我们可以这样实现： 1009 | 1010 | def flatMap[B](f: A => Future[B]): Future[B] = { 1011 | val mapped: Future[Future[B]] = this map f 1012 | val flattened: Future[B] = mapped.flatten 1013 | flattened 1014 | } 1015 | 1016 | 这是一种有威力的组合！使用flatMap我们可以定义一个 Future 作为两个Future序列的结果。第二个future 的计算基于第一个的结果。想象我们需要2次RPC调用来验证一个用户身份，我们可以用下面的方式组合操作： 1017 | 1018 | def getUser(id: Int): Future[User] 1019 | def authenticate(user: User): Future[Boolean] 1020 | 1021 | def isIdAuthed(id: Int): Future[Boolean] = 1022 | getUser(id) flatMap { user => authenticate(user) } 1023 | 1024 | .LP 这种组合类型的一个额外的好处是错误处理是内置的：如果getUser(..)或authenticate(..)失败，future 从 isAuthred(..)返回时将会失败。这里我们没有额外的错误处理的代码。 1025 | 1026 | #### 风格 1027 | 1028 | Future回调方法(respond, onSuccess, onFailure, ensure) 返回一个新的Future，并链接到调用者。这个Future被保证只有在它调用者完成后才完成，使用模式如下： 1029 | 1030 | acquireResource() 1031 | future onSuccess { value => 1032 | computeSomething(value) 1033 | } ensure { 1034 | freeResource() 1035 | } 1036 | 1037 | .LP freeResource() 被保证只有在 computeSomething之后才执行，这样就模拟了try-finally 模式。 1038 | 1039 | 使用 onSuccess替代 foreach —— 它与 onFailure 方法对称，命名的意图更明确，并且也允许 chaining。 1040 | 1041 | 永远避免直接创建Promise实例：几乎每一个任务都可以通过使用预定义的组合子完成。这些组合子确保错误和取消是可传播的, 通常鼓励的数据流风格的编程，不再需要同步和volatility声明。 1042 | 1043 | 用尾递归风格编写的代码不再导致堆栈空间泄漏，并使得以数据流风格高效的实现循环成为可能： 1044 | 1045 | case class Node(parent: Option[Node], ...) 1046 | def getNode(id: Int): Future[Node] = ... 1047 | 1048 | def getHierarchy(id: Int, nodes: List[Node] = Nil): Future[Node] = 1049 | getNode(id) flatMap { 1050 | case n@Node(Some(parent), ..) => getHierarchy(parent, n :: nodes) 1051 | case n => Future.value((n :: nodes).reverse) 1052 | } 1053 | 1054 | Future定义很多有用的方法：使用 Future.value() 和 Future.exception() 来创建未满意(pre-satisfied) 的future。Future.collect(), Future.join() 和 Future.select() 提供了组合子将多个future合成一个(例如：scatter-gather操作的gather部分)。 1055 | 1056 | #### Cancellation 1057 | 1058 | Future实现了一种弱形式的取消。调用Future#cancel 不会直接终止运算，而是发送某个级别的可被任何处理查询的触发信号，最终满足这个future。Cancellation信号流向相反的方向：一个由消费者设置的cancellation信号，会传播到它的生产者。生产者使用 Promise的onCancellation来监听信号并执行相应的动作。 1059 | 1060 | 这意味这cancellation语意上依赖生产者，没有默认的实现。cancellation只是一个提示。 1061 | 1062 | #### Local 1063 | 1064 | Util的[Local](https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Local.scala#L40)提供了一个位于特定的future派发树(dispatch tree)的引用单元(cell)。设定一个local的值，使这个值可以用于被同一个线程的Future 延迟的任何计算。有一些类似于thread locals(注：Java中的线程机制），不同的是它们的范围不是一个Java线程，而是一个 future 线程树。在 1065 | 1066 | trait User { 1067 | def name: String 1068 | def incrCost(points: Int) 1069 | } 1070 | val user = new Local[User] 1071 | 1072 | ... 1073 | 1074 | user() = currentUser 1075 | rpc() ensure { 1076 | user().incrCost(10) 1077 | } 1078 | 1079 | .LP 在 ensure块中的 user() 将在回调被添加的时候引用 user local的值。 1080 | 1081 | 就thread locals来说，我们的Locals非常的方便，但要尽量避免使用：除非确信通过显式传递数据时问题不能被充分的解决，哪怕解决起来有些繁重。 1082 | 1083 | Locals有效的被核心库使用在非常常见的问题上——线程通过RPC跟踪，传播监视器，为future的回调创建stack traces——任何其他解决方法都使得用户负担过度。Locals在几乎任何其他情况下都不适合。 1084 | 1085 | ### Offer/Broker 1086 | 1087 | 并发系统由于需要协调访问数据和资源而变得复杂。[Actor](http://www.scala-lang.org/api/current/scala/actors/Actor.html)提出一种简化的策略：每一个actor是一个顺序的进程(process),保持自己的状态和资源,数据通过消息的方式与其它actor共享。共享数据需要actor之间通信。 1088 | 1089 | Offer/Broker 建立于Actor之上，以这三种重要的方式表现：1，通信通道(Brokers)是first class——即发送消息需要通过Brokers，而非直接到actor。2, Offer/Broker 是一种同步机制：通信会话是同步的。这意味我们可以用 Broker作为协调机制：当进程a发送一条信息给进程b；a和b都要对系统状态达成一致。3, 最后，通信可以选择性地执行：一个进程可以提出几个不同的通信，其中的一个将被获取。 1090 | 1091 | 为了以一种通用的方式支持选择性通信（以及其他组合），我们需要将通信的描述和执行解耦。这正是Offer做的——它是一个持久数据用于描述一次通信；为了执行这个通信（offer执行），我们通过它的sync()方法同步 1092 | 1093 | trait Offer[T] { 1094 | def sync(): Future[T] 1095 | } 1096 | 1097 | .LP 返回 Future[T] 当通信被获取的时候生成交换值。 1098 | 1099 | Broker通过offer协调值的交换——它是通信的通道： 1100 | 1101 | trait Broker[T] { 1102 | def send(msg: T): Offer[Unit] 1103 | val recv: Offer[T] 1104 | } 1105 | 1106 | .LP 所以，当创建两个offer 1107 | 1108 | val b: Broker[Int] 1109 | val sendOf = b.send(1) 1110 | val recvOf = b.recv 1111 | 1112 | .LP sendOf和recvOf都同步 1113 | 1114 | // In process 1: 1115 | sendOf.sync() 1116 | 1117 | // In process 2: 1118 | recvOf.sync() 1119 | 1120 | .LP 两个offer都获取并且值1被交换。 1121 | 1122 | 通过将多个offer和Offer.choose绑定来执行可选择通信。 1123 | 1124 | def choose[T](ofs: Offer[T]*): Offer[T] 1125 | 1126 | .LP 上面的代码生成一个新的offer，当同步时获取一个特定的ofs——第一个可用的。当多个都立即可用时，随机获取一个。 1127 | 1128 | Offer对象有些一次性的Offers用于与来自Broker的Offer构建。 1129 | 1130 | Offer.timeout(duration): Offer[Unit] 1131 | 1132 | .LP offer在给定时间后激活。Offer.never将用于不会有效，Offer.const(value)在给定值后立即有效。这些操作由选择性通信来组合是非常有用的。例如，在一个send操作中使用超时： 1133 | 1134 | Offer.choose( 1135 | Offer.timeout(10.seconds), 1136 | broker.send("my value") 1137 | ).sync() 1138 | 1139 | 人们可能会比较 Offer/Broker 与[SynchronousQueue](http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/SynchronousQueue.html)，他们有细微但非常重要的区别。Offer可以被组合，而queue不能。例如，考虑一组queues，描述为 Brokers: 1140 | 1141 | val q0 = new Broker[Int] 1142 | val q1 = new Broker[Int] 1143 | val q2 = new Broker[Int] 1144 | 1145 | .LP 现在让我们为读取创建一个合并的queue 1146 | 1147 | val anyq: Offer[Int] = Offer.choose(q0.recv, q1.recv, q2.recv) 1148 | 1149 | .LP anyq是一个将从第一个可用的queue中读取的offer。注意 anyq 仍是同步的——我们仍然拥有底层队列的语义。这类组合是不可能用queue实现的。 1150 | 1151 | #### 例子：一个简单的连接池 1152 | 1153 | 连接池在网络应用中很常见，并且它们的实现常常需要技巧——例如，在从池中获取一个连接的时候，通常需要超时机制，因为不同的客户端有不同的延迟需求。池的简单原则：维护一个连接队列，满足那些进入的等待者。使用传统的同步原语，这通常需要两个队列(queues)：一个用于等待者(当没有连接可用时)，一个用于连接(当没有等待者时)。 1154 | 1155 | 使用 Offer/Brokers ，可以表达得非常自然： 1156 | 1157 | class Pool(conns: Seq[Conn]) { 1158 | private[this] val waiters = new Broker[Conn] 1159 | private[this] val returnConn = new Broker[Conn] 1160 | 1161 | val get: Offer[Conn] = waiters.recv 1162 | def put(c: Conn) { returnConn ! c } 1163 | 1164 | private[this] def loop(connq: Queue[Conn]) { 1165 | Offer.choose( 1166 | if (connq.isEmpty) Offer.never else { 1167 | val (head, rest) = connq.dequeue 1168 | waiters.send(head) { _ => loop(rest) } 1169 | }, 1170 | returnConn.recv { c => loop(connq enqueue c) } 1171 | ).sync() 1172 | } 1173 | 1174 | loop(Queue.empty ++ conns) 1175 | } 1176 | 1177 | loop总是提供一个归还的连接，但只有queue非空的时候才会send。使用持久化队列(persistent queue)更进一步简化逻辑。与连接池的接口也是通过Offer实现，所以调用者如果愿意设置timeout，他们可以通过利用组合子(combinators)来做： 1178 | 1179 | val conn: Future[Option[Conn]] = Offer.choose( 1180 | pool.get { conn => Some(conn) }, 1181 | Offer.timeout(1.second) { _ => None } 1182 | ).sync() 1183 | 1184 | 实现timeout不需要额外的记账(bookkeeping)；这是因为Offer的语义：如果Offer.timeout被选择，不会再有offer从池中获得——连接池和它的调用者在各自waiter的broker上不必同时同意接受和发送。 1185 | 1186 | #### 埃拉托色尼筛子(Sieve of Eratosthenes 译注：一种用于筛选素数的算法) 1187 | 1188 | 把并发程序构造为一组顺序的同步通信进程，通常很有用——有时程序被大大地简化了。Offer和Broker提供了一组工具来让它简单并一致。确实，它们的应用超越了我们可能认为是经典并发性问题——并发编程(有Offer/Broker的辅助)是一种有用的构建工具，正如子例程(subroutines)，类，和模块都是——来自CSP（译注：Communicating sequential processes的缩写，即通信顺序进程）的重要思想。 1189 | 1190 | 这里有一个[埃拉托色尼筛子](http://ja.wikipedia.org/wiki/%E3%82%A8%E3%83%A9%E3%83%88%E3%82%B9%E3%83%86%E3%83%8D%E3%82%B9%E3%81%AE%E7%AF%A9)可以构造为一个针对一个整数流(stream of integers)的连续的应用过滤器。首先，我们需要一个整数的源(source of integers)： 1191 | 1192 | def integers(from: Int): Offer[Int] = { 1193 | val b = new Broker[Int] 1194 | def gen(n: Int): Unit = b.send(n).sync() ensure gen(n + 1) 1195 | gen(from) 1196 | b.recv 1197 | } 1198 | 1199 | .LP integers(n) 方法简单地提供了从n开始的所有连续的整数。然后我们需要一个过滤器： 1200 | 1201 | def filter(in: Offer[Int], prime: Int): Offer[Int] = { 1202 | val b = new Broker[Int] 1203 | def loop() { 1204 | in.sync() onSuccess { i => 1205 | if (i % prime != 0) 1206 | b.send(i).sync() ensure loop() 1207 | else 1208 | loop() 1209 | } 1210 | } 1211 | loop() 1212 | 1213 | b.recv 1214 | } 1215 | 1216 | .LP filter(in, p) 方法返回的offer删除了in中的所有质数(prime)的倍数。最终我们定义了我们的筛子（sieve）： 1217 | 1218 | def sieve = { 1219 | val b = new Broker[Int] 1220 | def loop(of: Offer[Int]) { 1221 | for (prime <- of.sync(); _ <- b.send(prime).sync()) 1222 | loop(filter(of, prime)) 1223 | } 1224 | loop(integers(2)) 1225 | b.recv 1226 | } 1227 | 1228 | .LP loop() 工作很简单：从of中读取下一个质数，然后对of应用过滤器排除这个质数。loop不断的递归，持续的质数被过滤，于是我们得到了筛选结果。我们现在打印前10000个质数： 1229 | 1230 | val primes = sieve 1231 | 0 until 10000 foreach { _ => 1232 | println(primes.sync()()) 1233 | } 1234 | 1235 | 除了构造简单，组件正交，这种做法也给你一种流式筛子(streaming sieve)：你不需要事先计算出你感兴趣的质数集合，从而进一步提高了模块化。 1236 | 1237 | ## 致谢 1238 | 1239 | 本课程由Twitter公司Scala社区贡献——我希望我是个忠实的记录者。 1240 | 1241 | Blake Matheny, Nick Kallen, Steve Gury, 和Raghavendra Prabhu提供了很多有用的指导和许多优秀的建议。 1242 | 1243 | [Scala]: https://www.scala-lang.org/ 1244 | [Finagle]: https://github.com/twitter/finagle 1245 | [Util]: https://github.com/twitter/util 1246 | -------------------------------------------------------------------------------- /effectivescala-ja.mo: -------------------------------------------------------------------------------- 1 |

2 | 3 | 4 |

Marius Eriksen, Twitter Inc.
marius@twitter.com (@marius)

[translated by Yuta Okamoto (@okapies) and Satoshi Kobayashi (@scova0731)]

5 | 6 |

7 | 8 | .TOC 9 | 10 | 11 |

他の言語

12 | English 13 | Русский 14 | 简体中文 15 | 16 | ## 序章 17 | 18 | [Scala][Scala] は Twitter で使われている主要なアプリケーションプログラミング言語の一つだ。Twitter のインフラの大部分は Scala で書かれているし、我々の業務を支える[大規模ライブラリ](http://github.com/twitter/)をいくつか持っている。Scala は極めて効果的だが、一方で巨大な言語でもある。我々の経験では、Scala の適用には十分な注意が必要だ。落とし穴は何か？どの機能を活用して、どれを控えるべきか？いつ"純粋関数型のスタイル"を採用して、いつ避けるべきか？言い換えるなら、我々が見出した"Scala の効果的 (effective) な使い方"とは何か？本ガイドの目的は、我々の経験から抜き出された一連の*ベストプラクティス*を提供することだ。Twitter では、主に Scala を分散システムを構成する大容量サービス群の作成に使っているので、我々の助言にはバイアスがかかっている。しかし、ここにあるアドバイスの大半は、他の問題領域へ自然に移し替えられるはずだ。これは規則ではない。だから逸脱は正当化されるべきだ。 19 | 20 | Scala が提供するたくさんの道具は、簡潔な表現を可能にする。タイピングが減れば読む量が減り、大抵は読む量が減ればより速く読める。故に、簡潔であるほどより明快になる。しかしまた、簡潔さは正反対の効果をもたらす”なまくら道具”ともなりえる: 正確さの次には、いつも読み手のことを考えよう。 21 | 22 | 何よりも *Scala でプログラムしよう*。君が書いているのは Java ではないし、Haskell でも、Python でもない。Scala のプログラムは、いずれの言語で書かれたものとも異なっている。プログラミング言語を効果的に使うには、君の問題をその言語の用語で表現するべきだ。Java のプログラムを無理矢理 Scala で表現しても、ほとんどの場合オリジナルより劣ったものになるだろう。 23 | 24 | これは Scala の入門ではない。本ガイドは Scala に慣れ親しんだ読者を前提としている。これから Scala を学びたい読者には以下のような教材がある: 25 | 26 | * [Scala School](https://twitter.github.com/scala_school/) 27 | * [Learning Scala](https://www.scala-lang.org/node/1305) 28 | * [Learning Scala in Small Bites](https://matt.might.net/articles/learning-scala-in-small-bites/) 29 | 30 | 本ガイドは生きたドキュメントなので、我々の最新の”ベストプラクティス”を反映するために変更されるかもしれない。しかし、中核となるアイデアが変わることはないだろう: 可読性を常に優先せよ; 汎用的なコードを書き、しかし明瞭さを損なわないこと; シンプルな言語機能を活用せよ。シンプルさは偉大な力をもたらし、また（特に型システムにおける）難解さを回避できる。とりわけ、常にトレードオフを意識しよう。洗練された言語は複雑な実装を要求し、複雑さは複雑さを生む。推論の複雑さ、意味論の複雑さ、機能間相互作用の複雑さ、そして君の協力者への理解の複雑さを。したがって、複雑さは洗練がもたらす税金であり、効用がコストを上回っていることを常に確認すべきだ。 31 | 32 | では、楽しんでほしい。 33 | 34 | ## 書式 35 | 36 | コードの*書式*の細かい話は（それが実践的である限りは）重要ではない。当然だが、スタイルに本質的な良し悪しはないし、個人的な好みはほぼ人によって異なる。しかし、同じ整形ルールを*一貫して*適用すれば、ほとんどの場合で可読性が高まる。特定のスタイルに馴染んだ読み手は、さらに別のローカルな慣習を把握したり、言語の文法の隅を解読したりする必要がない。 37 | 38 | これは、重複度の高い文法を持つ Scala においては特に重要だ。メソッド呼び出しは分かりやすい例だ: メソッドは "`.`" を付けても、ホワイトスペースで空けても呼び出せる。同様に、ゼロまたは一つ引数を取るメソッドでは、丸カッコを付けても良いし付けなくても良い、といった様に。さらに、メソッド呼び出しの様々なスタイルは、文法上の様々な曖昧さをさらけ出す！注意深く選ばれた整形ルールを一貫して適用することで、人間と機械の両方にとっての多くの曖昧さを解決できるのは間違いない。 39 | 40 | 我々は、[Scala style guide](http://docs.scala-lang.org/style/) を順守すると共に以下のルールを追加した。 41 | 42 | ### ホワイトスペース 43 | 44 | インデントは、スペース 2 つとする。100 カラムを超える行は避ける。メソッド、クラス、オブジェクトの定義同士の間に一行空ける。 45 | 46 | ### 命名 47 | 48 |

小さいスコープでは短い名前を使う

ループ内では i, j, k が期待される。

より大きいスコープでは、より長い名前を使う

外部APIは、より長く、説明的で意味付けされた名前を持つべきだ。例えば、Future.all ではなく Future.collect とした方がよい。 53 |

一般的な略語を使い、難解な略語を避ける

ok や err や defn は誰もが知っている。しかし sfri はそれほど一般的ではない。

異なる用途に名前を再利用しない

val を使おう。

予約名を ` でオーバーロードするのは避ける

`type` ではなく typ とする。

副作用を起こす操作は能動態で命名する

user.setActive() ではなく user.activate() とする。

値を返すメソッドは説明的に命名する

src.defined ではなく src.isDefined とする。

ゲッター (getter) の名前の先頭に get を付けない

以前のルールと同様にこれは冗長だ。site.getCount ではなく site.count とする。

パッケージ名やオブジェクト名で既にカプセル化されている名前を繰り返さない

object User {
  68 |   def getUser(id: Int): Option[User]
  69 | }

ではなく 70 |

object User {
  71 |   def get(id: Int): Option[User]
  72 | }

とする。User.getUser は冗長だし、User.get よりも多くの情報を与えない。 73 |

75 | 76 | 77 | ### インポート 78 | 79 |

import 行はアルファベット順に並べ替える

こうすると視覚的に調べやすいし自動化もしやすい。

パッケージから複数の名前をインポートする際は中カッコを使う

import com.twitter.concurrent.{Broker, Offer}

6 個以上の名前をインポートする際はワイルドカードを使う

例: import com.twitter.concurrent._ 86 |
ワイルドカードを濫用しないこと。一部のパッケージは大量の名前をエクスポートする。

コレクションを使う際は、scala.collection.immutable と scala.collection.mutable の一方あるいは両方をインポートして名前を修飾する

可変 (mutable) コレクションと不変 (immutable) コレクションでは名前が重複している。名前を修飾して、どちらのコレクションを使っているか読み手に対して明らかにしよう。 (例: "immutable.Map")

他のパッケージからの相対指定でインポートしない

import com.twitter
  91 | import concurrent

とは書かずに曖昧さの無い書き方をしよう。

import com.twitter.concurrent

import 文はファイルの先頭に置く

全ての import を読み手が一箇所で参照できるようにしよう。

95 | 96 | ### 中カッコ 97 | 98 | 中カッコは複合式を作るのに使われる（"モジュール言語"では他の用途にも使われる）。このとき、複合式の値はリスト中の最後の式だ。単純な式に中カッコを使うのはやめよう。 99 | 100 | def square(x: Int) = x*x 101 | 102 | .LP と書く代わりに、メソッドの本体を構文的に見分けられるように 103 | 104 | def square(x: Int) = { 105 | x * x 106 | } 107 | 108 | .LP と書きたくなるかもしれない。しかし、最初の方がゴチャゴチャしていなくて読みやすい。明確化が目的でないなら仰々しい構文を使うのはやめよう。 109 | 110 | ### パターンマッチ 111 | 112 | 関数定義の中で、パターンマッチを直接使える場合はいつでもそうしよう。 113 | 114 | list map { item => 115 | item match { 116 | case Some(x) => x 117 | case None => default 118 | } 119 | } 120 | 121 | .LP という間接的な書き方では意図がはっきりしない。代わりに match を折り畳んで 122 | 123 | list map { 124 | case Some(x) => x 125 | case None => default 126 | } 127 | 128 | .LP と書くと、リストの要素を写像 (map over) していることが分かりやすい。 129 | 130 | ### コメント 131 | 132 | [ScalaDoc](https://wiki.scala-lang.org/display/SW/Scaladoc) を使って API ドキュメントを提供しよう。以下のスタイルで書こう: 133 | 134 | /** 135 | * ServiceBuilder builds services 136 | * ... 137 | */ 138 | 139 | .LP しかし、標準の ScalaDoc スタイルは使わない方がいい: 140 | 141 | /** ServiceBuilder builds services 142 | * ... 143 | */ 144 | 145 | アスキーアートや視覚的な装飾に頼ってはいけない。また、API ではない不必要なコメントをドキュメント化すべきでない。もし、コードの挙動を説明するためにコメントを追加しているのに気づいたら、まずは、それが何をするコードなのか明白になるよう再構築できないか考えてみよう。”見るからに、それは動作する (it works, obviously)”よりも”明らかにそれは動作する (obviously it works)”方がいい（ホーアには申し訳ないけど^[訳注: [アントニー・ホーア](http://ja.wikipedia.org/wiki/%E3%82%A2%E3%83%B3%E3%83%88%E3%83%8B%E3%83%BC%E3%83%BB%E3%83%9B%E3%83%BC%E3%82%A2)は、自身のチューリング賞受賞講演で*「極めて複雑に設計して”明らかな”欠陥を無くすより、非常に簡素に設計して”明らかに”欠陥が無いようにする方が遥かに難しい」*という趣旨の発言をしている。一方、著者は「コードから実装の意図を一目瞭然に読み取れるようにせよ」という立場であり、つまりホーアの主張とは真逆になる。]）。 146 | 147 | ## 型とジェネリクス 148 | 149 | 型システム (type system) の主な目的は、プログラミングの誤りを検出することだ。型システムは、制限された静的検査を効果的に提供する。これを使うと、コードについてある種の不変条件 (invariant) を記述して、それをコンパイラで検証できる。型システムがもたらす恩恵はもちろん他にもあるが、エラーチェックこそ、その存在理由（レーゾンデートル）だ。 150 | 151 | 我々が型システムを使う場合はこの目的を踏まえるべきだが、一方で、読み手にも気を配り続ける必要がある。型を慎重に使ったコードは明瞭さが高まるが、過剰に巧妙に使ったコードは読みにくいだけだ。 152 | 153 | Scala の強力な型システムは、学術的な探求と演習においてよく題材とされる (例: [Type level programming in 154 | Scala](http://apocalisp.wordpress.com/2010/06/08/type-level-programming-in-scala/))。これらのテクニックは学術的に興味深いトピックだが、プロダクションコードでの応用において有用であることは稀だ。避けるべきだろう。 155 | 156 | ### 戻り型アノテーション 157 | 158 | Scala では戻り型アノテーション (return type annotation) を省略できるが、一方でアノテーションは優れたドキュメンテーションを提供する。特に、public メソッドにとっては重要だ。戻り型が明白で露出していないメソッドの場合は省略しよう。 159 | 160 | これは、ミックスインを使ったオブジェクトのインスタンス化において、Scala コンパイラがシングルトン型を生成する際に特に重要だ。例えば、`make` 関数が: 161 | 162 | trait Service 163 | def make() = new Service { 164 | def getId = 123 165 | } 166 | 167 | .LP Service という戻り型を持たない場合、コンパイラは細別型 (refinement type) の Object with Service{def getId: Int} を生成する。代わりに、明示的なアノテーションを使うと: 168 | 169 | def make(): Service = new Service{} 170 | 171 | `make` の公開する型を変更することなく、トレイトをさらに好きなだけミックスできる。つまり、後方互換性の管理が容易になる。 172 | 173 | ### 変位 174 | 175 | 変位 (variance) は、ジェネリクスが派生型 (subtyping) と結びつく際に現れる。変位は、コンテナ型 (*container* type) の派生型と、要素型 (*contained* type) の派生型がどう関連するかを定義する。Scala では変位アノテーションを宣言できるので、コレクションに代表される共通ライブラリの作者は、多数のアノテーションを扱う必要がある。変位アノテーションは共有コードの使い勝手にとって重要だが、誤用すると危険なものになりうる。 176 | 177 | 非変 (invariant) は高度だが、Scala の型システムにとって必須の特徴であり、派生型の適用を助けるために広く（そして正しく）使われるべきだ。 178 | 179 | *不変コレクションは共変 (covariant) であるべきだ*。要素型を受け取るメソッドは、コレクションを適切に"格下げ"すべきだ: 180 | 181 | trait Collection[+T] { 182 | def add[U >: T](other: U): Collection[U] 183 | } 184 | 185 | *可変コレクションは非変であるべきだ*。共変は、通常は可変コレクションにおいては無効だ。この 186 | 187 | trait HashSet[+T] { 188 | def add[U >: T](item: U) 189 | } 190 | 191 | .LP と、以下の型階層について考えてみよう: 192 | 193 | trait Mammal 194 | trait Dog extends Mammal 195 | trait Cat extends Mammal 196 | 197 | .LP もしここに犬 (Dog) のハッシュセットがあるなら、 198 | 199 | val dogs: HashSet[Dog] 200 | 201 | .LP それを哺乳類 (Mammal) の集合として扱ったり、猫 (Cat) を追加したりできる。 202 | 203 | val mammals: HashSet[Mammal] = dogs 204 | mammals.add(new Cat{}) 205 | 206 | .LP これはもはや、犬の HashSet ではない！ 207 | 208 | 212 | 213 | ### 型エイリアス 214 | 215 | 型エイリアス (type alias) を使うと、便利な名前を提供したり、意味を明瞭にしたりできる。しかし、一目瞭然な型をエイリアスすべきではない。 216 | 217 | () => Int 218 | 219 | .LP は、短くて一般的な型を使っているので、 220 | 221 | type IntMaker = () => Int 222 | IntMaker 223 | 224 | .LP よりも意味が明瞭だ。しかし、 225 | 226 | class ConcurrentPool[K, V] { 227 | type Queue = ConcurrentLinkedQueue[V] 228 | type Map = ConcurrentHashMap[K, Queue] 229 | ... 230 | } 231 | 232 | .LP は、意思疎通が目的で簡潔さを高めたい場合に有用だ。 233 | 234 | エイリアスが使える場合は、サブクラスにしてはいけない。 235 | 236 | trait SocketFactory extends (SocketAddress => Socket) 237 | 238 | .LP SocketFactory は Socket を生成する関数だが、型エイリアス 239 | 240 | type SocketFactory = SocketAddress => Socket 241 | 242 | .LP を使う方がいい。これで、SocketFactory 型の値となる関数リテラルが提供されるので、関数合成が使える: 243 | 244 | val addrToInet: SocketAddress => Long 245 | val inetToSocket: Long => Socket 246 | 247 | val factory: SocketFactory = addrToInet andThen inetToSocket 248 | 249 | パッケージオブジェクトを使うと、型エイリアスをトップレベル名として束縛できる: 250 | 251 | package com.twitter 252 | package object net { 253 | type SocketFactory = (SocketAddress) => Socket 254 | } 255 | 256 | 型エイリアスは新しい型ではないことに注意しよう。型エイリアスは、エイリアスされた名前をその型へと構文的に置換することと同等だ。 257 | 258 | ### 暗黙 259 | 260 | 暗黙 (implicit) は強力な型システムの機能だが、慎重に使うべきだ。それらの解決ルールは複雑で、シンプルな字句検査においてさえ、実際に何が起きているか把握するのを困難にする。暗黙を間違いなく使ってもいいのは以下の状況だ: 261 | 262 | * Scala スタイルのコレクションを拡張したり、追加したりするとき 263 | * オブジェクトを適合 (adapt) したり、拡張 (extend) したりするとき（"pimp my library" パターン） 264 | * [制約エビデンス](http://www.ne.jp/asahi/hishidama/home/tech/scala/generics.html#h_generalized_type_constraints)を提供して*型安全を強化*するために使う 265 | * 型エビデンス（型クラス）を提供するため 266 | * `Manifest`（Scala 2.10 以降は `TypeTag`）のため 267 | 268 | 暗黙を使おうとする時は、暗黙を使わずに同じことを達成する方法がないか常に確認しよう。 269 | 270 | 似通ったデータ型同士を、自動的に変換するために暗黙を使うのはやめよう（例えば、リストをストリームに変換する等）。明示的に変換するべきだ。それらの型はそれぞれ異なった動作をするので、読み手は暗黙の型変換が働いていないか注意しなくてはならなくなる。 271 | 272 | ## コレクション 273 | 274 | Scala のコレクションライブラリは非常に総称的 (generic) で、機能が豊富で、強力で、組み立てやすい。コレクションは高水準であり、多数の操作を公開している。多くのコレクション操作と変換を簡潔かつ読みやすく表現できるが、そうした機能を不注意に適用すると、しばしば正反対の結果を招く。全ての Scala プログラマは [collections design document](http://docs.scala-lang.org/ja/overviews/collections/introduction.html) を読むべきだ。このドキュメントは、Scala のコレクションライブラリに対する優れた洞察とモチベーションをもたらしてくれる。 275 | 276 | 常に、要求を最もシンプルに満たすコレクションを使おう。 277 | 278 | ### 階層 279 | 280 | Scala のコレクションライブラリは巨大だ。`Traversable[T]` を基底とする入り組んだ継承階層だけでなく、ほとんどのコレクションに `immutable` 版と `mutable` 版がある。どんなに複雑でも、以下の図は `immutable` と `mutable` の双方の階層にとって重要な区別を含んでいる。 281 | 282 |

283 | .cmd 284 | pic2graph -format png >coll.png <Iterable[T] はイテレート (iterate) できるコレクションで、iterator (と foreach) メソッドを提供する。Seq[T] は順序付けされたコレクション、Set[T] は数学的集合（要素が一意な順序付けのないコレクション）、そして Map[T] は順序付けのない連想配列だ。 306 | 307 | ### 使う 308 | 309 | *不変 (immutable) コレクションを使おう。*不変コレクションはほとんどの状況に適用できる。また、不変コレクションは参照透過なのでデフォルトでスレッドセーフとなり、プログラムの理解が容易になる。 310 | 311 | *明示的に `mutable` 名前空間を使おう。*`scala.collections.mutable._` を import して `Set` を参照するのではなく、 312 | 313 | import scala.collections.mutable 314 | val set = mutable.Set() 315 | 316 | .LP とすることで、可変版の `Set` が使われていることがはっきりする。 317 | 318 | *コレクション型のデフォルトコンストラクタを使おう。*例えば、順序付きの（かつ連結リストである必要がない）シーケンスが必要な場合は、いつでも `Seq()` コンストラクタを使おう: 319 | 320 | val seq = Seq(1, 2, 3) 321 | val set = Set(1, 2, 3) 322 | val map = Map(1 -> "one", 2 -> "two", 3 -> "three") 323 | 324 | .LP このスタイルでは、コレクションの動作とその実装が切り分けられるので、コレクションライブラリに対して最も適切な実装型を使わせることができる。君が必要なのは Map であって、必ずしも赤黒木ではない。さらに、これらのデフォルトコンストラクタは、しばしば特殊化された表現を用いる。例えば、Map() は 3 つのキーを持つマップに対して、フィールドを 3 つ持つオブジェクト（Map3 クラス）を使う。 325 | 326 | 以上の当然の帰結として、メソッドやコンストラクタでは、*最も総称的なコレクション型を適切に受け取ろう*。要するに、通常は上記の `Iterable`, `Seq`, `Set`、あるいは `Map` のうちのどれか一つだ。シーケンスが必要なメソッドには `List[T]` ではなく `Seq[T]` を使おう。 327 | 328 | .LP （注意: `scala.package` が定義するデフォルトの `Traversable`、`Iterable` と `Seq` は `scala.collection` バージョンだ。これに対して、`Predef.scala` が定義する `Map` と `Set` は `scala.collection.immutable` バージョンだ。これが意味するのは、例えば、デフォルトの `Seq` 型は不変と可変の*両方*になれるということだ。したがって、君のメソッドの引数が不変コレクションに依存するなら、 `Traversable`、`Iterable` や `Seq` を使う場合、明確に不変バージョンを require/import する必要がある。さもなければ、メソッドに可変バージョンが渡されるかもしれない。） 329 | 330 | 334 | 335 | ### スタイル 336 | 337 | 関数型プログラミングでは、パイプライン化した不変コレクションの変換によって、コレクションを望みの結果へと成形することが推奨されている。この手法により、多くの問題をとても簡潔に解決できるが、これは読み手を困惑させる可能性がある。パイプライン化した変換は時に作者の意図の理解を困難にするので、その場合、暗黙的に示される中間結果を全て追跡し続けるしかない。例えば、様々なプログラミング言語に対する投票である (language, num votes) のシーケンスを集計して、票数が最も多い言語から順番に表示するコードは以下のように書ける: 338 | 339 | val votes = Seq(("scala", 1), ("java", 4), ("scala", 10), ("scala", 1), ("python", 10)) 340 | val orderedVotes = votes 341 | .groupBy(_._1) 342 | .map { case (which, counts) => 343 | (which, counts.foldLeft(0)(_ + _._2)) 344 | }.toSeq 345 | .sortBy(_._2) 346 | .reverse 347 | 348 | .LP このコードは簡潔でかつ正しい。しかし、ほとんどの読み手は作者の元々の意図を把握するのに苦労するだろう。中間結果とパラメータに名前を付ける戦略は、多くの場合で作者の意図を明確にするのに役立つ: 349 | 350 | val votesByLang = votes groupBy { case (lang, _) => lang } 351 | val sumByLang = votesByLang map { case (lang, counts) => 352 | val countsOnly = counts map { case (_, count) => count } 353 | (lang, countsOnly.sum) 354 | } 355 | val orderedVotes = sumByLang.toSeq 356 | .sortBy { case (_, count) => count } 357 | .reverse 358 | 359 | .LP このコードでは、施される変換を中間値の名前として、操作されるデータ構造をパラメータ名として表している。これにより、以前と同じくらい簡潔であるだけでなく、よりいっそう明瞭な表現となっている。もし名前空間の汚染が心配なら、式を {} でグループ化すると良い: 360 | 361 | val orderedVotes = { 362 | val votesByLang = ... 363 | ... 364 | } 365 | 366 | 367 | ### 性能 368 | 369 | 高水準コレクションライブラリは（高水準な構築物が一般的にそうであるように）性能の推測が難しい。コンピュータに直接指示する”命令型スタイル”から遠ざかるほど、あるコード片が性能に与える影響を厳密に予測するのは困難になる。一方で、コードの正確さを判断するのは概して容易だし、読みやすさも高まる。Scala の場合、Java ランタイムが事態をさらに複雑にしている。Scala はユーザに対してボクシング (boxing) およびアンボクシング (unboxing) 操作を隠蔽するので、性能やメモリ使用量の面で重大なペナルティを被ることがある。 370 | 371 | 低レベルにおける細部に注目する前に、君の用途に対してコレクションの使い方が適切かどうか確認しよう。また、データ構造に予期しない漸近的な複雑さがないか確かめよう。Scala のさまざまなコレクションの複雑さについては[こちら](http://docs.scala-lang.org/ja/overviews/collections/performance-characteristics.html)で述べられている。 372 | 373 | 性能最適化の第一法則は、君のアプリケーションが*なぜ*遅いのかを理解することだ。最適化を始める前に、君のアプリケーションをプロファイル^[[Yourkit](http://yourkit.com)は良いプロファイラだ。]してデータを取ろう。最初に注目するのは、回数の多いループや巨大なデータ構造だ。最適化への過度な取り組みは、たいてい無駄な努力に終わる。クヌースの”時期尚早な最適化は諸悪の根源”という格言を思い出そう。 374 | 375 | 性能やメモリ効率が要求される場面では、多くの場合で低レベルコレクションを使うのが妥当だ。巨大なシーケンスには、リストより配列を使おう（不変の `Vector` コレクションは、配列への参照透過なインタフェースを提供する）。また、性能が重要な場合は、シーケンスを直接生成せずにバッファを使おう。 376 | 377 | ### Java コレクション 378 | 379 | Java コレクションと Scala コレクションと相互運用するには、`scala.collection.JavaConverters` を使おう。`JavaConverters` は、暗黙変換を行う `asJava` メソッドと `asScala` メソッドを追加する。読み手のために、これらの変換は明示的に行うようにしよう: 380 | 381 | import scala.collection.JavaConverters._ 382 | 383 | val list: java.util.List[Int] = Seq(1,2,3,4).asJava 384 | val buffer: scala.collection.mutable.Buffer[Int] = list.asScala 385 | 386 | ## 並行性 387 | 388 | 現代のサービスは高い並行性 (concurrency) を備え、サーバが何万何十万もの同時操作をコーディネートするのが一般的になっている。そして、堅固なシステムソフトウェアを記述する上で、暗黙的な複雑性への対処は中心的なテーマだ。 389 | 390 | *スレッド (thread)* は、並行性を表現する手段の一つだ。スレッドを使うことで、オペレーティングシステムによってスケジュールされる、ヒープを共有する独立した実行コンテクストを利用できる。しかし、Java においてスレッド生成はコストが高いので、典型的にはスレッドプールを使うことで、スレッドをリソースとして管理する必要がある。これは、プログラマにとってさらなる複雑さと高い結合度を生み出す。つまり、アプリケーションロジックと、それが使用する潜在的なリソースを分離するのが難しくなる。 391 | 392 | この複雑さは、出力 (fan-out) の大きいサービスを作成する際に、とりわけ明らかになる。それぞれの受信リクエストからは、システムのさらに別の階層に対する多数のリクエストが生じる。それらのシステムにおいて、スレッドプールは各階層でのリクエストの割合によってバランスを保つように管理される必要がある。あるスレッドプールで管理に失敗すると、その影響は他のスレッドプールにも広がってしまう。 393 | 394 | また、堅固なシステムはタイムアウトとキャンセルについても検討する必要がある。どちらに対処するにも、さらなる”制御スレッド”を導入する必要があるので、問題がさらに複雑になる。ちなみに、もしスレッドのコストが安いなら問題は低減できる。スレッドプールが必要なくなり、タイムアウトしたスレッドを放棄することができ、追加のリソース管理も必要ないからだ。 395 | 396 | このように、リソース管理はモジュール性を危うくするのだ。 397 | 398 | ### Future 399 | 400 | Future で並行性を管理しよう。Future は、並行操作とリソース管理を疎結合にする。例えば、[Finagle][Finagle] はわずかな数のスレッド上で並行操作を効率的に多重化する。Scala には軽量なクロージャリテラル構文があるので、Future のために新たな構文を覚える必要がなく、ほとんどのプログラマにとって自然に扱える。 401 | 402 | Future は、プログラマが並行計算を宣言的なスタイルで表現できるようにする。Future は合成可能で、また計算の失敗を一定の原則に基づいて処理できる。こうした性質は Future は関数型プログラミング言語にとても適しており、推奨されるスタイルだと確信している。 403 | 404 | *生成した Future を変換しよう。*Future の変換は、失敗の伝播やキャンセルの通知が行われることを保証し、またプログラマが Java メモリモデルの影響を検討する必要がなくなる。注意深いプログラマでさえ、RPC を逐次的に 10 回発行して結果を表示するプログラマを以下のように書いてしまうかもしれない: 405 | 406 | val p = new Promise[List[Result]] 407 | var results: List[Result] = Nil 408 | def collect() { 409 | doRpc() onSuccess { result => 410 | results = result :: results 411 | if (results.length < 10) 412 | collect() 413 | else 414 | p.setValue(results) 415 | } onFailure { t => 416 | p.setException(t) 417 | } 418 | } 419 | 420 | collect() 421 | p onSuccess { results => 422 | printf("Got results %s\n", results.mkString(", ")) 423 | } 424 | 425 | プログラマは、RPC の失敗が確実に伝播するように、コードに制御フローをいくつも挿入する必要がある。さらに悪いことに、このコードは間違っている！ `results` を `volatile` として宣言していないので、繰り返しごとに `results` が一つ前の値を保持していることを保証できない。Javaのメモリモデルは、油断ならない獣だ。しかし幸いなことに、宣言的スタイルを使えばこれらの落とし穴を全て避けることができる: 426 | 427 | def collect(results: List[Result] = Nil): Future[List[Result]] = 428 | doRpc() flatMap { result => 429 | if (results.length < 9) 430 | collect(result :: results) 431 | else 432 | Future.value(result :: results) 433 | } 434 | 435 | collect() onSuccess { results => 436 | printf("Got results %s\n", results.mkString(", ")) 437 | } 438 | 439 | ここでは `flatMap` を使って操作を順序付けし、処理が進むにつれてリストの先頭に結果を追加している。これは、関数型プログラミングの一般的なイディオムを Future に置き換えたものだ。これは正しく動作するだけでなく、必要な”おまじない”が少なくなるし、エラーの温床が減り、そして読みやすい。 440 | 441 | *Future のコンビネータ (combinator) を使おう。*`Future.select`, `Future.join`, `Future.collect` は、複数の Future を組み合わせて操作する際の一般的なパターンを体系化している。 442 | 443 | ### コレクション 444 | 445 | 並行コレクションの話題は、私見と、機微と、ドグマと、FUD に満ちている。それらは、大抵の実践的な状況においては取るに足らない問題だ: 目的を果たすためには、いつでも最も単純で、最も退屈で、最も標準的なコレクションから始めよう。同期化コレクションでは上手くいかないことが*分かる*前に、並行コレクションを手に取ってはいけない。JVM は、同期を低コストで実現する洗練された機構を持っている。その効率に君は驚くだろう。 446 | 447 | 不変 (immutable) コレクションで目的を果たせるなら、それを使おう。不変コレクションは参照透過なので、並行コンテキストでの推論が簡単になる。不変コレクションの変更は、主に（`var` セルや `AtomicReference` が指す）現在の値への参照を更新することで行う。不変コレクションの変更は注意が必要だ。他のスレッドへ不変コレクションを公開するには、`AtomicReference` は再試行が必要だし、`var` 変数は `volatile` として宣言しなければならない。 448 | 449 | 可変 (mutable) な並行コレクションは複雑な動作をするだけでなく、Java メモリモデルの微妙な部分を利用するので、特に更新を公開する方法などの暗黙的な挙動を理解しておこう。同期化コレクションの方が合成は簡単だ。並行コレクションでは `getOrElseUpdate` のような操作を正しく実装できないし、特に並行コレクションの合成はエラーの温床になる。 450 | 451 | 462 | 463 | 464 | ## 制御構造 465 | 466 | 関数型スタイルのプログラムは、従来の制御構造が少なくなり、また、宣言型スタイルで書かれていると読みやすいことが多い。こうしたスタイルでは、典型的にはロジックをいくつかの小さなメソッドや関数に分解し、それらを互いに `match` 式で貼り合わせる。また、関数型プログラムは、より式指向となる傾向がある: つまり、条件式のそれぞれの分岐は同じ型の値を計算し、`for (..) yield` は内包 (comprehension) を計算する。また、再帰の利用が一般的だ。 467 | 468 | ### 再帰 469 | 470 | *再帰表現を使うと、問題をしばしば簡潔に記述できる。*そしてコンパイラは、末尾呼び出しの最適化が適用できるコードを正規のループに置き換える（末尾最適化が適用されるかは `@tailrec` アノテーションで確認できる）。 471 | 472 | ヒープの fix-down アルゴリズムの、極めて標準的な命令型バージョンを検討しよう: 473 | 474 | def fixDown(heap: Array[T], m: Int, n: Int): Unit = { 475 | var k: Int = m 476 | while (n >= 2*k) { 477 | var j = 2*k 478 | if (j < n && heap(j) < heap(j + 1)) 479 | j += 1 480 | if (heap(k) >= heap(j)) 481 | return 482 | else { 483 | swap(heap, k, j) 484 | k = j 485 | } 486 | } 487 | } 488 | 489 | このコードでは、while ループに入るたびに一つ前の反復で変更された状態を参照する。各変数の値は、どの分岐を取るかに依存する。そして、正しい位置が見つかるとループの中盤で `return` する（鋭い読者は、ダイクストラの ["Go To Statement Considered Harmful"](http://www.u.arizona.edu/~rubinson/copyright_violations/Go_To_Considered_Harmful.html) ^[訳注: [エドガー・ダイクストラ](http://ja.wikipedia.org/wiki/%E3%82%A8%E3%83%89%E3%82%AC%E3%83%BC%E3%83%BB%E3%83%80%E3%82%A4%E3%82%AF%E3%82%B9%E3%83%88%E3%83%A9)は、構造化プログラミングの提唱者。彼が執筆したエッセイ "Go To Statement Considered Harmful" は”GOTO有害論”の端緒として有名。] に同様の議論があることに気づくと思う）。 490 | 491 | （末尾）再帰による実装を検討しよう^[[Finagle's heap 492 | balancer](https://github.com/twitter/finagle/blob/master/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer/Heap.scala#L41)より]: 493 | 494 | @tailrec 495 | final def fixDown(heap: Array[T], i: Int, j: Int) { 496 | if (j < i*2) return 497 | 498 | val m = if (j == i*2 || heap(2*i) < heap(2*i+1)) 2*i else 2*i + 1 499 | if (heap(m) < heap(i)) { 500 | swap(heap, i, m) 501 | fixDown(heap, m, j) 502 | } 503 | } 504 | 505 | .LP ここでは、すべての反復は明確に定義された白紙の状態から開始する。また、参照セルが存在しないので不変式 (invariant) を数多く見出せる。このメソッドはより推論しやすく、より読みやすい。さらに、性能面のペナルティもない。メソッドは末尾再帰なので、コンパイラがこれを標準的な命令型のループへと変換するからだ。 506 | 507 | 510 | 511 | 512 | ### Return 513 | 514 | 前節では再帰を使うメリットを紹介したが、とはいえ命令型の構造が無価値だというわけではない。多くの場合、計算を早期に終了する方が、終点の可能性がある全ての位置に条件分岐を持つよりも適切だ。実際に、上記の `fixDown` がヒープの終端に達すると `return` によって早期に終了する。 515 | 516 | `return` を使うと、分岐を減らして不変式 (invariant) を定めることができる。これにより、入れ子が減ってコードを追いやすくなるだけでなく、後続のコードの正当性を推論しやすくなる（配列の範囲外をアクセスしないことを確認する場合とか）。これは、"ガード"節で特に有用だ: 517 | 518 | def compare(a: AnyRef, b: AnyRef): Int = { 519 | if (a eq b) 520 | return 0 521 | 522 | val d = System.identityHashCode(a) compare System.identityHashCode(b) 523 | if (d != 0) 524 | return d 525 | 526 | // slow path.. 527 | } 528 | 529 | `return` を使って、コードを明快にして読みやすさを高めよう。ただし、命令型言語でのような使い方をしてはいけない。つまり、計算結果を返すために `return` を使うのは避けよう。 530 | 531 | def suffix(i: Int) = { 532 | if (i == 1) return "st" 533 | else if (i == 2) return "nd" 534 | else if (i == 3) return "rd" 535 | else return "th" 536 | } 537 | 538 | .LP と書く代わりに下記のように書こう: 539 | 540 | def suffix(i: Int) = 541 | if (i == 1) "st" 542 | else if (i == 2) "nd" 543 | else if (i == 3) "rd" 544 | else "th" 545 | 546 | .LP しかし、match 式を使うのがより優れた方法だ: 547 | 548 | def suffix(i: Int) = i match { 549 | case 1 => "st" 550 | case 2 => "nd" 551 | case 3 => "rd" 552 | case _ => "th" 553 | } 554 | 555 | なお、クロージャの内部で `return` を使うと目に見えないコストが発生する場合があるので注意しよう。 556 | 557 | seq foreach { elem => 558 | if (elem.isLast) 559 | return 560 | 561 | // process... 562 | } 563 | 564 | .LP この `return` は、バイトコードでは例外を `throw` と `catch` するコードとして実装されるので、実行頻度の高いコード内で使うと性能に影響を与える。 565 | 566 | ### `for`ループと内包 567 | 568 | `for` を使うと、ループと集約を簡潔かつ自然に表現できる。`for` は、多数のシーケンスを平坦化 (flatten) する場合に特に有用だ。`for` の構文は、内部的にはクロージャを割り当ててディスパッチしていることを覆い隠している。これにより予期しないコストが発生したり、予想外の挙動を示したりする。例えば、 569 | 570 | for (item <- container) { 571 | if (item != 2) return 572 | } 573 | 574 | .LP このコードは `container` が遅延評価されるとランタイムエラーが発生し、これにより return が非局所的 (nonlocal) に評価されてしまう！ 575 | 576 | これらの理由から、コードを明瞭にするためである場合を除いて、`for` の代わりに `foreach`, `flatMap`, `map`, `filter` を直接呼び出すのが良いことが多い。 577 | 578 | .LP （訳者による補足: Scala の for 式は `foreach`, `flatMap`, `map`, `withFilter` を呼び出す糖衣構文で、ループ内の式は、コンパイル時にそれらのメソッドに渡される匿名関数に変換される。例えば、上記の for 式は: 579 | 580 | container foreach { item => 581 | if (item != 2) return 582 | } 583 | 584 | .LP というコードとして実行される。本ガイドでは、最初からこのように記述することを推奨している。 585 | 586 | ところで、ネストした匿名関数での `return` 式は、ランタイムエラーである `NonLocalReturnException` の `throw` と `catch` に変換される。この場合、for 式の中の `container` が遅延評価されると `return` 式の挙動が意図しないものになる場合がある。詳細に興味がある読者は[こちらの議論](https://github.com/scalajp/effectivescala/commit/8b448ef819e6d87d21fa78310b84fc72593b0226#commitcomment-996948)も参照してほしい。） 587 | 588 | ### `require` と `assert` 589 | 590 | `require` と `assert` は、どちらも実行可能なドキュメントとして機能する。これらは、型システムが要求される不変条件 (invariant) を表現できない状況で有用だ。`assert` は、コードが仮定する（内部あるいは外部の）*不変条件*を表現するために使われる。例えば、 591 | 592 | val stream = getClass.getResourceAsStream("someclassdata") 593 | assert(stream != null) 594 | 595 | 一方で、`require` は API の契約を表現するために使われる: 596 | 597 | def fib(n: Int) = { 598 | require(n > 0) 599 | ... 600 | } 601 | 602 | ## 関数型プログラミング 603 | 604 | *値指向*プログラミングは、特に関数型プログラミングと一緒に用いることで数々の利益をもたらす。このスタイルはステートフルな変更よりも値の変換を重視し、参照透過 (referentially transparent) なコードを生み出し、より強力な不変式 (invariant) を提供するので、推論が容易になる。ケースクラス、パターンマッチ、構造化代入 (destructuring binding) 、型推論、クロージャやメソッドの軽量な生成構文がこの仕事の道具になる。 605 | 606 | ### 代数的データ型としてのケースクラス 607 | 608 | ケースクラス (case class) は、代数的データ型 (algebraic data type) をエンコードする: ケースクラスは数多くのデータ構造をモデリングするのに役に立ち、強力な不変式を簡潔なコードとして提供する。ケースクラスは、パターンマッチと共に利用すると特に有用だ。パターンマッチの解析器は、さらに強力な静的保証を提供する包括的解析 (exhaustivity analysis) を実装している。 609 | 610 | ケースクラスで代数的データ型をエンコードする際は、以下のパターンを使おう: 611 | 612 | sealed trait Tree[T] 613 | case class Node[T](left: Tree[T], right: Tree[T]) extends Tree[T] 614 | case class Leaf[T](value: T) extends Tree[T] 615 | 616 | .LP Tree[T] 型には Node と Leaf の 2 つのコンストラクタがある。型を sealed として宣言するとソースファイルの外でコンストラクタを追加できなくなるので、コンパイラに包括的解析を行わせることができる。 617 | 618 | パターンマッチと共に使うと、上記のモデリングは簡潔でありかつ”明らかに正しい”コードになる: 619 | 620 | def findMin[T <: Ordered[T]](tree: Tree[T]) = tree match { 621 | case Node(left, right) => Seq(findMin(left), findMin(right)).min 622 | case Leaf(value) => value 623 | } 624 | 625 | 木構造のような再帰的構造は、代数的データ型の古典的な応用を占めるが、代数的データ型が有用な領域はそれよりずっと大きい。特に、状態機械によく現れる直和 (disjoint union) は、代数的データ型で容易にモデル化できる。 626 | 627 | ### Option 628 | 629 | `Option` 型は、空の状態 (`None`) と満たされた状態 (`Some(value)`) のいずれかであるコンテナだ。Option は `null` の安全な代替手段を提供するので、いつでも可能な限り利用するべきだ。`Option` は（たかだか要素が一つの）コレクションなので、コレクション操作を利用できる。使うしかない! 630 | 631 | 以下のように書くのは安全ではない: 632 | 633 | var username: String = null 634 | ... 635 | username = "foobar" 636 | 637 | .LP 以下のように書こう: 638 | 639 | var username: Option[String] = None 640 | ... 641 | username = Some("foobar") 642 | 643 | .LP Option 型は、username が空であるかのチェックを静的に強制してくれるのでより安全だ。 644 | 645 | `Option` 値の条件実行は `foreach` を使うべきだ。以下のように書く代わりに: 646 | 647 | if (opt.isDefined) 648 | operate(opt.get) 649 | 650 | .LP 以下のように書こう: 651 | 652 | opt foreach { value => 653 | operate(value) 654 | } 655 | 656 | このスタイルは奇妙に思えるかもしれないが、より優れた安全性と簡潔さを提供する（例外を引き起こしうる `get` は呼んでいない）。空の場合と値を持つ場合の両方の分岐がありうるなら、パターンマッチを使おう: 657 | 658 | opt match { 659 | case Some(value) => operate(value) 660 | case None => defaultAction() 661 | } 662 | 663 | .LP しかし、もし値がない場合にデフォルト値で良いなら getOrElse がある。 664 | 665 | operate(opt getOrElse defaultValue) 666 | 667 | `Option` の濫用はよくない: もし、[*Null Object*](http://en.wikipedia.org/wiki/Null_Object_pattern) のような目的にあったデフォルト値があるなら、代わりにそれを使おう。 668 | 669 | また `Option` は、null になり得る値をラップできる、扱いやすいコンストラクタと共に使おう: 670 | 671 | Option(getClass.getResourceAsStream("foo")) 672 | 673 | .LP は Option[InputStream] だが、getResourceAsStream が null を返す場合に None を返す。 674 | 675 | ### パターンマッチ 676 | 677 | パターンマッチ (`x match { ...`) は、うまく書かれた Scala コードのあらゆる場所で使われる。パターンマッチは、条件実行と分解、そして一つの構成物へのキャストの 3 つを融合する。うまく使うことで明快さと安全性の双方を高められる。 678 | 679 | 型ごとに処理を切り替える実装にパターンマッチを使う: 680 | 681 | obj match { 682 | case str: String => ... 683 | case addr: SocketAddress => ... 684 | 685 | また、パターンマッチは分解と組み合わせることで最も良い働きを示す。ケースクラスにマッチする際は、以下のように書かずに: 686 | 687 | animal match { 688 | case dog: Dog => "dog (%s)".format(dog.breed) 689 | case _ => animal.species 690 | } 691 | 692 | .LP 以下のように書く: 693 | 694 | animal match { 695 | case Dog(breed) => "dog (%s)".format(breed) 696 | case other => other.species 697 | } 698 | 699 | [カスタム抽出子](http://www.scala-lang.org/node/112)を書くのはコンストラクタ (`apply`) が重複する場合のみとする。さもなければ不自然になる可能性がある。 700 | 701 | デフォルト値にもっと意味がある場合は、条件実行にパターンマッチを使うべきではない。コレクションライブラリは、普通は `Option` を返すメソッドを提供している。以下のように書くのは避けるべきだ: 702 | 703 | val x = list match { 704 | case head :: _ => head 705 | case Nil => default 706 | } 707 | 708 | .LP なぜなら、 709 | 710 | val x = list.headOption getOrElse default 711 | 712 | .LP の方がより短くて目的が伝わりやすいからだ。 713 | 714 | ### 部分関数 715 | 716 | Scala は、部分関数 (`PartialFunction`) を定義できる構文的な簡易記法を提供する: 717 | 718 | val pf: PartialFunction[Int, String] = { 719 | case i if i%2 == 0 => "even" 720 | } 721 | 722 | .LP また、これらは orElse と組み合わせられる。 723 | 724 | val tf: (Int => String) = pf orElse { case _ => "odd"} 725 | 726 | tf(1) == "odd" 727 | tf(2) == "even" 728 | 729 | 部分関数は多くの場面に現れるが，`PartialFunction` で効率的にエンコードできる。メソッドの引数として利用する例: 730 | 731 | trait Publisher[T] { 732 | def subscribe(f: PartialFunction[T, Unit]) 733 | } 734 | 735 | val publisher: Publisher[Int] = ... 736 | publisher.subscribe { 737 | case i if isPrime(i) => println("found prime", i) 738 | case i if i%2 == 0 => count += 2 739 | /* ignore the rest */ 740 | } 741 | 742 | .LP また、Option を返す代わりに: 743 | 744 | // Attempt to classify the the throwable for logging. 745 | type Classifier = Throwable => Option[java.util.logging.Level] 746 | 747 | .LP PartialFunction で表現した方が良い場面もある: 748 | 749 | type Classifier = PartialFunction[Throwable, java.util.Logging.Level] 750 | 751 | .LP なぜなら、PartialFunction の組み合わせ可能な性質を大いに活かせるからだ: 752 | 753 | val classifier1: Classifier 754 | val classifier2: Classifier 755 | 756 | val classifier = classifier1 orElse classifier2 orElse { _ => java.util.Logging.Level.FINEST } 757 | 758 | 759 | ### 構造化代入 760 | 761 | 構造化代入 (destructuring binding^[訳注: destructuring binding には”構造化代入”や”分配束縛”などの訳がある。詳細については[こちらの議論](https://github.com/scalajp/effectivescala/issues/4)を参照してほしい。]) は値の代入の一種であり、パターンマッチと関連している。構造化代入とパターンマッチは同じメカニズムを利用するが、例外の可能性を許容しないために、厳密に選択肢が一つだけのときにのみ適用できる。構造化代入は、特にタプルやケースクラスを使う際に有用だ。 762 | 763 | val tuple = ('a', 1) 764 | val (char, digit) = tuple 765 | 766 | val tweet = Tweet("just tweeting", Time.now) 767 | val Tweet(text, timestamp) = tweet 768 | 769 | ### 遅延評価 770 | 771 | Scala において、`lazy` で修飾された `val` フィールド（遅延フィールド）は*必要になったときに*計算される。なぜなら、Scala ではフィールドとメソッドは等価だからだ（Scala のフィールドを Java のフィールドと同じものにしたい場合は `private[this]` を使う）。 772 | 773 | lazy val field = computation() 774 | 775 | .LP は（おおよそ）以下のようなコードの簡易記法だ: 776 | 777 | var _theField = None 778 | def field = if (_theField.isDefined) _theField.get else { 779 | _theField = Some(computation()) 780 | _theField.get 781 | } 782 | 783 | .LP すなわち、計算して結果を記憶する。遅延フィールドは、この目的のために使うようにする。しかし、意味論によって遅延を要求される場合に遅延評価を使うべきではない。このような場合には明示的に書いた方がよい。なぜなら、それによりコストモデルが明確になり、副作用をより精密に制御できるからだ。 784 | 785 | 遅延フィールドはスレッドセーフだ。 786 | 787 | ### 名前呼び出し 788 | 789 | メソッドの引数は、名前によって特定してもよい。その意味は、引数を値ではなく、繰り返し評価されうる*計算*に対して束縛するということだ。値渡しの文脈を期待している呼び出し側を驚かせないように、この機能は注意深く適用すべきだ。この機能の目的は、構文的に自然な DSL を構築することにある。特に、新しい制御構造を、あたかも最初から言語に備わっている機能であるかのように見せることができる。 790 | 791 | 名前呼び出し (call-by-name) は、そのような制御構造に渡されるのが、呼び出し側にとって思いもよらない計算の結果ではなく”ブロック”であることが明らかな場合にのみ使おう。同様に、名前呼び出しの引数は、最後の引数リストの最後の位置でのみ使うべきだ。引数が名前呼び出しであるメソッドの名前は、呼び出し側にそのことが分かるように命名しよう。 792 | 793 | 値を複数回計算させたくて、特にその計算が副作用を持つなら明示的な関数を使おう: 794 | 795 | class SSLConnector(mkEngine: () => SSLEngine) 796 | 797 | .LP 提供者の意図はシグネチャから明らかであり、呼び出し側を驚かせることがない。 798 | 799 | ### `flatMap` 800 | 801 | `flatMap` は `map` と `flatten` を組み合わせたもので、その巧妙な力と素晴らしい実用性によって特別な注目を浴びるに値する。`flatMap` は、その仲間の `map` と同様に、`Future` や `Option` のような従来とは異なるコレクションにおいてもしばしば利用可能だ。その振る舞いはシグネチャから明らかだ。ある `Container[A]` について、 802 | 803 | flatMap[B](f: A => Container[B]): Container[B] 804 | 805 | .LP flatMap はコレクションの要素に対し、各要素から新しいコレクションを作り出す関数 f を呼び出した後、その生成した（全ての）コレクションを平坦化 (flatten) した結果を返す。例えば、同じ文字を繰り返さないような 2 文字の文字列の順列を全て取得するには: 806 | 807 | val chars = 'a' to 'z' 808 | val perms = chars flatMap { a => 809 | chars flatMap { b => 810 | if (a != b) Seq("%c%c".format(a, b)) 811 | else Seq() 812 | } 813 | } 814 | 815 | .LP これは、上記のコードをより簡潔に記述できる糖衣構文である for 内包記法と（おおよそ）等価だ: 816 | 817 | val perms = for { 818 | a <- chars 819 | b <- chars 820 | if a != b 821 | } yield "%c%c".format(a, b) 822 | 823 | `flatMap` は `Option` を扱う際にしばしば有用だ。`flatMap` を使うと、Option の連鎖を畳み込んで一つにできる。 824 | 825 | val host: Option[String] = ... 826 | val port: Option[Int] = ... 827 | 828 | val addr: Option[InetSocketAddress] = 829 | host flatMap { h => 830 | port map { p => 831 | new InetSocketAddress(h, p) 832 | } 833 | } 834 | 835 | .LP これも for を使えばもっと簡潔に記述できる。 836 | 837 | val addr: Option[InetSocketAddress] = for { 838 | h <- host 839 | p <- port 840 | } yield new InetSocketAddress(h, p) 841 | 842 | `Future` における `flatMap` の使い方は”Future”の章で議論する。 843 | 844 | ## オブジェクト指向プログラミング 845 | 846 | Scala の偉大さの大部分は、オブジェクトシステムによるものだ。Scala は*すべての値*がオブジェクトであるという意味で*純粋な*言語であり、プリミティブ型と複合型の間に違いはない。Scala にはミックスイン (mixin) の機能もある。ミックスインを使うと、もっと直交的かつ段階的にモジュールを組み合わせられるだけでなく、そこにコンパイル時の静的な型検査を柔軟に組み合わせて、その恩恵をすべて享受できる。 847 | 848 | ミックスインシステムの背景にある動機は、従来の依存性注入 (dependency injection) を不要にすることだ。その”コンポーネントスタイル”のプログラミングの極致こそが [Cake 849 | パターン](http://jonasboner.com/real-world-scala-dependency-injection-di/)（[日本語訳](http://eed3si9n.com/ja/real-world-scala-dependency-injection-di)）だ。 850 | 851 | ### 依存性注入 852 | 853 | しかし、Scala 自身が、”古典的な”（コンストラクタへの）依存性注入を利用する際の面倒な構文を、ほとんど取り除いてくれることが分かったので、Twitter ではむしろ依存性注入を使うようにしている: それはより明快で、依存性はやはり（コンストラクタの）型によってエンコードされ、クラスを構築する構文はとりたてて難しくなく扱いやすい。それは退屈で単純だが、うまくいく。*依存性注入はプログラムをモジュール化するために使おう*。そして、特に*継承より合成を使おう*。これにより、よりモジュール化されてテストが容易なプログラムになる。継承が必要な状況に遭遇したら、こう考えてみよう: ”もし継承をサポートしない言語を使うとしたら、このプログラムをどのように構造化するだろう？”と。その答えには説得力があるかもしれない。 854 | 855 | 通常、依存性注入にはトレイトを使う。 856 | 857 | trait TweetStream { 858 | def subscribe(f: Tweet => Unit) 859 | } 860 | class HosebirdStream extends TweetStream ... 861 | class FileStream extends TweetStream ... 862 | 863 | class TweetCounter(stream: TweetStream) { 864 | stream.subscribe { tweet => count += 1 } 865 | } 866 | 867 | 一般的には、注入するのは他のオブジェクトを生成するオブジェクトである*ファクトリ (factory)* だ。この場合、特化したファクトリ型ではなくシンプルな関数を使うべきだ。（訳者による補足: つまり、戻り値を持つあらゆる関数はファクトリとみなせるということ） 868 | 869 | class FilteredTweetCounter(mkStream: Filter => TweetStream) { 870 | mkStream(PublicTweets).subscribe { tweet => publicCount += 1 } 871 | mkStream(DMs).subscribe { tweet => dmCount += 1 } 872 | } 873 | 874 | ### トレイト 875 | 876 | 依存性注入を使用するからといって、共通の*インタフェース*や、トレイト (trait) に実装された共通コードを同時に使ってはならないということは全くない。それどころか、一つの具象クラスが複数のインタフェース（トレイト）を実装するかもしれず、また共通コードを全てのクラスで横断的に再利用するかもしれないので、トレイトの使用は強く推奨される。 877 | 878 | トレイトは短くして直交性を保とう: 分割できる機能を一つのトレイトの塊にしてはいけない。互いに組み合わさる関連するアイデアのうち最小のものを考えるようにする。例えば、IO を行う何かがあるとしよう: 879 | 880 | trait IOer { 881 | def write(bytes: Array[Byte]) 882 | def read(n: Int): Array[Byte] 883 | } 884 | 885 | .LP これを二つの振る舞いに分離する: 886 | 887 | trait Reader { 888 | def read(n: Int): Array[Byte] 889 | } 890 | trait Writer { 891 | def write(bytes: Array[Byte]) 892 | } 893 | 894 | .LP そして、これらを互いに new Reader with Writer… のようにミックスインして、先ほどの IOer を形成する。インターフェイスの最小化は、よりよい直交性とモジュール化をもたらす。 895 | 896 | ### 可視性 897 | 898 | Scala は、可視性を制御するための非常に表現力の高い修飾子を持つ。これらの修飾子は、何を*公開 API* として構成するかを定義するのに使うので重要だ。公開する API は限定されるべきだ。それによってユーザが実装の詳細にうっかり依存することがなくなり、また、作者が API を変更する能力を制限する。これらは、良いモジュール性にとって極めて重要だ。原則的に、公開 API を拡張するのは縮小するよりもはるかに簡単だ。また、アノテーションが貧弱だと、コードのバイナリの後方互換性が危うくなる。 899 | 900 | #### `private[this]` 901 | 902 | `private` に指定したクラスメンバは、 903 | 904 | private val x: Int = ... 905 | 906 | .LP そのクラス（サブクラスは除く）の全てのインスタンスから可視になる。ほとんどの場合、private[this] としたいだろう。 907 | 908 | private[this] val x: Int = ... 909 | 910 | .LP これで x の可視性は特定のインスタンスに制限される。Scala コンパイラは、private[this] を単純なフィールドへのアクセスに変換できる（メンバへのアクセスが、静的に定義されたクラスに限定されるため）。これは時に、性能の最適化に寄与する。 911 | 912 | #### シングルトンクラス型 913 | 914 | Scala において、シングルトンクラス型を生成するのは一般的だ。例えば、 915 | 916 | def foo() = new Foo with Bar with Baz { 917 | ... 918 | } 919 | 920 | .LP このような状況で可視性を制限するには、戻り型を宣言する: 921 | 922 | def foo(): Foo with Bar = new Foo with Bar with Baz { 923 | ... 924 | } 925 | 926 | .LP foo() の呼び出し側は、返されたインスタンスの限定されたビュー (Foo with Bar) を見ることになる。 927 | 928 | ### 構造的部分型 929 | 930 | 通常、構造的部分型 (structural type^[訳注: "structural typing" を直訳すると”構造的な型付け”だが、Scalaの文脈では”構造的部分型(structural subtyping)”と同じ意味だと考えて良い。この用語の背景については[こちらの解説](https://github.com/scalajp/effectivescala/pull/1#r455268)を参照して欲しい。]) を使うべきではない。構造的部分型は便利で強力な機能だが、残念なことに JVM 上では効率的に実装されない。しかし、実装上の気まぐれによって、構造的部分型はリフレクションのためのとても優れた簡易記法を提供する。 931 | 932 | val obj: AnyRef 933 | obj.asInstanceOf[{def close()}].close() 934 | 935 | ## エラー処理 936 | 937 | Scala は例外機能を提供するが、正確さのためにプログラマが適切に対処すべき場合に、これを一般的なエラーに対して使ってはいけない。代わりに `Option` や `com.twitter.util.Try` を使うのは、慣習的で良い選択だ。これらは、型システムを利用して、ユーザがエラー処理を適切に考慮するようにする。 938 | 939 | 例えば、レポジトリを設計する時に、以下のような API にしたくなるかもしれない: 940 | 941 | trait Repository[Key, Value] { 942 | def get(key: Key): Value 943 | } 944 | 945 | .LP しかし、これを実装すると key が存在しない時に例外を投げる必要がある。より良いやり方は Option を使うことだ: 946 | 947 | trait Repository[Key, Value] { 948 | def get(key: Key): Option[Value] 949 | } 950 | 951 | .LP このインタフェースなら、レポジトリがあらゆる key を含まなくてもよく、またプログラマが key が見つからない場合に対処しなければならないことが明確になる。さらに、Option はこうしたケースに対処するための数多くのコンビネータを備えている。例えば、getOrElse は key が見つからない場合にデフォルト値を供給するのに使われる: 952 | 953 | val repo: Repository[Int, String] 954 | repo.get(123) getOrElse "defaultString" 955 | 956 | ### 例外処理 957 | 958 | Scala の例外機構は非チェック例外、つまりプログラマが可能性のある例外をカバーしているかをコンパイラが静的にチェックできないので、例外処理において広い網をかけたくなりがちだ。 959 | 960 | しかし、いくつかの *fatal*（致命的）な例外は捕捉 (catch) するべきではない。 961 | 962 | However, some exceptions are *fatal* and should never be caught; the 963 | code 964 | 965 | try { 966 | operation() 967 | } catch { 968 | case _ => ... 969 | } 970 | 971 | .LP このコードは伝搬するべき致命的なエラーを捕捉してしまうので、ほとんどの場合で誤りだ。代わりに、非致命的な例外のみを捕捉する com.twitter.util.NonFatal 抽出子を使う。 972 | 973 | try { 974 | operation() 975 | } catch { 976 | case NonFatal(exc) => ... 977 | } 978 | 979 | ## ガベージコレクション 980 | 981 | Twitter では、実運用において多くの時間をガベージコレクションのチューニングに費している。ガベージコレクションにおける関心事は Java のそれとほとんど同じだが、関数型スタイルの副作用として、慣習的な Scala コードは Java よりも多くの（生存期間の短い）ガベージを生成する。HotSpot の世代別ガベージコレクションは、生存期間の短いガベージを効率的に解放するので、ほとんどの状況ではこれは問題にならない。 982 | 983 | GC の性能問題に取り組む前に、Twitter でのいくつかの GC チューニングの経験について解説した Attila の[プレゼンテーション](http://www.infoq.com/presentations/JVM-Performance-Tuning-twitter)を見て欲しい。 984 | 985 | GC 問題を軽減するための Scala に特有な唯一の手段は、ガベージの生成をより少なくすることだ。しかし、データ無しで行動してはならない！明らかに性能を劣化させる何かをしているのでなければ、Java の様々なプロファイリングツールを使おう。Twitter 自身も [heapster](https://github.com/mariusaeriksen/heapster) や 986 | [gcprof](https://github.com/twitter/jvmgcprof) といったツールを提供している。 987 | 988 | ## Java との互換性 989 | 990 | Twitter では、Java から使われるコードを Scala で書くとき、Java での慣習的な使い方ができるようにしている。多くの場合、追加の努力は必要ない。クラスと純粋な（実装を含まない）トレイトは、Java において対応するものと正確に同じものになる。しかし、時々、別個の Java API を提供する必要がある。ライブラリの Java API の感じをつかむ良い方法は、Java で単体テストを書くことだ（コンパイルが通れば良い）。このテストによってライブラリの Java ビューが安定していることが保証されるので、将来、Scala コンパイラが生成する実装が変化しても検出できる。 991 | 992 | 実装を含むトレイトは直接 Java から利用できない。代わりに、抽象クラスをトレイトと共に拡張する。 993 | 994 | // 直接 Java からは利用できない 995 | trait Animal { 996 | def eat(other: Animal) 997 | def eatMany(animals: Seq[Animal) = animals foreach(eat(_)) 998 | } 999 | 1000 | // しかし、これなら利用できる 1001 | abstract class JavaAnimal extends Animal 1002 | 1003 | ## Twitterの標準ライブラリ 1004 | 1005 | Twitter において、最も重要な標準ライブラリは [Util](http://github.com/twitter/util) と [Finagle](https://github.com/twitter/finagle) だ。Util は、Scala や Java の標準ライブラリの拡張という位置付けで、それらに欠けている機能やより適切な実装を提供する。Finagle は、Twitter の RPC システムで、分散システムの構成要素の中核だ。 1006 | 1007 | ### Future 1008 | 1009 | Future については、並行性の章でも簡単に議論した。Future は非同期処理の協調において中心的な機構で、Twitter のコードベースや Finagle のコアで広く使われている。Future は並行イベントの合成を可能にするとともに、並行性の高い操作についての推論を単純化する。また Future を使うと、JVM 上で並行操作を非常に効率的に実装できる。 1010 | 1011 | Twitter の Future は*非同期*だ。だから、例えばネットワーク入出力やディスク入出力のように、スレッドの実行を一時停止させうるブロッキング操作はシステムが処理する必要がある。このとき、システムはブロッキング操作の結果に対する Future を提供する。Finagle は、ネットワーク入出力のためのそうしたシステムを提供する。 1012 | 1013 | Future は単純明瞭だ: Future は、まだ完了していない計算の結果を*約束 (promise)* する。Future は単純なコンテナ（プレースホルダ）だ。もちろん、計算は失敗することがあるので、この失敗もエンコードする必要がある。Future は、*保留中 (pending)*、*失敗 (failed)*、*完了 (completed)* の三つの状態のうち、ただ一つを取ることができる。 1014 | 1015 |

1016 |

余談: 合成について

1017 |

もう一度確認すると、合成 (composition) とは、単純なコンポーネントを結合してより複雑なコンポーネントにすることだ。関数合成は、合成の標準的な例だ: 関数 f と g が与えられたとき、合成関数 (g∘f)(x) = g(f(x)) は、まず x を f に適用して、その結果を g に適用した結果だ。この合成関数を Scala で書くと:

1018 | 1019 |

val f = (i: Int) => i.toString
1020 | val g = (s: String) => s+s+s
1021 | val h = g compose f  // : Int => String
1022 | 	
1023 | scala> h(123)
1024 | res0: java.lang.String = 123123123

1025 | 1026 | .LP この関数 h は合成関数で、f と g の双方を所定の方法で結合した新しい関数だ。 1027 |

1028 | 1029 | Future はコレクションの一種だ。つまり、ゼロ個または一個の要素を持つコンテナであり、`map` や `filter` や `foreach` のような標準コレクションメソッドを持つ。Future の値は遅延されるので、これらのメソッドを適用した結果もまた必然的に遅延される。 1030 | 1031 | val result: Future[Int] 1032 | val resultStr: Future[String] = result map { i => i.toString } 1033 | 1034 | .LP 関数 { i => i.toString } は、整数値 i が利用可能になるまで呼び出されない。また、変換されたコレクション resultStr もその時まで保留状態になる。 1035 | 1036 | リストは平坦化 (flatten) できる; 1037 | 1038 | val listOfList: List[List[Int]] = ... 1039 | val list: List[Int] = listOfList.flatten 1040 | 1041 | .LP 同様に、Future においても平坦化は意味をなす: 1042 | 1043 | val futureOfFuture: Future[Future[Int]] = ... 1044 | val future: Future[Int] = futureOfFuture.flatten 1045 | 1046 | .LP Future は遅延するので、flatten の実装は（直ちに）Future を返す必要がある。この Future は、外側の Future(Future[Future[Int]]) が完了して、そのあとに内側の Future(Future[Future[Int]]) が完了するのを待っている結果だ。もし外側の Future が失敗したら、平坦化された Future も失敗する必要がある。 1047 | 1048 | Future は、List と同様に `flatMap` を定義している。`Future[A]` が定義するシグネチャは、 1049 | 1050 | flatMap[B](f: A => Future[B]): Future[B] 1051 | 1052 | .LP これは map と flatten の組み合わせたようなもので、そのように実装すると以下のようになる: 1053 | 1054 | def flatMap[B](f: A => Future[B]): Future[B] = { 1055 | val mapped: Future[Future[B]] = this map f 1056 | val flattened: Future[B] = mapped.flatten 1057 | flattened 1058 | } 1059 | 1060 | これは強力な組み合わせだ！ `flatMap` を使うと、二つの Future を順番に実行した結果である Future を定義できる。これは、一つ目の Future の結果に基づいて計算される二つ目の Future だ。ユーザ (ID) の認証のために、二つの RPC を行う必要があると想像しよう。合成された操作は以下の方法で定義できる: 1061 | 1062 | def getUser(id: Int): Future[User] 1063 | def authenticate(user: User): Future[Boolean] 1064 | 1065 | def isIdAuthed(id: Int): Future[Boolean] = 1066 | getUser(id) flatMap { user => authenticate(user) } 1067 | 1068 | .LP こうした種類の結合のもう一つの恩恵は、エラー処理が組み込みになっていることだ: getUser(..) か authenticate(..) がさらにエラー処理をしない限り、isIdAuthed(..) が返す Future は失敗する。 1069 | 1070 | #### スタイル 1071 | 1072 | Future のコールバックメソッド (`respond`, `onSuccess`, `onFailure`, `ensure`) は、その親に*連鎖する*新たな Future を返す。この Future は、親が完了した後でのみ完了することが保証されている。このパターンを実現するには、 1073 | 1074 | acquireResource() onSuccess { value => 1075 | computeSomething(value) 1076 | } ensure { 1077 | freeResource() 1078 | } 1079 | 1080 | .LP このとき freeResource() は computeSomething の後でのみ実行されることが保証される。これにより、ネイティブな try .. finally パターンのエミュレートを可能にする。 1081 | 1082 | `foreach` の代わりに `onSuccess` を使おう。`onSuccess` の方が、`onFailure` と対称的で目的をより良く表す名前だし、連鎖も可能だ。 1083 | 1084 | `Promise` インスタンスを直接作るのはいつでも避けるようにしよう: ほとんどのあらゆるタスクは、定義済みのコンビネータを使うことで達成できる。これらのコンビネータは、エラーやキャンセルの伝播を保証する。また、コンビネータは一般的に*データフロー・スタイル*のプログラミングを促進し、これにより普段は同期化や `volatile` 宣言が不要になる。 1085 | 1086 | 末尾再帰方式で書かれたコードはスタック空間のリークを引き起こさないので、データフロー・スタイルでループを効率的に実装できる: 1087 | 1088 | case class Node(parent: Option[Node], ...) 1089 | def getNode(id: Int): Future[Node] = ... 1090 | 1091 | def getHierarchy(id: Int, nodes: List[Node] = Nil): Future[Node] = 1092 | getNode(id) flatMap { 1093 | case n@Node(Some(parent), ..) => getHierarchy(parent, n :: nodes) 1094 | case n => Future.value((n :: nodes).reverse) 1095 | } 1096 | 1097 | `Future` は有用なメソッドをたくさん定義している。`Future.value()` や `Future.exception()` を使うと、事前に結果が満たされた Future を作れる。`Future.collect()`, `Future.join()`, `Future.select()` は、複数の Future を一つにまとめるコンビネータを提供する（つまり scatter-gather 操作の gather 部分）。 1098 | 1099 | #### キャンセル 1100 | 1101 | Future は弱いキャンセルを実装している。`Future#cancel` の呼び出しは、直ちに計算を終了させる代わりに、どれが最終的に Future を満たしたプロセスがなのか問い合わせることができる*シグナル*をレベルトリガで伝播する。キャンセルは、値とは反対方向へ伝播する: コンシューマ (consumer) がセットしたキャンセル・シグナルはプロデューサ (producer) へと伝播する。プロデューサは `Promise` の `onCancellation` を使って、シグナルに応じて作動するリスナーを指定する。 1102 | 1103 | つまり、キャンセルの動作はプロデューサに依存するし、デフォルトの実装は存在しない。*キャンセルはヒントに過ぎない。* 1104 | 1105 | #### Local 1106 | 1107 | Util ライブラリの [`Local`](https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Local.scala#L40) は、特定の Future のディスパッチツリーに対するローカルな参照セルを提供する。`Local` に値をセットすると、同じスレッド内の Future によって遅延されるあらゆる計算がこの値を利用できるようになる。これらはスレッドローカルに似ているが、そのスコープが Java スレッドでなく”Future スレッド”のツリーである点が異なる。 1108 | 1109 | trait User { 1110 | def name: String 1111 | def incrCost(points: Int) 1112 | } 1113 | val user = new Local[User] 1114 | 1115 | ... 1116 | 1117 | user() = currentUser 1118 | rpc() ensure { 1119 | user().incrCost(10) 1120 | } 1121 | 1122 | .LP ここで ensure ブロック内の user() は、コールバックが追加された時点でのローカルな user の値を参照する。 1123 | 1124 | スレッドローカルと同様に `Local` は非常に便利なこともあるが、ほとんどの場合は避けるべきだ: たとえそうした方が負担が少ないときでも、データを明示的に渡して回る方法では問題を十分に解決できないことを確認しよう。 1125 | 1126 | Local は、コアライブラリにおける*非常に*一般的な関心事のために効果的に使われる: 例えば、RPC のトレースを使ったスレッド管理、モニタの伝播、Future コールバックのための”スタックトレース”の作成など、その他の解決策ではユーザに過度な負担がある場合だ。その他のほとんどの状況で Local は不適切だ。 1127 | 1128 | ### Offer と Broker 1129 | 1130 | 並行システムは非常に複雑だ。それは、共有データやリソースへのアクセスを協調させる必要があるからだ。[アクター (Actor)](http://doc.akka.io/api/akka/current/index.html#akka.actor.Actor) は、並行システムを単純にする一つの戦略を提起している。アクターは逐次的なプロセスで、それぞれのアクターが自分自身の状態やリソースを保持し、メッセージングによって他のアクターとデータを共有する。共有データはアクター間で通信する必要がある。 1131 | 1132 | Offer と Broker は、これに基づいて三つの重要な考え方を取り入れている。一つ目は、通信チャネル (Broker) が第一級 (first class) であること。すなわち、アクターに直接メッセージを送るのではなく Broker 経由で送信する。二つ目は、Offer や Broker が同期化メカニズムであること: 通信することは同期化することだ。この意味は、Broker は協調メカニズムとして使えるということだ: プロセス `a` がプロセス `b` にメッセージを送信したとき、`a` と `b` は共にシステムの状態について合意する。三つ目は、通信が*選択的に*実行できること: 一つのプロセスはいくつか異なる通信を提案でき、それらのうちただ一つが有効になる。 1133 | 1134 | 一般的な（他の合成と同様の）やり方で選択的な通信をサポートするには、通信の行為 (act of communicating) から通信の記述 (description of a communication) を分離する必要がある。これをやるのが `Offer` だ。Offer は通信を記述する永続的な値で、（Offer に作用する）通信を実行するには Offer の `sync()` メソッドで同期化する。 1135 | 1136 | trait Offer[T] { 1137 | def sync(): Future[T] 1138 | } 1139 | 1140 | .LP `sync()` は、通信が値を得たときに、交換された値を生成する Future[T] を返す。 1141 | 1142 | `Broker` は通信のチャネルであり、Offer を使って値の交換を協調する: 1143 | 1144 | trait Broker[T] { 1145 | def send(msg: T): Offer[Unit] 1146 | val recv: Offer[T] 1147 | } 1148 | 1149 | .LP そして、二つの Offer を生成するとき、 1150 | 1151 | val b: Broker[Int] 1152 | val sendOf = b.send(1) 1153 | val recvOf = b.recv 1154 | 1155 | .LP sendOf と recvOf はどちらも同期化されており、 1156 | 1157 | // In process 1: 1158 | sendOf.sync() 1159 | 1160 | // In process 2: 1161 | recvOf.sync() 1162 | 1163 | .LP 両方の Offer が値を得て 1 の値が交換される。 1164 | 1165 | 選択的な通信は、`Offer.choose` でいくつかの Offer を結合することで行われる。 1166 | 1167 | def choose[T](ofs: Offer[T]*): Offer[T] 1168 | 1169 | .LP は新しい Offer を生成する。これは、同期化すると、ofs のうち最初に利用可能になったものを唯一つ取得する。いくつかが即座に利用可能になった場合は、取得する `Offer` はランダムに選ばれる。 1170 | 1171 | `Offer` オブジェクトは、Broker から得た Offer と組み合わせて使うワンオフの Offer をたくさん持っている。 1172 | 1173 | Offer.timeout(duration): Offer[Unit] 1174 | 1175 | .LP は与えられた期間の後に起動する Offer だ。Offer.never は決して値を取得しない。一方、Offer.const(value) は、与えられた値を直ちに取得する。これらは、選択的な通信によって合成するのにも有用だ。例えば、送信操作にタイムアウトを適用するには: 1176 | 1177 | Offer.choose( 1178 | Offer.timeout(10.seconds), 1179 | broker.send("my value") 1180 | ).sync() 1181 | 1182 | Offer と Broker を使う方法と [SynchronousQueue](http://docs.oracle.com/javase/jp/6/api/java/util/concurrent/SynchronousQueue.html) を比べてみたくなるが、両者には微妙だが重要な違いがある。Offer は、そうしたキューではとてもできないような方法で組み立てることができる。例えば、Broker で表した一連のキューを考える: 1183 | 1184 | val q0 = new Broker[Int] 1185 | val q1 = new Broker[Int] 1186 | val q2 = new Broker[Int] 1187 | 1188 | .LP ここで、読み込みのためのマージされたキューを作ってみる: 1189 | 1190 | val anyq: Offer[Int] = Offer.choose(q0.recv, q1.recv, q2.recv) 1191 | 1192 | .LP anyq は Offer で、最初に利用可能になったキューから読み込む。なお、この anyq はやはり同期的であり、内部にあるキューの動作を利用できる。こうした合成は、キューを使う方法ではとても不可能だ。 1193 | 1194 | #### 例: 簡単なコネクションプール 1195 | 1196 | コネクションプールは、ネットワークアプリケーションでは一般的で、たいていは実装しにくい。例えば、個々のクライアントは異なるレイテンシを要求するため、プールからの取得にタイムアウトがあるのが多くの場合で望ましい。プールは原理的には単純だ: コネクションのキューを保持し、待機クライアント (waiter) が入ってきたら満たしてやる。従来の同期化プリミティブでは、典型的には二つのキューを使う。一つは waiters で、コネクション (connection) がない時に使われる。もう一つは connections で、これは待機クライアント (waiter) がない時に使われる。 1197 | 1198 | Offer と Broker を使うと、これをとても自然に表現できる: 1199 | 1200 | class Pool(conns: Seq[Conn]) { 1201 | private[this] val waiters = new Broker[Conn] 1202 | private[this] val returnConn = new Broker[Conn] 1203 | 1204 | val get: Offer[Conn] = waiters.recv 1205 | def put(c: Conn) { returnConn ! c } 1206 | 1207 | private[this] def loop(connq: Queue[Conn]) { 1208 | Offer.choose( 1209 | if (connq.isEmpty) Offer.never else { 1210 | val (head, rest) = connq.dequeue 1211 | waiters.send(head) { _ => loop(rest) } 1212 | }, 1213 | returnConn.recv { c => loop(connq enqueue c) } 1214 | ).sync() 1215 | } 1216 | 1217 | loop(Queue.empty ++ conns) 1218 | } 1219 | 1220 | `loop` は、コネクションが返却された状態にすることを常にオファー (offer) すると共に、キューが空でない場合のみ送信をオファーする。永続的なキューを使うことで推論をより単純にできる。プールのインタフェースにも Offer を使っているので、呼び出し側はコンビネータを使うことでタイムアウトを適用できる: 1221 | 1222 | val conn: Future[Option[Conn]] = Offer.choose( 1223 | pool.get { conn => Some(conn) }, 1224 | Offer.timeout(1.second) { _ => None } 1225 | ).sync() 1226 | 1227 | タイムアウトを実装するのにこれ以上の簿記は必要ない。これは Offer の動作によるものだ: もし `Offer.timeout` が選択されたら、もはやプールからの受信をオファーしない。つまり、プールと呼び出し側が `waiters` Broker 上での送信と受信をそれぞれ同時に合意することはない。 1228 | 1229 | #### 例: エラトステネスの篩 1230 | 1231 | 並行プログラムを、同期的に通信する一連の逐次的なプロセスとして構築するのは多くの場合で有用だし、場合によってはプログラムを非常に単純化できる。Offer と Broker は、これを単純化し統一化する手段を提供する。実際、それらのアプリケーションは、人によっては”古典的な”並行性の問題として考えるかもしれないものを乗り越える。サブルーチンやクラス、モジュールと同じように、（Offer や Broker を用いた）並行プログラミングは有用な*構造化*ツールだ。これは、制約充足問題 (Constraint Satisfaction Problem; CSP) からのもう一つの重要なアイデアだ。 1232 | 1233 | これの一つの例は[エラトステネスの篩](http://ja.wikipedia.org/wiki/%E3%82%A8%E3%83%A9%E3%83%88%E3%82%B9%E3%83%86%E3%83%8D%E3%82%B9%E3%81%AE%E7%AF%A9)で、整数ストリームに対するフィルタの連続的な適用として構造化できる。まず、整数の生成源が必要だ: 1234 | 1235 | def integers(from: Int): Offer[Int] = { 1236 | val b = new Broker[Int] 1237 | def gen(n: Int): Unit = b.send(n).sync() ensure gen(n + 1) 1238 | gen(from) 1239 | b.recv 1240 | } 1241 | 1242 | .LP integers(n) は、単に n から始まる全ての連続した整数の Offer だ。次に、フィルタが必要だ: 1243 | 1244 | def filter(in: Offer[Int], prime: Int): Offer[Int] = { 1245 | val b = new Broker[Int] 1246 | def loop() { 1247 | in.sync() onSuccess { i => 1248 | if (i % prime != 0) 1249 | b.send(i).sync() ensure loop() 1250 | else 1251 | loop() 1252 | } 1253 | } 1254 | loop() 1255 | 1256 | b.recv 1257 | } 1258 | 1259 | .LP filter(in, p) は、in から素数 p の倍数を取り除く Offer を返す。最後に、篩 (sieve) を定義する: 1260 | 1261 | def sieve = { 1262 | val b = new Broker[Int] 1263 | def loop(of: Offer[Int]) { 1264 | for (prime <- of.sync(); _ <- b.send(prime).sync()) 1265 | loop(filter(of, prime)) 1266 | } 1267 | loop(integers(2)) 1268 | b.recv 1269 | } 1270 | 1271 | .LP loop() の動作は単純だ: of から次の素数を読み取り、この素数を除いた of にフィルタを適用する。loop が再帰するにつれて連続した素数がフィルタされ、篩が手に入る。これで、最初の 10000 個の素数を出力できる: 1272 | 1273 | val primes = sieve 1274 | 0 until 10000 foreach { _ => 1275 | println(primes.sync()()) 1276 | } 1277 | 1278 | このアプローチは、篩を単純かつ直交するコンポーネントへと構造化できるだけでなく、ストリームとして扱える: 君は、興味がある素数の集合を事前に計算する必要がなく、いっそうモジュラリティを拡張できる。 1279 | 1280 | ## 謝辞 1281 | 1282 | 本レッスンは、Twitter 社の Scala コミュニティによるものだ。私は誠実な記録者でありたい。 1283 | 1284 | Blake Matheny と Nick Kallen、Steve Gury、そして Raghavendra Prabhu には、とても有益な助言と多くの優れた提案を与えてもらった。 1285 | 1286 | ### 日本語版への謝辞 1287 | 1288 | 本ドキュメントの日本語訳は、[@okapies](http://github.com/okapies) と [@scova0731](https://github.com/scova0731) が担当しました。 1289 | 1290 | 翻訳にあたっては、日本の Scala コミュニティから数多くの貢献を頂きました: [@xuwei-k](http://github.com/xuwei-k) さん、[@kmizu](http://github.com/kmizu) さん、[@eed3si9n](http://github.com/eed3si9n) さん、[@akr4](http://github.com/akr4) さん、[@yosuke-furukawa](http://github.com/yosuke-furukawa) さん、m hanada さん、および[日本 Scala ユーザーズグループ](http://jp.scala-users.org/)の皆さん。（以上、順不同） 1291 | 1292 | また、[@kmizu](http://github.com/kmizu) さんと[@eed3si9n](http://github.com/eed3si9n) さんには、高度に専門的な議論について貴重な助言を頂きました。 1293 | 1294 | ありがとうございます。 1295 | 1296 | [Scala]: http://www.scala-lang.org/ 1297 | [Finagle]: http://github.com/twitter/finagle 1298 | [Util]: http://github.com/twitter/util 1299 | -------------------------------------------------------------------------------- /effectivescala.mo: -------------------------------------------------------------------------------- 1 |

2 | 3 | 4 |

Marius Eriksen, Twitter Inc.
marius@twitter.com (@marius)

5 | 6 |

7 | 8 | .TOC 9 | 10 | 11 |

Other languages

12 | 日本語 13 | Русский 14 | 简体中文 15 | 16 | 17 | ## Introduction 18 | 19 | [Scala][Scala] is one of the main application programming languages 20 | used at Twitter. Much of our infrastructure is written in Scala and 21 | [we have several large libraries](https://github.com/twitter/) 22 | supporting our use. While highly effective, Scala is also a large language, 23 | and our experiences have taught us to practice great care in its 24 | application. What are its pitfalls? Which features do we embrace, 25 | which do we eschew? When do we employ "purely functional style", and when 26 | do we avoid it? In other words: what have we found to be an effective 27 | use of the language? This guide attempts to distill our experience into short 28 | essays, providing a set of *best practices*. Our use of Scala is mainly for 29 | creating high volume services that form distributed systems -- and our 30 | advice is thus biased -- but most of the advice herein should translate 31 | naturally to other domains. This is not the law, but deviation should 32 | be well justified. 33 | 34 | Scala provides many tools that enable succinct expression. Less typing 35 | is less reading, and less reading is often faster reading, and thus 36 | brevity enhances clarity. However brevity is a blunt tool that can 37 | also deliver the opposite effect: After correctness, think always of 38 | the reader. 39 | 40 | Above all, *program in Scala*. You are not writing Java, nor Haskell, 41 | nor Python; a Scala program is unlike one written in any of these. In 42 | order to use the language effectively, you must phrase your problems 43 | in its terms. There's no use coercing a Java program into Scala, for 44 | it will be inferior in most ways to its original. 45 | 46 | This is not an introduction to Scala; we assume the reader 47 | is familiar with the language. Some resources for learning Scala are: 48 | 49 | * [Scala School](https://twitter.github.com/scala_school/) 50 | * [Learning Scala](https://docs.scala-lang.org/) 51 | * [Learning Scala in Small Bites](https://matt.might.net/articles/learning-scala-in-small-bites/) 52 | 53 | This is a living document that will change to reflect our current 54 | "best practices," but its core ideas are unlikely to change: Always 55 | favor readability; write generic code but not at the expense of 56 | clarity; take advantage of simple language features that afford great 57 | power but avoid the esoteric ones (especially in the type system). 58 | Above all, be always aware of the trade offs you make. A sophisticated 59 | language requires a complex implementation, and complexity begets 60 | complexity: of reasoning, of semantics, of interaction between 61 | features, and of the understanding of your collaborators. Thus complexity 62 | is the tax of sophistication -- you must always ensure that its utility exceeds its cost. 63 | 64 | And have fun. 65 | 66 | ## Formatting 67 | 68 | The specifics of code *formatting* -- so long as they are practical -- 69 | are of little consequence. By definition style cannot be inherently 70 | good or bad and almost everybody differs in personal 71 | preference. However the *consistent* application of the same 72 | formatting rules will almost always enhance 73 | readability. A reader already familiar with a particular style does 74 | not have to grasp yet another set of local conventions, or decipher 75 | yet another corner of the language grammar. 76 | 77 | This is of particular importance to Scala, as its grammar has a high 78 | degree of overlap. One telling example is method invocation: Methods 79 | can be invoked with "`.`", with whitespace, without parenthesis for 80 | nullary or unary methods, with parenthesis for these, and so on. 81 | Furthermore, the different styles of method invocations expose 82 | different ambiguities in its grammar! Surely the consistent 83 | application of a carefully chosen set of formatting rules will resolve 84 | a great deal of ambiguity for both man and machine. 85 | 86 | We adhere to the [Scala style 87 | guide](https://docs.scala-lang.org/style/) plus the following rules. 88 | 89 | ### Whitespace 90 | 91 | Indent by two spaces. Try to avoid lines greater than 100 columns in 92 | length. Use one blank line between method, class, and object definitions. 93 | 94 | ### Naming 95 | 96 |

Use short names for small scopes

is, js and ks are all but expected 99 | in loops.

Use longer names for larger scopes

External APIs should have longer and explanatory names that confer meaning. 102 | Future.collect not Future.all. 103 |

Use common abbreviations but eschew esoteric ones

106 | Everyone 107 | knows ok, err or defn 108 | whereas sfri is not so common. 109 |

Don't rebind names for different uses

Use vals

Avoid using `s to overload reserved names.

typ instead of `type`

Use active names for operations with side effects

user.activate() not user.setActive()

Use descriptive names for methods that return values

src.isDefined not src.defined

Don't prefix getters with get

As per the previous rule, it's redundant: site.count not site.getCount

Don't repeat names that are already encapsulated in package or object name

Prefer: 122 |

object User {
 123 |   def get(id: Int): Option[User]
 124 | }

to 125 |

object User {
 126 |   def getUser(id: Int): Option[User]
 127 | }

They are redundant in use: User.getUser provides 128 | no more information than User.get. 129 |

131 | 132 | 133 | ### Imports 134 | 135 |

Sort import lines alphabetically

This makes it easy to examine visually, and is simple to automate.

Use braces when importing several names from a package

import com.twitter.concurrent.{Broker, Offer}

Use wildcards when more than six names are imported

e.g.: import com.twitter.concurrent._ 142 |
Don't apply this blindly: some packages export too many names

When using collections, qualify names by importing 144 | scala.collection.immutable and/or scala.collection.mutable

Mutable and immutable collections have dual names. 146 | Qualifiying the names makes it obvious to the reader which variant is being used (e.g. "immutable.Map")

Do not use relative imports from other packages

Avoid

import com.twitter
 149 | import concurrent

in favor of the unambiguous

import com.twitter.concurrent

Put imports at the top of the file

The reader can refer to all imports in one place.

153 | 154 | ### Braces 155 | 156 | Braces are used to create compound expressions (they serve other uses 157 | in the "module language"), where the value of the compound expression 158 | is the last expression in the list. Avoid using braces for simple 159 | expressions; write 160 | 161 | def square(x: Int) = x*x 162 | 163 | .LP but not 164 | 165 | def square(x: Int) = { 166 | x * x 167 | } 168 | 169 | .LP even though it may be tempting to distinguish the method body syntactically. The first alternative has less clutter and is easier to read. Avoid syntactical ceremony unless it clarifies. 170 | 171 | ### Pattern matching 172 | 173 | Use pattern matching directly in function definitions whenever applicable; 174 | instead of 175 | 176 | list map { item => 177 | item match { 178 | case Some(x) => x 179 | case None => default 180 | } 181 | } 182 | 183 | .LP collapse the match 184 | 185 | list map { 186 | case Some(x) => x 187 | case None => default 188 | } 189 | 190 | .LP it's clear that the list items are being mapped over — the extra indirection does not elucidate. 191 | 192 | ### Comments 193 | 194 | Use [ScalaDoc](https://wiki.scala-lang.org/display/SW/Scaladoc) to 195 | provide API documentation. Use the following style: 196 | 197 | /** 198 | * ServiceBuilder builds services 199 | * ... 200 | */ 201 | 202 | .LP but not the standard ScalaDoc style: 203 | 204 | /** ServiceBuilder builds services 205 | * ... 206 | */ 207 | 208 | Do not resort to ASCII art or other visual embellishments. Document 209 | APIs but do not add unnecessary comments. If you find yourself adding 210 | comments to explain the behavior of your code, ask first if it can be 211 | restructured so that it becomes obvious what it does. Prefer 212 | "obviously it works" to "it works, obviously" (with apologies to Hoare). 213 | 214 | ## Types and Generics 215 | 216 | The primary objective of a type system is to detect programming 217 | errors. The type system effectively provides a limited form of static 218 | verification, allowing us to express certain kinds of invariants about 219 | our code that the compiler can verify. Type systems provide other 220 | benefits too of course, but error checking is its Raison d’Être. 221 | 222 | Our use of the type system should reflect this goal, but we must 223 | remain mindful of the reader: judicious use of types can serve to 224 | enhance clarity, being unduly clever only obfuscates. 225 | 226 | Scala's powerful type system is a common source of academic 227 | exploration and exercise (eg. [Type level programming in 228 | Scala](https://apocalisp.wordpress.com/2010/06/08/type-level-programming-in-scala/)). 229 | While a fascinating academic topic, these techniques rarely find 230 | useful application in production code. They are to be avoided. 231 | 232 | ### Return type annotations 233 | 234 | While Scala allows these to be omitted, such annotations provide good 235 | documentation: this is especially important for public methods. Where a 236 | method is not exposed and its return type obvious, omit them. 237 | 238 | This is especially important when instantiating objects with mixins as 239 | the scala compiler creates singleton types for these. For example, `make` 240 | in: 241 | 242 | trait Service 243 | def make() = new Service { 244 | def getId = 123 245 | } 246 | 247 | .LP does not have a return type of Service; the compiler creates the refinement type Object with Service{def getId: Int}. Instead use an explicit annotation: 248 | 249 | def make(): Service = new Service{} 250 | 251 | Now the author is free to mix in more traits without changing the 252 | public type of `make`, making it easier to manage backwards 253 | compatibility. 254 | 255 | ### Variance 256 | 257 | Variance arises when generics are combined with subtyping. Variance defines 258 | how subtyping of the *contained* type relates to subtyping of the 259 | *container* type. Because Scala has declaration site variance 260 | annotations, authors of common libraries -- especially collections -- 261 | must be prolific annotators. Such annotations are important for the 262 | usability of shared code, but misapplication can be dangerous. 263 | 264 | Invariants are an advanced but necessary aspect of Scala's typesystem, 265 | and should be used widely (and correctly) as it aids the application 266 | of subtyping. 267 | 268 | *Immutable collections should be covariant*. Methods that receive 269 | the contained type should "downgrade" the collection appropriately: 270 | 271 | trait Collection[+T] { 272 | def add[U >: T](other: U): Collection[U] 273 | } 274 | 275 | *Mutable collections should be invariant*. Covariance 276 | is typically invalid with mutable collections. Consider 277 | 278 | trait HashSet[+T] { 279 | def add[U >: T](item: U) 280 | } 281 | 282 | .LP and the following type hierarchy: 283 | 284 | trait Mammal 285 | trait Dog extends Mammal 286 | trait Cat extends Mammal 287 | 288 | .LP If I now have a hash set of dogs 289 | 290 | val dogs: HashSet[Dog] 291 | 292 | .LP treat it as a set of Mammals and add a cat. 293 | 294 | val mammals: HashSet[Mammal] = dogs 295 | mammals.add(new Cat{}) 296 | 297 | .LP This is no longer a HashSet of dogs! 298 | 299 | 303 | 304 | ### Type aliases 305 | 306 | Use type aliases when they provide convenient naming or clarify 307 | purpose, but do not alias types that are self-explanatory. 308 | 309 | () => Int 310 | 311 | .LP is clearer than 312 | 313 | type IntMaker = () => Int 314 | IntMaker 315 | 316 | .LP since it is both short and uses a common type. However 317 | 318 | class ConcurrentPool[K, V] { 319 | type Queue = ConcurrentLinkedQueue[V] 320 | type Map = ConcurrentHashMap[K, Queue] 321 | ... 322 | } 323 | 324 | .LP is helpful since it communicates purpose and enhances brevity. 325 | 326 | Don't use subclassing when an alias will do. 327 | 328 | trait SocketFactory extends (SocketAddress => Socket) 329 | 330 | .LP a SocketFactory is a function that produces a Socket. Using a type alias 331 | 332 | type SocketFactory = SocketAddress => Socket 333 | 334 | .LP is better. We may now provide function literals for values of type SocketFactory and also use function composition: 335 | 336 | val addrToInet: SocketAddress => Long 337 | val inetToSocket: Long => Socket 338 | 339 | val factory: SocketFactory = addrToInet andThen inetToSocket 340 | 341 | Type aliases are bound to toplevel names by using package objects: 342 | 343 | package com.twitter 344 | package object net { 345 | type SocketFactory = (SocketAddress) => Socket 346 | } 347 | 348 | Note that type aliases are not new types -- they are equivalent to 349 | the syntactically substituting the aliased name for its type. 350 | 351 | ### Implicits 352 | 353 | Implicits are a powerful type system feature, but they should be used 354 | sparingly. They have complicated resolution rules and make it 355 | difficult -- by simple lexical examination -- to grasp what is actually 356 | happening. It's definitely OK to use implicits in the following 357 | situations: 358 | 359 | * Extending or adding a Scala-style collection 360 | * Adapting or extending an object ("pimp my library" pattern) 361 | * Use to *enhance type safety* by providing constraint evidence 362 | * To provide type evidence (typeclassing) 363 | * For `Manifest`s 364 | 365 | If you do find yourself using implicits, always ask yourself if there is 366 | a way to achieve the same thing without their help. 367 | 368 | Do not use implicits to do automatic conversions between similar 369 | datatypes (for example, converting a list to a stream); these are 370 | better done explicitly because the types have different semantics, and 371 | the reader should beware of these implications. 372 | 373 | ## Collections 374 | 375 | Scala has a very generic, rich, powerful, and composable collections 376 | library; collections are high level and expose a large set of 377 | operations. Many collection manipulations and transformations can be 378 | expressed succinctly and readably, but careless application of these 379 | features can often lead to the opposite result. Every Scala programmer 380 | should read the [collections design 381 | document](https://www.scala-lang.org/docu/files/collections-api/collections.html); 382 | it provides great insight and motivation for Scala collections 383 | library. 384 | 385 | Always use the simplest collection that meets your needs. 386 | 387 | ### Hierarchy 388 | 389 | The collections library is large: in addition to an elaborate 390 | hierarchy -- the root of which being `Traversable[T]` -- there are 391 | `immutable` and `mutable` variants for most collections. Whatever 392 | the complexity, the following diagram contains the important 393 | distinctions for both `immutable` and `mutable` hierarchies 394 | 395 |

396 | .cmd 397 | pic2graph -format png >coll.png <Iterable[T] is any collection that may be iterated over, they provide an iterator method (and thus foreach). Seq[T]s are collections that are ordered, Set[T]s are mathematical sets (unordered collections of unique items), and Map[T]s are associative arrays, also unordered. 419 | 420 | ### Use 421 | 422 | *Prefer using immutable collections.* They are applicable in most 423 | circumstances, and make programs easier to reason about since they are 424 | referentially transparent and are thus also threadsafe by default. 425 | 426 | *Use the `mutable` namespace explicitly.* Don't import 427 | `scala.collection.mutable._` and refer to `Set`, instead 428 | 429 | import scala.collection.mutable 430 | val set = mutable.Set() 431 | 432 | .LP makes it clear that the mutable variant is being used. 433 | 434 | *Use the default constructor for the collection type.* Whenever you 435 | need an ordered sequence (and not necessarily linked list semantics), 436 | use the `Seq()` constructor, and so on: 437 | 438 | val seq = Seq(1, 2, 3) 439 | val set = Set(1, 2, 3) 440 | val map = Map(1 -> "one", 2 -> "two", 3 -> "three") 441 | 442 | .LP This style separates the semantics of the collection from its implementation, letting the collections library use the most appropriate type: you need a Map, not necessarily a Red-Black Tree. Furthermore, these default constructors will often use specialized representations: for example, Map() will use a 3-field object for maps with 3 keys. 443 | 444 | The corollary to the above is: in your own methods and constructors, *receive the most generic collection 445 | type appropriate*. This typically boils down to one of the above: 446 | `Iterable`, `Seq`, `Set`, or `Map`. If your method needs a sequence, 447 | use `Seq[T]`, not `List[T]`. (A word of caution: the *default* 448 | `Traversable`, `Iterable` and `Seq` types in scope – defined in 449 | `scala.package` – are the `scala.collection` versions, as opposed to 450 | `Map` and `Set` – defined in `Predef.scala` – which are the `scala.collection.immutable` 451 | versions. This means that, for example, the default `Seq` type can 452 | be both the immutable *and* mutable implementations. Thus, if your 453 | method relies on a collection parameter being immutable, and you are 454 | using `Traversable`, `Iterable` or `Seq`, you *must* specifically 455 | require/import the immutable variant, otherwise someone *may* pass 456 | you the mutable version.) 457 | 458 | 462 | 463 | ### Style 464 | 465 | Functional programming encourages pipelining transformations of an 466 | immutable collection to shape it to its desired result. This often 467 | leads to very succinct solutions, but can also be confusing to the 468 | reader -- it is often difficult to discern the author's intent, or keep 469 | track of all the intermediate results that are only implied. For example, 470 | let's say we wanted to aggregate votes for different programming 471 | languages from a sequence of (language, num votes), showing them 472 | in order of most votes to least, we could write: 473 | 474 | val votes = Seq(("scala", 1), ("java", 4), ("scala", 10), ("scala", 1), ("python", 10)) 475 | val orderedVotes = votes 476 | .groupBy(_._1) 477 | .map { case (which, counts) => 478 | (which, counts.foldLeft(0)(_ + _._2)) 479 | }.toSeq 480 | .sortBy(_._2) 481 | .reverse 482 | 483 | .LP this is both succinct and correct, but nearly every reader will have a difficult time recovering the original intent of the author. A strategy that often serves to clarify is to name intermediate results and parameters: 484 | 485 | val votesByLang = votes groupBy { case (lang, _) => lang } 486 | val sumByLang = votesByLang map { case (lang, counts) => 487 | val countsOnly = counts map { case (_, count) => count } 488 | (lang, countsOnly.sum) 489 | } 490 | val orderedVotes = sumByLang.toSeq 491 | .sortBy { case (_, count) => count } 492 | .reverse 493 | 494 | .LP the code is nearly as succinct, but much more clearly expresses both the transformations take place (by naming intermediate values), and the structure of the data being operated on (by naming parameters). If you worry about namespace pollution with this style, group expressions with {}: 495 | 496 | val orderedVotes = { 497 | val votesByLang = ... 498 | ... 499 | } 500 | 501 | 502 | ### Performance 503 | 504 | High level collections libraries (as with higher level constructs 505 | generally) make reasoning about performance more difficult: the 506 | further you stray from instructing the computer directly -- in other 507 | words, imperative style -- the harder it is to predict the exact 508 | performance implications of a piece of code. Reasoning about 509 | correctness however, is typically easier; readability is also 510 | enhanced. With Scala the picture is further complicated by the Java 511 | runtime; Scala hides boxing/unboxing operations from you, which can 512 | incur severe performance or space penalties. 513 | 514 | Before focusing on low level details, make sure you are using a 515 | collection appropriate for your use. Make sure your datastructure 516 | doesn't have unexpected asymptotic complexity. The complexities of the 517 | various Scala collections are described 518 | [here](https://www.scala-lang.org/docu/files/collections-api/collections_40.html). 519 | 520 | The first rule of optimizing for performance is to understand *why* 521 | your application is slow. Do not operate without data; 522 | profile^[[Yourkit](https://www.yourkit.com/) is a good profiler] your 523 | application before proceeding. Focus first on hot loops and large data 524 | structures. Excessive focus on optimization is typically wasted 525 | effort. Remember Knuth's maxim: "Premature optimisation is the root of 526 | all evil." 527 | 528 | It is often appropriate to use lower level collections in situations 529 | that require better performance or space efficiency. Use arrays 530 | instead of lists for large sequences (the immutable `Vector` 531 | collections provides a referentially transparent interface to arrays); 532 | and use buffers instead of direct sequence construction when 533 | performance matters. 534 | 535 | ### Java Collections 536 | 537 | Use `scala.collection.JavaConverters` to interoperate with Java collections. 538 | These are a set of implicits that add `asJava` and `asScala` conversion 539 | methods. The use of these ensures that such conversions are explicit, aiding 540 | the reader: 541 | 542 | import scala.collection.JavaConverters._ 543 | 544 | val list: java.util.List[Int] = Seq(1,2,3,4).asJava 545 | val buffer: scala.collection.mutable.Buffer[Int] = list.asScala 546 | 547 | ## Concurrency 548 | 549 | Modern services are highly concurrent -- it is common for servers to 550 | coordinate 10s-100s of thousands of simultaneous operations -- and 551 | handling the implied complexity is a central theme in authoring robust 552 | systems software. 553 | 554 | *Threads* provide a means of expressing concurrency: they give you 555 | independent, heap-sharing execution contexts that are scheduled by the 556 | operating system. However, thread creation is expensive in Java and is 557 | a resource that must be managed, typically with the use of pools. This 558 | creates additional complexity for the programmer, and also a high 559 | degree of coupling: it's difficult to divorce application logic from 560 | their use of the underlying resources. 561 | 562 | This complexity is especially apparent when creating services that 563 | have a high degree of fan-out: each incoming request results in a 564 | multitude of requests to yet another tier of systems. In these 565 | systems, thread pools must be managed so that they are balanced 566 | according to the ratios of requests in each tier: mismanagement of one 567 | thread pool bleeds into another. 568 | 569 | Robust systems must also consider timeouts and cancellation, both of 570 | which require the introduction of yet more "control" threads, 571 | complicating the problem further. Note that if threads were cheap 572 | these problems would be diminished: no pooling would be required, 573 | timed out threads could be discarded, and no additional resource 574 | management would be required. 575 | 576 | Thus resource management compromises modularity. 577 | 578 | ### Futures 579 | 580 | Use Futures to manage concurrency. They decouple 581 | concurrent operations from resource management: for example, [Finagle][Finagle] 582 | multiplexes concurrent operations onto few threads in an efficient 583 | manner. Scala has lightweight closure literal syntax, so Futures 584 | introduce little syntactic overhead, and they become second nature to 585 | most programmers. 586 | 587 | Futures allow the programmer to express concurrent computation in a 588 | declarative style, are composable, and have principled handling of 589 | failure. These qualities have convinced us that they are especially 590 | well suited for use in functional programming languages, where this is 591 | the encouraged style. 592 | 593 | *Prefer transforming futures over creating your own.* Future 594 | transformations ensure that failures are propagated, that 595 | cancellations are signalled, and free the programmer from thinking 596 | about the implications of the Java memory model. Even a careful 597 | programmer might write the following to issue an RPC 10 times in 598 | sequence and then print the results: 599 | 600 | val p = new Promise[List[Result]] 601 | var results: List[Result] = Nil 602 | def collect() { 603 | doRpc() onSuccess { result => 604 | results = result :: results 605 | if (results.length < 10) 606 | collect() 607 | else 608 | p.setValue(results) 609 | } onFailure { t => 610 | p.setException(t) 611 | } 612 | } 613 | 614 | collect() 615 | p onSuccess { results => 616 | printf("Got results %s\n", results.mkString(", ")) 617 | } 618 | 619 | The programmer had to ensure that RPC failures are propagated, 620 | interspersing the code with control flow; worse, the code is wrong! 621 | Without declaring `results` volatile, we cannot ensure that `results` 622 | holds the previous value in each iteration. The Java memory model is a 623 | subtle beast, but luckily we can avoid all of these pitfalls by using 624 | the declarative style: 625 | 626 | def collect(results: List[Result] = Nil): Future[List[Result]] = 627 | doRpc() flatMap { result => 628 | if (results.length < 9) 629 | collect(result :: results) 630 | else 631 | Future.value(result :: results) 632 | } 633 | 634 | collect() onSuccess { results => 635 | printf("Got results %s\n", results.mkString(", ")) 636 | } 637 | 638 | We use `flatMap` to sequence operations and prepend the result onto 639 | the list as we proceed. This is a common functional programming idiom 640 | translated to Futures. This is correct, requires less boilerplate, is 641 | less error prone, and also reads better. 642 | 643 | *Use the Future combinators*. `Future.select`, `Future.join`, and 644 | `Future.collect` codify common patterns when operating over 645 | multiple futures that should be combined. 646 | 647 | *Do not throw your own exceptions in methods that return Futures.* 648 | Futures represent both successful and failed computations. Therefore, it's 649 | important that errors involved in that computation are properly encapsulated in 650 | the returned Future. Concretely, return Future.exception instead of 651 | throwing that exception: 652 | 653 | def divide(x: Int, y: Int): Future[Result] = { 654 | if (y == 0) 655 | return Future.exception(new IllegalArgumentException("Divisor is 0")) 656 | 657 | Future.value(x/y) 658 | } 659 | 660 | Fatal exceptions should not be represented by Futures. These exceptions 661 | include ones that are thrown when resources are exhausted, like 662 | OutOfMemoryError, and also JVM-level errors like NoSuchMethodError. These 663 | conditions are ones under which the JVM must exit. 664 | 665 | The predicate scala.util.control.NonFatal should be used to 666 | identify exceptions which should be returned as a Future.exception. 667 | 668 | ### Collections 669 | 670 | The subject of concurrent collections is fraught with opinions, 671 | subtleties, dogma and FUD. In most practical situations they are a 672 | nonissue: Always start with the simplest, most boring, and most 673 | standard collection that serves the purpose. Don't reach for a 674 | concurrent collection before you *know* that a synchronized one won't 675 | do: the JVM has sophisticated machinery to make synchronization cheap, 676 | so their efficacy may surprise you. 677 | 678 | If an immutable collection will do, use it -- they are referentially 679 | transparent, so reasoning about them in a concurrent context is 680 | simple. Mutations in immutable collections are typically handled by 681 | updating a reference to the current value (in a `var` cell or an 682 | `AtomicReference`). Care must be taken to apply these correctly: 683 | atomics must be retried, and `vars` must be declared volatile in order 684 | for them to be published to other threads. 685 | 686 | Mutable concurrent collections have complicated semantics, and make 687 | use of subtler aspects of the Java memory model, so make sure you 688 | understand the implications -- especially with respect to publishing 689 | updates -- before you use them. Synchronized collections also compose 690 | better: operations like `getOrElseUpdate` cannot be implemented 691 | correctly by concurrent collections, and creating composite 692 | collections is especially error prone. 693 | 694 | 705 | 706 | 707 | ## Control structures 708 | 709 | Programs in the functional style tend to require fewer traditional 710 | control structures, and read better when written in the declarative 711 | style. This typically implies breaking your logic up into several 712 | small methods or functions, and gluing them together with `match` 713 | expressions. Functional programs also tend to be more 714 | expression-oriented: branches of conditionals compute values of 715 | the same type, `for (..) yield` computes comprehensions, and recursion 716 | is commonplace. 717 | 718 | ### Recursion 719 | 720 | *Phrasing your problem in recursive terms often simplifies it,* and if 721 | the tail call optimization applies (which can be checked by the `@tailrec` 722 | annotation), the compiler will even translate your code into a regular loop. 723 | 724 | Consider a fairly standard imperative version of heap fix-down: 726 | 727 | def fixDown(heap: Array[T], m: Int, n: Int): Unit = { 728 | var k: Int = m 729 | while (n >= 2*k) { 730 | var j = 2*k 731 | if (j < n && heap(j) < heap(j + 1)) 732 | j += 1 733 | if (heap(k) >= heap(j)) 734 | return 735 | else { 736 | swap(heap, k, j) 737 | k = j 738 | } 739 | } 740 | } 741 | 742 | Every time the while loop is entered, we're working with state dirtied 743 | by the previous iteration. The value of each variable is a function of 744 | which branches were taken, and it returns in the middle of the loop 745 | when the correct position was found (The keen reader will find similar 746 | arguments in Dijkstra's ["Go To Statement Considered Harmful"](http://www.u.arizona.edu/~rubinson/copyright_violations/Go_To_Considered_Harmful.html)). 747 | 748 | Consider a (tail) recursive 749 | implementation^[From [Finagle's heap 750 | balancer](https://github.com/twitter/finagle/blob/master/finagle-core/src/main/scala/com/twitter/finagle/loadbalancer/Heap.scala#L41)]: 751 | 752 | @tailrec 753 | final def fixDown(heap: Array[T], i: Int, j: Int) { 754 | if (j < i*2) return 755 | 756 | val m = if (j == i*2 || heap(2*i) < heap(2*i+1)) 2*i else 2*i + 1 757 | if (heap(m) < heap(i)) { 758 | swap(heap, i, m) 759 | fixDown(heap, m, j) 760 | } 761 | } 762 | 763 | .LP here every iteration starts with a well-defined clean slate, and there are no reference cells: invariants abound. It’s much easier to reason about, and easier to read as well. There is also no performance penalty: since the method is tail-recursive, the compiler translates this into a standard imperative loop. 764 | 765 | 768 | 769 | 770 | ### Returns 771 | 772 | This is not to say that imperative structures are not also valuable. 773 | In many cases they are well suited to terminate computation early 774 | instead of having conditional branches for every possible point of 775 | termination: indeed in the above `fixDown`, a `return` is used to 776 | terminate early if we're at the end of the heap. 777 | 778 | Returns can be used to cut down on branching and establish invariants. 779 | This helps the reader by reducing nesting (how did I get here?) and 780 | making it easier to reason about the correctness of subsequent code 781 | (the array cannot be accessed out of bounds after this point). This is 782 | especially useful in "guard" clauses: 783 | 784 | def compare(a: AnyRef, b: AnyRef): Int = { 785 | if (a eq b) 786 | return 0 787 | 788 | val d = System.identityHashCode(a) compare System.identityHashCode(b) 789 | if (d != 0) 790 | return d 791 | 792 | // slow path.. 793 | } 794 | 795 | Use `return`s to clarify and enhance readability, but not as you would 796 | in an imperative language; avoid using them to return the results of a 797 | computation. Instead of 798 | 799 | def suffix(i: Int) = { 800 | if (i == 1) return "st" 801 | else if (i == 2) return "nd" 802 | else if (i == 3) return "rd" 803 | else return "th" 804 | } 805 | 806 | .LP prefer: 807 | 808 | def suffix(i: Int) = 809 | if (i == 1) "st" 810 | else if (i == 2) "nd" 811 | else if (i == 3) "rd" 812 | else "th" 813 | 814 | .LP but using a match expression is superior to either: 815 | 816 | def suffix(i: Int) = i match { 817 | case 1 => "st" 818 | case 2 => "nd" 819 | case 3 => "rd" 820 | case _ => "th" 821 | } 822 | 823 | Note that returns can have hidden costs: when used inside of a closure, 824 | 825 | seq foreach { elem => 826 | if (elem.isLast) 827 | return 828 | 829 | // process... 830 | } 831 | 832 | .LP this is implemented in bytecode as an exception catching/throwing pair which, used in hot code, has performance implications. 833 | 834 | ### `for` loops and comprehensions 835 | 836 | `for` provides both succinct and natural expression for looping and 837 | aggregation. It is especially useful when flattening many sequences. 838 | The syntax of `for` belies the underlying mechanism as it allocates 839 | and dispatches closures. This can lead to both unexpected costs and 840 | semantics; for example 841 | 842 | for (item <- container) { 843 | if (item != 2) return 844 | } 845 | 846 | .LP may cause a runtime error if the container delays computation, making the return nonlocal! 847 | 848 | For these reasons, it is often preferrable to call `foreach`, 849 | `flatMap`, `map`, and `filter` directly -- but do use `for`s when they 850 | clarify. 851 | 852 | ### `require` and `assert` 853 | 854 | `require` and `assert` both serve as executable documentation. Both are 855 | useful for situations in which the type system cannot express the required 856 | invariants. `assert` is used for *invariants* that the code assumes (either 857 | internal or external), for example 858 | 859 | val stream = getClass.getResourceAsStream("someclassdata") 860 | assert(stream != null) 861 | 862 | Whereas `require` is used to express API contracts: 863 | 864 | def fib(n: Int) = { 865 | require(n > 0) 866 | ... 867 | } 868 | 869 | ## Functional programming 870 | 871 | *Value oriented* programming confers many advantages, especially when 872 | used in conjunction with functional programming constructs. This style 873 | emphasizes the transformation of values over stateful mutation, 874 | yielding code that is referentially transparent, providing stronger 875 | invariants and thus also easier to reason about. Case classes, pattern 876 | matching, destructuring bindings, type inference, and lightweight 877 | closure- and method-creation syntax are the tools of this trade. 878 | 879 | ### Case classes as algebraic data types 880 | 881 | Case classes encode ADTs: they are useful for modelling a large number 882 | of data structures and provide for succinct code with strong 883 | invariants, especially when used in conjunction with pattern matching. 884 | The pattern matcher implements exhaustivity analysis providing even 885 | stronger static guarantees. 886 | 887 | Use the following pattern when encoding ADTs with case classes: 888 | 889 | sealed trait Tree[T] 890 | case class Node[T](left: Tree[T], right: Tree[T]) extends Tree[T] 891 | case class Leaf[T](value: T) extends Tree[T] 892 | 893 | .LP The type Tree[T] has two constructors: Node and Leaf. Declaring the type sealed allows the compiler to do exhaustivity analysis since constructors cannot be added outside the source file. 894 | 895 | Together with pattern matching, such modelling results in code that is 896 | both succinct and "obviously correct": 897 | 898 | def findMin[T <: Ordered[T]](tree: Tree[T]) = tree match { 899 | case Node(left, right) => Seq(findMin(left), findMin(right)).min 900 | case Leaf(value) => value 901 | } 902 | 903 | While recursive structures like trees constitute classic applications of 904 | ADTs, their domain of usefulness is much larger. Disjoint unions in particular are 905 | readily modelled with ADTs; these occur frequently in state machines. 906 | 907 | ### Options 908 | 909 | The `Option` type is a container that is either empty (`None`) or full 910 | (`Some(value)`). It provides a safe alternative to the use of `null`, 911 | and should be used instead of null whenever possible. Options are 912 | collections (of at most one item) and they are embellished with 913 | collection operations -- use them! 914 | 915 | Write 916 | 917 | var username: Option[String] = None 918 | ... 919 | username = Some("foobar") 920 | 921 | .LP instead of 922 | 923 | var username: String = null 924 | ... 925 | username = "foobar" 926 | 927 | .LP since the former is safer: the Option type statically enforces that username must be checked for emptyness. 928 | 929 | Conditional execution on an `Option` value should be done with 930 | `foreach`; instead of 931 | 932 | if (opt.isDefined) 933 | operate(opt.get) 934 | 935 | .LP write 936 | 937 | opt foreach { value => 938 | operate(value) 939 | } 940 | 941 | The style may seem odd, but provides greater safety (we don't call the 942 | exceptional `get`) and brevity. If both branches are taken, use 943 | pattern matching: 944 | 945 | opt match { 946 | case Some(value) => operate(value) 947 | case None => defaultAction() 948 | } 949 | 950 | .LP but if all that's missing is a default value, use getOrElse 951 | 952 | operate(opt getOrElse defaultValue) 953 | 954 | Do not overuse `Option`: if there is a sensible 955 | default -- a [*Null Object*](https://en.wikipedia.org/wiki/Null_Object_pattern) -- use that instead. 956 | 957 | `Option` also comes with a handy constructor for wrapping nullable values: 958 | 959 | Option(getClass.getResourceAsStream("foo")) 960 | 961 | .LP is an Option[InputStream] that assumes a value of None should getResourceAsStream return null. 962 | 963 | ### Pattern matching 964 | 965 | Pattern matches (`x match { ...`) are pervasive in well written Scala 966 | code: they conflate conditional execution, destructuring, and casting 967 | into one construct. Used well they enhance both clarity and safety. 968 | 969 | Use pattern matching to implement type switches: 970 | 971 | obj match { 972 | case str: String => ... 973 | case addr: SocketAddress => ... 974 | 975 | Pattern matching works best when also combined with destructuring (for 976 | example if you are matching case classes); instead of 977 | 978 | animal match { 979 | case dog: Dog => "dog (%s)".format(dog.breed) 980 | case _ => animal.species 981 | } 982 | 983 | .LP write 984 | 985 | animal match { 986 | case Dog(breed) => "dog (%s)".format(breed) 987 | case other => other.species 988 | } 989 | 990 | Write [custom extractors](https://docs.scala-lang.org/tour/extractor-objects.html) but only with 991 | a dual constructor (`apply`), otherwise their use may be out of place. 992 | 993 | Don't use pattern matching for conditional execution when defaults 994 | make more sense. The collections libraries usually provide methods 995 | that return `Option`s; avoid 996 | 997 | val x = list match { 998 | case head :: _ => head 999 | case Nil => default 1000 | } 1001 | 1002 | .LP because 1003 | 1004 | val x = list.headOption getOrElse default 1005 | 1006 | .LP is both shorter and communicates purpose. 1007 | 1008 | ### Partial functions 1009 | 1010 | Scala provides syntactical shorthand for defining a `PartialFunction`: 1011 | 1012 | val pf: PartialFunction[Int, String] = { 1013 | case i if i%2 == 0 => "even" 1014 | } 1015 | 1016 | .LP and they may be composed with orElse 1017 | 1018 | val tf: (Int => String) = pf orElse { case _ => "odd"} 1019 | 1020 | tf(1) == "odd" 1021 | tf(2) == "even" 1022 | 1023 | Partial functions arise in many situations and are effectively 1024 | encoded with `PartialFunction`, for example as arguments to 1025 | methods 1026 | 1027 | trait Publisher[T] { 1028 | def subscribe(f: PartialFunction[T, Unit]) 1029 | } 1030 | 1031 | val publisher: Publisher[Int] = ... 1032 | publisher.subscribe { 1033 | case i if isPrime(i) => println("found prime", i) 1034 | case i if i%2 == 0 => count += 2 1035 | /* ignore the rest */ 1036 | } 1037 | 1038 | .LP or in situations that might otherwise call for returning an Option: 1039 | 1040 | // Attempt to classify the the throwable for logging. 1041 | type Classifier = Throwable => Option[java.util.logging.Level] 1042 | 1043 | .LP might be better expressed with a PartialFunction 1044 | 1045 | type Classifier = PartialFunction[Throwable, java.util.Logging.Level] 1046 | 1047 | .LP as it affords greater composability: 1048 | 1049 | val classifier1: Classifier 1050 | val classifier2: Classifier 1051 | 1052 | val classifier: Classifier = classifier1 orElse classifier2 orElse { case _ => java.util.Logging.Level.FINEST } 1053 | 1054 | 1055 | ### Destructuring bindings 1056 | 1057 | Destructuring value bindings are related to pattern matching; they use the same 1058 | mechanism but are applicable when there is exactly one option (lest you accept 1059 | the possibility of an exception). Destructuring binds are particularly useful for 1060 | tuples and case classes. 1061 | 1062 | val tuple = ('a', 1) 1063 | val (char, digit) = tuple 1064 | 1065 | val tweet = Tweet("just tweeting", Time.now) 1066 | val Tweet(text, timestamp) = tweet 1067 | 1068 | ### Laziness 1069 | 1070 | Fields in scala are computed *by need* when `val` is prefixed with 1071 | `lazy`. Because fields and methods are equivalent in Scala (lest the fields 1072 | are `private[this]`) 1073 | 1074 | lazy val field = computation() 1075 | 1076 | .LP is (roughly) short-hand for 1077 | 1078 | var _theField = None 1079 | def field = if (_theField.isDefined) _theField.get else { 1080 | _theField = Some(computation()) 1081 | _theField.get 1082 | } 1083 | 1084 | .LP i.e., it computes a results and memoizes it. Use lazy fields for this purpose, but avoid using laziness when laziness is required by semantics. In these cases it's better to be explicit since it makes the cost model explicit, and side effects can be controlled more precisely. 1085 | 1086 | Lazy fields are thread safe. 1087 | 1088 | ### Call by name 1089 | 1090 | Method parameters may be specified by-name, meaning the parameter is 1091 | bound not to a value but to a *computation* that may be repeated. This 1092 | feature must be applied with care; a caller expecting by-value 1093 | semantics will be surprised. The motivation for this feature is to 1094 | construct syntactically natural DSLs -- new control constructs in 1095 | particular can be made to look much like native language features. 1096 | 1097 | Only use call-by-name for such control constructs, where it is obvious 1098 | to the caller that what is being passed in is a "block" rather than 1099 | the result of an unsuspecting computation. Only use call-by-name arguments 1100 | in the last position of the last argument list. When using call-by-name, 1101 | ensure that the method is named so that it is obvious to the caller that 1102 | its argument is call-by-name. 1103 | 1104 | When you do want a value to be computed multiple times, and especially 1105 | when this computation is side effecting, use explicit functions: 1106 | 1107 | class SSLConnector(mkEngine: () => SSLEngine) 1108 | 1109 | .LP The intent remains obvious and the caller is left without surprises. 1110 | 1111 | ### `flatMap` 1112 | 1113 | `flatMap` -- the combination of `map` with `flatten` -- deserves special 1114 | attention, for it has subtle power and great utility. Like its brethren `map`, it is frequently 1115 | available in nontraditional collections such as `Future` and `Option`. Its behavior 1116 | is revealed by its signature; for some `Container[A]` 1117 | 1118 | flatMap[B](f: A => Container[B]): Container[B] 1119 | 1120 | .LP flatMap invokes the function f for the element(s) of the collection producing a new collection, (all of) which are flattened into its result. For example, to get all permutations of two character strings that aren't the same character repeated twice: 1121 | 1122 | val chars = 'a' to 'z' 1123 | val perms = chars flatMap { a => 1124 | chars flatMap { b => 1125 | if (a != b) Seq("%c%c".format(a, b)) 1126 | else Seq() 1127 | } 1128 | } 1129 | 1130 | .LP which is equivalent to the more concise for-comprehension (which is — roughly — syntactical sugar for the above): 1131 | 1132 | val perms = for { 1133 | a <- chars 1134 | b <- chars 1135 | if a != b 1136 | } yield "%c%c".format(a, b) 1137 | 1138 | `flatMap` is frequently useful when dealing with `Options` -- it will 1139 | collapse chains of options down to one, 1140 | 1141 | val host: Option[String] = ... 1142 | val port: Option[Int] = ... 1143 | 1144 | val addr: Option[InetSocketAddress] = 1145 | host flatMap { h => 1146 | port map { p => 1147 | new InetSocketAddress(h, p) 1148 | } 1149 | } 1150 | 1151 | .LP which is also made more succinct with for 1152 | 1153 | val addr: Option[InetSocketAddress] = for { 1154 | h <- host 1155 | p <- port 1156 | } yield new InetSocketAddress(h, p) 1157 | 1158 | The use of `flatMap` in `Future`s is discussed in the 1159 | futures section. 1160 | 1161 | ## Object oriented programming 1162 | 1163 | Much of Scala's vastness lies in its object system. Scala is a *pure* 1164 | language in the sense that *all values* are objects; there is no 1165 | distinction between primitive types and composite ones. 1166 | Scala also features mixins allowing for more orthogonal and piecemeal 1167 | construction of modules that can be flexibly put together at compile 1168 | time with all the benefits of static type checking. 1169 | 1170 | A motivation behind the mixin system was to obviate the need for 1171 | traditional dependency injection. The culmination of this "component 1172 | style" of programming is [the cake 1173 | pattern](http://jonasboner.com/real-world-scala-dependency-injection-di/). 1174 | 1175 | ### Dependency injection 1176 | 1177 | In our use, however, we've found that Scala itself removes so much of 1178 | the syntactical overhead of "classic" (constructor) dependency 1179 | injection that we'd rather just use that: it is clearer, the 1180 | dependencies are still encoded in the (constructor) type, and class 1181 | construction is so syntactically trivial that it becomes a breeze. 1182 | It's boring and simple and it works. *Use dependency injection for 1183 | program modularization*, and in particular, *prefer composition over 1184 | inheritance* -- for this leads to more modular and testable programs. 1185 | When encountering a situation requiring inheritance, ask yourself: how 1186 | would you structure the program if the language lacked support for 1187 | inheritance? The answer may be compelling. 1188 | 1189 | Dependency injection typically makes use of traits, 1190 | 1191 | trait TweetStream { 1192 | def subscribe(f: Tweet => Unit) 1193 | } 1194 | class HosebirdStream extends TweetStream ... 1195 | class FileStream extends TweetStream ... 1196 | 1197 | class TweetCounter(stream: TweetStream) { 1198 | stream.subscribe { tweet => count += 1 } 1199 | } 1200 | 1201 | It is common to inject *factories* -- objects that produce other 1202 | objects. In these cases, favor the use of simple functions over specialized 1203 | factory types. 1204 | 1205 | class FilteredTweetCounter(mkStream: Filter => TweetStream) { 1206 | mkStream(PublicTweets).subscribe { tweet => publicCount += 1 } 1207 | mkStream(DMs).subscribe { tweet => dmCount += 1 } 1208 | } 1209 | 1210 | ### Traits 1211 | 1212 | Dependency injection does not at all preclude the use of common *interfaces*, or 1213 | the implementation of common code in traits. Quite the contrary -- the use of traits are 1214 | highly encouraged for exactly this reason: multiple interfaces 1215 | (traits) may be implemented by a concrete class, and common code can 1216 | be reused across all such classes. 1217 | 1218 | Keep traits short and orthogonal: don't lump separable functionality 1219 | into a trait, think of the smallest related ideas that fit together. For example, 1220 | imagine you have an something that can do IO: 1221 | 1222 | trait IOer { 1223 | def write(bytes: Array[Byte]) 1224 | def read(n: Int): Array[Byte] 1225 | } 1226 | 1227 | .LP separate the two behaviors: 1228 | 1229 | trait Reader { 1230 | def read(n: Int): Array[Byte] 1231 | } 1232 | trait Writer { 1233 | def write(bytes: Array[Byte]) 1234 | } 1235 | 1236 | .LP and mix them together to form what was an IOer: new Reader with Writer… Interface minimalism leads to greater orthogonality and cleaner modularization. 1237 | 1238 | ### Visibility 1239 | 1240 | Scala has very expressive visibility modifiers. It's important to use 1241 | these as they define what constitutes the *public API*. Public APIs 1242 | should be limited so users don't inadvertently rely on implementation 1243 | details and limit the author's ability to change them: They are crucial 1244 | to good modularity. As a rule, it's much easier to expand public APIs 1245 | than to contract them. Poor annotations can also compromise backwards 1246 | binary compatibility of your code. 1247 | 1248 | #### `private[this]` 1249 | 1250 | A class member marked `private`, 1251 | 1252 | private val x: Int = ... 1253 | 1254 | .LP is visible to all instances of that class (but not their subclasses). In most cases, you want private[this]. 1255 | 1256 | private[this] val x: Int = ... 1257 | 1258 | .LP which limits visibility to the particular instance. The Scala compiler is also able to translate private[this] into a simple field access (since access is limited to the statically defined class) which can sometimes aid performance optimizations. 1259 | 1260 | #### Singleton class types 1261 | 1262 | It's common in Scala to create singleton class types, for example 1263 | 1264 | def foo() = new Foo with Bar with Baz { 1265 | ... 1266 | } 1267 | 1268 | .LP In these situations, visibility can be constrained by declaring the returned type: 1269 | 1270 | def foo(): Foo with Bar = new Foo with Bar with Baz { 1271 | ... 1272 | } 1273 | 1274 | .LP where callers of foo() will see a restricted view (Foo with Bar) of the returned instance. 1275 | 1276 | ### Structural typing 1277 | 1278 | Do not use structural types in normal use. They are a convenient and 1279 | powerful feature, but unfortunately do not have an efficient 1280 | implementation on the JVM. However -- due to an implementation quirk -- 1281 | they provide a very nice shorthand for doing reflection. 1282 | 1283 | val obj: AnyRef 1284 | obj.asInstanceOf[{def close()}].close() 1285 | 1286 | ## Error handling 1287 | 1288 | Scala provides an exception facility, but do not use it for 1289 | commonplace errors, when the programmer must handle errors properly 1290 | for correctness. Instead, encode such errors explicitly: using 1291 | `Option` or `com.twitter.util.Try` are good, idiomatic choices, as 1292 | they harness the type system to ensure that the user is properly 1293 | considering error handling. 1294 | 1295 | For example, when designing a repository, the following API may 1296 | be tempting: 1297 | 1298 | trait Repository[Key, Value] { 1299 | def get(key: Key): Value 1300 | } 1301 | 1302 | .LP but this would require the implementor to throw an exception when the key is absent. A better approach is to use an Option: 1303 | 1304 | trait Repository[Key, Value] { 1305 | def get(key: Key): Option[Value] 1306 | } 1307 | 1308 | .LP This interface makes it obvious that the repository may not contain every key, and that the programmer must handle missing keys. Furthermore, Option has a number of combinators to handle these cases. For example, getOrElse is used to supply a default value for missing keys: 1309 | 1310 | val repo: Repository[Int, String] 1311 | repo.get(123) getOrElse "defaultString" 1312 | 1313 | ### Handling exceptions 1314 | 1315 | Because Scala's exception mechanism isn't *checked* -- the compiler 1316 | cannot statically tell whether the programmer has covered the set of 1317 | possible exceptions -- it is often tempting to cast a wide net when 1318 | handling exceptions. 1319 | 1320 | However, some exceptions are *fatal* and should never be caught; the 1321 | code 1322 | 1323 | try { 1324 | operation() 1325 | } catch { 1326 | case _ => ... 1327 | } 1328 | 1329 | .LP is almost always wrong, as it would catch fatal errors that need to be propagated. Instead, use the scala.util.control.NonFatal extractor to handle only nonfatal exceptions. 1330 | 1331 | try { 1332 | operation() 1333 | } catch { 1334 | case NonFatal(exc) => ... 1335 | } 1336 | 1337 | ## Garbage collection 1338 | 1339 | We spend a lot of time tuning garbage collection in production. The 1340 | garbage collection concerns are largely similar to those of Java 1341 | though idiomatic Scala code tends to generate more (short-lived) 1342 | garbage than idiomatic Java code -- a byproduct of the functional 1343 | style. Hotspot's generational garbage collection typically makes this 1344 | a nonissue as short-lived garbage is effectively free in most circumstances. 1345 | 1346 | Before tackling GC performance issues, watch 1347 | [this](https://www.infoq.com/presentations/JVM-Performance-Tuning-twitter) 1348 | presentation by Attila that illustrates some of our experiences with 1349 | GC tuning. 1350 | 1351 | In Scala proper, your only tool to mitigate GC problems is to generate 1352 | less garbage; but do not act without data! Unless you are doing 1353 | something obviously degenerate, use the various Java profiling tools 1354 | -- our own include 1355 | [heapster](https://github.com/mariusaeriksen/heapster) and 1356 | [gcprof](https://github.com/twitter/jvmgcprof). 1357 | 1358 | ## Java compatibility 1359 | 1360 | When we write code in Scala that is used from Java, we ensure 1361 | that usage from Java remains idiomatic. Oftentimes this requires 1362 | no extra effort -- classes and pure traits are exactly equivalent 1363 | to their Java counterpart -- but sometimes separate Java APIs 1364 | need to be provided. A good way to get a feel for your library's Java 1365 | API is to write a unittest in Java (just for compilation); this also ensures 1366 | that the Java-view of your library remains stable over time as the Scala 1367 | compiler can be volatile in this regard. 1368 | 1369 | Traits that contain implementation are not directly 1370 | usable from Java: extend an abstract class with the trait 1371 | instead. 1372 | 1373 | // Not directly usable from Java 1374 | trait Animal { 1375 | def eat(other: Animal) 1376 | def eatMany(animals: Seq[Animal) = animals foreach(eat(_)) 1377 | } 1378 | 1379 | // But this is: 1380 | abstract class JavaAnimal extends Animal 1381 | 1382 | ## Twitter's standard libraries 1383 | 1384 | The most important standard libraries at Twitter are 1385 | [Util](https://github.com/twitter/util) and 1386 | [Finagle](https://github.com/twitter/finagle). Util should be 1387 | considered an extension to the Scala and Java standard libraries, 1388 | providing missing functionality or more appropriate implementations. Finagle 1389 | is our RPC system; the kernel distributed systems components. 1390 | 1391 | ### Futures 1392 | 1393 | Futures have been discussed 1394 | briefly in the concurrency section. They 1395 | are the central mechanism for coordination asynchronous 1396 | processes and are pervasive in our codebase and core to Finagle. 1397 | Futures allow for the composition of concurrent events, and simplify 1398 | reasoning about highly concurrent operations. They also lend themselves 1399 | to a highly efficient implementation on the JVM. 1400 | 1401 | Twitter's futures are *asynchronous*, so blocking operations -- 1402 | basically any operation that can suspend the execution of its thread; 1403 | network IO and disk IO are examples -- must be handled by a system 1404 | that itself provides futures for the results of said operations. 1405 | Finagle provides such a system for network IO. 1406 | 1407 | Futures are plain and simple: they hold the *promise* for the result 1408 | of a computation that is not yet complete. They are a simple container 1409 | -- a placeholder. A computation could fail of course, and this must 1410 | also be encoded: a Future can be in exactly one of 3 states: *pending*, 1411 | *failed* or *completed*. 1412 | 1413 |

1414 |

Aside: Composition

1415 |

Let's revisit what we mean by composition: combining simpler components 1416 | into more complicated ones. The canonical example of this is function 1417 | composition: Given functions f and 1418 | g, the composite function (g∘f)(x) = g(f(x)) — the result 1419 | of applying f to x first, and then applying g to the result 1420 | of that — can be written in Scala:

1421 | 1422 |

val f = (i: Int) => i.toString
1423 | val g = (s: String) => s+s+s
1424 | val h = g compose f  // : Int => String
1425 | 	
1426 | scala> h(123)
1427 | res0: java.lang.String = 123123123

1428 | 1429 | .LP the function h being the composite. It is a new function that combines both f and g in a predefined way. 1430 |

1431 | 1432 | Futures are a type of collection -- they are a container of 1433 | either 0 or 1 elements -- and you'll find they have standard 1434 | collection methods (eg. `map`, `filter`, and `foreach`). Since a Future's 1435 | value is deferred, the result of applying any of these methods 1436 | is necessarily also deferred; in 1437 | 1438 | val result: Future[Int] 1439 | val resultStr: Future[String] = result map { i => i.toString } 1440 | 1441 | .LP the function { i => i.toString } is not invoked until the integer value becomes available, and the transformed collection resultStr is also in pending state until that time. 1442 | 1443 | Lists can be flattened; 1444 | 1445 | val listOfList: List[List[Int]] = ... 1446 | val list: List[Int] = listOfList.flatten 1447 | 1448 | .LP and this makes sense for futures, too: 1449 | 1450 | val futureOfFuture: Future[Future[Int]] = ... 1451 | val future: Future[Int] = futureOfFuture.flatten 1452 | 1453 | .LP since futures are deferred, the implementation of flatten — it returns immediately — has to return a future that is the result of waiting for the completion of the outer future (Future[Future[Int]]) and after that the inner one (Future[Future[Int]]). If the outer future fails, the flattened future must also fail. 1454 | 1455 | Futures (like Lists) also define `flatMap`; `Future[A]` defines its signature as 1456 | 1457 | flatMap[B](f: A => Future[B]): Future[B] 1458 | 1459 | .LP which is like the combination of both map and flatten, and we could implement it that way: 1460 | 1461 | def flatMap[B](f: A => Future[B]): Future[B] = { 1462 | val mapped: Future[Future[B]] = this map f 1463 | val flattened: Future[B] = mapped.flatten 1464 | flattened 1465 | } 1466 | 1467 | This is a powerful combination! With `flatMap` we can define a Future that 1468 | is the result of two futures sequenced, the second future computed based 1469 | on the result of the first one. Imagine we needed to do two RPCs in order 1470 | to authenticate a user (id), we could define the composite operation in the 1471 | following way: 1472 | 1473 | def getUser(id: Int): Future[User] 1474 | def authenticate(user: User): Future[Boolean] 1475 | 1476 | def isIdAuthed(id: Int): Future[Boolean] = 1477 | getUser(id) flatMap { user => authenticate(user) } 1478 | 1479 | .LP an additional benefit to this type of composition is that error handling is built-in: the future returned from isAuthed(..) will fail if either of getUser(..) or authenticate(..) does with no extra error handling code. 1480 | 1481 | #### Style 1482 | 1483 | Future callback methods (`respond`, `onSuccess`, `onFailure`, `ensure`) 1484 | return a new future that is *chained* to its parent. This future is guaranteed 1485 | to be completed only after its parent, enabling patterns like 1486 | 1487 | acquireResource() onSuccess { value => 1488 | computeSomething(value) 1489 | } ensure { 1490 | freeResource() 1491 | } 1492 | 1493 | .LP where freeResource() is guaranteed to be executed only after computeSomething, allowing for emulation of the native try .. finally pattern. 1494 | 1495 | Use `onSuccess` instead of `foreach` -- it is symmetrical to `onFailure` and 1496 | is a better name for the purpose, and also allows for chaining. 1497 | 1498 | Always try to avoid creating `Promise` instances directly: nearly every task 1499 | can be accomplished via the use of predefined combinators. These 1500 | combinators ensure errors and cancellations are propagated, and generally 1501 | encourage *dataflow style* programming which usually obviates the need for synchronization and 1503 | volatility declarations. 1504 | 1505 | Code written in tail-recursive style is not subject to stack-space leaks, 1506 | allowing for efficient implementation of loops in dataflow-style: 1507 | 1508 | case class Node(parent: Option[Node], ...) 1509 | def getNode(id: Int): Future[Node] = ... 1510 | 1511 | def getHierarchy(id: Int, nodes: List[Node] = Nil): Future[Node] = 1512 | getNode(id) flatMap { 1513 | case n@Node(Some(parent), ..) => getHierarchy(parent, n :: nodes) 1514 | case n => Future.value((n :: nodes).reverse) 1515 | } 1516 | 1517 | `Future` defines many useful methods: Use `Future.value()` and 1518 | `Future.exception()` to create pre-satisfied futures. 1519 | `Future.collect()`, `Future.join()` and `Future.select()` provide 1520 | combinators that turn many futures into one (ie. the gather part of a 1521 | scatter-gather operation). 1522 | 1523 | #### Cancellation 1524 | 1525 | Futures implement a weak form of cancellation. Invoking `Future#cancel` 1526 | does not directly terminate the computation but instead propagates a 1527 | level triggered *signal* that may be queried by whichever process 1528 | ultimately satisfies the future. Cancellation flows in the opposite 1529 | direction from values: a cancellation signal set by a consumer is 1530 | propagated to its producer. The producer uses `onCancellation` on 1531 | `Promise` to listen to this signal and act accordingly. 1532 | 1533 | This means that the cancellation semantics depend on the producer, 1534 | and there is no default implementation. *Cancellation is but a hint*. 1535 | 1536 | #### Locals 1537 | 1538 | Util's 1539 | [`Local`](https://github.com/twitter/util/blob/master/util-core/src/main/scala/com/twitter/util/Local.scala#L40) 1540 | provides a reference cell that is local to a particular future dispatch tree. Setting the value of a local makes this 1541 | value available to any computation deferred by a Future in the same thread. They are analogous to thread locals, 1542 | except their scope is not a Java thread but a tree of "future threads". In 1543 | 1544 | trait User { 1545 | def name: String 1546 | def incrCost(points: Int) 1547 | } 1548 | val user = new Local[User] 1549 | 1550 | ... 1551 | 1552 | user() = currentUser 1553 | rpc() ensure { 1554 | user().incrCost(10) 1555 | } 1556 | 1557 | .LP user() in the ensure block will refer to the value of the user local at the time the callback was added. 1558 | 1559 | As with thread locals, `Local`s can be very convenient, but should 1560 | almost always be avoided: make sure the problem cannot be sufficiently 1561 | solved by passing data around explicitly, even if it is somewhat 1562 | burdensome. 1563 | 1564 | Locals are used effectively by core libraries for *very* common 1565 | concerns -- threading through RPC traces, propagating monitors, 1566 | creating "stack traces" for future callbacks -- where any other solution 1567 | would unduly burden the user. Locals are inappropriate in almost any 1568 | other situation. 1569 | 1570 | ### Offer/Broker 1571 | 1572 | Concurrent systems are greatly complicated by the need to coordinate 1573 | access to shared data and resources. 1574 | [Actors](https://doc.akka.io/api/akka/current/index.html#akka.actor.Actor) 1575 | present one strategy of simplification: each actor is a sequential process 1576 | that maintains its own state and resources, and data is shared by 1577 | messaging with other actors. Sharing data requires communicating between 1578 | actors. 1579 | 1580 | Offer/Broker builds on this in three important ways. First, 1581 | communication channels (Brokers) are first class -- that is, you send 1582 | messages via Brokers, not to an actor directly. Secondly, Offer/Broker 1583 | is a synchronous mechanism: to communicate is to synchronize. This 1584 | means we can use Brokers as a coordination mechanism: when process `a` 1585 | has sent a message to process `b`; both `a` and `b` agree on the state 1586 | of the system. Lastly, communication can be performed *selectively*: a 1587 | process can propose several different communications, and exactly one 1588 | of them will obtain. 1589 | 1590 | In order to support selective communication (as well as other 1591 | composition) in a general way, we need to decouple the description of 1592 | a communication from the act of communicating. This is what an `Offer` 1593 | does -- it is a persistent value that describes a communication; in 1594 | order to perform that communication (act on the offer), we synchronize 1595 | via its `sync()` method 1596 | 1597 | trait Offer[T] { 1598 | def sync(): Future[T] 1599 | } 1600 | 1601 | .LP which returns a Future[T] that yields the exchanged value when the communication obtains. 1602 | 1603 | A `Broker` coordinates the exchange of values through offers -- it is the channel of communications: 1604 | 1605 | trait Broker[T] { 1606 | def send(msg: T): Offer[Unit] 1607 | val recv: Offer[T] 1608 | } 1609 | 1610 | .LP so that, when creating two offers 1611 | 1612 | val b: Broker[Int] 1613 | val sendOf = b.send(1) 1614 | val recvOf = b.recv 1615 | 1616 | .LP and sendOf and recvOf are both synchronized 1617 | 1618 | // In process 1: 1619 | sendOf.sync() 1620 | 1621 | // In process 2: 1622 | recvOf.sync() 1623 | 1624 | .LP both offers obtain and the value 1 is exchanged. 1625 | 1626 | Selective communication is performed by combining several offers with 1627 | `Offer.choose` 1628 | 1629 | def choose[T](ofs: Offer[T]*): Offer[T] 1630 | 1631 | .LP which yields a new offer that, when synchronized, obtains exactly one of ofs — the first one to become available. When several are available immediately, one is chosen at random to obtain. 1632 | 1633 | The `Offer` object has a number of one-off Offers that are used to compose with Offers from a Broker. 1634 | 1635 | Offer.timeout(duration): Offer[Unit] 1636 | 1637 | .LP is an offer that activates after the given duration. Offer.never will never obtain, and Offer.const(value) obtains immediately with the given value. These are useful for composition via selective communication. For example to apply a timeout on a send operation: 1638 | 1639 | Offer.choose( 1640 | Offer.timeout(10.seconds), 1641 | broker.send("my value") 1642 | ).sync() 1643 | 1644 | It may be tempting to compare the use of Offer/Broker to 1645 | [SynchronousQueue](https://docs.oracle.com/javase/6/docs/api/java/util/concurrent/SynchronousQueue.html), 1646 | but they are different in subtle but important ways. Offers can be composed in ways that such queues simply cannot. For example, consider a set of queues, represented as Brokers: 1647 | 1648 | val q0 = new Broker[Int] 1649 | val q1 = new Broker[Int] 1650 | val q2 = new Broker[Int] 1651 | 1652 | .LP Now let's create a merged queue for reading: 1653 | 1654 | val anyq: Offer[Int] = Offer.choose(q0.recv, q1.recv, q2.recv) 1655 | 1656 | .LP anyq is an offer that will read from first available queue. Note that anyq is still synchronous — we still have the semantics of the underlying queues. Such composition is simply not possible using queues. 1657 | 1658 | #### Example: A Simple Connection Pool 1659 | 1660 | Connection pools are common in network applications, and they're often 1661 | tricky to implement -- for example, it's often desirable to have 1662 | timeouts on acquisition from the pool since various clients have different latency 1663 | requirements. Pools are simple in principle: we maintain a queue of 1664 | connections, and we satisfy waiters as they come in. With traditional 1665 | synchronization primitives this typically involves keeping two queues: 1666 | one of waiters (when there are no connections), and one of connections 1667 | (when there are no waiters). 1668 | 1669 | Using Offer/Brokers, we can express this quite naturally: 1670 | 1671 | class Pool(conns: Seq[Conn]) { 1672 | private[this] val waiters = new Broker[Conn] 1673 | private[this] val returnConn = new Broker[Conn] 1674 | 1675 | val get: Offer[Conn] = waiters.recv 1676 | def put(c: Conn) { returnConn ! c } 1677 | 1678 | private[this] def loop(connq: Queue[Conn]) { 1679 | Offer.choose( 1680 | if (connq.isEmpty) Offer.never else { 1681 | val (head, rest) = connq.dequeue() 1682 | waiters.send(head) map { _ => loop(rest) } 1683 | }, 1684 | returnConn.recv map { c => loop(connq.enqueue(c)) } 1685 | ).sync() 1686 | } 1687 | 1688 | loop(Queue.empty ++ conns) 1689 | } 1690 | 1691 | `loop` will always offer to have a connection returned, but only offer 1692 | to send one when the queue is nonempty. Using a persistent queue simplifies 1693 | reasoning further. The interface to the pool is also through an Offer, so if a caller 1694 | wishes to apply a timeout, they can do so through the use of combinators: 1695 | 1696 | val conn: Future[Option[Conn]] = Offer.choose( 1697 | pool.get map { conn => Some(conn) }, 1698 | Offer.timeout(1.second) map { _ => None } 1699 | ).sync() 1700 | 1701 | No extra bookkeeping was required to implement timeouts; this is due to 1702 | the semantics of Offers: if `Offer.timeout` is selected, there is no 1703 | longer an offer to receive from the pool -- the pool and its caller 1704 | never simultaneously agreed to receive and send, respectively, on the 1705 | `waiters` broker. 1706 | 1707 | #### Example: Sieve of Eratosthenes 1708 | 1709 | It is often useful -- and sometimes vastly simplifying -- to structure 1710 | concurrent programs as a set of sequential processes that communicate 1711 | synchronously. Offers and Brokers provide a set of tools to make this simple 1712 | and uniform. Indeed, their application transcends what one might think 1713 | of as "classic" concurrency problems -- concurrent programming (with 1714 | the aid of Offer/Broker) is a useful *structuring* tool, just as 1715 | subroutines, classes, and modules are -- another important 1716 | idea from CSP. 1717 | 1718 | One example of this is the [Sieve of 1719 | Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes), 1720 | which can be structured as a successive application of filters to a 1721 | stream of integers. First, we'll need a source of integers: 1722 | 1723 | def integers(from: Int): Offer[Int] = { 1724 | val b = new Broker[Int] 1725 | def gen(n: Int): Unit = b.send(n).sync() ensure gen(n + 1) 1726 | gen(from) 1727 | b.recv 1728 | } 1729 | 1730 | .LP integers(n) is simply the offer of all consecutive integers starting at n. Then we need a filter: 1731 | 1732 | def filter(in: Offer[Int], prime: Int): Offer[Int] = { 1733 | val b = new Broker[Int] 1734 | def loop() { 1735 | in.sync() onSuccess { i => 1736 | if (i % prime != 0) 1737 | b.send(i).sync() ensure loop() 1738 | else 1739 | loop() 1740 | } 1741 | } 1742 | loop() 1743 | 1744 | b.recv 1745 | } 1746 | 1747 | .LP filter(in, p) returns the offer that removes multiples of the prime p from in. Finally, we define our sieve: 1748 | 1749 | def sieve = { 1750 | val b = new Broker[Int] 1751 | def loop(of: Offer[Int]) { 1752 | for (prime <- of.sync(); _ <- b.send(prime).sync()) 1753 | loop(filter(of, prime)) 1754 | } 1755 | loop(integers(2)) 1756 | b.recv 1757 | } 1758 | 1759 | .LP loop() works simply: it reads the next (prime) number from of, and then applies a filter to of that excludes this prime. As loop recurses, successive primes are filtered, and we have a Sieve. We can now print out the first 10000 primes: 1760 | 1761 | val primes = sieve 1762 | 0 until 10000 foreach { _ => 1763 | println(primes.sync()()) 1764 | } 1765 | 1766 | Besides being structured into simple, orthogonal components, this 1767 | approach gives you a streaming Sieve: you do not a priori need to 1768 | compute the set of primes you are interested in, further enhancing 1769 | modularity. 1770 | 1771 | ## Acknowledgments 1772 | 1773 | The lessons herein are those of Twitter's Scala community -- I hope 1774 | I've been a faithful chronicler. 1775 | 1776 | Blake Matheny, Nick Kallen, Steve Gury, and Raghavendra Prabhu 1777 | provided much helpful guidance and many excellent suggestions. 1778 | 1779 | [Scala]: https://www.scala-lang.org/ 1780 | [Finagle]: https://github.com/twitter/finagle 1781 | [Util]: https://github.com/twitter/util 1782 | --------------------------------------------------------------------------------

Effective Scala

Table of Contents

其他语言

闲话: 组合(composition)

Effective Scala

Table of Contents

他の言語

余談: 合成について

Effective Scala

Table of Contents

Other languages

Aside: Composition