├── 5W 相关.md ├── A cheat sheet for uncommon Git commands.md ├── AS 相关.md ├── AWS Serverless Workshop - lambda.md ├── AWS 知识.md ├── Better Bash history.md ├── Camo 详解.md ├── DCO.md ├── DNS issue in AWS China.md ├── Docker Compose 学习.md ├── Docker Compose 安装.md ├── Docker 之 Insecure Registry.md ├── Docker 加速器配置脚本说明.md ├── Docker 安装.md ├── Docker 常见问题汇总.md ├── Dockerfile 中的 ENTRYPOINT 指令.md ├── Dockerfile 解析.md ├── Dragonfly 项目研究.md ├── Git 相关.md ├── GitHub 和 GitLab 之间如何进行合理的迁移.md ├── Harbor HA solution proposals.md ├── Harbor REST API.md ├── Harbor 之 Notary 和 Clair.md ├── Harbor 信息汇总.md ├── Harbor 升级 v0.5.0 到 v1.2.2.md ├── Harbor 升级 v1.2.2 到 v1.6.3.md ├── Harbor 升级和数据库迁移向导 - master 分支.md ├── Harbor 升级和数据库迁移向导 - release-1.2.0 分支.md ├── Harbor 升级和数据库迁移向导 - release-1.5.0 分支.md ├── Harbor 升级和数据库迁移向导 - release-1.6.0 分支.md ├── Harbor 升级问题汇总.md ├── Harbor 实验环境构建.md ├── Harbor 工具开发.md ├── Harbor 服务搭建之网络互通问题.md ├── Hunter agent 开发.md ├── Hunter 项目介绍.md ├── LICENSE ├── Netfilter 与 Connection Tracking.md ├── New Relic 知识梳理.md ├── OpenCensus Agent.md ├── PPA 相关.md ├── Prometheus 知识梳理.md ├── README.md ├── Registry Mirror 与 --registry-mirror.md ├── Sentry 之 Nginx 使用.md ├── Sentry 显示用户 URL 为内网地址问题排查.md ├── Sentry 服务搭建.md ├── Sentry 现状调查.md ├── Terminal session recorder and others.md ├── Understanding AWS stolen CPU and how it affects.md ├── Zsh 和 iTerm2 的配置问题.md ├── bazel.md ├── cadvisor 调试记录.md ├── front-end Sentry 访问不稳定问题调查.md ├── frontend Sentry 由于被爬导致日志暴增问题排查.md ├── frontend Sentry 连接时断时续问题解决.md ├── ftrace.md ├── git flow 使用.md ├── git 命令之 rebase:reset:reflog 使用.md ├── gosu 使用.md ├── harbor prod 使用的 subnet 和 VPC subnet 冲突问题.md ├── iostat cheatsheet.md ├── opencensus blog summary.md ├── opencensus-go 之 zPages.md ├── opencensus-go 研究.md ├── opencensus-service 研究.md ├── perf Examples.md ├── perf_events cheatsheet.md ├── prometheus 线上环境 fd 超限问题排查.md ├── sentry 上传文件失败排查.md ├── shell 脚本中颜色输出问题.md ├── strace 的使用场景.md ├── 利用 Chrome 原生工具进行网页长截图.md ├── 动态追踪技术.md ├── 各大互联网公司线上故障处理资料汇总.md ├── 基于 docker 本地构建 kafka 测试环境.md ├── 基于内核源码研究 netstat -st 输出信息.md ├── 如何关闭 iptables 中 connection tracking (conntrack).md ├── 如何在构建镜像过程中进行 docker 调试.md ├── 实验研究 TCP accept queue 溢出后的表现.md ├── 常用 wireshark 表达式.md ├── 常见 Dockerfile 问题.md ├── 当新连接源端口和 CLOSE_WAIT 状态占用端口发生碰撞时.md ├── 性能分析 - CPU 篇.md ├── 性能分析 -- Mem 基础概念篇.md ├── 手撸 dev0 环境.md ├── 排查 Sentry 上传 sourcemap 时遇到的 500 错误.md ├── 流利说客户端内置网络监测功能背景信息整理.md ├── 流利说线上 SQLBuffet DNS 问题排查.md ├── 流利说线上 k8s node 问题排查.md ├── 流利说线上 k8s pod 问题排查.md ├── 流利说线上 tengine 问题排查.md ├── 流利说线上问题排查报告.md ├── 系统分析 - 文件系统写.md ├── 系统分析 - 文件系统读写时 memory 使用工具分析.md ├── 线上 kong 内存占用问题排查.md ├── 线上事故处理指导手册.md ├── 调整 vagrant VM 中的磁盘空间大小.md └── 高峰期 Presto Autoscaling 节点 DNS 解析问题.md /5W 相关.md: -------------------------------------------------------------------------------- 1 | # 5W 相关 2 | 3 | 相关文章: 4 | 5 | - Joel Spolsky 原文:《[Five whys](https://www.joelonsoftware.com/2008/01/22/five-whys/)》 6 | - 阮一峰 译文:《[五个为什么(译文)](http://www.ruanyifeng.com/blog/2009/08/five_whys.html)》 7 | - 《[丰田人为什么反复问:5个为什么?](http://www.sohu.com/a/19828737_114834)》 8 | 9 | 10 | 职场新人在遇到问题的时候经常会浅尝辄止,没能对问题根因寻根究底,错过了很多快速成长的机会。“5w 工作法”就是一种简单有效的、可用于指导工作和生过的小技巧。 11 | 12 | 13 | ## 0x01 14 | 15 | 日本丰田创始人的思想:**当某个地方出错的时候,就问为什么,一遍遍的追问,直到找到根本原因为止**; 16 | 17 | - 机房连接中断了; 18 | - **为什么**?网线接口好像不工作了; 19 | - **为什么**?网速/双工模式不匹配造成; 20 | - **为什么**?交换机的网速开关设在了自动调节,而没有被手动设置在固定档; 21 | - **为什么**?许多年前,我们就直到有可能发生此故障,但始终没有写出一份书面技术文档,用于知道交换机在生产环境中配置; 22 | - **为什么**?我们总是很狭隘的看待技术文档,觉得只有在找不到管理员的情况下才需要看。没有意识到要把它作为技术操作的标准和确认清单; 23 | 24 | 25 | ## 0x02 26 | 27 | **在持续改善的过程中反复问为什么,就可以不为表象所蒙蔽,找到问题发生的真正根源,从而真正找到有针对性的解决方案**。 28 | 29 | 第二次世界大战后,日本丰田公司曾陷入非常危险的境地,年汽车销量下降到了区区 3275 辆。 30 | 31 | 汽车销售不出去,工人开始罢工,而且持续相当长时间,丰田几乎濒临破产。不但资金短缺,还面临着原材料供应不足,而且日本汽车制造业的生产率与美国差距巨大。 32 | 33 | 在如此严峻的现实面前,丰田喜一郎提出:降低成本,消除不必要的浪费。用三年时间赶上美国!否则,日本的汽车产业将难以为继! 34 | 35 | 为了实现创业者的雄心壮志与迫在眉睫的目标,时任丰田汽车公司副社长的大野耐一日思夜想:为什么美国的生产率比日本高出几倍?一定是日本存在着大量的浪费!那么如何能找到更好的生产方式呢?通过对生产现场认真细致的研究,最终形成了一套严谨成熟的“准时生产”体系,这就是影响了全世界的“**丰田生产方式**”。 36 | 37 | 丰田生产方式的**核心**方式就是,“**必要的产品,只在必要的时间以最低的成本完成必要的数量**”。 38 | 39 | 为了达到这样的目标,就必须持续改善。而在改善的过程中之中,解决实际问题时也经常会用到**“5个为什么“分析法**。 40 | 41 | 在进行”5个为什么“分析之前,你要明确问题是什么,用丰田的术语来说就是“摸清情况”。 42 | 43 | **摸清情况要从以开放的心态观察情况,并把实际情况和标准进行对比开始**。然后,对问题的起因进行初步的分析。问题是在哪里发现的?这将带你追根溯源,接近问题的最根本原因。随后,通过“5个为什么”分析法就可以找到结果。 44 | 45 | 比如,一台机器不转动了,你就要问: 46 | 47 | - “**为什么**机器停了?” 48 | - “因为超负荷,保险丝断了。” 49 | - “**为什么**超负荷了呢?” 50 | - “因为轴承部分的润滑不够。” 51 | - “**为什么**润滑不够?” 52 | - “因为润滑泵吸不上油来。” 53 | - “**为什么**吸不上油来呢?” 54 | - “因为油泵轴磨损,松动了。” 55 | - “**为什么**磨损了呢?” 56 | - “因为没有安装过滤器,混进了铁屑。” 57 | 58 | 反复追问上述5个“为什么”就会发现需要安装过滤器。而如果“为什么”没有问到底,换上保险丝或者换上油泵轴就了事,那么,几个月以后就会再次发生同样的故障。 59 | 60 | 61 | -------------------------------------------------------------------------------- /A cheat sheet for uncommon Git commands.md: -------------------------------------------------------------------------------- 1 | # A cheat sheet for uncommon Git commands 2 | 3 | > From [RehanSaeed/Git-Cheat-Sheet](https://github.com/RehanSaeed/Git-Cheat-Sheet/blob/master/README.md) 4 | 5 | ## Configuration 6 | 7 | | Command | Description | 8 | | - | - | 9 | | `git config --global user.name "foo"` | Set user name | 10 | | `git config --global user.email "foo"` | Set user email | 11 | 12 | ## Branches 13 | 14 | | Command | Description | 15 | | - | - | 16 | | `git branch foo` | Create a new branch | 17 | | `git checkout foo` | Switch to a branch | 18 | | `git merge foo` | Merge branch into current branch | 19 | 20 | ## Staged Changes 21 | 22 | | Command | Description | 23 | | - | - | 24 | | `git mv file1.txt file2.txt` | Move/rename file | 25 | | `git rm --cached file.txt` | Unstage file | 26 | | `git rm --force file.txt` | Unstage and delete file | 27 | | `git reset HEAD` | Unstage changes | 28 | | `git reset --hard HEAD` | Unstage and delete changes | 29 | 30 | ## Changing Commits 31 | 32 | | Command | Description | 33 | | - | - | 34 | | `git reset 5720fdf` | Reset current branch but not working area to commit | 35 | | `git reset --hard 5720fdf` | Reset current branch and working area to commit | 36 | | `git commit --amend` | Change the last commit | 37 | | `git revert 5720fdf` | Revert a commit | 38 | | `git rebase --interactive` | Squash, rename and drop commits | 39 | | `git rebase --continue` | Continue an interactive rebase | 40 | | `git rebase --abort` | Cancel an interactive rebase | 41 | 42 | ## Compare 43 | 44 | | Command | Description | 45 | | - | - | 46 | | `git diff` | See difference between working area and current branch | 47 | | `git diff HEAD HEAD~2` | See difference between te current commit and two previous commits | 48 | | `git diff master other` | See difference between two branches | 49 | 50 | ## View 51 | 52 | | Command | Description | 53 | | - | - | 54 | | `git log` | See commit list | 55 | | `git log --patch` | See commit list and line changes | 56 | | `git log --graph --oneline --decorate` | See commit visualization | 57 | | `git log --grep foo` | See commits with foo in the message | 58 | | `git show HEAD` | Show the current commit | 59 | | `git show HEAD^` or `git show HEAD~1` | Show the previous commit | 60 | | `git show HEAD^^` or `git show HEAD~2` | Show the commit going back two commits | 61 | | `git show master` | Show the last commit in a branch | 62 | | `git show 5720fdf` | Show named commit | 63 | | `git blame file.txt` | See who changed each line and when | 64 | 65 | ## Stash 66 | 67 | | Command | Description | 68 | | - | - | 69 | | `git stash` | Stash staged files | 70 | | `git stash --include-untracked` | Stash working area and staged files | 71 | | `git stash list` | List stashes | 72 | | `git stash apply` | Moved last stash to working area | 73 | | `git stash apply 0` | Moved named stash to working area | 74 | | `git stash clear` | Clear the stash | 75 | 76 | ## Remote 77 | 78 | | Command | Description | 79 | | - | - | 80 | | `git remote -v` | List remote repositories | 81 | | `git remote show origin` | Show remote repository details | 82 | | `git remote add upstream ` | Add remote upstream repository | 83 | | `git remote -v` | List remote repositories | 84 | | `git push --tags` | Push tags to remote repository | 85 | 86 | -------------------------------------------------------------------------------- /AS 相关.md: -------------------------------------------------------------------------------- 1 | # AS 相关 2 | 3 | > 以下信息通过 https://bgp.he.net/ 查得 4 | 5 | | Origin AS (ASN 数据) | Announcement (CIDR | Description | 6 | | -- | -- | -- | 7 | | AS55960 | 54.222.0.0/19 | Beijing Guanghuan Xinwang Digital Technology co.Ltd. | 8 | | AS4808 | 124.65.0.0/16
124.65.192.0/18 | China Unicom Beijing province network | 9 | | AS4808 | 202.96.13.0/24 | China National Instruments Import & Export Corp | 10 | | AS4837 | 219.158.96.0/19 | CNC group | 11 | | AS4134 | 202.97.0.0/19 | CHINANET backbone network | 12 | | AS9394 | 61.236.0.0/15
61.237.0.0/16
61.237.0.0/17 | China TieTong Telecommunications Corporation | 13 | | AS9808 | 218.200.0.0/13
218.204.0.0/14
218.206.0.0/15
218.207.192.0/19

112.0.0.0/10
112.5.0.0/16

211.136.0.0/13
211.140.0.0/14
211.142.0.0/15 | China Mobile Communications Corporation | 14 | | AS9808 | 211.143.144.0/20 | China Mobile Communications Corporation - fujian | 15 | | | xx通 | | 16 | | AS9929 | | China Netcom Backbone | 17 | | AS9800 | | CHINA UNICOM | 18 | | AS4538 | | China Education and Research Network Center | 19 | | AS9306 | | China International Electronic Commerce Center | 20 | | AS4799 | | CNCGROUP Jitong IP network | 21 | | | 城域网 | | 22 | | AS17623 | | China Unicom Shenzen network | 23 | | AS17816 | | China Unicom IP network China169 Guangdong province | 24 | | | IDC | | 25 | | AS4816 | | China Telecom (Group) | 26 | | AS23724 | | IDC, China Telecommunications Corporation | 27 | | AS4835 | | China Telecom (Group) | 28 | | | ISP | | 29 | | AS9812 | | Oriental Cable Network Co., Ltd. | 30 | | | 中国电信北方九省 | | 31 | | AS17785 | | asn for Henan Provincial Net of CT | 32 | | AS17896 | | asn for Jilin Provincial Net of CT | 33 | | AS17923 | | asn for Neimenggu Provincial Net of CT | 34 | | AS17897 | | asn for Heilongjiang Provincial Net of CT | 35 | | AS17883 | | asn for Shanxi Provincial Net of CT | 36 | | AS17799 | | asn for Liaoning Provincial Net of CT | 37 | | AS17672 | | asn for Hebei Provincial Net of CT | 38 | | AS17638 | | ASN for TIANJIN Provincial Net of CT | 39 | | AS17633 | | ASN for Shandong Provincial Net of CT | 40 | | | IXP/NAP | | 41 | | AS4847 | | China Networks Inter-Exchange | 42 | | AS4839 | | NAP2 at CERNET located in Shanghai | 43 | | AS4840 | | NAP3 at CERNET located in Guangzhou | 44 | | | CN2 | | 45 | | AS4809 | | China Telecom Next Generation Carrier Network | 46 | | | 教育城域网 | | 47 | | AS9806 | | Beijing Educational Information Network Service Center Co., Ltd | 48 | | | IPv6 Test Network | | 49 | | AS23912 | | China Japan Joint IPv6 Test Network | 50 | | AS9808 | | Guangdong Mobile Communication Co.Ltd. | 51 | 52 | ## Origin AS 53 | 54 | > ref: https://www.arin.net/resources/originas.html 55 | 56 | The **Origin Autonomous System** (`AS`) field is an optional field collected by `ARIN` during all IPv4 and IPv6 block transactions (allocation and assignment requests, reallocation and reassignment actions, transfer and experimental requests). This additional field is used by IP address block holders (including legacy address holders) to record a list of the **Autonomous System Numbers** (`ASNs`), separated by commas or whitespace, from which the addresses in the address block(s) may originate. 57 | 58 | Collecting and making Origin AS information available to the community is part of the implementation of Policy [ARIN-2006-3: Capturing Originations in Templates](https://www.arin.net/vault/policy/proposals/2006_3.html), included in the ARIN Number Resource Policy Manual (`NRPM`) Section 3.5: "[Autonomous System Originations](https://www.arin.net/policy/nrpm.html#three5)." This information is available using our [Bulk Whois](https://www.arin.net/resources/request/bulkwhois.html) service. 59 | 60 | ## AS 相关信息 61 | 62 | - http://www.caida.org/home/ -- 一个研究 AS 级的拓扑结构的网站,在这个网站可以找到因特网 AS 级的拓扑资料和各种分析;其分析数据的三个来源是: 63 | - http://www.routeviews.org/ - BGP 64 | - http://www.caida.org/tools/measurement/skitter/ -- RouterTrace 65 | - Whois 66 | - http://www.caida.org/analysis/topology/as_core_network/historical.xml -- 全球 AS 爆炸性增长的一个直观印象 67 | - [自治系统 - 维基百科](https://zh.wikipedia.org/wiki/%E8%87%AA%E6%B2%BB%E7%B3%BB%E7%BB%9F) 68 | - [Exploring Autonomous System Numbers - The Internet Protocol Journal - Volume 9, Number 1 - Cisco](http://www.cisco.com/c/en/us/about/press/internet-protocol-journal/back-issues/table-contents-12/autonomous-system-numbers.html) 69 | - [Request Resources](https://www.arin.net/resources/request/asn.html) 70 | - 从 ip 地址反查 AS 信息的工具:https://tools.ipip.net/as.php 71 | 72 | Resources 73 | 74 | - 最完整: [Autonomous System (AS) Numbers](http://www.iana.org/assignments/as-numbers/as-numbers.xml) 75 | - 最客观: [Mapping Local Internet Control - Country Report: China](http://cyber.law.harvard.edu/netmaps/country_detail.php/?cc=CN) 76 | - 最全面: [AS info for the country China](http://www.tcpiputils.com/browse/as/cn) 77 | - 最专业: [Networks of China - bgp.he.net](http://bgp.he.net/country/CN/) 78 | - 最清晰: [Whois](http://ipwhois.cnnic.cn/) 79 | 80 | 参考: 81 | 82 | - http://www.cnblogs.com/webmedia/archive/2006/01/28/324031.html 83 | - https://github.com/idealhack/notes/blob/master/notes/BGP.md 84 | 85 | 86 | 87 | -------------------------------------------------------------------------------- /Better Bash history.md: -------------------------------------------------------------------------------- 1 | # Better Bash history 2 | 3 | > 原文地址:https://sanctum.geek.nz/arabesque/better-bash-history/ 4 | 5 | By default, the Bash shell keeps the history of your most recent session in the `.bash_history` file, and the commands you’ve issued in your current session are also available with a `history` call. These defaults are useful for keeping track of what you’ve been up to in the shell on any given machine, but with disks much larger and faster than they were when Bash was designed, **a little tweaking in your `.bashrc` file can record history more permanently, consistently, and usefully**. 6 | 7 | ## Append history instead of rewriting it 8 | 9 | You should start by setting the `histappend` option, which will mean that when you close a session, your history will be appended to the `.bash_history` file rather than overwriting what’s in there. 10 | 11 | ``` 12 | shopt -s histappend 13 | ``` 14 | 15 | ## Allow a larger history file 16 | 17 | The default maximum number of commands saved into the `.bash_history` file is a rather meager 500. If you want to keep history further back than a few weeks or so, you may as well bump this up by explicitly setting `$HISTSIZE` to a much larger number in your `.bashrc`. We can do the same thing with the `$HISTFILESIZE` variable. 18 | 19 | ``` 20 | HISTFILESIZE=1000000 21 | HISTSIZE=1000000 22 | ``` 23 | 24 | The man page for Bash says that `HISTFILESIZE` can be `unset` to stop truncation entirely, but unfortunately this **doesn’t work** in `.bashrc` files **due to the order in which variables are set**; it’s therefore more straightforward to simply set it to a very large number. 25 | 26 | If you’re on **a machine with resource constraints**, it might be a good idea to **occasionally archive old `.bash_history` files to speed up login and reduce memory footprint**. 27 | 28 | ## Don’t store specific lines 29 | 30 | You can prevent commands that start with a space from going into history by setting `$HISTCONTROL` to `ignorespace`. You can also ignore duplicate commands, for example repeated `du` calls to watch a file grow, by adding `ignoredups`. There’s a shorthand to set both in `ignoreboth`. 31 | 32 | ``` 33 | HISTCONTROL=ignoreboth 34 | ``` 35 | 36 | You might also want to remove the use of certain commands from your history, whether for privacy or readability reasons. This can be done with the `$HISTIGNORE` variable. It’s common to use this to exclude `ls` calls, **job control** builtins like `bg` and `fg`, and calls to `history` itself: 37 | 38 | ``` 39 | HISTIGNORE='ls:bg:fg:history' 40 | ``` 41 | 42 | ## Record timestamps 43 | 44 | If you set `$HISTTIMEFORMAT` to something useful, Bash will record the timestamp of each command in its history. In this variable you can specify the format in which you want this timestamp displayed when viewed with `history`. I find the full date and time to be useful, because it can be sorted easily and works well with tools like `cut` and `awk`. 45 | 46 | ``` 47 | HISTTIMEFORMAT='%F %T ' 48 | ``` 49 | 50 | ## Use one command per line 51 | 52 | To make your `.bash_history` file a little easier to parse, you can force commands that you entered on more than one line to be adjusted to fit on only one with the `cmdhist` option: 53 | 54 | ``` 55 | shopt -s cmdhist 56 | ``` 57 | 58 | ## Store history immediately 59 | 60 | **By default, Bash only records a session to the `.bash_history` file on disk when the session terminates. This means that if you crash or your session terminates improperly, you lose the history up to that point**. You can fix this by recording each line of history as you issue it, through the `$PROMPT_COMMAND` variable: 61 | 62 | ``` 63 | PROMPT_COMMAND='history -a' 64 | ``` 65 | 66 | ## Related Posts 67 | 68 | - [Bash history expansion](https://sanctum.geek.nz/arabesque/bash-history-expansion/) 69 | - [Bash prompts](https://sanctum.geek.nz/arabesque/bash-prompts/) 70 | - [Shell config subfiles](https://sanctum.geek.nz/arabesque/shell-config-subfiles/) 71 | 72 | 73 | -------------------------------------------------------------------------------- /Camo 详解.md: -------------------------------------------------------------------------------- 1 | # Camo 详解 2 | 3 | ## 起因 4 | 5 | 在 github 上经常看到“动图”,例如 6 | 7 | ![](https://camo.githubusercontent.com/bdc860dbbe237022f883e63f231b7966b59d1a96/68747470733a2f2f63646e2e7261776769742e636f6d2f62617272797a2f676f63692f33373262636363622f64656d6f6e7374726174696f6e2e737667) 8 | 9 | 查看其链接信息中均有 https://camo.githubusercontent.com/xxx ,那么 camo 到底是什么呢? 10 | 11 | ## [About anonymized image URLs](https://help.github.com/articles/about-anonymized-image-urls/) 12 | 13 | > If you upload an image to GitHub, the URL of the image will be modified so your information is not trackable. 14 | 15 | 当你上传图片到 GitHub 时,图片的 URL 可能会被修改,进而导致图片无法被正确访问; 16 | 17 | > To host your images, GitHub uses the [open-source project Camo](https://github.com/atmos/camo). **Camo generates an anonymous URL proxy for each image** that starts with https://camo.githubusercontent.com/ and hides your browser details and related information from other users. 18 | 19 | - GitHub 使用 Camo 提供图片服务; 20 | - Camo 会为每一个图片都生成匿名 URL proxy ,并以 https://camo.githubusercontent.com/ 作为链接的起始部分; 21 | - Camo 能够隐藏你的浏览器等相关信息,防止其他用户获取; 22 | 23 | > Anyone who receives your anonymized image URL, directly or indirectly, may view your image. To keep sensitive images private, restrict them to a private network or a server that requires authentication instead of using Camo. 24 | 25 | - 收到你 anonymized image URL 的任何人,都能够直接或间接查看你的图片; 26 | - 若想确保图片的私密性,请将其限制在私有网络中,或者使用需要鉴权的服务器作为图片宿主机,而不是 Camo ; 27 | 28 | ## 使用 Camo 时可能遇到的问题 29 | 30 | - An image is not showing up 31 | - An image that changed recently is not updating 32 | - Removing an image from Camo's cache 33 | - Viewing images on private networks 34 | 35 | ## [Proxying User Images](https://blog.github.com/2014-01-28-proxying-user-images/) 36 | 37 | > A while back, we started **[proxying all non-https images](https://blog.github.com/2010-11-13-sidejack-prevention-phase-3-ssl-proxied-assets/)** to avoid mixed-content warnings using a custom **node server** called [camo](https://github.com/atmos/camo). We’re making a small change today and **proxying HTTPS images as well**. 38 | 39 | > **Proxying these images will help protect your privacy**: your browser information won’t be leaked to other third party services. Since we’re also routing images through our CDN, you should also see faster overall load times across GitHub, as well as fewer broken images in the future. 40 | 41 | ## [atmos/camo](https://github.com/atmos/camo) -- Ruby + CoffeeScript 42 | 43 | 一句话:an http proxy to route images through SSL 44 | 45 | ## [cactus/go-camo](https://github.com/cactus/go-camo) -- Golang 46 | 47 | 一句话:Go secure image proxy server 48 | 49 | > 值得深入研究一下 50 | 51 | ## 使用 52 | 53 | > TODO 54 | -------------------------------------------------------------------------------- /DCO.md: -------------------------------------------------------------------------------- 1 | # DCO 2 | 3 | Ref: https://github.com/apps/dco 4 | 5 | > This App enforces the [Developer Certificate of Origin](https://developercertificate.org/) (DCO) on Pull Requests. It requires all commit messages to contain the `Signed-off-by` line with an email address that matches the commit author. 6 | 7 | - DCO 作用于 Pull Request 8 | - DCO 要求 commit 中必须包含 `Signed-off-by` 用于匹配 author 的 email ; 9 | 10 | > The Developer Certificate of Origin (DCO) is a lightweight way for contributors to certify that they wrote or otherwise have the right to submit the code they are contributing to the project. Here is the full [text of the DCO](https://developercertificate.org/), reformatted for readability: 11 | 12 | DCO 是一种确认 contributors 是否具有所有权的轻量级方式; 13 | 14 | >> By making a contribution to this project, I certify that: 15 | >> 16 | >> The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or 17 | >> 18 | >> The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or 19 | >> 20 | >> The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. 21 | >> 22 | >> I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. 23 | 24 | 上面是完整的 DCO 文本内容; 25 | 26 | > Contributors **sign-off** that they adhere to these requirements by adding a `Signed-off-by` line to commit messages. 27 | 28 | ``` 29 | This is my commit message 30 | 31 | Signed-off-by: Random J Developer 32 | ``` 33 | 34 | sign-off 的格式; 35 | 36 | > `Git` even has a `-s` command line option to append this automatically to your commit message: 37 | 38 | ``` 39 | $ git commit -s -m 'This is my commit message' 40 | ``` 41 | 42 | > Once installed, this integration will set the [status](https://developer.github.com/v3/repos/statuses/) to `failed` if commits in a Pull Request do not contain a valid `Signed-off-by` line. 43 | 44 | ![](https://cloud.githubusercontent.com/assets/173/24482273/a35dc23e-14b5-11e7-9371-fd241873e2c3.png) 45 | 46 | 47 | ---------- 48 | 49 | 50 | ## 实际问题 51 | 52 | 53 | ``` 54 | ➜ new git:(proposal/a_go_cli_tool_for_harbor) git remote -v 55 | my git@github.com:moooofly/community.git (fetch) 56 | my git@github.com:moooofly/community.git (push) 57 | origin git@github.com:goharbor/community (fetch) 58 | origin git@github.com:goharbor/community (push) 59 | ➜ new git:(proposal/a_go_cli_tool_for_harbor) 60 | 61 | ➜ new git:(proposal/a_go_cli_tool_for_harbor) git lg1 62 | * c0086b9 - (25 minutes ago) update — moooofly (HEAD -> proposal/a_go_cli_tool_for_harbor, my/proposal/a_go_cli_tool_for_harbor) 63 | * df816e3 - (36 minutes ago) New Proposal: A Go CLI Client for Harbor — moooofly 64 | * 5c9858c - (3 days ago) Merge pull request #5 from goharbor/summarize_meeting_minutes_0913 — Steven Zou (origin/master, origin/HEAD, master) 65 | |\ 66 | | * 5c0d16a - (3 days ago) Add meeting minutes for 2018/09/13 meeting — Steven Zou 67 | |/ 68 | * 74a7ae7 - (5 days ago) Merge pull request #3 from goharbor/add_proposal_folders — Steven Zou 69 | |\ 70 | | * d405cee - (5 days ago) Add the missing expected existing folder 'proposals' — Steven Zou 71 | * | e857eb1 - (5 days ago) Merge pull request #2 from goharbor/add_proposal_folders — Steven Zou 72 | |\ \ 73 | | |/ 74 | | * 4ed6afa - (5 days ago) Add proposal folders to keep raisd proposals from community and also update the README to reflect the change — Steven Zou 75 | |/ 76 | | * 89faa64 - (10 days ago) Add the very drafted version of governance model docs — Steven Zou (origin/build_governance_model) 77 | |/ 78 | * 62f6dc7 - (11 days ago) Add slides link to the meeting minutes doc — Steven Zou 79 | * 06f0cb7 - (11 days ago) Fix conflicts — Steven Zou 80 | |\ 81 | | * 2ae792b - (11 days ago) Add meeting minutes of 2018/09/05 — Steven Zou 82 | * | ab4e070 - (11 days ago) Add meeting minutes of 2018/09/05 — Steven Zou 83 | |/ 84 | * d1264b0 - (4 weeks ago) Set up repo and add some related materia — Steven Zou 85 | * 7ee0fb2 - (4 weeks ago) Initial commit — Steven Zou 86 | 87 | ➜ new git:(proposal/a_go_cli_tool_for_harbor) git rebase HEAD~2 --signoff 88 | Current branch proposal/a_go_cli_tool_for_harbor is up to date, rebase forced. 89 | First, rewinding head to replay your work on top of it... 90 | Applying: New Proposal: A Go CLI Client for Harbor 91 | Applying: update 92 | ➜ new git:(proposal/a_go_cli_tool_for_harbor) 93 | 94 | ➜ new git:(proposal/a_go_cli_tool_for_harbor) git push --force my proposal/a_go_cli_tool_for_harbor 95 | Counting objects: 10, done. 96 | Delta compression using up to 4 threads. 97 | Compressing objects: 100% (10/10), done. 98 | Writing objects: 100% (10/10), 1.55 KiB | 1.55 MiB/s, done. 99 | Total 10 (delta 6), reused 0 (delta 0) 100 | remote: Resolving deltas: 100% (6/6), completed with 2 local objects. 101 | To github.com:moooofly/community.git 102 | + 6f8d98d...c0086b9 proposal/a_go_cli_tool_for_harbor -> proposal/a_go_cli_tool_for_harbor (forced update) 103 | ➜ new git:(proposal/a_go_cli_tool_for_harbor) 104 | 105 | 106 | ➜ new git:(proposal/a_go_cli_tool_for_harbor) git log 107 | commit c0086b9278b87bcebad20512ab3a4d3b882a19c1 (HEAD -> proposal/a_go_cli_tool_for_harbor, my/proposal/a_go_cli_tool_for_harbor) 108 | Author: moooofly 109 | Date: Mon Sep 17 11:15:59 2018 +0800 110 | 111 | update 112 | 113 | Signed-off-by: moooofly 114 | 115 | commit df816e3e3f197c48a476eb489e62c77c49e87327 116 | Author: moooofly 117 | Date: Mon Sep 17 11:05:30 2018 +0800 118 | 119 | New Proposal: A Go CLI Client for Harbor 120 | 121 | Signed-off-by: moooofly 122 | 123 | commit 5c9858c9fa3940f79619551591d11e5819b6339b (origin/master, origin/HEAD, master) 124 | Merge: 74a7ae7 5c0d16a 125 | Author: Steven Zou 126 | Date: Fri Sep 14 13:51:39 2018 +0800 127 | 128 | Merge pull request #5 from goharbor/summarize_meeting_minutes_0913 129 | 130 | Add meeting minutes for 2018/09/13 meeting 131 | 132 | commit 5c0d16ad930611e0b9bcbaefefc033d489c57790 133 | Author: Steven Zou 134 | Date: Fri Sep 14 13:48:59 2018 +0800 135 | 136 | Add meeting minutes for 2018/09/13 meeting 137 | 138 | Signed-off-by: Steven Zou 139 | ... 140 | ``` 141 | 142 | ![DCO failed and resolve method](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/DCO%20failed%20and%20resolve%20method.png) 143 | 144 | ![DCO success](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/DCO%20success.png) 145 | 146 | 147 | ---------- 148 | 149 | 150 | ## signoff 151 | 152 | - git-commit 153 | 154 | ``` 155 | ➜ ~ man git-commit 156 | ... 157 | -s, --signoff 158 | Add Signed-off-by line by the committer at the end of the commit log message. The meaning of a signoff depends 159 | on the project, but it typically certifies that committer has the rights to submit this work under the same 160 | license and agrees to a Developer Certificate of Origin (see http://developercertificate.org/ for more 161 | information). 162 | ``` 163 | 164 | - git-rebase 165 | 166 | ``` 167 | ➜ ~ man git-rebase 168 | ... 169 | --signoff 170 | This flag is passed to git am to sign off all the rebased commits (see git-am(1)). Incompatible with the --interactive option. 171 | ``` 172 | 173 | - git-am 174 | 175 | ``` 176 | ➜ ~ man git-am 177 | ... 178 | -s, --signoff 179 | Add a Signed-off-by: line to the commit message, using the committer identity of yourself. See the signoff option in git-commit(1) for more information. 180 | ``` -------------------------------------------------------------------------------- /DNS issue in AWS China.md: -------------------------------------------------------------------------------- 1 | # DNS issue in AWS China 2 | 3 | Ref: https://github.com/kubernetes/kops/issues/2858#issuecomment-315422011 4 | 5 | So DNS ... one of our challenges. Here is some background information and what I understand. 6 | 7 | In order to do cool stuff like HA control plane (masters) and have a functional rolling update of a cluster, you need a way for Etcd and Nodes to find the stuff it needs to find. The option that is generic i.e. works on multiple platforms, of course, is DNS. But **DNS can be a bit of a pain, and for instance, in AWS China, there is not route53 DNS**. Also what about running in GCE, when you do not want to use a public DNS domain? As far as I know, a user cannot have a private domain hosted by GCE. 8 | 9 | So enter `gossip`. As mentioned above `gossip` is a communication pattern which allows a distributed system to maintain eventual consistency. `Gossip` is actually a very good pattern for eventual consistency. Many systems like **Weave** and **Cassandra** use `gossip`. 10 | 11 | Quite simply instead of using DNS, **a kops clusters can perform lookups using `gossip`**. Here is an example. During the creation of a kops cluster or a rolling update of a cluster, Etcd nodes need to discover each other. Instead of using Route53 for DNS, the hostnames are now updated via protokube container running `gossip`. 12 | 13 | Trade offs. At this point using `gossip` is new, and is not hardened by users breaking it. So that is exactly what we need the user base to do, is use it, break it for us, so we can make kops great! Also without DNS, you do not get stuff like a DNS name for the API endpoint. -------------------------------------------------------------------------------- /Docker Compose 安装.md: -------------------------------------------------------------------------------- 1 | # Docker Compose 安装 2 | 3 | > 参考:[Install Docker Compose](https://docs.docker.com/compose/install/) 4 | 5 | ## Prerequisites 6 | 7 | Docker Compose relies on **Docker Engine** for any meaningful work, so make sure you have Docker Engine installed either locally or remote, depending on your setup. 8 | 9 | - On desktop systems like **Docker for Mac** and **Windows**, Docker Compose is included as part of those desktop installs. 10 | - On **Linux** systems, first install the Docker for your OS as described on the Get Docker page, then come back here for instructions on installing Compose on Linux systems. 11 | 12 | ## Install Compose on Linux systems 13 | 14 | On Linux, you can download the Docker [Compose binary from the Compose repository release page on GitHub](https://github.com/docker/compose/releases). 15 | 16 | step by step instructions: 17 | 18 | ``` 19 | # Use the latest Compose release number in the download command. 20 | $ sudo curl -L https://github.com/docker/compose/releases/download/1.16.1/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose 21 | 22 | # Apply executable permissions to the binary 23 | $ sudo chmod +x /usr/local/bin/docker-compose 24 | 25 | # Optionally, install command completion for the bash and zsh shell. 26 | # https://docs.docker.com/compose/completion/ 27 | 28 | # Test the installation 29 | docker-compose --version 30 | ``` 31 | 32 | ## 脚本 33 | 34 | Ref: https://github.com/moooofly/scaffolding/blob/master/docker-compose_setup.sh -------------------------------------------------------------------------------- /Docker 之 Insecure Registry.md: -------------------------------------------------------------------------------- 1 | # Docker 之 Insecure Registry 2 | 3 | > 以下内容取自:[dockerd](https://docs.docker.com/engine/reference/commandline/dockerd/) 4 | 5 | 6 | ---------- 7 | 8 | > `dockerd` is the persistent process that manages containers. Docker uses different binaries for the **daemon** and **client**. To run the daemon you type `dockerd`. 9 | 10 | ``` 11 | Usage: dockerd COMMAND 12 | 13 | A self-sufficient runtime for containers. 14 | 15 | Options: 16 | ... 17 | --insecure-registry list Enable insecure registry communication (default []) 18 | ... 19 | ``` 20 | 21 | 22 | > Docker considers a `private registry` either **secure** or **insecure**. In the rest of this section, registry is used for private registry, and `myregistry:5000` is a placeholder example for a private registry. 23 | 24 | private registry 分为 **secure** 或 **insecure** 两种; 25 | 26 | > A `secure registry` uses TLS and a copy of its CA certificate is placed on the Docker host at `/etc/docker/certs.d/myregistry:5000/ca.crt`. An `insecure registry` is either not using TLS (i.e., listening on plain text HTTP), or is using TLS with a CA certificate not known by the Docker daemon. The latter can happen when the certificate was not found under `/etc/docker/certs.d/myregistry:5000/`, or if the certificate verification failed (i.e., wrong CA). 27 | 28 | **secure registry** 要求: 29 | 30 | - 启用 TLS 31 | - 将 CA certificate 放置到 /etc/docker/certs.d/myregistry:5000/ca.crt 32 | 33 | insecure registry 特征: 34 | 35 | - 未启用 TLS 36 | - 启用了 TLS 但 Docker daemon 不知道 CA certificate 在何处(未放置到正确位置 or 证书验证失败) 37 | 38 | > By default, Docker assumes all, but local, registries are secure. Communicating with an insecure registry is not possible if Docker assumes that registry is secure. In order to communicate with an insecure registry, the Docker daemon requires `--insecure-registry` in one of the following two forms: 39 | 40 | Docker 默认会认为除了 local 以外的所有 registries 都是 secure 的;因此在和 insecure registry 进行通信时,在未设置 `--insecure-registry` 的情况下,会失败;可以通过如下两种方式进行设置: 41 | 42 | 43 | > - `--insecure-registry myregistry:5000` tells the Docker daemon that `myregistry:5000` should be considered insecure. 44 | > - `--insecure-registry 10.1.0.0/16` tells the Docker daemon that all registries whose domain resolve to an IP address is part of the subnet described by the **CIDR** syntax, should be considered insecure. 45 | 46 | > The flag **can be used multiple times** to allow multiple registries to be marked as insecure. 47 | 48 | 可以基于上述选项设置多次; 49 | 50 | > If an `insecure registry` is not marked as insecure, `docker pull`, `docker push`, and `docker searc`h will result in an error message prompting the user to either secure or pass the `--insecure-registry` flag to the Docker daemon as described above. 51 | 52 | > **Local registries**, whose IP address falls in the 127.0.0.0/8 range, **are automatically marked as insecure** as of Docker 1.3.2. It is not recommended to rely on this, as it may change in the future. 53 | 54 | Local registries 默认就被认为是 insecure 的; 55 | 56 | > Enabling `--insecure-registry`, i.e., allowing un-encrypted and/or untrusted communication, can be useful when running a local registry. However, because its use creates security vulnerabilities it should ONLY be enabled for testing purposes. For increased security, users should add their CA to their system’s list of trusted CAs instead of enabling `--insecure-registry`. 57 | 58 | 59 | ---------- 60 | 61 | ## 证书问题导致的登录失败 62 | 63 | ### 失败一 64 | 65 | 登录基于备份数据恢复出来的 Harbor 66 | 67 | ``` 68 | ➜ ~ docker login 172.31.2.7 69 | Username: fei.sun 70 | Password: 71 | Error response from daemon: Get https://172.31.2.7/v2/: x509: cannot validate certificate for 172.31.2.7 because it doesn't contain any IP SANs 72 | ➜ ~ 73 | ``` 74 | 75 | - 基于备份恢复出来的环境,其证书配置仍旧为原来针对 `prod-reg.llsops.com` 生成的那个(花钱购买的),因此基于 IP 地址登录时,会触发上述错误; 76 | - 解决办法: 77 | - 在 `prod-reg.llsops.com` 绑定的 IP 地址未进行变更前(变更 DNSPod 上的配置),可以先创建一个自签名的、基于 IP 的证书使用; 78 | - 在目标机器的 `/etc/hosts` 中进行临时配置; 79 | 80 | ### 失败二 81 | 82 | 在 Mac 上登录 vagrant 中的 Harbor 83 | 84 | ``` 85 | ➜ ~ docker login 11.11.11.12 86 | Username (admin): 87 | Password: 88 | Error response from daemon: Get https://11.11.11.12/v2/: x509: certificate signed by unknown authority 89 | ➜ ~ 90 | ``` 91 | 92 | 需要调整 Docker for Mac 的配置(即 client 访问的本地 docker daemon 配置),Preferences -> Daemon -> Basic -> Insecure registries -> 添加 Harbor 服务对应的 ip 地址,之后点击 Apply & Restart ; 93 | 94 | ``` 95 | ➜ ~ docker login 11.11.11.12 96 | Username (admin): 97 | Password: 98 | Login Succeeded 99 | ➜ ~ 100 | ``` 101 | 102 | ### 失败三 103 | 104 | ``` 105 | Error response from daemon: Get https://myregistrydomain.com/v1/users/: dial tcp myregistrydomain.com:443 getsockopt: connection refused. 106 | ``` 107 | 108 | > Harbor supports HTTP by default and Docker client tries to connect to Harbor using HTTPS first, so if you encounter an error as below when you pull or push images, you need to add '`--insecure-registry`' option to `/etc/default/docker` (ubuntu) or `/etc/sysconfig/docker` (centos) and restart Docker. 109 | 110 | > If this private registry supports only HTTP or HTTPS with an unknown CA certificate, please add 111 | `--insecure-registry myregistrydomain.com` to the daemon's start up arguments. 112 | 113 | > In the case of HTTPS, if you have access to the registry's CA certificate, simply place the CA certificate at `/etc/docker/certs.d/myregistrydomain.com/ca.crt` . 114 | 115 | 总结: 116 | 117 | - 若 private registry 只支持 HTTP ,或基于自签名证书提供 HTTPS 服务,但没有将 ca.crt 放到合适的目录中,此时通过 docker cli 访问时会触发上述错误; 118 | - 若将 ca.crt 放到了合适的目录,且在 docker client 侧配置了 `--insecure-registry` 则能访问成功; 119 | 120 | ### 成功情况分析 121 | 122 | ``` 123 | # 登录原始 registry 124 | ➜ ~ docker login prod-reg.llsops.com 125 | Username: fei.sun 126 | Password: 127 | Login Succeeded 128 | ➜ ~ 129 | ``` 130 | 131 | - `prod-reg.llsops.com` 在 DNSPod 中进行了注册; 132 | - `docker login prod-reg.llsops.com` 登录过程中,会 133 | - 基于 `prod-reg.llsops.com` 进行证书验证; 134 | - 将 `prod-reg.llsops.com` 转为 IP 地址用于登录; 135 | 136 | 137 | ## 信息补充 138 | 139 | **SSL needs identification of the peer**, otherwise your connection might be against a man-in-the-middle which decrypts + sniffs/modifies the data and then forwards them encrypted again to the real target. **Identification is done with x509 certificates which need to be validated against a trusted CA** and which need to identify the target you want to connect to. 140 | 141 | Usually the target is given as a `hostname` and this is checked against the `subject` and `subject alternative names` of the certificate. In this case your target is a **IP**. To validate the certifcate successfully, the IP must be given in the certificate inside the `subject alternative names` section, but not as an DNS entry (e.g. hostname) but instead as IP. 142 | 143 | 参考:[这里](https://serverfault.com/questions/611120/failed-tls-handshake-does-not-contain-any-ip-sans) 144 | 145 | 146 | ---------- 147 | 148 | DNSPod 变更域名绑定信息后,大约 1min 左右就能成功更新,可以通过 `dig xxxx` 进行确认 -------------------------------------------------------------------------------- /Docker 安装.md: -------------------------------------------------------------------------------- 1 | # Docker 安装 2 | 3 | NOTE: last update 2019-10-24 4 | 5 | > 参考:[Get Docker CE for Ubuntu](https://docs.docker.com/engine/installation/linux/docker-ce/ubuntu/) 6 | 7 | ## OS requirements of Docker CE 8 | 9 | To install Docker CE, you need the 64-bit version of one of these Ubuntu versions: 10 | 11 | - Disco 19.04 12 | - Cosmic 18.10 13 | - Bionic 18.04 (LTS) 14 | - Xenial 16.04 (LTS) 15 | 16 | ## Uninstall old versions 17 | 18 | Older versions of Docker were called `docker` or `docker-engine`. If these are installed, uninstall them: 19 | 20 | ``` 21 | $ sudo apt-get remove docker docker-engine docker.io containerd runc 22 | ``` 23 | 24 | The Docker CE package is now called `docker-ce`. 25 | 26 | The contents of `/var/lib/docker/`, including images, containers, volumes, and networks, are preserved. The **Docker Engine - Community** package is now called `docker-ce`. 27 | 28 | 29 | > **If you need to use `aufs`** 30 | > 31 | > Docker CE now uses the `overlay2` storage driver by default, and it is recommended that you use it instead of `aufs`. If you need to use `aufs`, you will need to do additional preparation. 32 | > 33 | > **XENIAL 16.04 AND NEWER** 34 | > 35 | > For Ubuntu 16.04 and higher, the Linux kernel includes support for **OverlayFS**, and Docker CE will use the `overlay2` storage driver by default. If you need to use `aufs` instead, you need to configure it manually. 36 | 37 | ## Install Docker CE 38 | 39 | You can install Docker CE in different ways, depending on your needs: 40 | 41 | - Most users set up Docker’s repositories and install from them, for ease of installation and upgrade tasks. This is the recommended approach. 42 | - Some users download the DEB package and install it manually and manage upgrades completely manually. This is useful in situations such as installing Docker on air-gapped systems with no access to the internet. 43 | - In testing and development environments, some users choose to use automated convenience scripts to install Docker. 44 | 45 | > 推荐第一种 46 | 47 | Before you install Docker CE for the first time on a new host machine, you need to set up the Docker repository. Afterward, you can install and update Docker from the repository. 48 | 49 | 构建 Docker repository ; 50 | 51 | ``` 52 | # Update the apt package index 53 | $ sudo apt-get update 54 | 55 | # Install packages to allow apt to use a repository over HTTPS 56 | $ sudo apt-get install \ 57 | apt-transport-https \ 58 | ca-certificates \ 59 | curl \ 60 | gnupg-agent \ 61 | software-properties-common 62 | 63 | # Add Docker’s official GPG key 64 | $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - 65 | 66 | # Verify that you now have the key with the fingerprint 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88, by searching for the last 8 characters of the fingerprint. 67 | $ sudo apt-key fingerprint 0EBFCD88 68 | 69 | # Use the following command to set up the stable repository. 70 | # for amd64 only 71 | $ sudo add-apt-repository \ 72 | "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ 73 | $(lsb_release -cs) \ 74 | stable" 75 | ``` 76 | 77 | 安装 Docker CE ; 78 | 79 | ``` 80 | # Update the apt package index. 81 | $ sudo apt-get update 82 | 83 | # Install the latest version of Docker CE, or go to the next step to install a specific version. Any existing installation of Docker is replaced. 84 | $ sudo apt-get install docker-ce docker-ce-cli containerd.io 85 | 86 | # On production systems, you should install a specific version of Docker CE instead of always using the latest. This output is truncated. List the available versions. 87 | $ apt-cache madison docker-ce 88 | The contents of the list depend upon which repositories are enabled. Choose a specific version to install. 89 | $ sudo apt-get install docker-ce= 90 | 91 | # Verify that Docker CE is installed correctly by running the hello-world image. 92 | $ sudo docker run hello-world 93 | ``` 94 | 95 | ## 脚本 96 | 97 | Ref: https://github.com/moooofly/scaffolding/blob/master/docker_setup.sh 98 | -------------------------------------------------------------------------------- /Dockerfile 中的 ENTRYPOINT 指令.md: -------------------------------------------------------------------------------- 1 | # Dockerfile 中的 ENTRYPOINT 指令 2 | 3 | Ref: 4 | 5 | - [dockerfile_best-practices/#entrypoint](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#entrypoint) 6 | - [reference/builder/#entrypoint](https://docs.docker.com/engine/reference/builder/#entrypoint) 7 | 8 | ## 示例脚本 9 | 10 | [Postgres Official Image](https://hub.docker.com/_/postgres/) 使用如下脚本内容作为其 ENTRYPOINT : 11 | 12 | > **Set options directly on the run line**. The `entrypoint` script is made so that any options passed to the docker command will be passed along to the postgres server daemon. 13 | 14 | 好处在于,可以在执行 `docker run` 等命令时,指定需要传入到内部具体应用的 options ; 15 | 16 | > If you need to write a starter script for a single executable, you can **ensure that the final executable receives the Unix signals** by using `exec` and `gosu` commands: 17 | 18 | 在使用了 `exec` 和 `gosu` 后,能确保目标应用程序收到 Unix signals ; 19 | 20 | ``` 21 | #!/bin/bash 22 | set -e 23 | 24 | if [ "$1" = 'postgres' ]; then 25 | chown -R postgres "$PGDATA" 26 | 27 | if [ -z "$(ls -A "$PGDATA")" ]; then 28 | gosu postgres initdb 29 | fi 30 | 31 | exec gosu postgres "$@" 32 | fi 33 | 34 | exec "$@" 35 | ``` 36 | 37 | 注意:`gosu` 后面的参数 postgres 为 `user-spec` ; 38 | 39 | ## 基础 40 | 41 | `ENTRYPOINT` 指令具有两种使用形式: 42 | 43 | - `ENTRYPOINT ["executable", "param1", "param2"]` -- `exec` 形式,推荐 44 | - `ENTRYPOINT command param1 param2` -- `shell` 形式 45 | 46 | 若存在多条 ENTRYPOINT 指令,则只有最后一条生效; 47 | 48 | ## 常规用法 49 | 50 | > The best use for `ENTRYPOINT` is to set the image’s main command, allowing that image to be run as though it was that command (and then use `CMD` as the default flags). 51 | 52 | 在 `ENTRYPOINT` 指令中设置镜像主命令;在 `CMD` 中设置默认 flags ; 53 | 54 | > This is useful because the image name can double as a reference to the binary as shown in the command above. 55 | 56 | 好处在于,可以直接镜像名字当做二进制程序命令来用; 57 | 58 | ## 高级用法 59 | 60 | > The `ENTRYPOINT` instruction can also be used in combination with a **helper script**, allowing it to function in a similar way to the command above, even when starting the tool may require more than one step. 61 | 62 | 将 `ENTRYPOINT` 指令和 script 脚本结合使用; 63 | 64 | > The helper script is copied into the container and run via `ENTRYPOINT` on container start: 65 | 66 | 使用套路如下 67 | 68 | ``` 69 | COPY ./docker-entrypoint.sh / 70 | ENTRYPOINT ["/docker-entrypoint.sh"] 71 | CMD ["postgres"] 72 | ``` 73 | 74 | 使用了上述脚本后,用户可以通过如下几种方式和 Postgres 进行销户: 75 | 76 | - 简单启动 Postgres 77 | 78 | ``` 79 | $ docker run postgres 80 | ``` 81 | 82 | - 启动 Postgres 的同时传递一些 parameters 到内部 83 | 84 | ``` 85 | $ docker run postgres postgres --help 86 | ``` 87 | 88 | 注:`postgres --help` 将作为替代 Dockerfile 中 CMD 指令的内容使用; 89 | 90 | - 启动 Bash 等其他工具(**方便调试使用**) 91 | 92 | ``` 93 | $ docker run --rm -it postgres bash 94 | ``` 95 | 96 | ## Q&A 97 | 98 | ### [The exec builtin command](http://wiki.bash-hackers.org/commands/builtin/exec) 99 | 100 | > The `exec` builtin command is used to 101 | > 102 | > - **replace** the shell with a given program (executing it, **not as new process**) 103 | > - set redirections for the program to execute or for the current shell 104 | 105 | exec 的用途: 106 | 107 | - 使用指定程序替代当前 shell 进程(不创建新进程); 108 | - 用于重定向指定程序的 stdin/stdout/stderr ; 109 | 110 | > If only redirections are given, the redirections affect the current shell without executing any program. 111 | 112 | ### Why we want to configure app as PID 1 113 | 114 | > This script **uses the `exec` Bash command** so that the final running application **becomes the container’s PID 1**. This **allows the application to receive any Unix signals sent to the container**. 115 | 116 | - 使用 exec 执行目标程序后,目标程序将会成为容器的 PID 1 进程; 117 | - 容器的 PID 1 进程可以接收到任何发送给容器的 Unix signals ; 118 | 119 | > The `shell` form (of `ENTRYPOINT`) **prevents** any `CMD` or `run` command line arguments from being used, but has the **disadvantage** that your `ENTRYPOINT` will be started as a subcommand of `/bin/sh -c`, which **does not pass signals**. This means that the executable **will not be the container’s PID 1** - and **will not receive Unix signals** - so your executable **will not receive a SIGTERM from `docker stop `**. 120 | 121 | - shell 形式的 ENTRYPOINT 指令在运行指定程序时,会将其作为 `/bin/sh -c` 的子命令(子进程); 122 | - 因为常规 shell 进程不会负责 signals 转发,通常情况下 signals 转发是由具有 daemon 特性的程序来负责的,例如 init 等;因此,无法进行 Unix signals 的传递就没啥奇怪的了; 123 | - 理论上讲,PID 1 的特殊性在于其作为容器内外沟通的桥梁,例如针对 signals 的处理,所以,要么将业务进程直接作为 PID 1 运行,要么就需要使用一个具有 signals 处理功能的 daemon 进程来管理业务进程;很明显,常规 shell 不具备该能力; 124 | 125 | 126 | ### Understand how CMD and ENTRYPOINT interact 127 | 128 | Both `CMD` and `ENTRYPOINT` instructions define what command gets executed when running a container. There are few rules that describe their co-operation. 129 | 130 | - Dockerfile should specify at least one of `CMD` or `ENTRYPOINT` commands. 131 | - `ENTRYPOINT` should be defined when using the container as an executable. 132 | - `CMD` should be used as a way of defining default arguments for an `ENTRYPOINT` command or for executing an ad-hoc command in a container. 133 | - `CMD` will be overridden when running the container with alternative arguments. 134 | 135 | The table below shows what command is executed for different `ENTRYPOINT` / `CMD` combinations: 136 | 137 | | - | No ENTRYPOINT | ENTRYPOINT exec_entry p1_entry | ENTRYPOINT [“exec_entry”, “p1_entry”] | 138 | | -- | -- | -- | -- | 139 | | No CMD | error, not allowed | /bin/sh -c exec_entry p1_entry | exec_entry p1_entry | 140 | | CMD [“exec_cmd”, “p1_cmd”] | exec_cmd p1_cmd | /bin/sh -c exec_entry p1_entry | exec_entry p1_entry exec_cmd p1_cmd | 141 | | CMD [“p1_cmd”, “p2_cmd”] | p1_cmd p2_cmd | /bin/sh -c exec_entry p1_entry | exec_entry p1_entry p1_cmd p2_cmd | 142 | | CMD exec_cmd p1_cmd | /bin/sh -c exec_cmd p1_cmd | /bin/sh -c exec_entry p1_entry | exec_entry p1_entry /bin/sh -c exec_cmd p1_cmd | 143 | 144 | -------------------------------------------------------------------------------- /Git 相关.md: -------------------------------------------------------------------------------- 1 | # Git 相关 2 | 3 | ## 记不牢的 Git 操作 4 | 5 | ### develop via a single branch 6 | 7 | ``` 8 | git branch -m {{branch}} 9 | git fetch origin 10 | git rebase origin/master -i 11 | git push origin {{branch}} 12 | ``` 13 | 14 | ### create a new branch 15 | 16 | ``` 17 | git checkout -b {{branch}} 18 | checkout remote branch 19 | git checkout -b {{branch}} origin/{{branch}} 20 | ``` 21 | 22 | ### merge branch to master 23 | 24 | ``` 25 | git checkout master 26 | git merge {{branch}} 27 | ``` 28 | 29 | ### delete branch 30 | 31 | ``` 32 | git branch -D {{localBranch}} 33 | git push --delete origin {{remoteBranch}} 34 | ``` 35 | 36 | ### rename repo 37 | 38 | ``` 39 | git remote -v 40 | // View existing remotes 41 | // origin https://github.com/user/repo.git (fetch) 42 | // origin https://github.com/user/repo.git (push) 43 | 44 | git remote set-url origin https://github.com/user/repo2.git 45 | // Change the 'origin' remote's URL 46 | ``` 47 | 48 | ### add tag 49 | 50 | ``` 51 | git tag {{tag}} 52 | git push --tags 53 | ``` 54 | 55 | ### add tag for a history commit 56 | 57 | ``` 58 | // Set the HEAD to the old commit that we want to tag 59 | git checkout {{leading 7 chars of commit}} 60 | 61 | // temporarily set the date to the date of the HEAD commit, and add the tag 62 | GIT_COMMITTER_DATE="$(git show --format=%aD | head -1)" git tag -a {{tag}} -m "{{commit message}}" 63 | 64 | // set HEAD back to whatever you want it to be 65 | git checkout master 66 | 67 | git push --tags 68 | ``` 69 | 70 | ### delete tag 71 | 72 | ``` 73 | git tag --delete {{tag}} 74 | git push --delete origin {{tag}} 75 | ``` 76 | 77 | ### gh-pages 78 | 79 | ``` 80 | http://{{group}}.github.io/{{repo}}/ 81 | ``` 82 | 83 | After rename the repo, you need to push at least a commit to activate it. 84 | 85 | ### npm add owner 86 | 87 | ``` 88 | npm owner add {{name}} 89 | ``` 90 | 91 | ### modify commit author 92 | 93 | ``` 94 | $ git config user.name 'yourname' 95 | $ git config user.email 'youremail' 96 | 97 | $ git rebase -i -p 98 | 99 | # Then mark all of your bad commits as "edit" in the rebase file 100 | 101 | # Then, repeat the two commands below 102 | $ git commit --amend --reset-author 103 | $ git rebase --continue 104 | 105 | $ git push -f 106 | ``` 107 | 108 | ### Splitting a subfolder out into a new repository 109 | 110 | https://help.github.com/articles/splitting-a-subfolder-out-into-a-new-repository/ 111 | 112 | ### moving files from one git repository to another preserving history 113 | 114 | http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/ 115 | 116 | > NOTE: `mv * ` won't move hidden files such as `.eslintrc`, `mv * .* ` will move hidden files and directories with dot prefix including directory `.git` 117 | 118 | 119 | 120 | 121 | ## 合作开发中的 git 使用 122 | 123 | Ref: https://github.com/fool2fish/blog/issues/16 124 | 125 | ### 做个受人欢迎的 pr 提交者 126 | 127 | ``` 128 | #1. 从源分支进行 rebase,确保基线是最新的 129 | $ git rebase ${sourceBranch} 130 | 131 | #2. 使用 eslint 进行代码风格检查 132 | $ npm run lint 133 | 134 | #3. 运行测试,切记删除 node_modules,确保测试结果和 ci 保持一致 135 | $ rm -rf node_modules 136 | $ npm install 137 | $ npm run test 138 | # 哪怕一行最简单的修改都必须运行测试确保通过 139 | 140 | #4. 合并 commits,使 commit 更有意义,也更便于 rebase 141 | $ git rebase -i HEAD~${commitCount} 142 | # 在打开的编辑器中,将除第一条外的 commit 标记从 pick 改为 squash,保存后退出 143 | # 在打开的编辑器中,更新 commit 信息,保存后退出 144 | # 对于已经 push 过的 commit,可运行 `$ git push --force` 强制 push 145 | ``` 146 | 147 | ### 利用 ci 工具把控代码提交质量 148 | 149 | ``` 150 | #1. lint 151 | #2. test 152 | #3. test-cov 153 | # 最后使用 [WIP] mr 进行人工 review 154 | ``` 155 | 156 | ### 发布一个 npm 模块 157 | 158 | ``` 159 | #0. 首先确保已经安装 git-extra 160 | 161 | #1. 修改 History 162 | $ git changelog 163 | 164 | #2. 修改 package.json 的版本号 165 | 166 | #3. 发布模块 167 | $ npm publish 168 | 169 | #4. 打 tag 170 | $ git release ${version} 171 | ``` 172 | 173 | ### cherry-pick 174 | 175 | ``` 176 | $ git checkout ${targetBranch} 177 | $ git cherry-pick ${commit} 178 | 179 | # 如有冲突,解决冲突后再 add, commit 即可 180 | ``` 181 | -------------------------------------------------------------------------------- /GitHub 和 GitLab 之间如何进行合理的迁移.md: -------------------------------------------------------------------------------- 1 | # GitHub 和 GitLab 之间如何进行合理的迁移 2 | 3 | - GitHub -> GitLab 和 GitLab -> GitHub 4 | - 为何要进行迁移 5 | - 如何保证迁移内容的完整性 6 | - 迁移过程中需要注意哪些地方 7 | 8 | 相关链接: 9 | 10 | - [How to create a project in GitLab](https://git.llsapp.com/help/gitlab-basics/create-project#push-to-create-a-new-project) 11 | - [Import your project from GitHub to GitLab](https://git.llsapp.com/help/user/project/import/github.md) 12 | -------------------------------------------------------------------------------- /Harbor HA solution proposals.md: -------------------------------------------------------------------------------- 1 | # Harbor HA solution proposals 2 | 3 | > [issues/3728](https://github.com/vmware/harbor/issues/3728) 4 | > [issues/3582](https://github.com/vmware/harbor/issues/3582) 5 | > [T43199](https://phab.llsapp.com/T43199) 6 | 7 | 8 | ---------- 9 | 10 | 11 | We will use this Page to discuss the **Harbor HA solution proposals**. 12 | 13 | Harbor will plan to release some script to help make the HA setup easily. 14 | 15 | Please feel free give out comments/suggestions/questions. We will adjust the proposal according to your feedback. So your voice are valuable to Harbor. 16 | 17 | ## Harbor HA solutions proposals 18 | 19 | This document will cover the follow **four** solutions which provide different severity of Harbor HA. each solution will have it's cons and pros. We need to balance them and choose the one which is most fit the use scenario. 20 | 21 | ### Solution 1: Active-Active with scale out ability 22 | 23 | ![](https://user-images.githubusercontent.com/1715683/32586942-a92aa8da-c540-11e7-9af7-db03fc467451.png) 24 | 25 | As seen in the figure, components involved in the architecture are: 26 | 27 | - **VIP**: Virtual IP. The Harbor user will access Harbor through this VIP address. This VIP will only active on one load balancer node at the same time. It will automatically switch to the other node if the active loadbalancer node is down. 28 | 29 | - **LoadBalancer01 and 02**: They together compose as a group which avoid single point failure of load balancer nodes. `Keepalived` is installed on both load balancer nodes. The two `Keepalived` instance will form a **VRRP** group to provide the VIP and ensure the VIP only shows on one node at the same time. The `LVS` component in `Keepalived` is responsible for router the requests between different Harbor servers according to the routing algorithm. 30 | 31 | - **Harbor server 1..n**: These are the running Harbor instances. They are in active-active mode. User can setup multiple nodes according to their workload. 32 | 33 | - **DB cluster**: The MariaDB is used by Harbor to store user authentication information, image metadata information and so on. **User should follow its best practice to make it HA protect**. 34 | 35 | - **Shared Storage**: The shared storage is used for storing Docker Volumes used by Harbor. Images pushed by users are actually stored in this shared storage. The shared storage makes sure that multiple Harbor instances have consistent storage backend. Shared Storages can be Swift, NFS, S3, azure, GCS or OSS. **User should follow its best practice to make it HA protect**. 36 | 37 | - **Redis**: The purpose of having Redis is to store UI session data and store the registry metadata cache. When one Harbor instance fails or the load balancer routes a user request to another Harbor instance, any Harbor instance can query the Redis to retrieve session information to make sure an end-user has a continued session. **User should follow the best practice of Redis to make it HA protect**. 38 | 39 | > 由上面可以看出:DB cluster 和 Shared Storage 和 Redis 的 HA 都需要用户自己搞定; 40 | 41 | 42 | #### Limitation 43 | 44 | Currently it doesn’t support Clair and Notary. 45 | 46 | #### Setup Prerequisites 47 | 48 | - MariaDB cluster 49 | - Shared Storage 50 | - Redis cluster 51 | 52 | > Item 1,2,3 are considered external components to Harbor. Before configuring Harbor HA, we assume these components are present and all of them are HA protected. Otherwise, any of these components can be a single point of failure. 53 | 54 | - 2 VMs for Load balancer 55 | - n VMs for Harbor instances (n >=2) 56 | - n+1 static IPs (1 for VIP and the other n IPs will be used by harbor servers) 57 | 58 | 59 | ### Solution 2 Classical Active-Standby solution 60 | 61 | ![](https://user-images.githubusercontent.com/1715683/32590156-d88e6568-c553-11e7-9188-07eebbe207c7.png) 62 | 63 | This solution is ideal for situations when the workload is not very high, this is the classical two node HA solution. The different compare with solution 1 is that it doen’t need the two extra loadblancer nodes and only need 1 static IP address. This solution doesn’t support scale out. 64 | 65 | - **Keepalived**: The Keepalived will be installed both on VM1 and VM2, the two `Keepalived` will form a **VRRP** group to provide the VIP. 66 | - **VIP**: User will access the harbor cluster by the VIP. Only the server that hold the VIP will provide the service. 67 | - **Harbor instance1,2**: The harbor instance will share the VM1,2 with the Keepalived. 68 | - **DB Cluster**: Same as the DB cluster in solution1 69 | - **Shared storage**: Same as the shared storage in solution 1 70 | - **Redis cluster**: Same as the Redis cluster in solution 1 71 | 72 | #### Setup Prerequisites 73 | 74 | - Standalone MariaDB (cluster) is needed 75 | - Shared storage for registry is needed 76 | - Redis (cluster) for Harbor session and registry metadata cache is needed. 77 | - 2 Ubuntu16.04 VMs 78 | - 1 Static IP address (used as VIP) 79 | 80 | ### Solution 3 81 | 82 | Since the above two solution both require shared storage. For some scenario that don’t have shared storage but has some kind of HA requirement. We can use images replication. 83 | 84 | ![](https://user-images.githubusercontent.com/1715683/32590229-485dd112-c554-11e7-80ca-59308f63885d.png) 85 | 86 | Set up a harbor as the **master** Harbor instance. All the images will be pushed to this Harbor instance. In the meanwhile, there are a group of **read-only** harbors, which serve the pull request from Docker hosts. 87 | 88 | This solution is easy to implement and can met the requirement of load-balance and scale out. 89 | 90 | 91 | ### Solution 4 92 | 93 | This is another solution that use the replication to achieve low level “HA”, same as solution 3, no need for shared storage external database cluster and Redis cluster. This solution can make the images under protection of single node failure. 94 | 95 | ![](https://user-images.githubusercontent.com/1715683/32590250-6f937f02-c554-11e7-849a-fa0d03ddd606.png) 96 | 97 | As the figure shows it only need to setup a fully replication between two harbor nodes. This use case is more suitable for Geographically distributed team. 98 | 99 | 100 | ## 结论 101 | 102 | ### Classical Active-Active 103 | 104 | - 适用于任何场景; 105 | - 但要求的资源最多; 106 | - 将 Harbor 拆分为无状态部分和有状态部分(数据相关) 107 | 108 | 109 | ### Classical Active-Standby 110 | 111 | - 适用于工作负载不是很高的场景; 112 | - 将 Harbor 拆分为无状态部分和有状态部分(数据相关) 113 | - 不支持横向扩展,但节省了 LB 需要的机器和 static ip 等东东; 114 | - 依赖 Keepalived (VIP) ; 115 | 116 | 117 | ### Master-Slaves Replication 118 | 119 | - 适用于没有共享存储的场景; 120 | - 基于主从单向复制实现 HA ; 121 | - 一主多从,只有 master 接受 push 动作,所有 slave 只提供 read-only 的 pull 操作; 122 | - 能够根据自定义配置决定镜像的分布情况,在一定程度上实现了负载均衡和扩展能力;但配置规则会有一定的维护成本; 123 | 124 | ### Master-Master Replication 125 | 126 | - 适用于没有共享存储、没有外部 DB cluster 的场景; 127 | - 适用于地理上分布在不同地点团队需要相互共享镜像的场景; 128 | - 基于主主双向复制实现 HA ; 129 | - 每个 Harbor 上都具有全量数据; -------------------------------------------------------------------------------- /Harbor 信息汇总.md: -------------------------------------------------------------------------------- 1 | # Harbor 信息汇总 2 | 3 | ## 文档汇总 4 | 5 | - [Harbor 升级 v0.5.0 到 v1.2.2](https://github.com/moooofly/MarkSomethingDownLLS/blob/master/Harbor%20%E5%8D%87%E7%BA%A7%20v0.5.0%20%E5%88%B0%20v1.2.2.md) 6 | - [Harbor 升级和数据库迁移向导 - release-1.2.0 分支](https://github.com/moooofly/MarkSomethingDownLLS/blob/master/Harbor%20%E5%8D%87%E7%BA%A7%E5%92%8C%E6%95%B0%E6%8D%AE%E5%BA%93%E8%BF%81%E7%A7%BB%E5%90%91%E5%AF%BC%20-%20release-1.2.0%20%E5%88%86%E6%94%AF.md) 7 | - [Harbor 升级和数据库迁移向导 - release-1.5.0 分支](https://github.com/moooofly/MarkSomethingDownLLS/blob/master/Harbor%20%E5%8D%87%E7%BA%A7%E5%92%8C%E6%95%B0%E6%8D%AE%E5%BA%93%E8%BF%81%E7%A7%BB%E5%90%91%E5%AF%BC%20-%20release-1.5.0%20%E5%88%86%E6%94%AF.md) 8 | - [Harbor 升级和数据库迁移向导 - release-1.6.0 分支](https://github.com/moooofly/MarkSomethingDownLLS/blob/master/Harbor%20%E5%8D%87%E7%BA%A7%E5%92%8C%E6%95%B0%E6%8D%AE%E5%BA%93%E8%BF%81%E7%A7%BB%E5%90%91%E5%AF%BC%20-%20release-1.6.0%20%E5%88%86%E6%94%AF.md) 9 | - [Harbor 升级和数据库迁移向导 - master 分支](https://github.com/moooofly/MarkSomethingDownLLS/blob/master/Harbor%20%E5%8D%87%E7%BA%A7%E5%92%8C%E6%95%B0%E6%8D%AE%E5%BA%93%E8%BF%81%E7%A7%BB%E5%90%91%E5%AF%BC%20-%20master%20%E5%88%86%E6%94%AF.md) 10 | - [Harbor 升级问题汇总](https://github.com/moooofly/MarkSomethingDownLLS/blob/master/Harbor%20%E5%8D%87%E7%BA%A7%E9%97%AE%E9%A2%98%E6%B1%87%E6%80%BB.md) 11 | 12 | 13 | ## 数据库变更 14 | 15 | - Harbor DB 引擎 `MySQL` 在 harbor 1.3 中被替换成了 `MariaDB` ; 16 | - 从 v1.6.0 开始,Harbor 将自身使用的 DB 从 `MariaDB` 迁移成 `Postgresql` ; 17 | 18 | ## migrator tools 变更 19 | 20 | - https://hub.docker.com/r/vmware/harbor-db-migrator/tags/ -- 基本已废弃,除非有人真的使用了老版本,因为这个还是 harbor 项目未进入 CNCF 前维护的内容;涉及到的 tags 有 21 | - 0.4.5 22 | - 1.2 23 | - 1.3 24 | - 1.4 25 | - https://hub.docker.com/r/vmware/harbor-migrator/tags -- 过渡用? 26 | - v1.5.0 27 | - https://hub.docker.com/r/goharbor/harbor-db-migrator/tags/ -- 这个应该是 harbor 加入 CNCF 后将之前的 migrator tools 迁移过来的内容,理论上有了这个后,上面那个就没有用了 28 | - 1.2 29 | - 1.3 30 | - 1.4 31 | - https://hub.docker.com/r/goharbor/harbor-migrator/tags/ -- 这里提供了更新版本的工具 32 | - v1.5.0 33 | - v1.6.0 34 | - v1.6.1 35 | - v1.6.3 36 | 37 | ## migrator tools 的使用 38 | 39 | [从 v1.1.1 升级到 v1.5.0](https://github.com/goharbor/harbor/issues/5745) 40 | 41 | - 先使用 `vmware/harbor-db-migrator:1.2` 从 v1.1.1 升级到 v1.2.0 42 | - 再使用 `vmware/harbor-migrator:v1.5.0` 从 v1.2.0 升级到 v1.5.0 43 | 44 | [从 1.2.0 升级到高版本](https://github.com/goharbor/harbor/issues/5232) 45 | 46 | - 首先使用 `vmware/harbor-db-migrator:1.2` 备份数据; 47 | - 再根据目标版本的 tag 选择相应的 migrator 版本进行升级; 48 | 49 | [从 1.2.2 升级到 1.3.0](https://github.com/goharbor/harbor/issues/3949) 50 | 51 | - 使用 **migrator v1.2** 备份数据; 52 | - 删除 `/data/database` 53 | - Start and stop harbor-db v1.3.0 to initiate an empty `mariadb` instance 54 | - 使用 **migrator v1.3** 恢复之前的数据库备份; 55 | - 使用 **migrator v1.3** 进行数据库升级; 56 | 57 | [从 1.2.2 升级到 1.6.0](https://github.com/goharbor/harbor/issues/6139) 58 | 59 | - 使用 `goharbor/harbor-db-migrator:1.2` 备份数据; 60 | - 使用 `goharbor/harbor-migrator:v1.6.0` 升级; 61 | 62 | [从 1.5.1 升级到 1.6.0](https://github.com/goharbor/harbor/issues/6004) 63 | 64 | - 使用 `goharbor/harbor-migrator:v1.6.0` 升级; 65 | -------------------------------------------------------------------------------- /Harbor 升级 v1.2.2 到 v1.6.3.md: -------------------------------------------------------------------------------- 1 | # Harbor 升级 v1.2.2 到 v1.6.3 2 | 3 | ## 备份 4 | 5 | ``` 6 | # 方便回滚时切换 7 | /opt/apps/harbor 8 | 9 | # 完整的数据目录(默认位置),备份时可以将其中的 job_logs/ 下的内容删除 10 | /data 11 | 12 | # 在执行 backup 命令时生成,在通过 restore 命令进行回滚时使用 13 | /path/to/backup/registry.sql 14 | ``` 15 | 16 | ## 升级 17 | 18 | ``` 19 | # Log in to the host that Harbor runs on, stop and remove existing Harbor instance if it is still running 20 | cd /opt/apps/harbor 21 | docker-compose down 22 | 23 | # Back up Harbor's current files so that you can roll back to the current version when it is necessary. 24 | cd .. 25 | cp -rf harbor /my_backup_dir/harbor 26 | 27 | # Get the lastest Harbor release package from https://github.com/goharbor/harbor/releases 28 | docker pull goharbor/harbor-migrator:v1.6.3 29 | docker pull goharbor/harbor-db-migrator:1.2 30 | 31 | # Back up database and harbor.cfg to a directory such as /path/to/backup. 32 | docker run -it --rm -e DB_USR=root -e DB_PWD='xxxx' -v /data/database:/var/lib/mysql -v /opt/apps/harbor/harbor.cfg:/harbor-migration/harbor-cfg/harbor.cfg -v /path/to/backup:/harbor-migration/backup goharbor/harbor-migrator:v1.6.3 backup 33 | 34 | # (最后使用的是这个) 35 | docker run -it --rm -e DB_USR=root -e DB_PWD='xxxx' -v /data/database:/var/lib/mysql -v /opt/apps/harbor/harbor.cfg:/harbor-migration/harbor-cfg/harbor.cfg -v /path/to/backup:/harbor-migration/backup goharbor/harbor-db-migrator:1.2 backup 36 | 37 | # Upgrade database schema, harbor.cfg and migrate data. 38 | 39 | # The following command handles the upgrade for Harbor DB and CFG, not include Notary and Clair DB. 40 | # (最后使用的是这个) 41 | docker run -it --rm -e DB_USR=root -e DB_PWD='xxxx' -v /data/database:/var/lib/mysql -v /opt/apps/harbor/harbor.cfg:/harbor-migration/harbor-cfg/harbor.cfg goharbor/harbor-migrator:v1.6.3 up 42 | 43 | # You must run migration of Notary and Clair's DB before launch Harbor. If you want to upgrade Notary and Clair DB, refer to the following commands 44 | # 由于我之前没有使用 notary 和 clair ,故可以直接跳过这里 45 | docker run -it --rm -e DB_USR=root -v /data/notary-db/:/var/lib/mysql -v /data/database:/var/lib/postgresql/data goharbor/harbor-migrator:${tag} --db up 46 | 47 | docker run -it --rm -v /data/clair-db/:/clair-db -v /data/database:/var/lib/postgresql/data goharbor/harbor-migrator:${tag} --db up 48 | 49 | # 解压 50 | tar zxvf harbor-offline-installer-v1.6.3.tgz 51 | cd harbor 52 | 53 | # 调整配置 54 | vi harbor.cfg 55 | #(主要调整存储使用 s3) 56 | vi common/templates/registry/config.yml 57 | #(主要是 timeout 配置) 58 | vi common/templates/nginx/nginx.http.conf 59 | vi common/templates/nginx/nginx.https.conf 60 | 61 | # Under the directory ./harbor, run the ./install.sh script to install the new Harbor instance. If you choose to install Harbor with components like Notary and/or Clair, refer to Installation & Configuration Guide for more information. 62 | # https://github.com/goharbor/harbor/blob/release-1.6.0/docs/installation_guide.md 63 | ./install.sh --with-notary --with-clair --with-chartmuseum 64 | ``` 65 | 66 | 测试 mysql 访问是否正常: 67 | 68 | ``` 69 | docker run -it --rm -e DB_USR=root -e DB_PWD='xxxx' -v /data/database:/var/lib/mysql -v /opt/apps/harbor/harbor.cfg:/harbor-migration/harbor-cfg/harbor.cfg goharbor/harbor-migrator:v1.6.3 test 70 | ``` 71 | 72 | 可能会看到的错误信息 73 | 74 | ``` 75 | /usr/lib/python2.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: . 76 | ``` 77 | 78 | > will be occurred during upgrading **from harbor <= v1.5.0 to harbor v1.6.0**, just ignore them if harbor can start successfully. 79 | -------------------------------------------------------------------------------- /Harbor 升级和数据库迁移向导 - master 分支.md: -------------------------------------------------------------------------------- 1 | # Harbor 升级和数据库迁移向导 - master 分支 2 | 3 | > ref: https://github.com/goharbor/harbor/blob/c9d51f2a7534a4d63f35e865cc1510dddbd91468/docs/migration_guide.md 4 | 5 | 该文档适用于从 v1.6.0 迁移到更高的版本(当前为 v1.7.0-rc2) 6 | 7 | > When upgrading your existing Harbor instance to a newer version, you may need to **migrate the data in your database and the settings in `harbor.cfg`**. Since the migration may alter the database schema and the settings of `harbor.cfg`, you should always back up your data before any migration. 8 | 9 | - 升级已存在的 harbor 实例到更新的版本时,需要迁移数据库中的数据和 `harbor.cfg` 的配置;因为迁移过程会导致数据库 schema 以及 `harbor.cfg` 中的配置发生变更; 10 | - 进行迁移前一定要进行数据库的备份; 11 | 12 | > NOTE: 13 | > 14 | > - Again, you must **back up your data** before any data migration. 15 | > - This guide only covers the **migration from v1.6.0 to current version**, if you are upgrading from earlier versions please refer to the migration guide in release branch to upgrade to v1.6.0 and follow this guide to do the migration to later version. 16 | > - From v1.6.0 on, Harbor will automatically try to do the migrate the DB schema when it starts, so **if you are upgrading from v1.6.0 or above it's not necessary to call the migrator tool to migrate the schema**. 17 | > - From v1.6.0 on, **Harbor migrates DB from `MariaDB` to `PostgreSQL`**, and combines Harbor, Notary and Clair DB into one. 18 | > - For the change in Database schema please refer to change log. 19 | 20 | 注意: 21 | 22 | - 数据迁移前一定要进行备份; 23 | - 该向导仅覆盖了从 v1.6.0 迁移到最新版本的步骤;如果您打算基于更早的版本进行迁移,则需要参考对应 release 分支中的迁移向导,先完成向 v1.6.0 版本的迁移;之后在基于当前向导迁移到更新的版本; 24 | - 从 v1.6.0 版本开始,Harbor 将在启动后自动进行 DB schema 迁移的尝试,因此,如果你打算你从 v1.6.0 或更高的版本进行迁移,则没有必要再调用迁移工具进行 schema 迁移了; 25 | - 从 v1.6.0 开始,Harbor 将 DB 从 `MariaDB` 变更为 `PostgreSQL` ,并且将 `Harbor`, `Notary` 和 `Clair` 的 DB 合并成了一个(即均使用 PostgreSQL); 26 | - 具体的数据库 schema 变更详见 [change log](https://github.com/goharbor/harbor/blob/c9d51f2a7534a4d63f35e865cc1510dddbd91468/tools/migration/db/changelog.md) ; 27 | 28 | ## Upgrading Harbor and migrating data 29 | 30 | - 登录 + 停止运行中的 Harbor 31 | 32 | ``` 33 | cd harbor 34 | docker-compose down 35 | ``` 36 | 37 | - 备份 Harbor 相关文件以便必要时回滚 38 | 39 | ``` 40 | mv harbor /my_backup_dir/harbor 41 | ``` 42 | 43 | - 备份数据库(默认位于 `/data/database`) 44 | 45 | ``` 46 | cp -r /data/database /my_backup_dir/ 47 | ``` 48 | 49 | - 获取最新 Harbor 发布包:https://github.com/goharbor/harbor/releases 50 | 51 | - 进行 Harbor 升级前需要先完成数据库迁移操作;迁移工具通过 docker image 提供,需要从 docker hub 上进行下载;在使用如下命令时,替换 `[tag]` 为指定的 release version(例如 v1.5.0): 52 | 53 | ``` 54 | docker pull goharbor/harbor-db-migrator:[tag] 55 | ``` 56 | 57 | 这里的说明存在问题:首先 v1.5.0 是不存在的,详见 https://hub.docker.com/r/goharbor/harbor-db-migrator/tags/ ;其次,应该是 `goharbor/harbor-db-migrator` 而不是 `goharbor/harbor-migrator` ,上面已调整; 58 | 59 | - 升级 `harbor.cfg` 文件;注意:`${harbor_cfg}` 会被覆盖,因此你必须将先备份,之后再拷贝回安装目录; 60 | 61 | ``` 62 | docker run -it --rm -v ${harbor_cfg}:/harbor-migration/harbor-cfg/harbor.cfg goharbor/harbor-db-migrator:[tag] --cfg up 63 | ``` 64 | 65 | 注意:schema upgrade 和数据库的数据迁移均在 Harbor 启动时由 adminserver 完成,如果迁移发生了失败,请查看 adminserver 的日志进行 debug ; 66 | 67 | - 进入 `./harbor` 目录,运行 `./install.sh` 脚本以安装新的 Harbor 实例;如果你打算安装具有 `Notary`, `Clair` 和 `chartmuseum` 等组件的 Harbor ,请参考 [Installation & Configuration Guide](https://github.com/goharbor/harbor/blob/c9d51f2a7534a4d63f35e865cc1510dddbd91468/docs/installation_guide.md) 文档; 68 | 69 | ## Roll back from an upgrade 70 | 71 | > For any reason, if you want to roll back to the previous version of Harbor, follow the below steps: 72 | > 73 | > **NOTE**: **Roll back doesn't support upgrade across v1.5.0**, like from v1.2.0 to v1.7.0. This is because Harbor changes DB to PostgreSQL from v1.7.0, the migrator cannot roll back data to MariaDB. 74 | 75 | 注意:回滚操作不支持跨 v1.5.0 版本的情况(例如从 v1.2.0 到 v1.7.0);因为 Harbor 从 v1.7.0 开始(实际是从 v1.6.0 开始)将数据库变成了 `PostgreSQL` ;而迁移工具无法将数据重新回滚为 `MariaDB` 格式; 76 | 77 | > - 停止并移除当前正在运行的 Harbor 服务 78 | 79 | ``` 80 | cd harbor 81 | docker-compose down 82 | ``` 83 | 84 | > - 删除当前 Harbor instance 目录 85 | 86 | ``` 87 | rm -rf harbor 88 | ``` 89 | 90 | > - 恢复打算回滚的目标 Harbor 旧版本 91 | 92 | ``` 93 | mv /my_backup_dir/harbor harbor 94 | ``` 95 | 96 | > - 恢复数据库,从备份目录中拷贝数据文件到你的数据卷中,默认为 `/data/database` ; 97 | > - 使用之前(备份)的配置重启 Harbor 服务;如果之前的 Harbor 是基于 release build 安装的,则可以执行 98 | 99 | ``` 100 | cd harbor 101 | ./install.sh 102 | ``` -------------------------------------------------------------------------------- /Harbor 升级和数据库迁移向导 - release-1.2.0 分支.md: -------------------------------------------------------------------------------- 1 | # Harbor 升级和数据库迁移向导 - release-1.2.0 分支 2 | 3 | > ref: https://github.com/goharbor/harbor/blob/release-1.2.0/docs/migration_guide.md 4 | 5 | 本文仅摘取关键差异部分; 6 | 7 | 8 | ---------- 9 | 10 | > NOTE: 11 | > 12 | > - **From v1.2 on**, you need to use the release version as the tag of the migrator image. 'latest' is no longer used for new release. 13 | > - You must back up your data before any data migration. 14 | 15 | ## Upgrading Harbor and migrating data 16 | 17 | > The migration tool is delivered as a docker image, so you should pull the image from docker hub. Replace `[tag]` with the release version of Harbor (e.g. 1.2) in the below command: 18 | 19 | ``` 20 | docker pull vmware/harbor-db-migrator:[tag] 21 | ``` 22 | 23 | > - Configure Harbor by modifying the file `harbor.cfg`, you may need to refer to the configuration files you've backed up during step 2. Refer to [Installation & Configuration Guide](https://github.com/goharbor/harbor/blob/release-1.2.0/docs/installation_guide.md) for more information. Since the content and format of `harbor.cfg` may have been changed in the new release, **DO NOT directly copy `harbor.cfg` from previous version of Harbor**. 24 | 25 | > **IMPORTANT**: If you are upgrading a Harbor instance with LDAP/AD authentication, you must make sure `auth_mode` is set to `ldap_auth` in `harbor.cfg` before launching the new version of Harbor. Otherwise, users may not be able to log in after the upgrade. 26 | 27 | > - To assist you in **migrating the `harbor.cfg` file from v0.5.0 to v1.1.x**, a script is provided and described as below. For other versions of Harbor, you need to manually migrate the file `harbor.cfg`. 28 | 29 | ``` 30 | cd harbor 31 | ./upgrade --source-loc source_harbor_cfg_loc --source-version 0.5.0 --target-loc target_harbor_cfg_loc --target-version 1.1.x 32 | ``` 33 | 34 | > **NOTE**: After running the script, make sure you go through `harbor.cfg` to verify all the settings are correct. You can make changes to `harbor.cfg` as needed. 35 | 36 | ## Migration tool reference 37 | 38 | > Use `help` command to show instructions of the migration tool: 39 | 40 | ``` 41 | docker run --rm -e DB_USR=root -e DB_PWD=xxxx vmware/harbor-db-migrator:[tag] help 42 | ``` 43 | -------------------------------------------------------------------------------- /Harbor 升级和数据库迁移向导 - release-1.5.0 分支.md: -------------------------------------------------------------------------------- 1 | # Harbor 升级和数据库迁移向导 - release-1.5.0 分支 2 | 3 | > ref: https://github.com/goharbor/harbor/blob/release-1.5.0/docs/migration_guide.md 4 | 5 | 本文仅摘取关键差异部分; 6 | 7 | 8 | ---------- 9 | 10 | 11 | > If your install Harbor for the first time, or the database version is the same as that of the lastest version, you do not need any database migration. 12 | 13 | 首次安装 Harbor 或数据库版本和最新版本相同,则无需进行数据库迁移; 14 | 15 | > NOTE: 16 | > 17 | > - From v1.5.0 on, the migration tool add support for the `harbor.cfg` migration, which supports upgrade from v1.2.x, v1.3.x and v1.4.x. 18 | > - From v1.2 on, you need to use the release version as the tag of the migrator image. 'latest' is no longer used for new release. 19 | > - You must back up your data before any data migration. 20 | > - To migrate harbor OVA, please refer [migrate OVA guide](https://github.com/goharbor/harbor/blob/release-1.5.0/docs/migrate_ova_guide.md) 21 | 22 | 注意: 23 | 24 | - 从 v1.5.0 开始,迁移工具增加了针对 `harbor.cfg` 迁移的支持;支持以 v1.2.x, v1.3.x 和 v1.4.x 为基础版本的迁移; 25 | - 从 v1.2 开始,你需要使用 release version 作为 migrator 镜像的 tag ;'latest' 这个 tag 已经不再用作新 release 来使用; 26 | - 在任何数据迁移前,你必须备份数据; 27 | 28 | 29 | ## Upgrading Harbor and migrating data 30 | 31 | - Before upgrading Harbor, perform migration first. The migration tool is delivered as a docker image, so you should pull the image from docker hub. Replace [tag] with the release version of Harbor (e.g. v1.5.0) in the below command: 32 | 33 | ``` 34 | docker pull vmware/harbor-db-migrator:[tag] 35 | ``` 36 | 37 | 这里的说明存在问题:首先 v1.5.0 是不存在的,详见 https://hub.docker.com/r/vmware/harbor-db-migrator/tags/ ;其次,应该是 `vmware/harbor-db-migrator` 而不是 `vmware/harbor-migrator` ,上面已调整; 38 | 39 | 40 | - **Back up** database/`harbor.cfg` to a directory such as `/path/to/backup`. 41 | 42 | > NOTE: Upgrade from harbor 1.2 or older to harbor 1.3 must use `vmware/harbor-db-migrator:1.2`. Because DB engine replaced by `MariaDB` in harbor 1.3 43 | 44 | 注意:从 harbor 1.2 或之前的版本升级到 harbor 1.3 必须使用 `vmware/harbor-db-migrator:1.2` ,因为 DB 引擎在 harbor 1.3 中被替换成了 `MariaDB` ; 45 | 46 | ``` 47 | docker run -it --rm -e DB_USR=root -e DB_PWD={db_pwd} -v ${harbor_db_path}:/var/lib/mysql -v ${harbor_cfg}:/harbor-migration/harbor-cfg/harbor.cfg -v ${backup_path}:/harbor-migration/backup vmware/harbor-db-migrator:[tag] backup 48 | ``` 49 | 50 | > NOTE: By default, the migrator handles the backup for DB and CFG. If you want to backup DB or CFG only, refer to the following commands: 51 | 52 | ``` 53 | docker run -it --rm -e DB_USR=root -e DB_PWD={db_pwd} -v ${harbor_db_path}:/var/lib/mysql -v ${backup_path}:/harbor-migration/backup vmware/harbor-db-migrator:[tag] --db backup 54 | 55 | docker run -it --rm -v ${harbor_cfg}:/harbor-migration/harbor-cfg/harbor.cfg -v ${backup_path}:/harbor-migration/backup vmware/harbor-db-migrator:[tag] --cfg backup 56 | ``` 57 | 58 | 默认情况下,migrator 会同时备份 DB 和 CFG ;可以通过上述两个命令进行单独备份; 59 | 60 | 61 | - **Upgrade** database schema, `harbor.cfg` and migrate data. 62 | 63 | ``` 64 | docker run -it --rm -e DB_USR=root -e DB_PWD={db_pwd} -v ${harbor_db_path}:/var/lib/mysql -v ${harbor_cfg}:/harbor-migration/harbor-cfg/harbor.cfg vmware/harbor-db-migrator:[tag] up 65 | ``` 66 | 67 | > NOTE: By default, the migrator handles the upgrade for DB and CFG. If you want to upgrade DB or CFG only, refer to the following commands: 68 | 69 | ``` 70 | docker run -it --rm -e DB_USR=root -e DB_PWD={db_pwd} -v ${harbor_db_path}:/var/lib/mysql vmware/harbor-db-migrator:[tag] --db up 71 | 72 | docker run -it --rm -v ${harbor_cfg}:/harbor-migration/harbor-cfg/harbor.cfg vmware/harbor-db-migrator:[tag] --cfg up 73 | ``` 74 | 75 | > NOTE: Some errors like 76 | 77 | ``` 78 | [ERROR] Missing system table mysql.roles_mapping; please run mysql_upgrade to create it 79 | [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position ... ... 80 | [ERROR] mysqld: Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler. 81 | [Warning] Failed to load slave replication state from table mysql.gtid_slave_pos: 1146: Table 'mysql.gtid_slave_pos' doesn't exist 82 | ``` 83 | 84 | > will be occurred during upgrading **from harbor 1.2 to harbor 1.3**, just ignore them if harbor can start successfully. 85 | 86 | 可以忽略的错误信息; 87 | 88 | ### Roll back from an upgrade 89 | 90 | > NOTE: **Rollback from harbor 1.3 to harbor 1.2** should delete `/data/database` directory first, then create new database directory by `docker-compose up -d && docker-compose stop`. And must use `vmware/harbor-db-migrator:1.2` to restore. Because of DB engine change. 91 | 92 | 从 harbor 1.3 回滚到 harbor 1.2 要求:先删除 `/data/database` 目录,之后通过 `docker-compose up -d && docker-compose stop` 创建新的数据库目录;同时必须使用 `vmware/harbor-db-migrator:1.2` 进行恢复; 93 | 94 | 95 | > Use `test` command to test mysql connection: 96 | 97 | ``` 98 | docker run -it --rm -e DB_USR=root -e DB_PWD={db_pwd} -v ${harbor_db_path}:/var/lib/mysql -v ${harbor_cfg}:/harbor-migration/harbor-cfg/harbor.cfg vmware/harbor-db-migrator:[tag] test 99 | ``` 100 | 101 | 可以使用 `test` 命令测试 mysql 连接; 102 | -------------------------------------------------------------------------------- /Harbor 升级和数据库迁移向导 - release-1.6.0 分支.md: -------------------------------------------------------------------------------- 1 | # Harbor 升级和数据库迁移向导 - release-1.6.0 分支 2 | 3 | > ref: https://github.com/goharbor/harbor/blob/release-1.6.0/docs/migration_guide.md 4 | 5 | 本文仅摘取关键差异部分; 6 | 7 | ---------- 8 | 9 | > NOTE: 10 | > 11 | > - Please use `goharbor/harbor-migrator:1.6.3` instead if you're performing migration, as the v1.6.3 includes a fix for issue https://github.com/goharbor/harbor/issues/6465. 12 | > - **From v1.6.0 on, Harbor migrates DB from `MariaDB` to `Postgresql`**, and combines Harbor, Notary and Clair DB into one. 13 | > - **From v1.5.0 on**, the migration tool add support for the `harbor.cfg` migration, which supports upgrade from v1.2.x, v1.3.x and v1.4.x. 14 | > - **From v1.2 on**, you need to use the release version as the tag of the migrator image. 'latest' is no longer used for new release. 15 | > - You must back up your data before any data migration. 16 | > - To migrate harbor OVA, please refer [migrate OVA guide](https://github.com/goharbor/harbor/blob/release-1.6.0/docs/migrate_ova_guide.md). 17 | 18 | 注意: 19 | 20 | - 数据库迁移工具需要使用 [`goharbor/harbor-migrator:1.6.3`](https://hub.docker.com/r/goharbor/harbor-migrator/tags/) ; 21 | - 从 v1.6.0 开始,Harbor 将自身使用的 DB 从 `MariaDB` 迁移成 `Postgresql` ; 22 | 23 | ## Upgrading Harbor and migrating data 24 | 25 | > NOTE: **[Before harbor 1.5](https://hub.docker.com/r/goharbor/harbor-db-migrator/tags/)** , image name of the migration tool is `goharbor/harbor-db-migrator:[tag]` 26 | 27 | ``` 28 | docker pull goharbor/harbor-migrator:[tag] 29 | ``` 30 | 31 | > NOTE: Upgrade from harbor 1.2 or older to harbor 1.3 must use `goharbor/harbor-db-migrator:1.2`. Because DB engine replaced by MariaDB in harbor 1.3 32 | 33 | > NOTE: **In v1.6.0, you needs to DO three sequential steps to fully migrate Harbor, `Notary` and `Clair`'s DB**. The migration of `Notary` and `Clair`'s DB depends on Harbor's DB, you need to first upgrade Harbor's DB, then upgrade `Notary` and `Clair`'s DB. The following command handles the upgrade for Harbor DB and CFG, not include `Notary` and `Clair` DB. 34 | 35 | 注意:在 v1.6.0 版本中,你需要按顺序执行如下三个步骤:先迁移 Harbor 数据库,之后才是 `Notary` 和 `Clair` 的数据库; 36 | 37 | ``` 38 | # Harbor 的 DB (MariaDB) 和 CFG 迁移 39 | docker run -it --rm -e DB_USR=root -e DB_PWD={db_pwd} -v ${harbor_db_path}:/var/lib/mysql -v ${harbor_cfg}:/harbor-migration/harbor-cfg/harbor.cfg goharbor/harbor-migrator:[tag] up 40 | ``` 41 | 42 | > NOTE: **You must run migration of Notary and Clair's DB before launch Harbor**. If you want to upgrade Notary and Clair DB, refer to the following commands: 43 | 44 | ``` 45 | # 迁移 notary-db (postgresql) 46 | docker run -it --rm -e DB_USR=root -v /data/notary-db/:/var/lib/mysql -v /data/database:/var/lib/postgresql/data goharbor/harbor-migrator:${tag} --db up 47 | 48 | # 迁移 clair-db (postgresql) 49 | docker run -it --rm -v /data/clair-db/:/clair-db -v /data/database:/var/lib/postgresql/data goharbor/harbor-migrator:${tag} --db up 50 | ``` 51 | 52 | > NOTE: If you want to upgrade DB or CFG only, refer to the following commands: 53 | 54 | ``` 55 | # 迁移 Harbor 的 DB 56 | docker run -it --rm -e DB_USR=root -e DB_PWD={db_pwd} -v ${harbor_db_path}:/var/lib/mysql goharbor/harbor-migrator:[tag] --db up 57 | 58 | # 迁移 notary 的 DB 59 | docker run -it --rm -e DB_USR=root -v /data/notary-db/:/var/lib/mysql -v /data/database:/var/lib/postgresql/data goharbor/harbor-migrator:${tag} --db up 60 | 61 | # 迁移 clair 的 DB 62 | docker run -it --rm -v /data/clair-db/:/clair-db -v /data/database:/var/lib/postgresql/data goharbor/harbor-migrator:${tag} --db up 63 | 64 | # 迁移 Harbor 的 CFG 65 | docker run -it --rm -v ${harbor_cfg}:/harbor-migration/harbor-cfg/harbor.cfg goharbor/harbor-migrator:[tag] --cfg up 66 | ``` 67 | 68 | DB 和 CFG 分开迁移的玩法; 69 | 70 | > NOTE: Some errors like 71 | 72 | ``` 73 | [ERROR] Missing system table mysql.roles_mapping; please run mysql_upgrade to create it 74 | [ERROR] Incorrect definition of table mysql.event: expected column 'sql_mode' at position ... ... 75 | [ERROR] mysqld: Event Scheduler: An error occurred when initializing system tables. Disabling the Event Scheduler. 76 | [Warning] Failed to load slave replication state from table mysql.gtid_slave_pos: 1146: Table 'mysql.gtid_slave_pos' doesn't exist 77 | ``` 78 | 79 | will be occurred during **upgrading from harbor 1.2 to harbor 1.3**, just ignore them if harbor can start successfully. 80 | 81 | ``` 82 | /usr/lib/python2.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: . 83 | ``` 84 | 85 | will be occurred during **upgrading from harbor <= v1.5.0 to harbor v1.6.0**, just ignore them if harbor can start successfully. 86 | 87 | 可能遇到的错误,如果 harbor 能够成功启动,则直接忽略; 88 | 89 | ## Roll back from an upgrade 90 | 91 | > NOTE: **Roll back doesn't support upgrade across v1.5.0, like from v1.2.0 to v1.6.0**. It's because Harbor changes DB to `Postgresql` from v1.6.0, the migrator cannot roll back data to `MariaDB`. 92 | 93 | 注意:回滚操作不支持跨 v1.5.0 版本的情况(例如从 v1.2.0 到 v1.6.0);因为 Harbor 从 v1.6.0 开始将自身使用的数据库变成了 `PostgreSQL` ;而迁移工具无法将数据重新回滚成 `MariaDB` 格式; 94 | 95 | -------------------------------------------------------------------------------- /Harbor 升级问题汇总.md: -------------------------------------------------------------------------------- 1 | # Harbor 升级问题汇总 2 | 3 | ## [where I can get goharbor/harbor-db-migrator:1.6.3 as per release-1.6.0/docs/migration_guide.md](https://github.com/goharbor/harbor/issues/6528) 4 | 5 | migrator tools 的变更历史: 6 | 7 | - https://hub.docker.com/r/vmware/harbor-db-migrator/tags/ -- almost deprecated, but can work well in some cases, not suggest to use. It is a hub keeping migrator tools before harbor entering into CNCF. Tags involved: 8 | - 0.4.5 9 | - 1.2 10 | - 1.3 11 | - 1.4 12 | - https://hub.docker.com/r/vmware/harbor-migrator/tags -- for what use? 13 | - v1.5.0 14 | - https://hub.docker.com/r/goharbor/harbor-db-migrator/tags/ -- This hub is used to keep migrator tools after harbor entering into CNCF, as you can see, the contents are almost the same as above. So it is just a copy. Tags involved: 15 | - 1.2 16 | - 1.3 17 | - 1.4 18 | - https://hub.docker.com/r/goharbor/harbor-migrator/tags/ -- This hub keeps the most recently migrator tools. Tags involved: 19 | - v1.5.0 20 | - v1.6.0 21 | - v1.6.1 22 | - v1.6.3 23 | 24 | ## [upgrade harbor from 1.2.2 to 1.6.0,but occur error while database backup](https://github.com/goharbor/harbor/issues/6139) 25 | 26 | 要点: 27 | 28 | - please use the [`goharbor/harbor-db-migrator:1.2`](https://hub.docker.com/r/goharbor/harbor-db-migrator/tags) to **backup** data, and follow the [guide on branch release-1.2.0](https://github.com/goharbor/harbor/blob/release-1.2.0/docs/migration_guide.md) 29 | - "**How should I rollback from 1.6.0 to 1.2.2** because the guide said don't support it.", the harbor doesn't support rollback (这里应该是指“不支持从 1.6.0 回滚到 1.2.2 版本,因为底层数据库发生了变更”). But, I think you can do it manually. Just clean your data folder and import the dumped data into Db. 30 | - "**import the dumped data into Db** -- Do you mean that login the db container and exec the dumped sql ?". The easiest way to backup data is just to copy files. But, I suggest you just use the `harbor-db-migrator:1.2` to backup `MySQL` data. If any issue during upgrade, you can rollback the data with MySQL command in the DB container, no need to login. 31 | 32 | 这里没有回答清楚究竟应该使用哪个 hub 上的 1.2 ;已经在 issue 中提问; 33 | 34 | ## [Upgrade from v1.3.0-rc4 to v1.6.3 fails](https://github.com/goharbor/harbor/issues/6523) 35 | 36 | By default, the `harbor.cfg` used by the `install.sh` is the one in the offline installer, please either to modify it or to replace it with upgraded one. 37 | 38 | > 疯狂更新中 39 | 40 | ## [Unknown error has occured - Error Message](https://github.com/goharbor/harbor/issues/5975) 41 | 42 | 问题现象: 43 | 44 | - 在 ui.log 中 45 | 46 | ``` 47 | ... 48 | Oct 1 13:44:42 172.18.0.1 ui[597]: 2018-10-01T11:44:42Z [WARNING] Harbor is not deployed with Clair, it's not impossible to get vulnerability details. 49 | Oct 1 13:44:42 172.18.0.1 ui[597]: 2018/10/01 11:44:42 #033[1;44m[D] [server.go:2619] | 10.3.0.133|#033[41m 503 #033[0m| 701.153µs| match|#033[44m GET #033[0m /api/repositories/windows/visual-studio-2017-build-tools/tags/latest/vulnerability/details r:/api/repositories/*/tags/:tag/vulnerability/details#033[0m 50 | ... 51 | ``` 52 | 53 | - 在 proxy.log 中 54 | 55 | ``` 56 | Oct 18 08:37:17 172.18.0.1 proxy[616]: 10.3.0.133 - "GET /api/repositories/windows/visual-studio-2017-build-tools/tags/latest/vulnerability/details HTTP/1.1" 503 1 "https://10.100.152.165/harbor/projects/5/repositories/windows%2Fvisual-studio-2017-build-tools/tags/latest" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36" 0.000 0.000 . 57 | ``` 58 | 59 | > - This error message appears after **upgrading harbor from v1.1.2 to v1.5.0**. 60 | > - There were no errors while updating habor. 61 | > - I tried to **Update harbor from v1.5.0 to v1.6.0** but the error message still appears. 62 | 63 | 通过 web console 可以看到调用 /vulnerability/details 这个 API 时触发了 503 ; 64 | 65 | 66 | 问题原因: 67 | 68 | > This 503 issue was because UI tries to **get information from Clair while Clair is not installed**, this should be fixed in 1.6, could you confirm you still see this issue after 1.6.0? 69 | 70 | > So the error was because the browser tries to load the vulnerability of an image when Clair is not installed, **this was a known issue in 1.5 and fixed in 1.6** 71 | 72 | > If you keep seeing this in 1.6, it's **maybe due to browser cache**, could you clean up the cache and retry? If the problem persists please upload the complete logs again. 73 | 74 | 75 | ## [Is it possible to migrate from 1.5.1 to 1.6.0 version](https://github.com/goharbor/harbor/issues/6004) 76 | 77 | you can **use `goharbor/harbor-migrator:v1.6.0` to migrate from 1.5.1 to 1.6.0** 78 | 79 | ## [harbor upgrade directly from v1.1.1 to v1.5.0, only to v1.4.0](https://github.com/goharbor/harbor/issues/5745) 80 | 81 | 问题现象: 82 | 83 | > I am using harbor v1.1.1 and I want to upgrade to v1.5.0 84 | > 85 | > I was upgrading harbor using the command below, but it just upgrade to v1.4.0. 86 | 87 | ``` 88 | docker run -it --rm -e DB_USR=root -e DB_PWD=root123 -v /data/database:/var/lib/mysql -v /root/harbor_test/harbor/harbor.cfg:/harbor-migration/harbor-cfg/harbor.cfg vmware/harbor-migrator:v1.5.0 up 89 | ``` 90 | 91 | 问题原因: 92 | 93 | > Can you please use the v1.2.0 migrator to upgrade to v1.2.0 DB scheme firstly? 94 | > 95 | > And then to upgrade to v1.5.0, it seems that there is a conflict in the alchemy framework. 96 | 97 | ## [No image to migrate from v1.2.0 to higher version for Harbor](https://github.com/goharbor/harbor/issues/5232) 98 | 99 | > So you should probably try to use `vmware/harbor-db-migrator:1.2` (**only for backing up your DB**) 100 | > 101 | > If you want to upgrade your DB, use the command from the upgrade notes with the matching tag (for example "v1.5.0" for version 1.5) 102 | 103 | > The `vmware/harbor-db-migrator:1.2` image is just used for backup 104 | 105 | ## [DB schema upgrade problem from 1.2.2 to 1.3.0](https://github.com/goharbor/harbor/issues/3949) 106 | 107 | > you should 108 | > 109 | > - (这里缺失了 backup 步骤) 110 | > - delete `/data/database` dir first. 111 | > - And start harbor with empty MySQL. 112 | > - Then restore with migrator v1.2 113 | 114 | 这个是 vmware 的人给的方法,然后说的不清楚,也不完整; 115 | 116 | > I faced the same problem last week. It works with the following process: 117 | > 118 | > - **Backup** your database with **migrator v1.2** 119 | > - Delete `/data/database` 120 | > - Start and stop harbor-db v1.3.0 to initiate an empty mariadb instance 121 | > - **Restore** your backup with **migrator v1.3** 122 | > - Upgrade with migrator v1.3 123 | 124 | 这是另外一个人给的完整步骤,这个应该是 ok 的; 125 | -------------------------------------------------------------------------------- /Harbor 服务搭建之网络互通问题.md: -------------------------------------------------------------------------------- 1 | # Harbor 服务搭建之网络互通问题 2 | 3 | staging 和 prod 环境之间网络互通涉及以下两方面配置: 4 | 5 | - Security Groups 6 | - VPC peering 7 | 8 | ## Security Groups 9 | 10 | ### Amazon EC2 Security Groups for Linux Instances 11 | 12 | > 英文地址:[这里](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html) 13 | > 中文地址:[Linux 实例的 Amazon EC2 个安全组](http://docs.aws.amazon.com/zh_cn/AWSEC2/latest/UserGuide/using-network-security.html) 14 | 15 | 安全组基础知识介绍; 16 | 17 | ### harbor staging SG 18 | 19 | > EC2 信息查询:[这里](http://splaydock.llsstaging.com/instances?utf8=%E2%9C%93&dept=all_dept&name=registry&private_ip=&ip=&commit=Search) 20 | > SG 完整列表:[这里](http://splaydock.llsstaging.com/security_groups) 21 | 22 | | Name | Security Group Id | Vpc Id | Description | Inbound | 23 | | -- | -- | -- | -- | -- | 24 | | shared-default | sg-ddd7e9b8 | vpc-a0e20cc4 | Default shared SG for ssh created 2016-08-09 14:05:26 | tcp://22 => 0.0.0.0/0
tcp://23333~23433 => 10.1.0.0/16
tcp://9100 => 10.1.0.0/16 | 25 | | docker-registry-harbor | sg-d17a68b4 | vpc-a0e20cc4 | docker-registry-harbor created at 2016-10-24 14:50:49 +0800 CST | tcp://80 => 10.1.0.0/16,10.2.0.0/16,10.3.0.0/16,10.4.0.0/16
tcp://443 => 10.1.0.0/16,10.2.0.0/16,10.3.0.0/16,10.4.0.0/16
tcp://5201 => 10.0.0.0/8 | 26 | | spinnaker-web | sg-27df3d43 | vpc-a0e20cc4 | spinnaker-web created at 2016-12-6 10:50:49 +0800 CST | tcp://80 => 52.80.44.111/32,54.223.229.211/32
tcp://443 => 52.80.44.111/32,54.223.229.211/32 | 27 | 28 | 29 | ### harbor prod SG 30 | 31 | > EC2 信息:[这里](https://playdock.llsapp.com/instances?utf8=%E2%9C%93&dept=all_dept&name=harbor&private_ip=&ip=&commit=Search) 32 | > SG 完整列表:[这里](https://playdock.llsapp.com/security_groups) 33 | 34 | | Name | Security Group Id | Vpc Id | Description | Inbound | 35 | | -- | -- | -- | -- | -- | 36 | | shared-default | sg-bea396db | vpc-72abbd10 | Default shared SG for ssh created 2016-07-11 13:43:00 +0800 | tcp://22 => 172.1.0.0/16,172.2.0.0/16,172.31.0.0/16
tcp://23333~23433 => 172.1.0.0/16,172.2.0.0/16,172.31.0.0/16
tcp://9100 => 172.31.0.0/16 | 37 | | harbor | sg-51759d35 | vpc-72abbd10 | harbor created at 2016-12-23 16:38:36 +0800 CST | tcp://80 => 172.31.0.0/16,172.1.0.0/16,172.2.0.0/16,172.3.0.0/16,10.1.1.37/32,10.1.0.13/32,54.223.229.211/32
tcp://22 => 172.31.0.0/16
tcp://443 => 172.31.0.0/16,172.1.0.0/16,172.2.0.0/16,172.3.0.0/16,10.1.1.37/32,10.1.0.13/32,54.223.229.211/32 | 38 | 39 | 40 | ### 基于 Terraform 生成的安全组信息 41 | 42 | 以下内容取自 `spiral/platform/harbor/sg.tf` ,可以基于其内容更好的理解上面的规则; 43 | 44 | ``` 45 | # https://www.terraform.io/docs/providers/aws/r/security_group.html 46 | resource "aws_security_group" "harbor" { 47 | name = "harbor" 48 | description = "harbor created at 2016-12-23 16:38:36 +0800 CST" 49 | vpc_id = "vpc-72abbd10" 50 | 51 | # Inbound 52 | ingress { 53 | # ssh 54 | from_port = 22 55 | to_port = 22 56 | protocol = "tcp" 57 | cidr_blocks = ["172.31.0.0/16"] 58 | } 59 | 60 | # Inbound 61 | ingress { 62 | # ssh 63 | from_port = 80 64 | to_port = 80 65 | protocol = "tcp" 66 | # 172.31 is the share cluster 67 | # 172.1 is prod0 k8s vpc 68 | # 172.2 is prod1 k8s vpc 69 | # 172.3 is prod2 k8s vpc 70 | # 10.1.1.37/32 is old harbor staging to sync replica 71 | # 10.1.0.13/32 is new harbor staging to sync replica 72 | # 54.223.229.211/32 is spinnaker in staging env 73 | cidr_blocks = ["172.31.0.0/16", "172.1.0.0/16", "172.2.0.0/16", "172.3.0.0/16", "10.1.1.37/32", "10.1.0.13/32", "54.223.229.211/32"] 74 | } 75 | 76 | # Inbound 77 | ingress { 78 | # ssh 79 | from_port = 443 80 | to_port = 443 81 | protocol = "tcp" 82 | # 172.31 is the share cluster 83 | # 172.1 is prod0 k8s vpc 84 | # 172.2 is prod1 k8s vpc 85 | # 172.3 is prod2 k8s vpc 86 | # 10.1.1.37/32 is old harbor staging to sync replica 87 | # 10.1.0.13/32 is new harbor staging to sync replica 88 | # 54.223.229.211/32 is spinnaker in staging env 89 | cidr_blocks = ["172.31.0.0/16", "172.1.0.0/16", "172.2.0.0/16", "172.3.0.0/16", "10.1.1.37/32", "10.1.0.13/32", "54.223.229.211/32"] 90 | } 91 | 92 | # Outbound 93 | # All traffic for outbound, it's OK in most cases 94 | egress { 95 | from_port = 0 96 | to_port = 0 97 | protocol = "-1" 98 | cidr_blocks = ["0.0.0.0/0"] 99 | } 100 | } 101 | ``` 102 | 103 | ---------- 104 | 105 | ## VPC peering 106 | 107 | **VPC peering**: A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses. Instances in either VPC can communicate with each other as if they are within the same network. You can create a VPC peering connection between your own VPCs, or with a VPC in another AWS account. In both cases, the VPCs must be in the same region. 108 | 109 | 参考: 110 | 111 | - [VPC Peering](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-peering.html) 112 | - [What is VPC Peering?](http://docs.aws.amazon.com/AmazonVPC/latest/PeeringGuide/Welcome.html) -------------------------------------------------------------------------------- /Hunter 项目介绍.md: -------------------------------------------------------------------------------- 1 | # Hunter 项目介绍 2 | 3 | 项目主页:https://phab.llsapp.com/w/engineer/ops/hunter/ 4 | 5 | ## 项目价值 6 | 7 | - 耗时问题 8 | - 服务依赖关系展示 9 | - 组件失败展示,失败原因展示 10 | - opencensus 和 opentracing 相比,标准定义更详细,约束更明确;通过 opencensus 的 stats 可以统一化 metric 的上报格式问题; 11 | 12 | ## 系统架构 13 | 14 | Ver1 结构图 15 | 16 | ![](https://phab.llsapp.com/file/data/n5oxrxqonv7rpub2br6d/PHID-FILE-6clrcshyobcewttn5d55/image.png) 17 | 18 | Ver2 结构图 19 | 20 | ![](https://phab.llsapp.com/file/data/ik4gzber3dcnjocwxdyo/PHID-FILE-c5vbf5fctmpaeqlczjwb/image.png) 21 | 22 | ## 技术栈 23 | 24 | - [opencensus](https://opencensus.io/) 25 | - [opencensus-go](https://github.com/census-instrumentation/opencensus-go) 26 | - [opencensus-ruby](https://github.com/census-instrumentation/opencensus-ruby) 27 | - opencensus [exporter](https://github.com/census-instrumentation/opencensus-go/tree/master/exporter) -- 是 opencensus-go 的一部分,目前不支持对接 kafka ,**需要自行实现**(moooofly); 28 | - [go-kit/tracing/opencensus](https://github.com/go-kit/kit/tree/master/tracing/opencensus) -- Go kit uses the opencensus-go implementation to power its middlewares. -- 用于学习参考; 29 | - kafka 30 | - spark 31 | - spark streaming -- 从 kafka 消费 opencensus 数据,进行流式处理后,再写入 kafka ;**需要自行实现**; 32 | - spark job -- 应该是类似 Map-Reduce 的东西,从 cassandra 中获取当天的数据计算服务之间的拓扑关系,以便 jaeger ui 展示需要;**需要自行实现**; 33 | - [vizceral](https://github.com/Netflix/vizceral) -- 能够从全局的角度展示服务之间的拓扑关系以及流量和错误率等信息; 34 | - vizceral collector -- **需要自行实现**,从 kafka 获取输出,转换成 vizceral 需要的格式后,提供给 vizceral ; 35 | - hunter adapter:从 kafka 消费 opencensus 数据,转换成 jaeger collector 需要的数据格式,输出到 jaeger collector ;**需要自行实现**; 36 | - jaeger -- jaeger clients are language specific implementations of the OpenTracing API. 37 | - [jaeger collector](https://github.com/jaegertracing/jaeger/tree/master/cmd/collector): 从 hunter adapter 处获取 opencensus 数据,对数据进行加工处理,之后存储到 cassandra 中;**需要自行实现**; 38 | - [jaeger-client-go](https://github.com/jaegertracing/jaeger-client-go) -- 应该会用到; 39 | - [jaeger query](https://github.com/jaegertracing/jaeger/tree/master/cmd/query) -- 从 cassandra 查询数据,供 jaeger-ui 展示使用; 40 | - [jaeger-ui](https://github.com/jaegertracing/jaeger-ui) 41 | - [cassandra](https://github.com/apache/cassandra) 42 | 43 | > 注意:v2 中已经将 hunter adapter 和 jaeger collector 合并为 hunter collector ; 44 | 45 | ## 关键点 46 | 47 | - 第一阶段只考虑 trace ,不考虑 stats 问题; 48 | - 针对 exporter 的实现,需要考虑 49 | - 针对数据库的 query 和 update 操作进行 wrap ; 50 | - 针对那些操作不需要 wrap ; 51 | - 涉及到的 trace 点 52 | - grpc -- 重点 53 | - http -- 次重点 54 | - redis 55 | - mysql 56 | 57 | 58 | ## Related 59 | 60 | - [opencensus-go](https://github.com/census-instrumentation/opencensus-go) 61 | - [opencensus-specs](https://github.com/census-instrumentation/opencensus-specs) -- 标准文档 62 | - [Shopify/sarama](https://github.com/Shopify/sarama) -- 业务组使用的 kafka 库; 63 | - [elastic/beats](https://github.com/elastic/beats/) -- 包含对 Shopify/sarama 的使用,参考使用; 64 | 65 | 66 | ## protobuf 67 | 68 | 可以对比看 69 | 70 | - https://github.com/census-instrumentation/opencensus-proto/blob/master/opencensus/proto/trace/trace.proto 71 | - https://developers.google.com/protocol-buffers/docs/reference/proto3-spec 72 | 73 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 moooofly 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /New Relic 知识梳理.md: -------------------------------------------------------------------------------- 1 | # New Relic 知识梳理 2 | 3 | ## New Relic APM 4 | 5 | 一句话总结:提供应用性能监控能力; 6 | 7 | [New Relic APM Features](https://newrelic.com/application-monitoring/features) 主要分为以下几个方面(按照功能分类): 8 | 9 | > 按照产品分类移步[这里](https://newrelic.com/application-monitoring/features/edition); 10 | 11 | - Application Monitoring (*) 12 | - Database Monitoring 13 | - Availability & Error 14 | - Monitoring 15 | - Reports 16 | - Team Collaboration 17 | - Security 18 | 19 | > Some features are exclusive to **New Relic APM Pro**. 20 | 21 | 很多功能都是收费的~~ 22 | 23 | ### Application Monitoring 24 | 25 | > Application Monitoring **in one place** allows you to see your application performance **trends** at a glance – from **page load times**, to **error rates**, **slow transactions**, and a list of servers running the app. 26 | 27 | 关键: 28 | 29 | - 基于单一视图查看应用性能变化趋势; 30 | - 提供 31 | - 页面加载时间 32 | - 错误率 33 | - 慢事务 34 | - 构成当前 app 的运行服务器列表 35 | 36 | #### Response Time, Throughput, and Error Rates 37 | 38 | > **Response time** is an average of the total time spent across all web transactions occurring within a selected time frame on the server-side. **Throughput** is requests per minute through the application. **Error rates** measures the percentage of errors over a set time period for your application. 39 | 40 | 关键: 41 | 42 | - **Response time** 针对的是: 43 | - 服务器侧 44 | - 选定时间片内 45 | - 全部 web transactions 上花费总时间的平均值 46 | - **Throughput** 为特定应用上每分钟的请求数量(requests per minute, `rpm`) 47 | - **Error rates** 给出的是目标应用在一定时间段内错误百分比; 48 | 49 | 50 | #### Most Time-Consuming Transactions 51 | 52 | > Transactions can be accessed both on the overview page and in the left hand navigation. This list of most time-consuming web transactions provides **aggregated details** around the surfaced slow transactions occurring during a specified time window alongside the total **response time** for each. Drill into these transactions to discover **specific details** around each individual transaction, where time is being spent, throughput and drill into **transaction traces**. 53 | 54 | 关键: 55 | 56 | - 排序规则:基于 Transactions Time-Consuming 数值从大到小排序; 57 | - `Transactions Time-Consuming = Avg.response time * Throughput` 58 | - 排查问题过程中,既要考虑 **Avg.response time** 的绝对数值,也要考虑 **Throughput** 的绝对数值; 59 | - 在未选中特定 transaction 前,展示的是 aggregated details ;选中后展示的是 specific details ; 60 | - aggregated details 仅适合观察最慢的几个接口都有谁,以及总体的变化趋势; 61 | - **specific details 适用于查看具体接口慢在那一层次的调用上**; 62 | - **Transaction traces** 提供了代码中出现的、深层次的 slowdowns 信息,并能深入到某个特定的 method 或 database query 上,但需要升级到 New Relic Pro 才能使用; 63 | 64 | #### Transaction Metrics and Traces 65 | 66 | > Visualizing where your app is spending its time is a game-changer. You can't fix what you can't see. Transaction tracing extends the New Relic service to the tiniest transaction detail levels - Collecting data for your slowest individual HTTP requests, all the way down to the SQL. 67 | 68 | 用不了 69 | 70 | 71 | #### Performance of External Services 72 | 73 | > External service instrumentation captures calls to out-of-process services such as web services, resources in the cloud, and any other network calls. The external services dashboard provides charts with your top five external services by response time and external **calls per minute**. 74 | 75 | 关键: 76 | 77 | - 用于排查**由于外部服务导致的访问延迟**问题; 78 | - 需要理解当前都有哪些外部服务以及常规表现,当前看到的外部服务有: 79 | - **ELB**: internal-venet-internal-prod-elb-1301483333.cn-north-1.elb.amazonaws.com.cn 80 | - **环信**: a1.easemob.com 81 | - **homepage**: homepage-prod-api-lb.backend 82 | - **wechat-prod**: 172.31.9.10 83 | - **微博开放平台**:api.weibo.com 84 | - api.weixin.qq.com 85 | 86 | 87 | ## New Relic Insights 88 | 89 | 一句话总结:主要提供 data analysis 能力; 90 | 91 | > Organize, visualize, evaluate. With in-depth analytics, you can better understand the end-to-end business impact of your software performance. 92 | 93 | 用不了 94 | 95 | ## New Relic Infrastructure 96 | 97 | 一句话总结:主要提供 monitoring 能力; 98 | 99 | > Get a precise picture of your dynamically changing systems. Scale rapidly. Deploy intelligently. Run your business optimally. 100 | 101 | 用不了 102 | 103 | ## Q&A 104 | 105 | ### 0x01 我们当前使用的版本和 ESSENTIALS/PRO 的关系? 106 | 107 | > xxx 108 | 109 | ### 0x02 Apdex 是什么? 110 | 111 | Apdex 是一种工业标准(industry standard),**用于衡量访问 web 应用和服务时,用户对 response time 的满意程度**;可以将其认为是一种简化的 Service Level Agreement (SLA) 解决方案,由此为 application owner 提供用户满意度的参考数据;和传统的平均应答时间(average response time)等 metrics 相比,其不容易受少量的、超长时间的 responses 的影响; 112 | 113 | 以下内容取自 http://apdex.org/overview.html 114 | 115 | > **Apdex (Application Performance Index)** is an open standard developed by an alliance of companies that defines a standardized method to **report**, **benchmark**, and **track** application performance. 116 | > 117 | > 评价: 118 | > 119 | > "Apdex represents a new milestone in application analysis. The network management industry needs an agreed upon standard to **reflect the experience of the end-user** for everyday applications. **Apdex is a natural extension to our tried and proven Application Response Time analysis** already incorporated into our existing products." 120 | > 121 | > 产生背景: 122 | > 123 | > Enterprises are swimming in IT performance numbers, but have no insight into how well their applications perform from a business point of view. Response time values do not reveal whether users are productive, and a greater number of samples leads to confusion. Averaging the samples washes out significant details about frustration with slow response times. Time values are not uniform across different applications. There should be a better way to analyze and measure what matters. 124 | 125 | ### 0x03 Apdex 的度量方式 126 | 127 | Apdex 是基于一组 threshold 值来对 response time 进行度量的;其计算了 satisfactory response 次数和 unsatisfactory response 次数的比例;response time 的获取是通过计算针对某个 asset 发起请求,与收到完整应答的时间差值计算得到的; 128 | 129 | application owner 定义一个 response time 的 threshold 为 T ;定义所有在 T 时间完成处理的 response 都是满足了 user 的; 130 | 131 | 可以针对每一个 New Relic apps 定义单独的 Apdex T 数值;还可以定义独立的 Apdex T thresholds 值用于 key transactions ; 132 | 133 | Apdex tracks three response counts: 134 | 135 | - **Satisfied**: The response time is **less than or equal to T**. 136 | - **Tolerating**: The response time is **greater than T and less than or equal to 4T**. 137 | - **Frustrated**: The response time is **greater than 4T**. 138 | 139 | For example, if T is 1.2 seconds and a response completes in 0.5 seconds, then the user is **satisfied**. All responses greater than 1.2 seconds **dissatisfy** the user. Responses greater than 4.8 seconds **frustrate** the user. 140 | 141 | 142 | ## 参考 143 | 144 | - https://newrelic.com/ 145 | - https://docs.newrelic.com/docs/apm -------------------------------------------------------------------------------- /PPA 相关.md: -------------------------------------------------------------------------------- 1 | # PPA 相关 2 | 3 | ## 使用场景示例 4 | 5 | > If running Ubuntu < 15.04, you’ll need to install from a different PPA. We recommend [chris-lea/redis-server](https://launchpad.net/~chris-lea/+archive/ubuntu/redis-server) 6 | 7 | 在某些发行版中,由于内置的软件包版本比较低,因此需要自行添加“源”,以安装期望的版本; 8 | 9 | 针对上述场景,细节如下: 10 | 11 | - Adding this PPA to your system 12 | 13 | You can update your system with **unsupported packages** from this **untrusted PPA** by adding `ppa:chris-lea/redis-server` to your system's Software Sources. 14 | 15 | > 相当于自动安装 16 | 17 | ``` 18 | sudo add-apt-repository ppa:chris-lea/redis-server 19 | sudo apt-get update 20 | ``` 21 | 22 | - Technical details about this PPA 23 | 24 | > 相当于手动安装 25 | 26 | This PPA can be added to your system **manually by copying** the lines below and adding them to your system's software sources. 27 | 28 | Display `sources.list` entries for: 29 | 30 | [ Zesty (17.04) ] 31 | 32 | ``` 33 | deb http://ppa.launchpad.net/chris-lea/redis-server/ubuntu zesty main 34 | deb-src http://ppa.launchpad.net/chris-lea/redis-server/ubuntu zesty main 35 | ``` 36 | 37 | [ Xenial (16.04) ] 38 | 39 | ``` 40 | deb http://ppa.launchpad.net/chris-lea/redis-server/ubuntu xenial main 41 | deb-src http://ppa.launchpad.net/chris-lea/redis-server/ubuntu xenial main 42 | ``` 43 | 44 | [ Vivid (15.04) ] 45 | 46 | ``` 47 | deb http://ppa.launchpad.net/chris-lea/redis-server/ubuntu vivid main 48 | deb-src http://ppa.launchpad.net/chris-lea/redis-server/ubuntu vivid main 49 | ``` 50 | 51 | 52 | Signing key: 53 | 54 | ``` 55 | 1024R/136221EE520DDFAF0A905689B9316A7BC7917B12 (What is this?) 56 | ``` 57 | 58 | Fingerprint: 59 | 60 | ``` 61 | 136221EE520DDFAF0A905689B9316A7BC7917B12 62 | ``` 63 | 64 | 65 | ---------- 66 | 67 | ## How do I use software from a PPA? 68 | 69 | To start installing and using software from a **Personal Package Archive**, you first need to tell Ubuntu where to find the PPA. 70 | 71 | > **Important**: The contents of Personal Package Archives are **not** checked or monitored. You install software from them **at your own risk**. 72 | 73 | If you're using the most recent version of Ubuntu (or any version from Ubuntu 9.10 onwards), you can add a PPA to your system with a single line in your terminal. 74 | 75 | **Step 1**: On the PPA's overview page, look for the heading that reads `Adding this PPA to your system`. Make a note of the PPA's location, which looks like: 76 | 77 | ``` 78 | ppa:gwibber-daily/ppa 79 | ``` 80 | 81 | **Step 2**: Open a terminal and enter: 82 | 83 | ``` 84 | sudo add-apt-repository ppa:user/ppa-name 85 | ``` 86 | 87 | Replace `ppa:user/ppa-name` with the PPA's location that you noted above. 88 | 89 | Your system will now **fetch the PPA's key**. This enables your Ubuntu system to verify that the packages in the PPA have not been interfered with since they were built. 90 | 91 | **Step 3**: Now, as a one-off, you should tell your system to **pull down the latest list of software** from each archive it knows about, including the PPA you just added: 92 | 93 | ``` 94 | sudo apt-get update 95 | ``` 96 | 97 | Now you're ready to start installing software from the PPA! 98 | 99 | ---------- 100 | 101 | 102 | ## [Packaging/PPA](https://help.launchpad.net/Packaging/PPA) 103 | 104 | ### Overview 105 | 106 | Using a **Personal Package Archive (PPA)**, you can **distribute** software and updates directly to Ubuntu users. **Create** your source package, **upload** it and Launchpad will **build** binaries and then **host** them in your own apt repository. 107 | 108 | That means Ubuntu users can install your packages in just the same way they install standard Ubuntu packages and they'll automatically receive updates as and when you make them. 109 | 110 | Every individual and team in Launchpad can have one or more PPAs, each with its own URL. 111 | 112 | Packages you publish in your PPA will remain there until you remove them, they're superseded by another package that you upload or the version of Ubuntu against which they're built becomes obsolete. 113 | 114 | > **Note**: CommercialHosting allow you to have private PPAs. 115 | 116 | #### Size and transfer limits 117 | 118 | Each PPA gets 2 GiB of disk space. If you need more space for a particular PPA, ask us. 119 | 120 | While we don't enforce a strict limit on data transfer, we will get in touch with you if your data transfer looks unusually high. 121 | 122 | #### Supported architectures 123 | 124 | When Launchpad builds a source package in a PPA, by default it creates binaries for: 125 | 126 | - x86 127 | - AMD64 128 | 129 | You may also request builds for **arm64**, **armhf**, and/or **ppc64el**. Use the "Change details" page for the PPA to enable the architectures you want. 130 | 131 | Changing the set of architectures for which a PPA builds does not create new builds for source packages that are already published in that PPA; it only affects which builds will be created for new uploads. If you need to create builds for newly-enabled architectures without reuploading, go to "View package details" and then "Copy packages", select all the packages for which you want to create builds, select "This PPA", "The same series", and "Copy existing binaries", and submit the form using the "Copy Packages" button. 132 | 133 | We use **OpenStack clouds** for security during the build process, ensuring that each build has a clean build environment and different developers cannot affect one another's builds accidentally. These clouds do not yet have support for the **powerpc** and **s390x** architectures; when they do, it will also be possible to request those architectures in PPAs. 134 | 135 | #### Supported series 136 | 137 | When building a source package you can specify one of the supported series in your changelog file which are listed at the Launchpad PPA page. 138 | 139 | If you specify a different series the build will fail. 140 | 141 | ### Activating a PPA 142 | 143 | Before you can start using a PPA, whether it's your own or it belongs to a team, you need to **activate** it on your profile page or the team's overview page. If you already have one or more PPAs, this is also where you'll be able to create additional archives. 144 | 145 | #### Your PPA's key 146 | 147 | Launchpad **generates** a unique key for each PPA and uses it to **sign** any packages built in that PPA. 148 | 149 | This means that people downloading/installing packages from your PPA can verify their source. After you've activated your PPA, uploading its first package causes Launchpad to start generating your key, which can take up to a couple of hours to complete. 150 | 151 | Your key, and instructions for adding it to Ubuntu, are shown on the PPA's overview page. 152 | 153 | ### Deleting a PPA 154 | 155 | When you no longer need your PPA, you can delete it. This deletes all of the PPA's packages, and removes the repository from ppa.launchpad.net. You'll have to wait up to an hour before you can recreate a PPA with the same name. 156 | 157 | ### Next steps 158 | 159 | You can familiarise yourself with how PPAs work by [installing a package from an existing PPA](https://help.launchpad.net/Packaging/PPA/InstallingSoftware). You can also jump straight into [uploading your source packages](https://help.launchpad.net/Packaging/PPA/Uploading). -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # MarkSomethingDownLLS 2 | 3 | “对我来说,博客首先是一种知识管理工具,其次才是传播工具。我的技术文章,主要用来整理我还不懂的知识。我只写那些我还没有完全掌握的东西,那些我精通的东西,往往没有动力写。炫耀从来不是我的动机,好奇才是。” by *阮一峰* 4 | 5 | 在这里,我可以直言不讳的说,离开流利说的原因是因为遇到了3个小人:一个是姓姚的数据团队负责人,一个是从阿里淘来的不知道几手的姓王的 HR,还有一个是张口闭口和你扯 ROI,然后把所有脏活累活推给别人,有事没事背后还会插你两刀的姓李的; 6 | 7 | 从我个人心底来讲,我觉得流利说是一个不错的公司,在这边认识了一些不错的伙伴,也学到了很多;虽然从流利说离开的时候被 HR 威胁说会在离职证明和档案上怎样怎样,不过说实话,我并不怕,想通过卑劣手段达成目的,至少要先掂量一下自己的斤两,呵呵 8 | -------------------------------------------------------------------------------- /Registry Mirror 与 --registry-mirror.md: -------------------------------------------------------------------------------- 1 | # Registry Mirror 与 --registry-mirror 2 | 3 | ## Docker Hub 拉取镜像慢解决方案 4 | 5 | - Registry 中包含一个或多个 Repository ; 6 | - Repository 中包含一个或多个 Image ; 7 | - Image 用 GUID 标识,有一个或多个 Tag 与之关联; 8 | - index.docker.io 其实是一个 Index Server ,和 Registry 有区别; 9 | - **配置 Registry Mirror 之后,Docker 拉取镜像的过程**:Docker CLI 会试图获得授权,例如会向 https://index.docker.io/v1 请求认证,认证完成后,认证服务器会返回一个对应的 Token 。注意,这里用户认证与配置的 Registry Mirror 完全无关,这样我们就不用担心使用 Mirror 的安全问题了; 10 | - 不幸的是,**Docker Hub 并没有在国内放服务器或者用国内的 CDN** ;为了克服跨洋网络延迟,一般有两个解决方案:一是使用**私有 Registry** ,另外是使用 **Registry Mirror** ; 11 | - 方案一:搭建或者使用现有的私有 Registry ,通过定期和 Docker Hub 同步热门的镜像,私有 Registry 上保存了一些镜像的副本,然后大家可以通过 `docker pull private-registry.com/user-name/ubuntu:latest`,从这个私有 Registry 上拉取镜像。因为这个方案需要定期同步 Docker Hub 镜像,因此它比较适合于使用的镜像相对稳定,或者都是私有镜像的场景。而且用户需要显式的映射官方镜像名称到私有镜像名称,私有 Registry 更多被大家应用在企业内部场景。私有 Registry 部署也很方便,可以直接在 Docker Hub 上下载 Registry 镜像,即拉即用,具体部署可以参考官方文档。 12 | - 方案二:使用 Registry Mirror ,它的原理类似于缓存,如果镜像在 Mirror 中命中则直接返回给客户端,否则从存放镜像的 Registry 上拉取并自动缓存在 Mirror 中。最酷的是,是否使用 Mirror 对 Docker 使用者来讲是透明的,也就是说在配置 Mirror 以后,大家可以仍然输入 `docker pull ubuntu` 来拉取 Docker Hub 镜像,除了速度变快了,和以前没有任何区别。 13 | 14 | 15 | ---------- 16 | 17 | ## DaoCloud 提供的 Registry Mirror 服务 18 | 19 | 下面的例子,使用的是由 DaoCloud 提供的 Registry Mirror 服务,在申请开通 Mirror 服务后你会得到一个 Mirror 地址(我的地址:http://99a63370.m.daocloud.io ),然后我们要做的就是把这个地址配置在 Docker Server 启动脚本中,重启 Docker 服务后 Mirror 配置就生效了; 20 | 21 | 22 | 简单介绍下**如何在 DaoCloud 申请一个 Mirror 服务**: 23 | 24 | - 账号注册:http://www.daocloud.io/ 25 | - 加速器:https://www.daocloud.io/mirror#accelerator-doc ,启动成功后,你就拥有了一个你**专用的 Registry Mirror 地址**了,加速器链接就是你要设置 "`--registry-mirror`" 的地址。目前每个用户有 10G 的加速流量(Tips:如果流量不够用可以邀请好友获得奖励流量) 26 | 27 | **加速器说明**: 28 | 29 | - 天下容器,唯快不破:Docker Hub 提供众多镜像,你可以从中自由下载数十万计的免费应用镜像, 这些镜像作为 docker 生态圈的基石,是我们使用和学习 docker 不可或缺的资源。**为了解决国内用户使用 Docker Hub 时遇到的稳定性及速度问题 DaoCloud 推出永久免费的新一代加速器服务**。 30 | - 原生技术:新一代 Docker 加速器采用自主研发的智能路由及缓存技术,并引入了先进的协议层优化,极大提升拉取镜像的速度和体验。完全兼容 Docker 原生的 `--registry-mirror` 参数配置方式。支持 Linux,MacOS ,Windows 三大平台。使您能够更加方便地配置和使用镜像加速功能。 31 | - 配置 Docker 加速器: 32 | - **Linux**:`curl -sSL https://get.daocloud.io/daotools/set_mirror.sh | sh -s http://99a63370.m.daocloud.io`,该脚本可以将 `--registry-mirror` 加入到你的 Docker 配置文件 `/etc/docker/daemon.json` 中(在 vagrant 环境中使用了一下,ps 后没找到和这个 mirror 地址相关的信息)。适用于 Ubuntu14.04、Debian、CentOS6 、CentOS7、Fedora、Arch Linux、openSUSE Leap 42.1,其他版本可能有细微不同。更多详情请[访问文档](http://guide.daocloud.io/dcs/daocloud-9153151.html)。 33 | - **MacOS**:**Docker For Mac** 右键点击桌面顶栏的 docker 图标,选择 Preferences ,在 Daemon 标签(Docker 17.03 之前版本为 Advanced 标签)下的 Registry mirrors 列表中加入下面的镜像地址: `http://99a63370.m.daocloud.io` ,点击 Apply & Restart 按钮使设置生效。 34 | 35 | > 针对 set_mirror.sh 脚本内容的详细说明,见《[Docker 加速器配置脚本说明](https://gitee.com/moooofly/SecretGarden/blob/master/Docker%20%E5%8A%A0%E9%80%9F%E5%99%A8%E9%85%8D%E7%BD%AE%E8%84%9A%E6%9C%AC%E8%AF%B4%E6%98%8E.md)》; 36 | 37 | - **Docker 加速器是什么,我需要使用吗**?使用 Docker 的时候,需要经常从官方获取镜像,但是由于显而易见的网络原因,拉取镜像的过程非常耗时,严重影响使用 Docker 的体验。因此 DaoCloud 推出了加速器工具解决这个难题,通过智能路由和缓存机制,极大提升了国内网络访问 Docker Hub 的速度,目前已经拥有了广泛的用户群体,并得到了 Docker 官方的大力推荐。如果您是在国内的网络环境使用 Docker,那么 Docker 加速器一定能帮助到您。 38 | - **Docker 加速器对 Docker 的版本有要求吗**?需要 Docker 1.8 或更高版本才能使用,如果您没有安装 Docker 或者版本较旧,请安装或升级。 39 | - **Docker 加速器支持什么系统**?Linux, MacOS 以及 Windows 平台。 40 | - **Docker 加速器是否收费**?DaoCloud 为了降低国内用户使用 Docker 的门槛,提供永久免费的加速器服务,请放心使用。 41 | 42 | 43 | ---------- 44 | 45 | ## Harbor 提供的 Registry Mirror 服务 46 | 47 | > 原文地址:[Configuring Harbor as a local registry mirror](https://github.com/vmware/harbor/blob/master/contrib/Configure_mirror.md) 48 | 49 | Harbor runs as a **local registry** by default. It can also be configured as a **registry mirror**, which caches downloaded images for subsequent use. *Note that under this setup, the Harbor registry only acts as a mirror server and no longer accepts image pushing requests.* Edit `Deploy/templates/registry/config.yml` before executing `./prepare`, and append a proxy section as follows: 50 | 51 | 52 | ``` 53 | proxy: 54 | remoteurl: https://registry-1.docker.io 55 | ``` 56 | 57 | In order to access **private images** on the Docker Hub, a username and a password can be supplied: 58 | 59 | ``` 60 | proxy: 61 | remoteurl: https://registry-1.docker.io 62 | username: [username] 63 | password: [password] 64 | ``` 65 | 66 | You will need to pass the `--registry-mirror` option to your **Docker daemon** on startup: 67 | 68 | ``` 69 | docker --registry-mirror=https:// daemon 70 | ``` 71 | 72 | > 说明: 73 | > 74 | > - 上面提及的 "Docker daemon" 是指 docker client 侧使用(依赖)的那个; 75 | > - 上面的 startup 命令一般会配置在不同 linux 发行版中用于进行服务管理的服务中(如 systemd) 76 | 77 | 78 | For example, if your mirror is serving on `http://reg.yourdomain.com`, you would run: 79 | 80 | ``` 81 | docker --registry-mirror=https://reg.yourdomain.com daemon 82 | ``` 83 | 84 | Refer to the [Registry as a pull through cache](https://docs.docker.com/registry/recipes/mirror/) for detailed information. 85 | 86 | 87 | ---------- 88 | 89 | 90 | ## 与 linux 发行版及 docker 版本相关的问题 91 | 92 | - Docker 版本在 1.12 或更高 93 | 94 | **创建**或**修改** `/etc/docker/daemon.json` 文件,修改为如下形式 (请将 **加速地址** 替换为在[加速器](https://www.daocloud.io/mirror#accelerator-doc)页面获取的专属地址) 95 | 96 | ``` 97 | { 98 | "registry-mirrors": [ 99 | "加速地址" 100 | ], 101 | "insecure-registries": [] 102 | } 103 | ``` 104 | 105 | - Docker 版本在 1.8 与 1.11 之间 106 | 107 | 您可以找到 Docker 配置文件,不同的 Linux 发行版的配置路径不同,具体路径请参考 [Docker官方文档](https://docs.docker.com/engine/admin/),在配置文件中的 `DOCKER_OPTS` 加入 `--registry-mirror=加速地址` 重启Docker,不同的 Linux 发行版的重启命令不一定相同,一般为 `service docker restart` ; 108 | 109 | 110 | ---------- 111 | 112 | - By running a **local registry mirror**, you can keep most of the redundant image fetch traffic on your local network. 113 | - if the set of images you are using is well delimited, you can simply pull them manually and push them to **a simple, local, private registry**. 114 | - if your images are all built in-house, not using the Hub at all and relying entirely on your local registry is the simplest scenario. 115 | - It’s currently not possible to mirror another private registry. Only the central Hub can be mirrored. 116 | - The Registry can be configured as a `pull through cache`. In this mode a Registry responds to all normal docker pull requests but stores all content locally. 117 | 118 | 119 | ---------- 120 | 121 | 122 | 参考: 123 | 124 | - [玩转Docker镜像](http://blog.daocloud.io/how-to-master-docker-image/) 125 | - [daocloud mirror accelerator](https://www.daocloud.io/mirror#accelerator-doc) - 给出基于自身用户的 mirror 地址 126 | - [Docker 加速器](http://guide.daocloud.io/dcs/daocloud-9153151.html) 127 | - [Registry as a pull through cache](https://docs.docker.com/registry/recipes/mirror/) 128 | -------------------------------------------------------------------------------- /Terminal session recorder and others.md: -------------------------------------------------------------------------------- 1 | # Terminal Session Recorder and Others 2 | 3 | ## asciinema/asciinema 4 | 5 | > github: [asciinema/asciinema](https://github.com/asciinema/asciinema) 6 | 7 | 一句话说明:Terminal session recorder 8 | 9 | 官网地址:https://asciinema.org 10 | 11 | [asciinema](https://asciinema.org) is a default asciinema-server instance, and prints a secret link you can use to watch your recording in a web browser. 12 | 13 | ### 原理 14 | 15 | 将终端的操作记录成 JSON 格式,然后使用 JavaScript 解析,配合 CSS 展示,看起来像是视频播放器。实际上就是文本,相比 GIF 和视频文件体积非常之小(时长 2 分 50 秒的录屏只有 325KB),无需缓冲播放,也可以方便的分享给别人或嵌入到网页中。 16 | 17 | 经确认,存在两种格式: 18 | 19 | - [asciicast file format (version 1)](https://github.com/asciinema/asciinema/blob/master/doc/asciicast-v1.md) -- asciicast file is JSON file containing meta-data like duration or title of the recording, and the actual content printed to terminal's stdout during recording. 20 | - [asciicast file format (version 2)](https://github.com/asciinema/asciinema/blob/master/doc/asciicast-v2.md) -- asciicast v2 file is newline-delimited JSON file. 21 | 22 | > NOTE: 23 | > 24 | > - [asciicast](https://github.com/asciinema/asciinema/blob/master/doc/asciicast-v1.md) is asciinema recording. 25 | > - Suggested file extension of asciicast v2 is `.cast`. 26 | > - Suggested file extension of asciicast v1 is `.json`. 27 | 28 | ### 存在的问题 29 | 30 | 使用时通过嵌入一个链接到 asciinema.org 上静态图片,点击后,跳转到目标网站再播放;用户体验没有本地直接播放好; 31 | 32 | 例如: 33 | [![demo](https://asciinema.org/a/113463.png)](https://asciinema.org/a/113463?autoplay=1) 34 | 35 | ### 安装 36 | 37 | - Mac 38 | 39 | ``` 40 | brew install asciinema 41 | ``` 42 | 43 | - Ubuntu 44 | 45 | ``` 46 | sudo apt-add-repository ppa:zanchey/asciinema 47 | sudo apt-get update 48 | sudo apt-get install asciinema 49 | ``` 50 | 51 | 可能会报如下错误信息,不过不影响使用; 52 | 53 | ``` 54 | ... 55 | Fetched 3,832 kB in 11s (341 kB/s) 56 | Reading package lists... Done 57 | W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: http://storage.googleapis.com/bazel-apt stable InRelease: The following signatures were invalid: KEYEXPIRED 1527185977 KEYEXPIRED 1527185977 KEYEXPIRED 1527185977 58 | W: Failed to fetch http://storage.googleapis.com/bazel-apt/dists/stable/InRelease The following signatures were invalid: KEYEXPIRED 1527185977 KEYEXPIRED 1527185977 KEYEXPIRED 1527185977 59 | W: Some index files failed to download. They have been ignored, or old ones used instead. 60 | [#3#root@ubuntu-1604 ~]$ 61 | ``` 62 | 63 | 64 | ### Examples 65 | 66 | - Record your first session 67 | 68 | ``` 69 | asciinema rec first.cast 70 | ``` 71 | 72 | > first.cast is a JSON file 73 | 74 | - Now replay it with double speed: 75 | 76 | ``` 77 | asciinema play -s 2 first.cast 78 | ``` 79 | 80 | - Or with normal speed but with idle time limited to 2 seconds: 81 | 82 | ``` 83 | asciinema play -i 2 first.cast 84 | ``` 85 | 86 | - Print full output of recorded asciicast to a terminal immediately. 87 | 88 | ``` 89 | asciinema cat first.cast 90 | ``` 91 | 92 | - If you want to watch and share it on the asciinema.org, upload it: 93 | 94 | ``` 95 | asciinema upload first.cast 96 | ``` 97 | 98 | - Replay the recorded session from asciinema.org URL. 99 | 100 | ``` 101 | asciinema play https://asciinema.org/a/Rv4XsAFMRdbr8SFZVRgEgE8oM 102 | ``` 103 | 104 | - You can record and upload in one step by omitting the filename: 105 | 106 | ``` 107 | asciinema rec 108 | ``` 109 | 110 | This spawns a new shell instance and records all terminal output. When you're ready to finish simply exit the shell either by typing `exit` or hitting `Ctrl-D`. 111 | 112 | - If you want to manage your recordings on asciinema.org, you need to authenticate. Run the following command and open displayed URL in your web browser: 113 | 114 | ``` 115 | asciinema auth 116 | ``` 117 | 118 | ### 配置文件 119 | 120 | - [`$HOME/.config/asciinema/config`](https://github.com/asciinema/asciinema#configuration-file) -- 需要自行创建 121 | - [`$HOME/.config/asciinema/install-id`](https://github.com/asciinema/asciinema#auth) -- Install ID is a random ID (UUID v4) generated locally when you run asciinema for the first time. It's purpose is to connect local machine with uploaded recordings, so they can later be associated with asciinema.org account. 122 | 123 | 124 | ## marionebl/svg-term-cli 125 | 126 | > github: [marionebl/svg-term-cli](https://github.com/marionebl/svg-term-cli) 127 | 128 | 一句话说明:Share terminal sessions via **SVG** and **CSS** 129 | 130 | ### 功能 131 | 132 | - Render `asciicast` to animated SVG 133 | - Share `asciicasts` everywhere (sans JS) 134 | - Style with common color profiles 135 | 136 | 存在的价值: 137 | 138 | - 替代 GIF 格式的 `asciicast` recordings 为 SVG 格式,解决某些情况下无法正常使用 `asciinema` player 的问题; 139 | 140 | > Replace **GIF** `asciicast` recordings where you can not use the [`asciinema` player](https://asciinema.org/), e.g. `README.md` files on GitHub and the `npm` registry. 141 | > 142 | > The image at the top of this README is an example. See how sharp the text looks, even when you zoom in? That’s because it’s an **SVG**! 143 | 144 | ### 安装 145 | 146 | ``` 147 | # Install asciinema via: https://asciinema.org/docs/installation 148 | npm install -g svg-term-cli 149 | ``` 150 | 151 | ### Examples 152 | 153 | - 基于上传到 asciinema 上的 [asciicast](https://asciinema.org/a/113643) 生成 `parrot.svg` 154 | 155 | ``` 156 | svg-term --cast 113643 157 | svg-term --cast 113643 --out examples/parrot.svg 158 | svg-term --cast=113643 --out examples/parrot.svg --window 159 | svg-term --cast 113643 --out examples/parrot.svg --window --no-cursor --from=4500 160 | ``` 161 | 162 | - 基于预生成的 asciicast v1/v2 数据创建 SVG 163 | 164 | ``` 165 | # asciicast v1 166 | cat rec.json | svg-term 167 | # asciicase v2 168 | cat first.cast | svg-term --out first_window.svg --window 169 | cat first.cast | svg-term --out first_window_frame_profile_hw2.svg --window --profile=Seti --term=iterm2 --height=40 --width=100 170 | ``` 171 | 172 | - [commitlint](https://github.com/marionebl/commitlint) 提供的功能展示图。基于 `svg-term-cli` 生成 173 | 174 | ``` 175 | cat docs/assets/commitlint.json | svg-term --out docs/assets/commitlint.svg --frame --profile=Seti --height=20 --width=80 176 | ``` 177 | 178 | - harbor 使用 179 | 180 | ``` 181 | cat 03_statistics.cast | svg-term --out 03_statistics.svg --window --profile=Seti --term=iterm2 --height=35 --width=120 182 | ``` 183 | 184 | ## 组合技 185 | 186 | - 基于 asciinema 录制 asciicast 文件 187 | 188 | 189 | ``` 190 | asciinema rec 00_harbor_setup.cast 191 | ``` 192 | 193 | - 基于 svg-term 将 asciicast 转换成 SVG 194 | 195 | ``` 196 | cat 00_harbor_setup.cast | svg-term --out 00_harbor_setup.svg --window --profile=Seti --term=iterm2 --height=35 --width=120 197 | ``` 198 | 199 | 200 | ## 其他 201 | 202 | - [Making Your Code Beautiful](https://hackernoon.com/presenting-your-code-beautifully-fdbab9e6fb68) -- (*) 203 | - [sindresorhus/gifski-app](https://github.com/sindresorhus/gifski-app) -- Convert videos to high-quality GIFs on your Mac. 204 | - [giphycapture](https://giphy.com/apps/giphycapture) -- GIPHY CAPTURE is the best way to create GIFs on your Mac. 205 | - [kap](https://getkap.co/) -- An open-source screen recorder built with web technology. 206 | 207 | -------------------------------------------------------------------------------- /Understanding AWS stolen CPU and how it affects.md: -------------------------------------------------------------------------------- 1 | # Understanding AWS stolen CPU and how it affects your apps 2 | 3 | > https://www.datadoghq.com/blog/understanding-aws-stolen-cpu-and-how-it-affects-your-apps/ 4 | 5 | 6 | ---------- 7 | 8 | 9 | > **Stolen CPU** is a metric that’s often looked at but can be hard to understand. It implies some malevolent intent from your virtual neighbors. **In reality, it is a relative measure of the cycles a CPU should have been able to run but could not due to the `hypervisor` diverting cycles away from the instance.** From the point of view of your application, stolen CPU cycles are cycles that your application could have used. 10 | 11 | - 和可能(malevolent intent)有恶意的 virtual neighbors 有关; 12 | - Stolen CPU 代表本应该能够运行却(由于 hypervisor 的缘故)无法运行的 CPU 时间; 13 | 14 | > Some of these `diverted cycles stem` from the **hypervisor enforcing a quota based on the `ECU` you have purchased**. In other cases, such as the one shown below, the amount of diverted or stolen CPU cycles varies over time, presumably due to **other instances on the same physical hardware also requesting CPU cycles from the underlying hardware.** 15 | 16 | 导致 Stolen CPU 的两种情况: 17 | 18 | - hypervisor 基于你所购买的 `ECU` 进行 quota 限定; 19 | - 位于同一台物理硬件上的其他 instances 同时在请求 CPU cycles 导致; 20 | 21 | ## Seeing AWS stolen CPU 22 | 23 | > Here is a graph of CPU usage on a host with `stolen` CPU in **yellow**. **Light blue** denotes “`idle`” or available cycles, **purple** denotes “`user`” or cycles spent executing application code, and **dark blue** denotes “`system`” or cycles spent executing kernel code. In this case we can see that the amount of “**stolen**” CPU is clearly visible. 24 | 25 | ![](https://datadog-prod.imgix.net/img/blog/understanding-aws-stolen-cpu-and-how-it-affects-your-apps/aws-stolen-cpu-seeing-it.png) 26 | 27 | 查看 CPU 各项的内容; 28 | 29 | 30 | ## Are noisy neighbors stealing from you? 31 | 32 | > Let us now find out whether **other tenants** on the same machine can affect the amount of stolen CPU. The following graphs show the amount of **stolen CPU** (top) and the amount of **idle CPU** (bottom), both measured in percent of all CPU cycles for the same machine at the same time. CPU usage is sampled from the operating system of the instance every 15 seconds. 33 | 34 | ![](https://datadog-prod.imgix.net/img/blog/understanding-aws-stolen-cpu-and-how-it-affects-your-apps/aws-stolen-system-cpu-metric-show.png) 35 | 36 | > The interesting part occurs when `idle` CPU reaches zero. All cycles have been accounted for, either doing useful work (not represented here) or being taken away by the hypervisor (the stolen CPU graph). 37 | 38 | 当 idle 为 0 时: 39 | 40 | - 要么 CPU 被用于完成 useful work 41 | - 要么 CPU 被 hypervisor 给 steal 了 42 | 43 | > Notice that at in the highlighted sections, the amount of stolen CPU differs from **30% (red)** to **50% (purple)**. If the **ECU quota** were the only thing causing CPU to be stolen, we should expect the stolen CPU to be equal at these two points in time. 44 | 45 | 这里反证了:如果 ECU quota 是唯一导致 stolen CPU 的前提条件的话,那么应该有一致的数值输出,然并卵; 46 | 47 | ## How AWS stolen CPU affects your applications 48 | 49 | > **Stolen CPU is particularly problematic for CPU intensive applications** which will more frequently attempt to run at or near the threshold for ECUs you’ve purchased. These applications use more cycles and have idle CPU at zero more often. 50 | 51 | 对于 CPU 密集型应用更容易收到 Stolen CPU 问题影响; 52 | 53 | > **If the hypervisor is simply enforcing a quota based on ECUs you’ve purchased, you would expect consistent values of stolen CPU when idle CPU approaches zero.** If this is case, the limitation set by your instance quota would slow down your application in a consistent and predictable way. You are getting the ECUs you purchased and can increase application performance by upgrading your instance. 54 | 55 | 针对非实际情况的讨论(反证); 56 | 57 | > **If instead you’re seeing varying values for stolen CPU it’s likely that you and other tenants are requesting more CPU cycles than are available on your hardware.** Because you have no insight into other tenants CPU usage behavior, it can be difficult or impossible to predict the amount of stolen CPU you will see as your application approaches idle CPU of zero. In this case, you are not consistently getting the ECUs you purchased and the changing CPU availability means you can’t predictably forecast performance for your application. 58 | 59 | 针对实际情况的讨论: 60 | 61 | - 无法实际看到其他租户(tenants)的 CPU 使用情况; 62 | - 当你的应用看到 idle CPU 接近 0 时 ,很难或无法预测 stolen CPU 准确数量; 63 | - 因此无法真正得到与购买的 ECUs 匹配的算力,以及即使升级也无法准确判定性能改善的量; 64 | 65 | ##How to resolve EBS performance issues from AWS stolen CPU 66 | 67 | - Buy more powerful EC2 instances 68 | - Baseline your application compute needs 69 | - Profile your app on an EC2 instance before finalizing a deploy decision 70 | - Re-deploy your application in another instance 71 | -------------------------------------------------------------------------------- /Zsh 和 iTerm2 的配置问题.md: -------------------------------------------------------------------------------- 1 | # Zsh 和 iTerm2 的配置问题 2 | 3 | 常见的组合推荐:**agnoster 主题 + Solarized 配色方案 + Powerline 字体** 4 | 5 | 6 | ## 切换 zsh 主题 7 | 8 | 编辑 `~/.zshrc` 9 | 10 | ``` 11 | ZSH_THEME="robbyrussell" 12 | 改为 13 | ZSH_THEME="agnoster" 14 | ``` 15 | 16 | 重新打开 iTerm2 ; 17 | 18 | > 注意:此时会有奇怪字符出现,因为尚未安装好相应的 Powerline-patched 字体; 19 | 20 | 21 | 根据 [agnoster/agnoster-zsh-theme](https://github.com/agnoster/agnoster-zsh-theme) 中的说明可知,该主题是专门为使用 **Solarized 配色方案** + **Git** + 22 | **Unicode-compatible 字体和终端**的人设计的; 23 | 24 | 对于 Mac 用户来说,一般会建议使用 iTerm 2 + Solarized Dark + 能够正确显示 `echo "\ue0b0 \u00b1 \ue0a0 \u27a6 \u2718 \u26a1 \u2699"` 命令输出结果的字体; 25 | 26 | 27 | ## [为 iTerm2 安装一个 Solarized 配色方案](https://github.com/altercation/solarized/tree/master/iterm2-colors-solarized) 28 | 29 | 特点:颜色柔和,看着不累; 30 | 31 | 问题:模拟 fish shell 提示功能的 zsh-autosuggestions 的提示的颜色被遮挡,看不到提示信息; 32 | 33 | ``` 34 | $ mkdir tmpfiles 35 | $ cd tmpfiles 36 | $ wget https://raw.githubusercontent.com/altercation/solarized/master/iterm2-colors-solarized/Solarized%20Dark.itermcolors 37 | $ wget https://raw.githubusercontent.com/altercation/solarized/master/iterm2-colors-solarized/Solarized%20Light.itermcolors 38 | $ open . 39 | 40 | # 双击对应文件就可以安装到 iterm2 中 41 | # 或 42 | # 打开 iTerm2 并通过 iTerm2 -> Preferences -> Profiles -> Colors -> Color Presets... -> Import 导入上述下载好的 Solarized 配色方案 43 | # 44 | # 安装后,需要选中对应主题才会生效 45 | ``` 46 | 47 | ## 安装 Powerline 字体 48 | 49 | > 解决 agnoster 主题中出现的奇怪字符; 50 | 51 | 几点说明: 52 | 53 | - 特殊的字符都是对字体 patch 后才有的; 54 | - 别人已经 patch 好的字体:[powerline/fonts](https://github.com/powerline/fonts) (推荐使用); 55 | - 自行对字体进行 patch 的工具:[powerline/fontpatcher](https://github.com/powerline/fontpatcher) (别自找麻烦了); 56 | 57 | 58 | 字体安装(参考:https://github.com/powerline/fonts) 59 | 60 | ``` 61 | $ cd tmpfiles 62 | $ git clone https://github.com/powerline/fonts.git --depth=1 63 | $ cd fonts/ 64 | $ ./install.sh 65 | # 字体文件会被安装到 $HOME/Library/Fonts 目录下面(mac 上)或 $HOME/.local/share/fonts 目录下面(linux 上) 66 | $ cd .. 67 | # 成功安装后,清理之前下载的 github 项目文件 68 | $ rm -rf fonts 69 | # 安装好的字体自动会出现在 iTerm2 -> Preferences -> Profiles -> Text -> Font -> Change Font 下的列表中,自行选择相应的字体即可 70 | ``` 71 | 72 | 73 | ## 调整 iTerm2 设置 74 | 75 | - 选择配色方案; 76 | - 选择 Font 和 Non-ASCII Font ; 77 | - 把 Text Rendering 里的 "Draw bold text in bright colors" 给去掉(否则可能导致 ls 时列表无着色); 78 | 79 | 80 | ---------- 81 | 82 | ## 我的选择 83 | 84 | > (*) 为推荐项 85 | 86 | 87 | ZSH Themes 88 | 89 | - robbyrussell (*) (default) 90 | - agnoster (*) 91 | - miloshadzic 92 | - pygmalion 93 | 94 | 95 | 96 | Color Presets 97 | 98 | - Dark Background 99 | - Solarized Light (*) 100 | - Tango Dark (*) 101 | 102 | Font 103 | 104 | - Ayuthaya 105 | - Courier New 106 | - D2Coding for Powerline (*) 107 | - Menlo 108 | - Meslo LG S DZ for Powerline 109 | - Monaco (*) 110 | - Roboto Mono Light for Powerline 111 | 112 | 113 | Non-ASCII Font 114 | 115 | - D2Coding for Powerline (*) 116 | - Meslo LG S DZ for Powerline 117 | - Roboto Mono Light for Powerline (*) 118 | 119 | 120 | 121 | ---------- 122 | 123 | 124 | ## [powerline/powerline](https://github.com/powerline/powerline) 125 | 126 | 127 | Powerline is a **statusline plugin** for `vim`, and provides statuslines and prompts for several other applications, including `zsh`, `bash`, `tmux`, IPython, Awesome and Qtile. 128 | 129 | - 安装配置文档:https://powerline.readthedocs.io/en/latest/ 130 | - 配套使用的 pre-patched 字体:[`powerline/fonts`](https://github.com/powerline/fonts) 131 | - Powerline 衍生品:[`vim-airline`](https://github.com/vim-airline/vim-airline) 132 | 133 | ## [powerline/fonts](https://github.com/powerline/fonts) 134 | 135 | Patched fonts for Powerline users. 136 | 137 | 快速安装: 138 | 139 | ``` 140 | # clone 141 | git clone https://github.com/powerline/fonts.git --depth=1 142 | # install 143 | cd fonts 144 | ./install.sh 145 | # clean-up a bit 146 | cd .. 147 | rm -rf fonts 148 | ``` 149 | 150 | 详细安装文档: 151 | 152 | - for **linux**: https://powerline.readthedocs.io/en/latest/installation/linux.html#fonts-installation 153 | - for **OS X**: https://powerline.readthedocs.io/en/latest/installation/osx.html 154 | 155 | iTerm2 users need to set both the **Regular font** and the **Non-ASCII Font** in "`iTerm > Preferences > Profiles > Text`" to use a patched font (per this issue). 156 | 157 | > In some distributions, Terminess Powerline is ignored by default and must be explicitly allowed. A fontconfig file is provided which enables it. Copy this file from the fontconfig directory to your home folder under `~/.config/fontconfig/conf.d` (create it if it doesn't exist) and re-run `fc-cache -vf`. 158 | 159 | 160 | ## [vim-airline/vim-airline](https://github.com/vim-airline/vim-airline) 161 | 162 | 163 | lean & mean status/tabline for vim that's light as air 164 | 165 | 166 | 167 | -------------------------------------------------------------------------------- /git 命令之 rebase:reset:reflog 使用.md: -------------------------------------------------------------------------------- 1 | # git 命令之 rebase/reset/reflog 使用 2 | 3 | 4 | ## 主要命令 5 | 6 | ``` 7 | git rebase -i HEAD~4 8 | git reset --hard 9 | git reflog 10 | ``` 11 | 12 | ## 实验过程 13 | 14 | ### 当前 git log 状态 15 | 16 | 17 | ``` 18 | git log --graph --all --format=format:'%C(bold blue)%h%C(reset) - %C(bold green)(%ar)%C(reset) %C(white)%s%C(reset) %C(bold white)— %an%C(reset)%C(bold yellow)%d%C(reset)' --abbrev-commit --date=relative 19 | ``` 20 | 21 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%201.png) 22 | 23 | ### 调整最后 4 次 commit 的 message 内容 24 | 25 | ``` 26 | git rebase -i HEAD~4 27 | ``` 28 | 29 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%202.png) 30 | 31 | 差别对比: 32 | 33 | - 直接以 `:q` 退出时,得到 34 | 35 | ``` 36 | Successfully rebased and updated refs/heads/master. 37 | ``` 38 | 39 | 并会在 `git reflog` 输出中增加 40 | 41 | ``` 42 | 53775be (HEAD -> master) HEAD@{0}: rebase -i (finish): returning to refs/heads/master 43 | 53775be (HEAD -> master) HEAD@{1}: rebase -i (start): checkout HEAD~4 44 | ``` 45 | 46 | - 删除所有未被 `#` 注释的内容后退出,得到 47 | 48 | ``` 49 | Nothing to do 50 | ``` 51 | 52 | 不会在 `git reflog` 输出中增加任何内容; 53 | 54 | 结论:后一种方式才是推荐使用的正确的 abort 当前 rebase 的方式; 55 | 56 | ### 尝试直接修改 57 | 58 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%203.png) 59 | 60 | ### 触发 reword 命令 61 | 62 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%204.png) 63 | 64 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%205.png) 65 | 66 | ### 触发 edit 命令 67 | 68 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%206.png) 69 | 70 | ### 手动执行 git commit --amend 进行 commit 修改 71 | 72 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%207.png) 73 | 74 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%208.png) 75 | 76 | ### 手动执行 git rebase --continue 恢复 rebase 执行 77 | 78 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%209.png) 79 | 80 | ### 观察对比 rebase 前后 hash 值的变化 81 | 82 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2010.png) 83 | 84 | ### 从 reflog 中查看整个 rebase 的执行过程 85 | 86 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2011.png) 87 | 88 | ### 再次进行 rebase 调整 89 | 90 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2012.png) 91 | 92 | ### 第一次新调整 93 | 94 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2013.png) 95 | 96 | ### 调整顺序后触发冲突 97 | 98 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2014.png) 99 | 100 | ### 操作尝试:直接 abort 当前 rebase 操作 101 | 102 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2015.png) 103 | 104 | > 可以看出状态直接回到了未进行 rebase 前; 105 | 106 | ### 操作尝试:使用 skip 命令跳过冲突 commit 107 | 108 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2017.png) 109 | 110 | > 可以看到能够完成其他 rebase 命令动作; 111 | 112 | ### 执行 skip 命令后的其他 rebase 命令操作 113 | 114 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2016.png) 115 | 116 | ### 从 git log 中可以看到 commit 被成功合并 117 | 118 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2018.png) 119 | 120 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2019.png) 121 | 122 | ### 从目标文件内容可以看出被 skip 掉的 commit 对应的内容已丢失 123 | 124 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2020.png) 125 | 126 | ### 从 reflog 中查看再次 rebase 的执行过程 127 | 128 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2021.png) 129 | 130 | 131 | ### 在 rebase 过程中删除指定 commit 132 | 133 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2022.png) 134 | 135 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2023.png) 136 | 137 | ### 删除后的效果 138 | 139 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2024.png) 140 | 141 | ### 直接通过 reset 命令实现删除 commit 的效果 142 | 143 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/git%20rebase_reset_reflog%20usage%20-%2025.png) 144 | 145 | 差别对比: 146 | 147 | - 基于 rebase 方式进行 commit 删除,粒度更加细致,能够在删除的过程中同时完成其他调整; 148 | - 基于 reset 方式进行 commit 删除,适用于简单的回退操作; 149 | 150 | 151 | 152 | 153 | -------------------------------------------------------------------------------- /gosu 使用.md: -------------------------------------------------------------------------------- 1 | # gosu 使用 2 | 3 | github 地址:https://github.com/tianon/gosu 4 | 5 | 一句话介绍:Simple Go-based setuid+setgid+setgroups+exec 6 | 7 | ## 解决的问题 8 | 9 | > This is a simple tool grown out of the simple fact that **`su` and `sudo` have very strange and often annoying TTY and signal-forwarding behavior**. They're also somewhat **complex to setup and use** (especially in the case of `sudo`), which allows for a great deal of expressivity, but falls flat if all you need is "**run this specific application as this specific user and get out of the pipeline**". 10 | 11 | - su 和 sudo 在处理 tty 和 signal 转发相关的东西时会有奇怪的表现; 12 | - 由于 expressivity 太过丰富,su 和 sudo 在使用时比较复杂;在“使用指定用户身份运行特定应用后,从 pipeline 中脱离”的场景中很难使用; 13 | 14 | ## 工作原理 15 | 16 | > The core of how `gosu` works is stolen directly from how `Docker/libcontainer` itself starts an application inside a container (and in fact, is using the `/etc/passwd` processing code directly from libcontainer's codebase). 17 | 18 | gosu 工作原理的核心来自于 `Docker/libcontainer` 中使用的、在容器中启动应用的方法; 19 | 20 | > Once the user/group is processed, we switch to that user, then we `exec` the specified process and `gosu` itself is no longer resident or involved in the process lifecycle at all. This avoids all the issues of signal passing and TTY, and punts them to the process invoking `gosu` and the process being invoked by `gosu`, where they belong. 21 | 22 | 一旦 user/group 的相关处理完成后,则成功切换到对应的 user 身份,之后再通过 `exec` 执行指定的进程;而 `gosu` 本身在业务进程的整个生命周期中将不会存在;通过这种方式,避免了 signal 处理和 tty 问题; 23 | 24 | > Additionally, due to the fact that `gosu` is using Docker's own code for processing these `user:group`, it has exact 1:1 parity with Docker's own `--user` flag. 25 | 26 | > If you're curious about the edge cases that gosu handles, see [Dockerfile.test](https://github.com/tianon/gosu/blob/master/Dockerfile.test) for the "test suite" (and the associated [test.sh](https://github.com/tianon/gosu/blob/master/test.sh) script that wraps this up for testing arbitrary binaries). 27 | 28 | ## 使用场景 29 | 30 | > The core use case for `gosu` is to step down from `root` to a non-privileged user during container startup (specifically in the `ENTRYPOINT`, usually). 31 | 32 | `gosu` 的核心使用场景:在容器启动的过程中(尤其是用于 `ENTRYPOINT`),将 `root` 身份降权为非特权用户身份; 33 | 34 | > Uses of `gosu` beyond that could very well suffer from vulnerabilities such as CVE-2016-2779 (from which the Docker use case naturally shields us); see tianon/gosu#37 for some discussion around this point. 35 | 36 | 超出该使用范围,则可能容易受到漏洞问题影响; 37 | 38 | ## 安装方法 39 | 40 | - [Docker 内安装](https://github.com/tianon/gosu/blob/master/INSTALL.md) 41 | - [宿主机安装](https://github.com/moooofly/scaffolding/blob/master/gosu_setup.sh) 42 | 43 | ## 用法 44 | 45 | ``` 46 | $ gosu 47 | Usage: ./gosu user-spec command [args] 48 | ie: ./gosu tianon bash 49 | ./gosu nobody:root bash -c 'whoami && id' 50 | ./gosu 1000:1 id 51 | 52 | ./gosu version: 1.1 (go1.3.1 on linux/amd64; gc) 53 | ``` 54 | 55 | ## gosu 使用示例 56 | 57 | - 基于 su 58 | 59 | ``` 60 | [#381#root@ubuntu-1604 ~]$docker run -it --rm ubuntu:trusty su -c 'exec ps aux' 61 | USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 62 | root 1 0.0 0.2 46640 2608 pts/0 Ss+ 06:57 0:00 su -c exec ps a 63 | root 6 0.0 0.2 15580 2112 ? Rs 06:57 0:00 ps aux 64 | [#382#root@ubuntu-1604 ~]$ 65 | ``` 66 | 67 | - 基于 sudo 68 | 69 | ``` 70 | [#382#root@ubuntu-1604 ~]$docker run -it --rm ubuntu:trusty sudo ps aux 71 | USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 72 | root 1 0.0 0.3 46024 3076 pts/0 Ss+ 06:58 0:00 sudo ps aux 73 | root 7 0.0 0.2 15580 2096 pts/0 R+ 06:58 0:00 ps aux 74 | [#383#root@ubuntu-1604 ~]$ 75 | ``` 76 | 77 | - 基于 gosu 78 | 79 | ``` 80 | [#383#root@ubuntu-1604 ~]$docker run -it --rm -v /usr/local/bin/gosu:/usr/local/bin/gosu:ro ubuntu:trusty gosu root ps aux 81 | USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 82 | root 1 0.0 0.0 7148 848 pts/0 Rs+ 06:59 0:00 ps aux 83 | [#384#root@ubuntu-1604 ~]$ 84 | ``` 85 | 86 | ## 和 gosu 类似的工具 87 | 88 | ### [su-exec](https://github.com/ncopa/su-exec) 89 | 90 | As mentioned in `INSTALL.md`, `su-exec` is a very minimal re-write of `gosu` in C, making for a much smaller binary, and is available in the main **Alpine** package repository. 91 | 92 | 93 | ### chroot 94 | 95 | With the `--userspec` flag, `chroot` can provide similar benefits/behavior: 96 | 97 | ``` 98 | [#388#root@ubuntu-1604 ~]$docker run -it --rm ubuntu:trusty chroot --userspec=root / ps aux 99 | USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND 100 | root 1 0.0 0.0 7148 756 pts/0 Rs+ 07:09 0:00 ps aux 101 | [#389#root@ubuntu-1604 ~]$ 102 | ``` 103 | -------------------------------------------------------------------------------- /iostat cheatsheet.md: -------------------------------------------------------------------------------- 1 | # iostat cheatsheet 2 | 3 | Background: 4 | 5 | - `iostat` reports at **physics device**/**sector level** (i.e. beneath cache and IO scheduling) 6 | 7 | Overall utilization: 8 | 9 | - avg-cpu / `%iowait`: how busy is the CPU, or “amount of computation waiting for IO” 10 | - device / `%util`: **how busy is the device** 11 | 12 | Columns (listed per device): 13 | 14 | - `rrqm/s` `wrqm/s`: r/w requests **merged** per second 15 | - **Block IO subsystem** may merge **physically adjacent** requests 16 | - `r/s` `w/s`: number of (possibly merged) r/w requests 17 | - `rsec/s` `wsec/s`: number of sectors read/written 18 | - `iostat` can also be set to use units rkB/s wkB/s rMB/s wMB/s 19 | - `avgrq-sz`: **average size of request**, in #sectors 20 | - **larger size indicates sequential IO, smaller indicates random IO** 21 | - `avgqu-sz`: **average length of request queue**, i.e. average #request waiting to be served 22 | - `await`: average (**wait** + **serve**) time for a request, in ms 23 | - `r_await` / `w_await`: the same time for r/w requests 24 | - `svctm`: average **serve** time for a request, in ms 25 | 26 | > 需要重点关注 await 、svctm 以及 await - svctm 的数值关系;一般来讲,await 和 svctm 数值接近才是好的表现;如果 await 远超 svctm 的值,则说明 IO device 可能已经过载了; 27 | 28 | A monitor command with iostat: 29 | 30 | ``` 31 | # -x: extended report 32 | # -t: print time for report 33 | # -z: omit inactive devices 34 | # 1: report every 1 second 35 | 36 | $ iostat -xtz 1 37 | ``` 38 | 39 | ## Linux Disk IO Subsystem 40 | 41 | > a basic background knowledge of the Disk IO Subsystem. 42 | 43 | | Layer | Unit | Typical Unit Size | 44 | | -- | -- | -- | 45 | | User Space System Calls | read() , write() | | 46 | | Virtual File System Switch (VFS) | Block | 4096 Bytes | 47 | | Disk Caches | Page | | 48 | | Filesystem (For example ext3) | Blocks | 4096 Bytes (Can be set at FS creation) | 49 | | Generic Block Layer | Page Frames / Block IO Operations (bio) | | 50 | | I/O Scheduler Layer | bios per block device (Which this layer may combine) | | 51 | | Block Device Driver | Segment | 512 Bytes | 52 | | Hard Disk | Sector | 512 Bytes | 53 | 54 | 55 | 可以认为 iostat 工作在 I/O Scheduler Layer 之下 56 | 57 | > There are two basic system calls, `read()` and `write()`, that a user process can make to read data from a file system. In the kernel these are handled by the Linux **Virtual Filesystem Switch (VFS)**. VFS is an abstraction to all file systems so they look the same to the user space and it also handles the interface between the file system and the block device layer. The **caching layer** provides caching of disk reads and writes in terms of memory pages. The **generic block layer** breaks down IO operations that might involve many different non-contiguous blocks into multiple IO operations. The **I/O scheduling layer** takes these IO operations and schedules them based on order on disk, priority, and/or direction. Lastly, the **device driver** handles interfacing with the hardware for the actual operations in terms of disk sectors which are usually 512 bytes. 58 | 59 | `iostat` can break down the statistics at both the **partition level** and then **device level**. 60 | 61 | **Device utilization** is "The percentage of time the device spent servicing requests as opposed to being idle." 62 | 63 | It is import to note that **`iostat` shows requests to the device (or partition) and not read and write requests from user space**. So in the table above `iostat` is reading below the disk cache layer. Therefore, **`iostat` says noting about your cache hit ratio for block devices**. So it is possible that disk IO problems might be able to be resolved by memory upgrades. 64 | 65 | > iostat 展示的是针对 device 或 partition 的读写,不是来自用户空间的读写请求;iostat 没有给出关于 cache hit/miss 的相关信息; 66 | 67 | One of the main things to do when examining disk IO is **to determine if the disk access patterns are sequential or random**. This information can aid in our disk choices. When operations are random the seek time of the disk becomes more important. This is because physically the drive head has to jump around. Seek time is the measurement of the speed at which the heads can do this. **For small random reads solid state disks can be a huge advantage**. 68 | 69 | Snapshot of **Random Read** Test: 70 | 71 | ``` 72 | Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util 73 | sda 0.00 0.00 172.67 0.00 1381.33 0.00 8.00 0.99 5.76 5.76 99.47 74 | ``` 75 | 76 | Snapshot of **Sequential Read** Test: 77 | 78 | ``` 79 | Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util 80 | sda 13.00 0.00 367.00 0.00 151893.33 0.00 413.88 2.46 6.71 2.72 100.00 81 | ``` 82 | 83 | > 如何模拟顺序读和随机读; 84 | 85 | `rrqm/s` and `wrqm/s`, are read and write requests merged per second. I mentioned that that the scheduler can combine operations. This can be done **when multiple operations are physically adjacent to each other on the device**. So in sequential operation it would make sense to often see a large number of merges. In the snapshot of the random reads, we see no merges. However, the merging layer feels a little bit like “magic” and I don’t believe it is the best indicator of if the patterns are random or sequential. 86 | 87 | read and write requests to the device (`r/s`, `w/s`), followed by the amount of sectors read and written from the device (`rsec/s`, `wsec/s`), and then the size of each request (`avgrq-sz`). However the average request size does not differentiate between reads and writes. 88 | 89 | > avgrq-sz — average sectors per request (for both reads and writes). 90 | 91 | the average queue length of requests to the device (`avgqu-sz`), how long requests took to be serviced including their time in the queue (`await`), how long requests took to be serviced by the device after they left the queue (`svctm`), and lastly the utilization percentage (`util`). 92 | 93 | 94 | > await — average response time (ms) of IO requests to a device. 95 | > svctim — average time (ms) a device was servicing requests. This is a component of total response time of IO requests. 96 | 97 | 98 | Utilization (`util`), in more detail, is the `service time in ms * total IO operations / 1000 ms`. This gives the percentage of how busy the single disk was during the given time slice. 99 | 100 | 101 | > `utilization = ( (read requests + write requests) * service time in ms / 1000 ms ) * 100%` 102 | > 103 | > or 104 | > 105 | > `%util = ( r + w ) * svctim /10` = ( 10.18 + 9.78 ) * 8.88 = 17.72448 106 | 107 | 108 | In the end it seems **average request size is the key to show if the disk usage patterns are random or not since this is post merging**. Taking this into the context of the layers above this might not mirror what an application is doing. This is because a read or write operations coming from user space might operate on a fragmented file in which case the generic block layer will break it up and it appears as random disk activity. 109 | 110 | 111 | ---------- 112 | 113 | 参考: 114 | 115 | - [Interpreting iostat Output](https://blog.serverfault.com/2010/07/06/777852755/) 116 | - [Basic I/O Monitoring On Linux](https://blog.pythian.com/basic-io-monitoring-on-linux/) 117 | - [Is there a way to get Cache Hit/Miss ratios for block devices in Linux?](https://serverfault.com/questions/157612/is-there-a-way-to-get-cache-hit-miss-ratios-for-block-devices-in-linux/157724#157724) 118 | - [9 Linux iostat Command Examples for Performance Monitoring](https://linoxide.com/monitoring-2/find-linux-disk-utilization-iostat/) 119 | - [5 TOOLS FOR MONITORING DISK ACTIVITY IN LINUX](https://www.opsdash.com/blog/disk-monitoring-linux.html) 120 | 121 | -------------------------------------------------------------------------------- /opencensus blog summary.md: -------------------------------------------------------------------------------- 1 | # opencensus blog summary 2 | 3 | ## [“Hello, world!” for web servers in Go with OpenCensus](https://medium.com/@orijtech/hello-world-for-web-servers-in-go-with-opencensus-29955b3f02c6) 4 | 5 | > Apr 18, 2018 6 | 7 | 基于输出 "Hello, world!" 的 web server 介绍了如何针对 HTTP 使用 opencensus 提供的东东; 8 | 9 | 主要涉及导出到 Stackdriver/Prometheus/AWS X-Ray 的基础代码编写; 10 | 11 | ## [OpenCensus’s journey ahead: platforms and languages](https://opensource.googleblog.com/2018/05/opencensus-journey-ahead-part-1.html) 12 | 13 | > May 7, 2018 14 | 15 | ### Istio 16 | 17 | OpenCensus will soon have out-of-the-box tracing and metrics collection in Istio. We’re currently working through our initial designs and implementation for integrations with the Envoy Sidecar and Istio Mixer service. Our goal is to provide Istio users with a great out of box tracing and metrics collection experience. 18 | 19 | ### Kubernetes 20 | 21 | We have two primary use cases in mind for Kubernetes deployments: providing cluster-wide visibility via z-pages, and better labeling of traces, stats, and metrics. Cluster-wide z-pages will allow developers to view telemetry in real time across an entire Kubernetes deployment, independently of their back-end. This is incredibly useful when debugging immediate high-impact issues like service outages. 22 | 23 | ## [How Google uses Census internally](https://opensource.googleblog.com/2018/03/how-google-uses-opencensus-internally.html) 24 | 25 | > March 7, 2018 26 | 27 | Google adopted or invented new technologies, including distributed tracing (Dapper) and metrics processing, in order to operate some of the world’s largest web services. However, building analysis systems didn’t solve the difficult problem of instrumenting and extracting data from production services. This is what Census was created to do. 28 | 29 | The Census project provides uniform instrumentation across most Google services, capturing trace spans, app-level metrics, and other metadata like log correlations from production applications. One of the biggest benefits of uniform instrumentation to developers inside of Google is that it’s almost entirely automatic: any service that uses gRPC automatically collects and exports basic traces and metrics. 30 | 31 | ### Incident Management 32 | 33 | When latency problems or new errors crop up in a highly distributed environment, visibility into what’s happening is critical. For example, when the latency of a service crosses expected boundaries, we can view distributed traces in Dapper to find where things are slowing down. Or when a request is returning an error, we can look at the chain of calls that led to the error and examine the metadata captured during a trace (typically logs or trace annotations). This is effectively a bigger stack trace. In rare cases, we enable custom trigger-based sampling which allows us to focus on specific kinds of requests. 34 | 35 | Once we know there’s a production issue, we can use Census data to determine the regions, services, and scope (one customer vs many) of a given problem. You can use service-specific diagnostics pages, called “z-pages,” to monitor problems and the results of solutions you deploy. These pages are hosted locally on each service and provide a firehose view of recent requests, stats, and other performance-related information. 36 | 37 | ### Performance Optimization 38 | 39 | At Google’s scale, we need to be able to instrument and attribute costs for services. We use Census to help us answer questions like: 40 | 41 | - How much CPU time does my query consume? 42 | - Does my feature consume more storage resources than before? 43 | - What is the cost of a particular user operation at a particular layer of the stack? 44 | - What is the total cost of a particular user operation across all layers of the stack? 45 | 46 | We’re obsessed with reducing the tail latency of all services, so we’ve built sophisticated analysis systems that process traces and metrics captured by Census to identify regressions and other anomalies. 47 | 48 | ### Quality of Service 49 | 50 | Google also improves performance dynamically depending on the source and type of traffic. Using Census tags, traffic can be directed to more appropriate shards, or we can do things like load shedding and rate limiting. 51 | 52 | 53 | ## [The value of OpenCensus](https://opensource.googleblog.com/2018/03/the-value-of-opencensus.html) 54 | 55 | > March 13, 2018 56 | 57 | Google’s reasons for developing and promoting OpenCensus apply to partners at all levels. 58 | 59 | Service developers reap the benefits of having automatic traces and stats collection, along with vendor-neutral APIs for manually interacting with these. Developers who use open source backends like Prometheus or Zipkin benefit from having a single set of well-supported instrumentation libraries that can export to both services at once. 60 | 61 | For APM vendors, being able to take advantage of already-provided language support and framework integrations is huge, and the exporter API allows traces and metrics to be sent to an ingestion API without much additional work. Developers who might have been working on instrumentation code can now focus on other more important tasks, and vendors get traces and metrics back from places they previously didn’t have coverage for. 62 | 63 | Cloud and API providers have the added benefit of being able to include OpenCensus in client libraries, allowing customers to gain insight into performance characteristics and debug issues without having to contact support. In situations where customers were still not able to diagnose their own issues, customer traces can be matched with internal traces for faster root cause analysis, regardless of which tracing or APM product they use. 64 | 65 | ## [OpenCensus with Prometheus and Kubernetes](https://kausal.co/blog/opencensus-prometheus-kausal/) 66 | 67 | > Jan 18, 2018 68 | 69 | This [example app](https://github.com/census-instrumentation/opencensus-go/blob/master/exporter/prometheus/example/main.go) shows how to take measurements, and then expose them to a `/metrics` endpoint for Prometheus. 70 | 71 | Note that the concept of an “exporter” here is different to exporters in Prometheus land, where it usually means a separate process that translates between an applications native metrics interface and the prometheus exposition format, running as a sidecar or colocated in some other way with the application. 72 | 73 | Before we can measure anything, the measure type needs to be registered. 74 | 75 | ``` 76 | videoCount = stats.Int64("example.com/measures/video_count", "number of processed videos", stats.UnitDimensionless) 77 | ``` 78 | 79 | To record a single measurement we need to obtain a measure reference either by creating a new one, or by finding one via FindMeasure(name). 80 | 81 | This increased the videoCount counter by 1. 82 | 83 | ``` 84 | stats.Record(ctx, videoCount.M(1)) 85 | ``` 86 | 87 | This recording would be ignored because we have not registered the measure as part of a view. 88 | 89 | ``` 90 | if err = view.Register( 91 | &view.View{ 92 | Name: "video_count", 93 | Description: "number of videos processed over time", 94 | Measure: videoCount, 95 | Aggregation: view.Count(), 96 | }, 97 | ... 98 | ); err != nil { 99 | log.Fatalf("Cannot register the view: %v", err) 100 | } 101 | ``` 102 | 103 | After a view like the above uses the videoCount measure, its recordings will be kept, and by calling Subscribe() it makes its metrics available to the exporters. 104 | 105 | ``` 106 | xxx.Subscribe() 107 | ``` 108 | 109 | > NOTE: 此处,文章中的说明和实际示例代码中已经不一致; 110 | 111 | New labels can be added via tags. 112 | 113 | ``` 114 | routeKey, err := tag.NewKey("route") 115 | if err != nil { 116 | log.Fatal(err) 117 | } 118 | 119 | tagMap, err2 := tag.NewMap(ctx, 120 | tag.Insert(routeKey, "/myroute"), 121 | ) 122 | if err2 != nil { 123 | log.Fatal(err2) 124 | } 125 | ctx = tag.NewContext(ctx, tagMap) 126 | ``` 127 | 128 | This augmented context is later used in the stats.Record() call where it passes on those tags as labels. 129 | 130 | 131 | 132 | 133 | 134 | ## [OpenCensus: A Stats Collection and Distributed Tracing Framework](https://opensource.googleblog.com/2018/01/opencensus.html) 135 | 136 | > January 17, 2018 137 | 138 | OpenCensus, a vendor-neutral open source library for metric collection and tracing. OpenCensus is built to add minimal overhead and be deployed fleet wide, especially for microservice-based architectures. 139 | 140 | OpenCensus is the open source version of Google’s Census library, written based on years of optimization experience. It aims to make the collection and submission of app metrics and traces easier for developers. It is a vendor neutral, single distribution of libraries that automatically collects traces and metrics from your app, displays them locally, and sends them to analysis tools. 141 | 142 | Developers can use this powerful, out-of-the box library to instrument microservices and send data to any supported backend. For an Application Performance Management (APM) vendor, OpenCensus provides free instrumentation coverage with minimal work, and affords customers a simple setup experience. -------------------------------------------------------------------------------- /opencensus-go 之 zPages.md: -------------------------------------------------------------------------------- 1 | # opencensus 之 zPages 2 | 3 | ## zPages 4 | 5 | > Ref: https://opencensus.io/core-concepts/z-pages/ 6 | 7 | OpenCensus provides in-process web pages that displays collected data from the process. These pages are called `zPages` and they are useful to see collected data from a specific process without having to depend on any metric collection or distributed tracing backend. 8 | 9 | `zPages` can be useful during the development time or when the process to be inspected is known in production. `zPages` can also be used to debug exporter issues. 10 | 11 | In order to serve `zPages`, register their handlers and start a web server. Below, there is an example how to serve these pages from 127.0.0.1:7777. 12 | 13 | ```golang 14 | package main 15 | 16 | import ( 17 | "log" 18 | "net/http" 19 | "go.opencensus.io/zpages" 20 | ) 21 | 22 | func main() { 23 | // Using the default serve mux, but you can create your own 24 | mux := http.DefaultServeMux 25 | zpages.Handle(mux, "/") 26 | log.Fatal(http.ListenAndServe("127.0.0.1:7777", mux)) 27 | } 28 | ``` 29 | 30 | Once handler is registered, there are various pages provided from the libraries: 31 | 32 | - 127.0.0.1:7777/rpcz 33 | - 127.0.0.1:7777/tracez 34 | 35 | ### rpcz 36 | 37 | `/rpcz` serves stats about sent and received RPCs. 38 | 39 | Available stats include: 40 | 41 | - Number of RPCs made per minute, hour and in total. 42 | - Average latency in the last minute, hour and since the process started. 43 | - RPCs per second in the last minute, hour and since the process started. 44 | - Input payload in KB/s in the last minute, hour and since the process started. 45 | - Output payload in KB/s in the last minute, hour and since the process started. 46 | - Number of RPC errors in the last minute, hour and in total. 47 | 48 | ### tracez 49 | 50 | `/tracez` serves details about the trace spans collected in the process. It provides several sample spans per latency bucket and sample errored spans. 51 | 52 | ## code review 53 | 54 | > TODO 55 | -------------------------------------------------------------------------------- /opencensus-go 研究.md: -------------------------------------------------------------------------------- 1 | # opencensus-go 研究 2 | 3 | github: [opencensus-go](https://github.com/census-instrumentation/opencensus-go) 4 | 5 | 一句话:A stats collection and distributed tracing framework. 6 | 7 | > OpenCensus Go is a Go implementation of OpenCensus, a toolkit for collecting application performance and behavior monitoring data. Currently it consists of three major components: **tags**, **stats**, and **tracing**. 8 | 9 | - opencensus-go 能够针对应用性能进行数据收集,针对应用行为进行监控; 10 | - opencensus-go 目前主要由以下三部分构成 11 | - tags 12 | - stats 13 | - tracing 14 | 15 | ## 安装 16 | 17 | ``` 18 | go get -u go.opencensus.io 19 | ``` 20 | 21 | 注意:不是 `go get -u github.com/census-instrumentation/opencensus-go`,原因在于 package 的引用问题; 22 | 23 | > The API of this project is still evolving. The use of vendoring or a dependency management tool is recommended. 24 | 25 | 官方建议使用依赖管理工具来避免代码变更导致的不兼容问题; 26 | 27 | ## 项目组件 28 | 29 | 目前已存在的、集成了 OpenCensus 的 RPC framework 30 | 31 | - net/http -- 重点关注 32 | - gRPC -- 重点关注 33 | - database/sql 34 | - Go kit 35 | - Groupcache 36 | - Caddy webserver 37 | - MongoDB 38 | - Redis gomodule/redigo 39 | - Redis goredis/redis 40 | - Memcache 41 | 42 | 使用上述 RPC framework 时,能够方便的生成 instrumentation data ; 43 | 44 | 目前已经存在的 Exporters 45 | 46 | - [Prometheus](https://godoc.org/go.opencensus.io/exporter/prometheus) for stats 47 | - [OpenZipkin](https://godoc.org/go.opencensus.io/exporter/zipkin) for traces 48 | - [Stackdriver](https://godoc.org/contrib.go.opencensus.io/exporter/stackdriver) Monitoring for stats and Trace for traces 49 | - [Jaeger](https://godoc.org/go.opencensus.io/exporter/jaeger) for traces 50 | - [AWS X-Ray](https://github.com/census-ecosystem/opencensus-go-exporter-aws) for traces 51 | - [Datadog](https://github.com/DataDog/opencensus-go-exporter-datadog) for stats and traces 52 | 53 | ## Overview 54 | 55 | ![](https://camo.githubusercontent.com/198c3be40c5fe727bf7d00877aacc6199d3eaf68/68747470733a2f2f692e696d6775722e636f6d2f636634456c48452e6a7067) 56 | 57 | > In a **microservices** environment, a user request may go through multiple services until there is a response. OpenCensus allows you to instrument your services and collect diagnostics data all through your services end-to-end. 58 | 59 | 在微服务环境下的使用; 60 | 61 | ## Tags 62 | 63 | > **Tags** represent propagated key-value pairs. They are **propagated** using `context.Context` **in the same process** or can be encoded to be transmitted **on the wire**. Usually, this will be handled by an integration plugin, e.g. `ocgrpc.ServerHandler` and `ocgrpc.ClientHandler` for gRPC. 64 | 65 | - tags 代表被传递的 key/value 对; 66 | - tags 可在同一个进程内部通过 `context.Context` 传递,也可以编码后跨进程传递; 67 | 68 | > Package `tag` allows **adding** or **modifying** tags in the current context. 69 | > 70 | > ``` 71 | > ctx, err = tag.New(ctx, 72 | > tag.Insert(osKey, "macOS-10.12.5"), 73 | > tag.Upsert(userIDKey, "cde36753ed"), 74 | > ) 75 | > if err != nil { 76 | > log.Fatal(err) 77 | > } 78 | > ``` 79 | 80 | 可以针对 tag 进行添加和修改; 81 | 82 | ## Stats 83 | 84 | > OpenCensus is a low-overhead framework even if instrumentation is always enabled. In order to be so, **it is optimized to make recording of data points fast and separate from the data aggregation**. 85 | > 86 | > OpenCensus **stats** collection happens in two stages: 87 | > 88 | > - Definition of **measures** and recording of data points 89 | > - Definition of **views** and aggregation of the recorded data 90 | 91 | stats 收集在两个 stage 上发生: 92 | 93 | - 定义 measures 时:记录 data points 时(关键词 **record**) 94 | - 定义 views 时:针对记录(数据)进行聚合时(关键词 **snapshot**) 95 | 96 | ### Recording 97 | 98 | > **Measurements** are data points associated with a **measure**. **Recording** implicitly tags the set of Measurements with the tags from the provided context: 99 | > ``` 100 | > stats.Record(ctx, videoSize.M(102478)) 101 | > ``` 102 | 103 | - **Measurements** 对应了与某个 measure 关联的 recorded data points 集合; 104 | - **Recording** 对应了基于 tags 对 Measurements 进行的分组; 105 | 106 | ### Views 107 | 108 | > **Views** are how Measures are aggregated. You can think of them as **queries** over the set of recorded data points (**measurements**). 109 | > 110 | > **Views** have two parts: 111 | > 112 | > - the **tags** to group by, and 113 | > - the **aggregation type** used. 114 | > 115 | > Currently three types of aggregations are supported: 116 | > 117 | > - **CountAggregation** is used to count the number of times a sample was recorded. 118 | > - **DistributionAggregation** is used to provide a histogram of the values of the samples. 119 | > - **SumAggregation** is used to sum up all sample values. 120 | > 121 | > ``` 122 | > countAgg := view.Count() 123 | > distAgg := view.Distribution(0, 1<<32, 2<<32, 3<<32) 124 | > sumAgg := view.Sum() 125 | > ``` 126 | 127 | - Views 对应了 Measures 被聚合的方式;可以认为 view 是针对 measurements 的 query 查询; 128 | - Views 由两部分构成 129 | - 用于进行 group by 的 tags 130 | - 所使用的 aggregation 类型 131 | - aggregations 类型 132 | - **CountAggregation** 用于统计样本被 record 的次数; 133 | - **DistributionAggregation** 用于提供样本数值分布的 histogram ; 134 | - **SumAggregation** 用于对所有样本值进行求和; 135 | 136 | 137 | > Here we create a **view** with the **DistributionAggregation** over our **measure**. 138 | > 139 | > ``` 140 | > if err := view.Register(&view.View{ 141 | > Name: "example.com/video_size_distribution", 142 | > Description: "distribution of processed video size over time", 143 | > Measure: videoSize, 144 | > Aggregation: view.Distribution(0, 1<<32, 2<<32, 3<<32), 145 | > }); err != nil { 146 | > log.Fatalf("Failed to register view: %v", err) 147 | > } 148 | > ``` 149 | > 150 | > **Register** (`view.Register`) begins collecting data for the view. Registered views' data will be exported via the registered exporters. 151 | 152 | 通过 `view.Register` 进行 view 注册后,将触发针对该 view 的数据收集;与注册 view 相关的数据会通过已注册的 exporter 导出到外部; 153 | 154 | ## Traces 155 | 156 | > A **distributed trace** tracks the progression of a single user request as it is handled by the services and processes that make up an application. Each step is called a **span** in the **trace**. Spans include **metadata** about the step, including especially the time spent in the step, called the **span’s latency**. 157 | 158 | - trace 由 span 构成,用于跟踪指定用户请求的完整处理过程; 159 | - span 中会包含关于当前 step 的各种 metadata 数据;其中 span 产生的 latency 是我们重点关注的; 160 | 161 | ### Spans 162 | 163 | > **Span** is the unit step in a **trace**. Each span has a **name**, **latency**, **status** and **additional metadata**. 164 | > 165 | > Below we are starting a span for a cache read and ending it when we are done: 166 | > 167 | > ``` 168 | > ctx, span := trace.StartSpan(ctx, "cache.Get") 169 | > defer span.End() 170 | > 171 | > // Do work to get from cache. 172 | > ``` 173 | 174 | 每个 span 都具有以下属性: 175 | 176 | - name 177 | - latency 178 | - status 179 | - additional metadata 180 | 181 | 182 | ### Propagation 183 | 184 | > Spans can have **parents** or can be **root spans** if they don't have any parents. The current span is propagated **in-process** and **across the network** to allow associating new child spans with the parent. 185 | > 186 | > **In the same process**, `context.Context` is used to propagate spans. `trace.StartSpan` creates a new span as a root if the current context doesn't contain a span. Or, it creates a child of the span that is already in current context. The returned context can be used to keep propagating the newly created span in the current context. 187 | > 188 | > **Across the network**, OpenCensus provides different propagation methods for different protocols. 189 | > 190 | > - **gRPC integrations** uses the **OpenCensus**' [binary propagation format](https://godoc.org/go.opencensus.io/trace/propagation). 191 | > - **HTTP integrations** uses **Zipkin**'s [B3](https://github.com/openzipkin/b3-propagation) by default but can be configured to use a custom propagation method by setting another [propagation.HTTPFormat](https://godoc.org/go.opencensus.io/trace/propagation#HTTPFormat). 192 | 193 | - Spans 具有父子关系; 194 | - 在同一个进程内,通过使用 `context.Context` 进行 span 信息的 propagate ; 195 | - 在跨网络传递时,OpenCensus 提供了不同的 propagation 方法: 196 | - **gRPC integrations** 使用的是 **OpenCensus** 自己实现的 [binary propagation format](https://godoc.org/go.opencensus.io/trace/propagation) ; 197 | - **HTTP integrations** 默认使用的是 **Zipkin** 的 [B3](https://github.com/openzipkin/b3-propagation) ,但可以通过设置 [propagation.HTTPFormat](https://godoc.org/go.opencensus.io/trace/propagation#HTTPFormat) 配置成使用自定义 propagation 方法 ; 198 | 199 | 200 | ---------- 201 | 202 | 203 | ## 代码阅读 204 | 205 | 206 | 在 `stats/measure.go` 中有 207 | 208 | > **Measure** represents a single numeric value to be tracked and recorded. For example, latency, request bytes, and response bytes could be measures to collect from a server. 209 | > 210 | > Measures by themselves have no outside effects. **In order to be exported, the measure needs to be used in a View**. If no Views are defined over a measure, there is very little cost in recording it. 211 | 212 | > **Measurement** is the numeric value measured when recording stats. Each measure provides methods to create measurements of their kind. For example, `Int64Measure` provides `M` to convert an `int64` into a measurement. 213 | 214 | 在 `stats/view/view.go` 中有 215 | 216 | > **View** allows users to aggregate the recorded `stats.Measurements`. **Views need to be passed to the Register function to be before data will be collected and sent to Exporters**. 217 | 218 | 在 `trace/basetypes.go` 中有 219 | 220 | > **Annotation** represents a text annotation with a set of attributes and a timestamp. 221 | 222 | > **Attribute** represents a key-value pair on a span, link or annotation. Construct with one of: BoolAttribute, Int64Attribute, or StringAttribute. 223 | 224 | > **Link** represents a reference from one span to another span. 225 | 226 | 227 | 在 `trace/trace.go` 中有 228 | 229 | > **Span** represents a span of a trace. It has an associated SpanContext, and stores data accumulated while the span is active. 230 | 231 | > **SpanData** representing the current state of the Span. 232 | 233 | -------------------------------------------------------------------------------- /shell 脚本中颜色输出问题.md: -------------------------------------------------------------------------------- 1 | # shell 脚本中颜色输出问题 2 | 3 | > 写在开始: 4 | > 5 | > - 若使用 `#!/bin/sh` ,由于其链接到 `dash` 上,因此在使用 echo 输出时不需要指定 -e 选项; 6 | > - 若使用 `#!/bin/bash` ,则必须指定 -e 选项才行; 7 | 8 | 输出示例 9 | 10 | ``` 11 | echo -e "\e[1;34mThis is a blue text.\e[0m" 12 | ``` 13 | 14 | 控制开关 15 | 16 | - \e[`attribute code`;`text color code`;`background color code`m 17 | - \e[0m 18 | 19 | > 经确认,`attribute code`、`text color code` 和 `background color code` 的顺序没有严格要求,因为数值范围不同,不会导致错误; 20 | 21 | | Attribute codes | 英文 | 中文 | 22 | | -- | -- | -- | 23 | | 00 | none | 重新设置属性到缺省设置 | 24 | | 01 | bold | 设置粗体 | 25 | | 02 | | 设置一半亮度 | 26 | | 03 | | 设置斜体 | 27 | | 04 | underscore | 设置下划线 | 28 | | 05 | blink | 设置闪烁 | 29 | | 07 | reverse | 反白显示 | 30 | | 08 | concealed | 不可见/消隐 | 31 | 32 | | Text color codes | 英文 | 中文 | 33 | | -- | -- | -- | 34 | | 30 | black | 黑色 | 35 | | 31 | red | 红色 | 36 | | 32 | green | 绿色 | 37 | | 33 | yellow | 黄色 | 38 | | 34 | blue | 蓝色 | 39 | | 35 | magenta | 紫红色 | 40 | | 36 | cyan | 青蓝色 | 41 | | 37 | white | 白色 | 42 | | 38 | | 在缺省的前景颜色上设置下划线 | 43 | | 39 | | 在缺省的前景颜色上关闭下划线 | 44 | 45 | | Background color codes | 英文 | 中文 | 46 | | -- | -- | -- | 47 | | 40 | black | 黑色 | 48 | | 41 | red | 红色 | 49 | | 42 | green | 绿色 | 50 | | 43 | yellow | 黄色 | 51 | | 44 | blue | 蓝色 | 52 | | 45 | magenta | 紫红色 | 53 | | 46 | cyan | 青蓝色 | 54 | | 47 | white | 白色 | 55 | 56 | 57 | 可以输出全部颜色的脚本(略微调整) 58 | 59 | ``` 60 | #/bin/bash 61 | 62 | for STYLE in 0 1 2 3 4 5 6 7; do 63 | for FG in 30 31 32 33 34 35 36 37; do 64 | for BG in 40 41 42 43 44 45 46 47; do 65 | CTRL="\033[${STYLE};${FG};${BG}m" 66 | END="\033[0m" 67 | echo "${CTRL}moooofly${END} <--> ${STYLE};${FG};${BG}" 68 | done 69 | echo 70 | done 71 | echo 72 | done 73 | # Reset 74 | echo "\033[0m" 75 | ``` 76 | 77 | 更实际的一个脚本 78 | 79 | ``` 80 | #!/bin/bash 81 | 82 | # NOTE: 83 | # 若使用 #!/bin/sh ,由于其链接到 dash 上,因此使用 echo 输出时不需要指定 -e 选项 84 | # 若使用 #!/bin/bash ,则必须指定 -e 选项才行 85 | 86 | echo -e "\033[47;30;5m david use echo say \033[0m Hello World" 87 | 88 | echo -e "\033[0m none \033[0m" 89 | echo -e "\033[30m black \033[0m" 90 | echo -e "\033[1;30m dark_gray \033[0m" 91 | echo -e "\033[0;34m blue \033[0m" 92 | echo -e "\033[1;34m light_blue \033[0m" 93 | echo -e "\033[0;32m green \033[0m" 94 | echo -e "\033[1;32m light_green \033[0m" 95 | echo -e "\033[0;36m cyan \033[0m" 96 | echo -e "\033[1;36m light_cyan \033[0m" 97 | 98 | echo -e "\033[0;31m red \033[0m" 99 | echo -e "\033[1;31m light_red \033[0m" 100 | echo -e "\033[0;35m purple \033[0m" 101 | echo -e "\033[1;35m light_purple \033[0m" 102 | echo -e "\033[0;33m brown \033[0m" 103 | echo -e "\033[1;33m yellow \033[0m" 104 | echo -e "\033[0;37m light_gray \033[0m" 105 | echo -e "\033[1;37m white \033[0m" 106 | echo -e "\033[0m none \033[0m" 107 | 108 | echo -e "\033[40;37m 黑底白字 \033[0m" 109 | echo -e "\033[41;30m 红底黑字 \033[0m" 110 | echo -e "\033[41;30;1m 红底加粗黑字 \033[0m" 111 | echo -e "\033[42;34m 绿底蓝字 \033[0m" 112 | echo -e "\033[42;34;1m 绿底加粗蓝字 \033[0m" 113 | echo -e "\033[42;30;1m 绿底加粗黑字 \033[0m" 114 | echo -e "\033[43;34m 黄底蓝字 \033[0m" 115 | echo -e "\033[44;30m 蓝底黑字 \033[0m" 116 | echo -e "\033[44;30;1m 蓝底加粗黑字 \033[0m" 117 | echo -e "\033[45;30m 紫底黑字 \033[0m" 118 | echo -e "\033[46;30m 天蓝底黑字 \033[0m" 119 | echo -e "\033[46;30;1m 天蓝底加粗黑字 \033[0m" 120 | echo -e "\033[47;34m 白底蓝字 \033[0m" 121 | echo -e "\033[47;30m 白底黑字 \033[0m" 122 | echo -e "\033[47;30;1m 白底加粗黑字 \033[0m" 123 | echo -e "\033[4;31m 下划线红字 \033[0m" 124 | echo -e "\033[5;31m 红字在闪烁 \033[0m" 125 | echo -e "\033[8m 消隐 \033[0m" 126 | 127 | echo "---------" 128 | 129 | RED_COLOR='\033[1;31m' 130 | YELOW_COLOR='\033[1;33m' 131 | BLUE_COLOR='\033[1;34m' 132 | RESET='\033[0m' 133 | 134 | echo -e "${RED_COLOR}===david say red color===${RESET}" 135 | echo -e "${YELOW_COLOR}===david say yelow color===${RESET}" 136 | echo -e "${BLUE_COLOR}===david say green color===${RESET}" 137 | 138 | echo "---------" 139 | 140 | ESC=" 141 | 142 | RED_COLOR="${ESC}[31m" 143 | YELOW_COLOR="${ESC}[33m" 144 | BLUE_COLOR="${ESC}[34m" 145 | RESET="${ESC}[0m" 146 | 147 | echo -e "${RED_COLOR}===david say red color===${RESET}" 148 | echo -e "${YELOW_COLOR}===david say yelow color===${RESET}" 149 | echo -e "${BLUE_COLOR}===david say green color===${RESET}" 150 | ``` 151 | 152 | ---------- 153 | 154 | 155 | 参考: 156 | 157 | - [Bash: Using Colors](http://webhome.csc.uvic.ca/~sae/seng265/fall04/tips/s265s047-tips/bash-using-colors.html) 158 | - [shell脚本输出输出带颜色内容](http://blog.csdn.net/David_Dai_1108/article/details/70478826) 159 | - [Bash Shell怎么打印各种颜色](https://segmentfault.com/q/1010000000122806) 160 | 161 | 162 | 163 | 164 | 165 | -------------------------------------------------------------------------------- /strace 的使用场景.md: -------------------------------------------------------------------------------- 1 | # strace 的使用场景 2 | 3 | > https://eklitzke.org/strace 4 | 5 | A lot of people know about `strace`, but in my opinion fail to use it effectively because they don’t use it at the right time. 6 | 7 | - **Any time a program is crashing right after you start it**, you can probably figure out what’s going on with `strace`. The most common situation here is that the program is making a failed system call and then exiting after the failed system call. Frequently this will be something like a missing file or a permissions error (e.g. **ENOENT** or **EPERM**), but it could be almost any failed system call. As I said before, try to work backwards from the end of the `strace` output. 8 | 9 | - **Any time you have a program that appears to be hung**, you should use `strace -p` to see what system call it’s stuck on (or what sequence of system calls it’s stuck on). **Any time you have a program that appears to be running really slowly**, you should use `strace` or `strace -p` and see if there are system calls that are taking a long time. 10 | 11 | In both of the cases I just listed, there are **a few common patterns** that you’ll see: 12 | 13 | - Programs that are **stuck in a `wait` call** are waiting for a child process to complete, proceed by tracing the child processes. 14 | - Programs that are **stuck in `select(2)` or `epoll_wait(2)` calls** or loops without making other system calls are typically waiting forever for network data. To debug, try to figure out what file descriptors are in the `select` or `epoll` (you can typically figure this out with more `strace`, by looking in `/proc`, or using `lsof`). This is potentially kind of tricky to do with edge-triggered epoll loops, you may need to use `gdb` or restart the program if you don’t have sufficient context. 15 | - Programs that are **stuck in calls like `connect(2)`, `read(2)`, `write(2)`, `sendto(2)`, `recvfrom(2)`, etc. are also stuck doing I/O**, and once again you should use something like `lsof` to figure out what the other end of the file descriptor is. 16 | - Programs that are **stuck in the `futex(2)` system call** typically have some sort of **multi-threading related problem** as this is the primitive that `Pthreads` uses on Linux. You’ll see this also in higher level programs written in languages like **Python**, since Python threads are implemented on Linux using `Pthreads`. This can’t easily be debugged with `strace` but often knowing that it’s a **threading problem** is sufficient to point you in the right direction. If you need to do more low level debugging, proceed with `gdb` (I would recommend starting by attaching to the process and getting a backtrace). 17 | 18 | Another really common use case for me is **programs that aren’t loading the right files**. Here’s one that comes up a lot. I frequently see people who have a Python program that has an import statement that is either loading the wrong module, or that is failing even though the developer thinks that the module should be able to be imported. **How can you figure out where Python is attempting to load the module from?** You can use `strace` and then look for calls to `open(2)` (or perhaps `acces(2)`), and then `grep` for the module name in question. Quickly you’ll see what files the Python interpreter is actually opening (or attempting to open), and that will dispell any misconceptions about what may or may not be in your `sys.path`, will identify stale bytecode files, etc. 19 | 20 | **If there aren’t any slow system calls or the system calls don’t appear to be looping then the process is likely stuck on some sort of userspace thing**. For instance, if the program is computing the value of pi to trillions of decimal places it will probably not be doing any system calls (or doing very few system calls that complete quickly). In this case you need to use a userspace debugger like `gdb`. 21 | 22 | One final piece of advice I have is to try to make an effort to learn as many system calls as you can and figure out why programs are making them. If you’ve never used `strace` before the output is going to be really overwhelming and confusing, particularly if you don’t have much Unix C programming experience. However, there actually aren’t that many system calls that are regularly used or that are useful when debugging these issues. There are also common sequences or patterns to system calls (e.g. for things like program initialization, memory allocation, network I/O, etc.) that you can pick up quickly that will make scanning `strace` output really fast. All of the system calls you see will have man pages available that explain all of the arguments and return codes, so you don’t even need to leave your terminal or have an internet connection to figure out what’s going on. 23 | 24 | Happy debugging, and may the `strace` gods bless you with quick results. 25 | 26 | 27 | 28 | 29 | -------------------------------------------------------------------------------- /利用 Chrome 原生工具进行网页长截图.md: -------------------------------------------------------------------------------- 1 | > 相信一定有朋友曾经好奇过,朋友圈中发的长图是怎么制作出来的,诚然方法很多,不一而足。​本文介绍一种基于 Chrome 开发者工具进行制作的办法。 2 | 3 | --- 4 | 5 | 相信很多朋友不知道,Chrome 开发者工具中其实自带了截图命令,但需要首先确保 Chrome 已升级至 59 或更高版本; 6 | 7 | ## 召唤出调试界面 8 | 9 | mac 命令 10 | 11 | ```shell 12 | ⌘Command + ⌥Option + I 13 | ``` 14 | 15 | windows 命令 16 | 17 | ```shell 18 | Ctrl + Shift + I 19 | ``` 20 | 21 | ## 截取常规尺寸网页长图 22 | 23 | mac 命令 24 | 25 | ```shell 26 | ⌘Command + ⇧Shift + P 27 | ``` 28 | 29 | windows 命令 30 | 31 | ```shell 32 | Ctrl + Shift + P 33 | ``` 34 | 35 | 之后输入命令 "Capture full size screenshot"(只输前几个字母就能找到),敲下回车; 36 | 37 | ## 截取手机版网页长图 38 | 39 | 只需要按下如下命令即可模拟移动设备; 40 | 41 | mac 命令 42 | 43 | ```shell 44 | ⌘Command + ⇧Shift + M 45 | ``` 46 | 47 | windows 命令 48 | 49 | ```shell 50 | Ctrl + Shift + M 51 | ``` 52 | 53 | > 注:已知该快捷键和某输入法“系统菜单”打开命令有冲突,需要自行关闭 54 | 55 | 图 56 | 57 | 可以看到,在顶部的工具栏中,可以设置想要模拟的设备和分辨率等内容; 58 | 59 | ## 准确截取网页的某一部分 60 | 61 | mac 命令 62 | 63 | ``` 64 | ⌘Command + ⇧Shift + C 65 | ``` 66 | 67 | > windows 上实测该命令不生效,可以在调用 `Ctrl + Shift + P` 之后,自己通过上下箭头手动选择 68 | 69 | 选中想要的部分后,再运行 Capture node screenshot 命令,一张完美的选区截图就诞生了。 70 | 71 | 72 | -------------------------------------------------------------------------------- /基于 docker 本地构建 kafka 测试环境.md: -------------------------------------------------------------------------------- 1 | # 基于 docker 本地构建 kafka 测试环境 2 | 3 | - kafka 环境:https://github.com/wurstmeister/kafka-docker/wiki/Connectivity 4 | - [edenhill/kafkacat](https://github.com/edenhill/kafkacat) 5 | 6 | 7 | ## kafkacat 8 | 9 | 10 | > kafkacat is a generic non-JVM producer and consumer for Apache Kafka >=0.8, think of it as a netcat for Kafka. 11 | 12 | - 通用的、基于命令行的、non-JVM 的,Kafka producer 和 consumer ; 13 | - 适用于 Apache Kafka >=0.8 14 | - 等价于 netcat 的作用 15 | 16 | ## 模式 17 | 18 | > In **producer mode** kafkacat reads messages from `stdin`, delimited with a configurable delimiter (`-D`, defaults to newline), and produces them to the provided Kafka cluster (`-b`), topic (`-t`) and partition (`-p`). 19 | 20 | - 在 producer 模式下,kafkacat 从 stdin 进行 messages 读取,默认的多 messages 分隔符号为换行符(可以通过 `-D` 进行改变); 21 | - 在 producer 模式下,kafkacat 可以将 message 发送到指定的 Kafka 集群(`-b`)中,指定的 topic 上(`-t`),以及指定的 partition 中(`-p`); 22 | 23 | 24 | > In **consumer mode** kafkacat reads messages from a topic and partition and prints them to stdout using the configured message delimiter. 25 | 26 | - 在 consumer 模式下,kafkacat 将从指定的 topic 和 partition 中进行 messages 的读取,并将其从 stdout 上输出; 27 | - 输出时同样可以控制 message 使用的分隔符号; 28 | 29 | > kafkacat also features a Metadata list (`-L`) mode to display the current state of the Kafka cluster and its topics and partitions. 30 | 31 | kafkacat 还支持 Metadata 模式(`-L`),用于查看 kafka 集群的元数据信息,包括 topics 和 partitions 信息; 32 | 33 | > There's also support for the Kafka >=0.9 high-level balanced consumer, use the `-G ` switch and provide a list of topics to join the group. 34 | 35 | kafkacat 还支持 balanced consumer ; 36 | 37 | 38 | ## 安装 39 | 40 | - Ubuntu 41 | 42 | ``` 43 | apt-get install kafkacat 44 | sudo apt-get install librdkafka-dev libyajl-dev 45 | ``` 46 | 47 | - Mac OS X 48 | 49 | ``` 50 | brew install kafkacat 51 | ``` 52 | 53 | 54 | ## 使用 55 | 56 | > 更多高级用法,详见 github 上的 README.md 57 | 58 | - 获取 metadata 59 | 60 | ``` 61 | root@proxy-hangzhou:~# kafkacat -L -b 47.98.126.155:32776,47.98.126.155:32777,47.98.126.155:32778 62 | Metadata for all topics (from broker 1001: 47.98.126.155:32776/1001): 63 | 3 brokers: 64 | broker 1001 at 47.98.126.155:32776 65 | broker 1003 at 47.98.126.155:32778 66 | broker 1002 at 47.98.126.155:32777 67 | 2 topics: 68 | topic "test_topic" with 1 partitions: 69 | partition 0, leader 1001, replicas: 1001, isrs: 1001 70 | topic "beats" with 1 partitions: 71 | partition 0, leader 1001, replicas: 1001, isrs: 1001 72 | root@proxy-hangzhou:~# 73 | ``` 74 | 75 | - 创建 consumer 76 | 77 | ``` 78 | root@proxy-hangzhou:~# kafkacat -C -b 47.98.126.155:32776,47.98.126.155:32777,47.98.126.155:32778 -t test_topic 79 | 80 | % Reached end of topic test_topic [0] at offset 15 81 | 82 | 83 | 1:{"order_id":1,"order_ts":1534772501276,"total_amount":10.50,"customer_name":"Bob Smith"} 84 | 2:{"order_id":2,"order_ts":1534772605276,"total_amount":3.32,"customer_name":"Sarah Black"} 85 | 3:{"order_id":3,"order_ts":1534772742276,"total_amount":21.00,"customer_name":"Emma Turner"} 86 | % Reached end of topic test_topic [0] at offset 18 87 | 88 | 89 | 90 | topic 91 | % Reached end of topic test_topic [0] at offset 19 92 | topic 93 | % Reached end of topic test_topic [0] at offset 20 94 | 95 | ``` 96 | 97 | - 创建 producer 98 | 99 | 100 | ``` 101 | root@proxy-hangzhou:~# kafkacat -P -b 47.98.126.155:32776,47.98.126.155:32777,47.98.126.155:32778 -t test_topic < 1:{"order_id":1,"order_ts":1534772501276,"total_amount":10.50,"customer_name":"Bob Smith"} 103 | > 2:{"order_id":2,"order_ts":1534772605276,"total_amount":3.32,"customer_name":"Sarah Black"} 104 | > 3:{"order_id":3,"order_ts":1534772742276,"total_amount":21.00,"customer_name":"Emma Turner"} 105 | > EOF 106 | root@proxy-hangzhou:~# 107 | root@proxy-hangzhou:~# 108 | root@proxy-hangzhou:~# kafkacat -P -b 47.98.126.155:32776,47.98.126.155:32777,47.98.126.155:32778 -t test_a < aaa 110 | > EOF 111 | root@proxy-hangzhou:~# kafkacat -P -b 47.98.126.155:32776,47.98.126.155:32777,47.98.126.155:32778 -t test_a < topic 117 | > EOF 118 | root@proxy-hangzhou:~# 119 | root@proxy-hangzhou:~# 120 | root@proxy-hangzhou:~# kafkacat -P -b 47.98.126.155:32776,47.98.126.155:32777,47.98.126.155:32778 -t test_a < https://github.com/moooofly/MarkSomethingDownLLS/issues/78#issuecomment-479787685 133 | 134 | 135 | 136 | -------------------------------------------------------------------------------- /基于内核源码研究 netstat -st 输出信息.md: -------------------------------------------------------------------------------- 1 | # 基于内核源码研究 netstat -st 输出信息 2 | 3 | > 以下内容基于 linux-3.13.11 源码 4 | 5 | ## TCPBacklogDrop 6 | 7 | 含义:**由于内存不足导致 skb 被 drop** ; 8 | 9 | TCPBacklogDrop 对应 LINUX_MIB_TCPBACKLOGDROP 10 | 11 | 在 `tcp_v4_rcv()` 中有 12 | 13 | ``` 14 | 2015 } else if (unlikely(sk_add_backlog(sk, skb, 15 | 2016 sk->sk_rcvbuf + sk->sk_sndbuf))) { 16 | 2017 bh_unlock_sock(sk); 17 | 2018 NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP); 18 | ``` 19 | 20 | 在 `include/net/sock.h` 中有 21 | 22 | ``` 23 | 788 /* 24 | 789 * Take into account size of receive queue and backlog queue 25 | 790 * Do not take into account this skb truesize, 26 | 791 * to allow even a single big packet to come. 27 | 792 */ 28 | 793 static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb, 29 | 794 unsigned int limit) 30 | 795 { 31 | 796 unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc); 32 | 797 33 | 798 return qsize > limit; 34 | 799 } 35 | 800 36 | 801 /* The per-socket spinlock must be held here. */ 37 | 802 static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb, 38 | 803 unsigned int limit) 39 | 804 { 40 | 805 if (sk_rcvqueues_full(sk, skb, limit)) 41 | 806 return -ENOBUFS; 42 | 807 43 | 808 __sk_add_backlog(sk, skb); 44 | 809 sk->sk_backlog.len += skb->truesize; 45 | 810 return 0; 46 | 811 } 47 | ``` 48 | 49 | 在 `arch/alpha/include/uapi/asm/errno.h` 中有 50 | 51 | ``` 52 | 31 #define ENOBUFS 55 /* No buffer space available */ 53 | ``` 54 | 55 | 其中 56 | 57 | - `sk->sk_rcvbuf` 对应 size of receive buffer in bytes 58 | - `sk->sk_sndbuf` 对应 size of send buffer in bytes 59 | 60 | 综上,该参数的含义(大致意思)是由于内存不足导致 skb 被 drop ; 61 | 62 | 63 | 64 | ## ListenOverflows 65 | 66 | 含义:**当收到 TCP 三次握手的第一个 SYN 或最后一次握手的 ACK 时,若发现 accept queue 已满,则说明发生了 overflow 的问题**; 67 | 68 | ListenOverflows 对应 LINUX_MIB_LISTENOVERFLOWS 69 | 70 | 在 `tcp_v4_conn_request()` 中有 71 | 72 | ``` 73 | 1476 /* Accept backlog is full. If we have already queued enough 74 | 1477 * of warm entries in syn queue, drop request. It is better than 75 | 1478 * clogging syn queue with openreqs with exponentially increasing 76 | 1479 * timeout. 77 | 1480 */ 78 | 1481 if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) { 79 | 1482 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); 80 | 1483 goto drop; 81 | 1484 } 82 | ``` 83 | 84 | - `sk_acceptq_is_full(sk)` 判定 accept queue 是否已满; 85 | - `inet_csk_reqsk_queue_young(sk)` 获取 SYN 队列中还没有握手完成的请求数,也就是 young request sock 的数量; 86 | 87 | > `tcp_v4_conn_request()` 实现了针对 listen 状态下收到三次握手的第一个 SYN 的处理; 88 | 89 | 在 `tcp_v4_syn_recv_sock()` 中有 90 | 91 | ``` 92 | 1635 if (sk_acceptq_is_full(sk)) 93 | 1636 goto exit_overflow; 94 | ... 95 | 1703 exit_overflow: 96 | 1704 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); 97 | ``` 98 | 99 | > `tcp_v4_syn_recv_sock()` 实现了处于 SYN_RECV 状态下收到一个(三次握手最后的)合法 ACK 后,新建一个 socket 的处理; 100 | 101 | 综上,该参数的含义是当收到 TCP 三次握手的第一个 SYN 或最后一次握手的 ACK 时,若发现 accept queue 已满,则说明发生了 overflow 的问题; 102 | 103 | ## ListenDrops 104 | 105 | 含义:可以看到,很多分支都会触发 drop 操作; 106 | 107 | ListenDrops 对应 LINUX_MIB_LISTENDROPS 108 | 109 | 在 `tcp_v4_conn_request()` 中有 110 | 111 | ``` 112 | 1461 /* Never answer to SYNs send to broadcast or multicast */ 113 | 1462 if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST)) 114 | 1463 goto drop; 115 | 1464 116 | 1465 /* TW buckets are converted to open requests without 117 | 1466 * limitations, they conserve resources and peer is 118 | 1467 * evidently real one. 119 | 1468 */ 120 | 1469 if ((sysctl_tcp_syncookies == 2 || 121 | 1470 inet_csk_reqsk_queue_is_full(sk)) && !isn) { 122 | 1471 want_cookie = tcp_syn_flood_action(sk, skb, "TCP"); 123 | 1472 if (!want_cookie) 124 | 1473 goto drop; 125 | 1474 } 126 | 1475 127 | 1476 /* Accept backlog is full. If we have already queued enough 128 | 1477 * of warm entries in syn queue, drop request. It is better than 129 | 1478 * clogging syn queue with openreqs with exponentially increasing 130 | 1479 * timeout. 131 | 1480 */ 132 | 1481 if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) { 133 | 1482 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS); 134 | 1483 goto drop; 135 | 1484 } 136 | 1485 137 | 1486 req = inet_reqsk_alloc(&tcp_request_sock_ops); 138 | 1487 if (!req) 139 | 1488 goto drop; 140 | ... 141 | 1611 drop: 142 | 1612 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS); 143 | ``` 144 | 145 | 在 `tcp_v4_syn_recv_sock()` 中有 146 | 147 | ``` 148 | 1697 if (__inet_inherit_port(sk, newsk) < 0) 149 | 1698 goto put_and_exit; 150 | ... 151 | 1707 exit: 152 | 1708 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS); 153 | 1709 return NULL; 154 | 1710 put_and_exit: 155 | 1711 inet_csk_prepare_forced_close(newsk); 156 | 1712 tcp_done(newsk); 157 | 1713 goto exit; 158 | ``` 159 | 160 | 161 | ---------- 162 | 163 | 164 | 相关代码: 165 | 166 | - include/uapi/linux/snmp.h 167 | - ./net/ipv4/tcp_ipv4.c 168 | - include/net/sock.h 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | -------------------------------------------------------------------------------- /如何关闭 iptables 中 connection tracking (conntrack).md: -------------------------------------------------------------------------------- 1 | # 如何关闭 iptables 中 connection tracking (conntrack) 功能 2 | 3 | 4 | 在《[How to disable conntrack protocol parsing in the linux kernel?](https://security.stackexchange.com/questions/121513/how-to-disable-conntrack-protocol-parsing-in-the-linux-kernel)》中有 5 | 6 | 7 | > There are a few ways to set module parameters, both **temporary** and **persistent**. 8 | > 9 | > - **Persistent changes** 10 | > 11 | > The change will **take effect as soon as the module is loaded**, whether it is done manually or automatically at boot. If the module is already loaded, you must either reboot, or unload it and load it, which may or may not be possible if it has unremovable dependencies. To do this, create a file, such as `/etc/modprobe.d/no_conntrack_helper.conf`, with the following contents: 12 | > 13 | > ``` 14 | > options nf_conntrack nf_conntrack_helper=0 15 | > ``` 16 | > 17 | > - **Temporary changes** (`modprobe`) 18 | > 19 | > This **requires the module be unloaded before you run the command**. The changes will disappear when the module is unloaded or when the system reboots. You can change specific parameters by passing them as arguments to the `modprobe` utility when loading the module. Load the module as root: 20 | > 21 | > ``` 22 | > modprobe nf_conntrack nf_conntrack_helper=0 23 | > ``` 24 | > 25 | > - **Temporary changes** (`sysfs`) 26 | > 27 | > **Some modules can have their parameters modified even after the module has been loaded**. This can be done by writing to a special file in `sysfs`. I do not know if the specific parameter you want to change can be modified at runtime, but if it is, you would want to run the following command as root: 28 | > 29 | > ``` 30 | > echo 0 > /sys/module/nf_conntrack/parameters/nf_conntrack_helper 31 | > ``` 32 | 33 | 34 | 在 ubuntu 上为 `/proc/sys/net/netfilter/nf_conntrack_helper` ; 35 | 36 | 37 | ---------- 38 | 39 | 40 | 在《[Secure use of iptables and connection tracking helpers](https://home.regit.org/netfilter-en/secure-use-of-helpers/)》中有 41 | 42 | > **Disable helper by default** 43 | > 44 | > - Principle 45 | > 46 | > Once a helper is loaded, it will treat packets for a given port and all IP addresses. As explained before, this is not optimal and is even a security risk. **A better solution is to load the module helper and deactivate their parsing by default**. Each helper we need to use is then set by using a call to the CT target. 47 | > 48 | > - Method 49 | > 50 | > Since **Linux 3.5**, it is possible to desactivate the automatic conntrack helper assignment. This can be **done when loading** the `nf_conntrack` module 51 | > 52 | > ``` 53 | > modprobe nf_conntrack nf_conntrack_helper=0 54 | > ``` 55 | > This can also be **done after the module is loading** by using a `/proc` entry 56 | > 57 | > ``` 58 | > echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper 59 | > ``` 60 | > 61 | > Please note that flows that already got a helper will keep using it even if automatic helper assignment has been disabled. 62 | 63 | 64 | ---------- 65 | 66 | 67 | 在《[How can I disable connection tracking (conntrack) in IPTables?](https://access.redhat.com/solutions/972673)》中有(2014年6月27日,有点过时) 68 | 69 | 环境 70 | 71 | > Red hat Enterprise Linux (RHEL) 72 | 73 | 问题 74 | 75 | > - The system is dropping packets due to [**the nf_conntrack table being filled**](https://access.redhat.com/solutions/8721). 76 | > - It is **not necessary to perform NAT** on this system. 77 | > - How can conntrack be disabled? 78 | 79 | 决议 80 | 81 | > IPTables provides a special target, **NOTRACK**, which is part of the raw table: 82 | > 83 | >> raw: 84 | >> 85 | >> This table is used mainly for configuring exemptions from 86 | >> connection tracking in combination with the NOTRACK target. 87 | >> It registers at the netfilter hooks with higher priority and 88 | >> is thus called before ip_conntrack, or any other IP tables. 89 | >> It provides the following built-in chains: PREROUTING (for 90 | >> packets arriving via any network interface) OUTPUT (for 91 | >> packets generated by local processes) 92 | > 93 | > **This makes it possible to disable connection tracking for connections matching a specific criteria**. Taking as an example a very busy DNS server/s (such as a tier-1 ISP DNS infrastructure), the following procedure will disable connection tracking for all DNS connections made to/from the system: 94 | > 95 | > - Add **NOTRACK** rules in the `raw` table for both TCP and UDP, port 53, incoming and outgoing (since DNS servers can work as recursive and query other DNS servers on behalf of clients). 96 | > 97 | > ``` 98 | > iptables -t raw -A PREROUTING -p tcp --dport 53 -j NOTRACK 99 | > iptables -t raw -A PREROUTING -p udp --dport 53 -j NOTRACK 100 | > iptables -t raw -A PREROUTING -p tcp --sport 53 -j NOTRACK 101 | > iptables -t raw -A PREROUTING -p udp --sport 53 -j NOTRACK 102 | > iptables -t raw -A OUTPUT -p tcp --sport 53 -j NOTRACK 103 | > iptables -t raw -A OUTPUT -p udp --sport 53 -j NOTRACK 104 | > iptables -t raw -A OUTPUT -p tcp --dport 53 -j NOTRACK 105 | > iptables -t raw -A OUTPUT -p udp --dport 53 -j NOTRACK 106 | > ``` 107 | > 108 | > - Save the new IPTables rules, to make them persistent: 109 | > 110 | > ``` 111 | > service iptables save 112 | > ``` 113 | > 114 | > - Restart IPTables to confirm the rules are in place. Note that the raw table must be specified since by default the filter table is shown. 115 | > 116 | > ``` 117 | > service iptables restart 118 | > iptables -L -t raw 119 | > ``` 120 | > 121 | > - Once this has been done, there are two ways to test this. Either monitor the following files: 122 | > 123 | > ``` 124 | > watch -n1 cat /proc/sys/net/ipv4/netfilter/ip_conntrack_count 125 | > watch -n1 cat /proc/net/ip_conntrack 126 | > ``` 127 | > 128 | > - Or monitor the current rules in the raw table, showing the amount of packets matched by each rule: 129 | > 130 | > ``` 131 | > iptables -nvL -t raw 132 | > ``` 133 | > 134 | > - Perform connections again and see if either conntrack does not rise or packets are matched in the NOTRACK rules. 135 | 136 | 根源 137 | 138 | > **It is not possible to disable conntrack globally, as IPTables is tightly coupled with it**. Instead, conntrack must be disabled for specific connections based on normal IPTables rule processing (e.g. address, port, protocol, etc). 139 | 140 | 141 | ---------- 142 | 143 | 144 | 在《[Packet drops when using ip_conntrack or nf_conntrack, logs say 'ip_conntrack: table full, dropping packet.' or 'nf_conntrack: table full, dropping packet'](https://access.redhat.com/solutions/8721)》中有 145 | 146 | > - "ip_conntrack: table full, dropping packet." seen in `/var/log/messages` means that Packet drops on this system for connections using `ip_conntrack` or `nf_conntrack`. 147 | > - To change the default value of `ip_conntrack_max`, modify the default value for hashsize. 148 | > - The default `ip_conntrack_max` will become 8 times for RHEL5. 149 | > - the default `nf_conntrack_max` will be 4 times for RHEL6/7. 150 | > - The `ip_conntrack` module uses a portion of the system memory to track connections called a **connection tracking table**. The size of this table is set when the `ip_conntrack` module is loaded, and is usually determined automatically by a hash of the installed system RAM. 151 | > - To check the number of packets dropped on each CPU by conntrack overflowing, check `/proc/net/stat/nf_conntrack` stat `early_drop`. 152 | 153 | 154 | 155 | 156 | ---------- 157 | 158 | 159 | 在《[nf_conntrack 超限问题](https://github.com/moooofly/MarkSomethingDown/blob/0637c7fb716b8becfa0f4562ca67305ad685c8b8/Linux/nf_conntrack%20%E8%B6%85%E9%99%90%E9%97%AE%E9%A2%98.md)》中有 160 | 161 | 注意: 162 | 163 | - nf_conntrack 跟 NAT 有关,用来跟踪连接条目; 164 | - **任何调用 iptables NAT 功能的操作都会触发 nf_conntrack 模块被 load** ; 165 | 166 | 执行如下命令 167 | 168 | ``` 169 | root@ip-172-31-6-140:~# iptables -L -t nat 170 | Chain PREROUTING (policy ACCEPT) 171 | target prot opt source destination 172 | 173 | Chain INPUT (policy ACCEPT) 174 | target prot opt source destination 175 | 176 | Chain OUTPUT (policy ACCEPT) 177 | target prot opt source destination 178 | 179 | Chain POSTROUTING (policy ACCEPT) 180 | target prot opt source destination 181 | root@ip-172-31-6-140:~# 182 | ``` 183 | 184 | 会导致 nat 功能被启用,之后会自动加载 NAT 相关内核模块 185 | 186 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/nf_conntrack%20%E5%8A%A0%E8%BD%BD%E5%89%8D%E5%90%8E%E5%86%85%E6%A0%B8%E6%A8%A1%E5%9D%97%E5%AF%B9%E6%AF%94.png) 187 | 188 | 同时会出现如下内核参数 189 | 190 | ``` 191 | root@ip-172-31-6-140:~# sysctl -a|grep conn 192 | net.core.somaxconn = 128 193 | net.netfilter.nf_conntrack_acct = 0 194 | net.netfilter.nf_conntrack_buckets = 16384 195 | net.netfilter.nf_conntrack_checksum = 1 196 | net.netfilter.nf_conntrack_count = 12289 197 | net.netfilter.nf_conntrack_events = 1 198 | net.netfilter.nf_conntrack_events_retry_timeout = 15 199 | net.netfilter.nf_conntrack_expect_max = 256 200 | net.netfilter.nf_conntrack_generic_timeout = 600 201 | net.netfilter.nf_conntrack_helper = 1 202 | net.netfilter.nf_conntrack_icmp_timeout = 30 203 | net.netfilter.nf_conntrack_log_invalid = 0 204 | net.netfilter.nf_conntrack_max = 65536 205 | net.netfilter.nf_conntrack_tcp_be_liberal = 0 206 | net.netfilter.nf_conntrack_tcp_loose = 1 207 | net.netfilter.nf_conntrack_tcp_max_retrans = 3 208 | net.netfilter.nf_conntrack_tcp_timeout_close = 10 209 | net.netfilter.nf_conntrack_tcp_timeout_close_wait = 60 210 | net.netfilter.nf_conntrack_tcp_timeout_established = 432000 211 | net.netfilter.nf_conntrack_tcp_timeout_fin_wait = 120 212 | net.netfilter.nf_conntrack_tcp_timeout_last_ack = 30 213 | net.netfilter.nf_conntrack_tcp_timeout_max_retrans = 300 214 | net.netfilter.nf_conntrack_tcp_timeout_syn_recv = 60 215 | net.netfilter.nf_conntrack_tcp_timeout_syn_sent = 120 216 | net.netfilter.nf_conntrack_tcp_timeout_time_wait = 120 217 | net.netfilter.nf_conntrack_tcp_timeout_unacknowledged = 300 218 | net.netfilter.nf_conntrack_timestamp = 0 219 | net.netfilter.nf_conntrack_udp_timeout = 30 220 | net.netfilter.nf_conntrack_udp_timeout_stream = 180 221 | net.nf_conntrack_max = 65536 222 | root@ip-172-31-6-140:~# 223 | ``` 224 | 225 | 移除上述自动加载的内核模块的命令 226 | 227 | ``` 228 | modprobe -r iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat nf_conntrack 229 | ``` 230 | 231 | 移除前后对比图 232 | 233 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/nf_conntrack%20%E5%8A%A0%E8%BD%BD%E5%89%8D%E5%90%8E%E5%86%85%E6%A0%B8%E6%A8%A1%E5%9D%97%E5%AF%B9%E6%AF%94.png) -------------------------------------------------------------------------------- /如何在构建镜像过程中进行 docker 调试.md: -------------------------------------------------------------------------------- 1 | # 如何在构建镜像过程中进行 docker 调试 2 | 3 | - 调试过程中的日志 4 | - 中间镜像 5 | - 什么样的镜像才能够满足被调试的条件 6 | 7 | 8 | -------------------------------------------------------------------------------- /常用 wireshark 表达式.md: -------------------------------------------------------------------------------- 1 | # 常用 wireshark 过滤表达式 2 | 3 | 参考:http://www.lovemytool.com/blog/2010/04/top-10-wireshark-filters-by-chris-greer.html 4 | 5 | ## ip.addr == 10.0.0.1 6 | 7 | Sets a filter for any packet with 10.0.0.1, as either the source or dest 8 | 9 | ## ip.addr==10.0.0.1 && ip.addr==10.0.0.2 10 | 11 | sets a conversation filter between the two defined IP addresses 12 | 13 | ## http or dns 14 | 15 | sets a filter to display all http and dns 16 | 17 | ## tcp.port==4000 18 | 19 | sets a filter for any TCP packet with 4000 as a source or dest port 20 | 21 | ## tcp.flags.reset==1 22 | 23 | displays all TCP resets 24 | 25 | ## http.request 26 | 27 | displays all HTTP GET requests 28 | 29 | ## tcp contains traffic 30 | 31 | displays all TCP packets that contain the word ‘traffic’. Excellent when searching on a specific string or user ID 32 | 33 | ## !(arp or icmp or dns) 34 | 35 | masks out arp, icmp, dns, or whatever other protocols may be background noise. Allowing you to focus on the traffic of interest 36 | 37 | ## udp contains 33:27:58 38 | 39 | sets a filter for the HEX values of 0x33 0x27 0x58 at any offset 40 | 41 | ## tcp.analysis.retransmission 42 | 43 | displays all retransmissions in the trace. Helps when tracking down slow application performance and packet loss 44 | 45 | ## dns.qry.name contains xxx.com.cn 46 | 47 | 过滤 DNS Query string 中包含指定子串的 Query 和 Query Response 48 | 49 | ## dns.qry.type == 1 || dns.qry.type == 28 50 | 51 | 过滤类型为 A (Host Address) 和 AAAA (IPv6 Address) 的 Query 和 Query Response 52 | 53 | 54 | 55 | 56 | -------------------------------------------------------------------------------- /常见 Dockerfile 问题.md: -------------------------------------------------------------------------------- 1 | # 常见 Dockerfile 问题 2 | 3 | > Ref: [9 Common Dockerfile Mistakes](https://runnable.com/blog/9-common-dockerfile-mistakes) 4 | 5 | - [Running apt-get](#running-apt-get) 6 | - [Using ADD instead of COPY](#using-add-instead-of-copy) 7 | - [Adding your entire application directory in one line](#adding-your-entire-application-directory-in-one-line) 8 | - [Using :latest](#using-latest) 9 | - [Using external services during the build](#using-external-services-during-the-build) 10 | - [Adding EXPOSE and ENV at the top of your Dockerfile](#adding-expose-and-env-at-the-top-of-your-dockerfile) 11 | - [Multiple FROM statements](#multiple-from-statements) 12 | - [Multiple services running in the same container](#multiple-services-running-in-the-same-container) 13 | - [Using VOLUME in your build process](#using-volume-in-your-build-process) 14 | 15 | 16 | ## Running apt-get 17 | 18 | > The first is running `apt-get upgrade`. This will update all your packages to their latests versions — which is bad because **it prevents your Dockerfile from creating consistent, immutable builds**. 19 | 20 | 使用 `apt-get upgrade` 会破坏一致性; 21 | 22 | > Another issue is with running `apt-get update` in a different line than running your `apt-get install` command. The reason why this is bad is because a line with only `apt-get update` will get cached by the build and won't actually run every time you need to run `apt-get install`. Instead, make sure you run `apt-get update` in the same line with all the packages to ensure all are updated correctly. 23 | 24 | `apt-get update` 和 `apt-get install` 必须在一个 RUN 指令中使用,否则会存在缓存失效问题; 25 | 26 | a good example 27 | 28 | ``` 29 | # From https://github.com/docker-library/golang 30 | RUN apt-get update && \ 31 | apt-get install -y --no-install-recommends \ 32 | g++ \ 33 | gcc \ 34 | libc6-dev \ 35 | make \ 36 | && rm -rf /var/lib/apt/lists/* 37 | ``` 38 | 39 | ## Using ADD instead of COPY 40 | 41 | > While similar, `ADD` and `COPY` are actually different commands. `COPY` is the simplest of the two, since it just copies a file or a directory from your host to your image. `ADD` does this too, but also has some more magical features like extracting TAR files or fetching files from remote URLs. In order to reduce the complexity of your Dockerfile and prevent some unexpected behavior, it's usually best to always use `COPY` to copy your files. 42 | 43 | 只使用 COPY 就对了; 44 | 45 | ``` 46 | FROM busybox:1.24 47 | 48 | ADD example.tar.gz /add # Will untar the file into the ADD directory 49 | COPY example.tar.gz /copy # Will copy the file directly 50 | ``` 51 | 52 | ## Adding your entire application directory in one line 53 | 54 | > Being explicit about what part of your code should be included in your build, and at what time, might be the most important thing you can do to significantly speed up your builds. 55 | 56 | 加速构建的关键在于合理拆分被包含的代码,以便最大程度利用缓存; 57 | 58 | bad one: 59 | 60 | ``` 61 | # !!! ANTIPATTERN !!! 62 | COPY ./my-app/ /home/app/ 63 | RUN npm install # or RUN pip install or RUN bundle install 64 | # !!! ANTIPATTERN !!! 65 | ``` 66 | 67 | good one: 68 | 69 | ``` 70 | COPY ./my-app/package.json /home/app/package.json # Node/npm packages 71 | WORKDIR /home/app/ 72 | RUN npm install 73 | 74 | # Maybe you have to install python packages too? 75 | COPY ./my-app/requirements.txt /home/app/requirements.txt 76 | RUN pip install -r requirements.txt 77 | COPY ./my-app/ /home/app/ 78 | ``` 79 | 80 | ## Using :latest 81 | 82 | > Many Dockerfiles use the `FROM node:latest` pattern at the top of their Dockerfiles to pull the latest image from a Docker registry. **While simple, using the latest tag for an image means that your build can suddenly break if that image gets updated**. Figuring this out might prove to be very difficult, since the maintainer of the Dockerfile didn’t actually make any changes. To prevent this, just make sure you use a specific tag of an image (example: `node:6.2.1`). This will ensure your Dockerfile remains immutable. 83 | 84 | 不要使用 `:latest` 就对了 85 | 86 | ## Using external services during the build 87 | 88 | > Many people forget the difference between building a Docker image and running a Docker container. When building an image, Docker reads the commands in your Dockerfile and creates an image from it. **Your image should be immutable and reusable until any of your dependencies or your code changes**. This process should be completely independent of any other container. Anything that requires interaction with other containers or other services (like a database) should happen when you run the container. 89 | 90 | 镜像构建以 immutable 和 reusable 为首要考虑因素; 91 | 92 | ## Adding EXPOSE and ENV at the top of your Dockerfile 93 | 94 | > `EXPOSE` and `ENV` are cheap commands to run. If you bust the cache for them, rebuilding them is almost instantaneous. Therefore, **it’s best to declare these commands as late as possible**. You should only ever declare `ENV`s whenever you need them in your build process. If they’re not needed during build time, then they should be at the end of your Dockerfile, along with `EXPOSE`. 95 | 96 | `EXPOSE` 和 `ENV` 的变更会破会 build cache ,因此最佳实践为尽可能晚的声明; 97 | 98 | 一个好的示例 99 | 100 | ``` 101 | ENV GOLANG_VERSION 1.7beta1 102 | ENV GOLANG_DOWNLOAD_URL https://golang.org/dl/go$GOLANG_VERSION.linux-amd64.tar.gz 103 | ENV GOLANG_DOWNLOAD_SHA256 a55e718935e2be1d5b920ed262fd06885d2d7fc4eab7722aa02c205d80532e3b 104 | 105 | RUN curl -fsSL "$GOLANG_DOWNLOAD_URL" -o golang.tar.gz \ 106 | && echo "$GOLANG_DOWNLOAD_SHA256 golang.tar.gz" | sha256sum -c - \ 107 | && tar -C /usr/local -xzf golang.tar.gz \ 108 | && rm golang.tar.gz 109 | 110 | ENV GOPATH /go 111 | ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH 112 | ``` 113 | 114 | ## Multiple FROM statements 115 | 116 | > It might be tempting to try to combine different images together by using multiple `FROM` statements; this won’t work. Instead, Docker will just use the last `FROM` specified and ignore everything before that. 117 | 118 | 在 Dockerfile 中想要通过使用多个 `FROM` 达到将多个不同的镜像合并的目的是行不通的; 119 | 120 | ## Multiple services running in the same container 121 | 122 | > It's a well established best-practice that **every different service which composes your application should run in its own container**. It's tempting to add multiple services to one docker image, but this practice has some downsides. 123 | > 124 | > - First, you'll make it more difficult to horizontally scale your application. 125 | > - Second, the additional dependencies and layers will make your build slower. 126 | > - Finally, it'll make your Dockerfile harder to write, maintain, and debug. 127 | 128 | 最佳实践为每个容器中只运行一个服务; 129 | 130 | 一个容器中运行多个服务的缺点: 131 | 132 | - 难以水平扩展 133 | - 由于存在额外的依赖,将导致 layer 数量的增加,进而导致构建速度变慢 134 | - 难以编写、维护,以及调试 Dockerfile 135 | 136 | > If you want to quickly setup a Django+Nginx application for development, it might make sense to just run them in the same container and have a different Dockerfile in production where they run separately. 137 | 138 | ## Using VOLUME in your build process 139 | 140 | > **Volumes in your image are added when you run your container, not when you build it**. In a similar way to #5, **you should never interact with your declared volume in your build process**. Rather, you should only use it when you run the container. 141 | 142 | 在构建镜像时 VOLUME 是不能被使用的; 143 | 144 | 145 | good one 146 | 147 | ``` 148 | FROM busybox:1.24 149 | RUN echo "hello-world!!!!" > /myfile.txt 150 | 151 | CMD ["cat", "/myfile.txt"] 152 | 153 | $ docker run volume-in-build 154 | hello-world!!!! 155 | ``` 156 | 157 | bad one 158 | 159 | ``` 160 | FROM busybox:1.24 161 | VOLUME /data 162 | RUN echo "hello-world!!!!" > /data/myfile.txt 163 | 164 | CMD ["cat", "/data/myfile.txt"] 165 | 166 | $ docker run volume-in-build 167 | cat: can't open '/data/myfile.txt': No such file or directory 168 | ``` 169 | 170 | > An interesting gotcha for this is that if any of your previous layers has a `VOLUME` declaration (which might be several `FROM`s away) you will still run into the same issue. For that reason, it's a good idea to be aware of what volumes your parent images declare. Use `docker inspect` if you run into problems. 171 | 172 | 还存在一种情况,如果之前的某层 layer 进行了 `VOLUME` 声明,则也会导致上述问题; 173 | 174 | -------------------------------------------------------------------------------- /排查 Sentry 上传 sourcemap 时遇到的 500 错误.md: -------------------------------------------------------------------------------- 1 | # 排查 Sentry 上传 sourcemap 时遇到的 500 错误 2 | 3 | > https://phab.llsapp.com/T54237 4 | 5 | ## 背景信息 6 | 7 | by jianhua.cheng 8 | 9 | > - 跟文件数不知道有没有关系,我之前一个分支 40 个文件就能上传成功,换了构建脚本,现在有 60 个文件,就会出现 **Internal Server 500** 的错误; 10 | > - 这回看到了更具体的错误信息:`StatusCodeError: 500 - "{"detail": "Internal Error", "errorId": "a9f76f2d3b774e02b5fc80cdfd86a259"}"`;还是上传 sourcemap 文件到 ft sentry 服务器上遇到的失败问题; 11 | > - 我主要用库做的这件事情, 不是直接用的 `sentry-cli` ,要贴也就是贴个 `node scripts/build.js`,错误结果就是上面的了,没有更多; 12 | > - 我只是在一个分支里改了个构建配置,同样的构建和上传 sourcemap 代码在另一个分支就是一直成功的; 13 | > - 我的配置再有问题,生成的也只是 JavaScript 文件,`sentry-cli` 的输入也只是多个文件而已,而且错误是 500 ;如果是我参数有问题的话按理也不会是 5xx 的错误码吧; 14 | 15 | 16 | by moooofly 17 | 18 | > - 每次都失败,还是偶尔失败? 19 | > - 请把完整执行命令和结果贴出来,我看下; 20 | > - 能用等价的 sentry-cli 命令确认一下么? 21 | > - 是否有可能是域名变更导致的? 22 | 23 | 24 | ## 问题表现 25 | 26 | - jianhua.cheng 反馈的 500 上传错误 27 | 28 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/sentry%20%E4%B8%8A%E4%BC%A0%E9%97%AE%E9%A2%98%20-%20%E5%BC%80%E5%8F%91%E4%BA%BA%E5%91%98%E5%8F%8D%E9%A6%88%E7%9A%84%20500%20%E4%B8%8A%E4%BC%A0%E9%94%99%E8%AF%AF.png) 29 | 30 | - 自测复现的 500 上传错误 31 | 32 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/sentry%20%E4%B8%8A%E4%BC%A0%E9%97%AE%E9%A2%98%20-%20%E8%87%AA%E6%B5%8B%E5%A4%8D%E7%8E%B0%E7%9A%84%20500%20%E4%B8%8A%E4%BC%A0%E9%94%99%E8%AF%AF.png) 33 | 34 | ## 测试脚本 35 | 36 | ``` 37 | #!/bin/sh 38 | # VERSION=`sentry-cli releases propose-version` 39 | VERSION="b513627c6215aadc86805975e441bf56dfd6a1b8." 40 | 41 | export SENTRY_PROJECT="lms-mobile-staging" 42 | export SENTRY_ORG="lls" 43 | export SENTRY_AUTH_TOKEN="50682a06586c4f8e9133183a54972b5c9812cc44e6f14731b67d21849ef7f882" 44 | export SENTRY_URL="https://prod-ft.llsops.com" 45 | 46 | sentry-cli releases new "$VERSION" 47 | 48 | sentry-cli \ 49 | releases files "$VERSION" \ 50 | upload-sourcemaps --url-prefix "https://cdn.llscdn.com/hybrid/lms-mobile/" "./dist" \ 51 | --validate 52 | 53 | sentry-cli releases finalize "$VERSION" 54 | ``` 55 | 56 | 测试文件 57 | 58 | (略) 59 | 60 | 61 | ## 问题解决 62 | 63 | - sentry_web 输出的触发 500 错误的异常日志 64 | 65 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/sentry%20%E4%B8%8A%E4%BC%A0%E9%97%AE%E9%A2%98%20-%20sentry_web%20%E8%BE%93%E5%87%BA%E7%9A%84%E8%A7%A6%E5%8F%91%20500%20%E9%94%99%E8%AF%AF%E7%9A%84%E5%BC%82%E5%B8%B8%E6%97%A5%E5%BF%97.png) 66 | 67 | - 导致 500 的几个名字过长的文件 68 | 69 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/sentry%20%E4%B8%8A%E4%BC%A0%E9%97%AE%E9%A2%98%20-%20%E5%AF%BC%E8%87%B4%20500%20%E7%9A%84%E8%BF%87%E9%95%BF%E6%96%87%E4%BB%B6%E5%90%8D.png) 70 | 71 | - 去掉超长名字文件后上传成功 72 | 73 | ![](https://raw.githubusercontent.com/moooofly/ImageCache/master/Pictures/sentry%20%E4%B8%8A%E4%BC%A0%E9%97%AE%E9%A2%98%20-%20%E5%8E%BB%E6%8E%89%E8%B6%85%E9%95%BF%E5%90%8D%E5%AD%97%E6%96%87%E4%BB%B6%E5%90%8E%E4%B8%8A%E4%BC%A0%E6%88%90%E5%8A%9F.png) 74 | 75 | ## 结论 76 | 77 | - 需要 frontend 提供文件名长度的明确上限值; 78 | - (我)从代码层面确认如何进行调整; 79 | - 建议文件命名在不影响理解的情况下,应该尽量缩写; 80 | 81 | 82 | 83 | 84 | 85 | 86 | -------------------------------------------------------------------------------- /流利说客户端内置网络监测功能背景信息整理.md: -------------------------------------------------------------------------------- 1 | # 流利说客户端内置网络监测功能背景信息整理 2 | 3 | > hi, 现在 client infra 团队的耀文他们在定位 AWS 外网链路的网络性能问题,麻烦你跟进一下吧;这个对我们的产品体验很重要, 需要我们投入更大的精力还进一步优化; 回头 yaowen 会议写 task 跟进;thx 4 | 5 | ## 链接 6 | 7 | - https://phab.llsapp.com/T70745 8 | 9 | ## 我要干什么 10 | 11 | - 和 aws 就此问题进行沟通:确保信息对问题解决的全面性和有效性,避免扯皮问题的发生; 12 | 13 | ## 背景 14 | 15 | 当前,偶尔会出现**用户无法访问 aws 服务器**,**网络无法访问**的情况(无法访问是否和 dns 解析有关不可知) 16 | 17 | ![](https://phab.llsapp.com/file/data/qbspqlegnjtwzfhh5u6w/PHID-FILE-ggb2rv6wcklym7kbvbgb/image_%282%29.png) 18 | 19 | ## 需求 20 | 21 | 当用户上报上述问题时,(能够同时)提供机制(附加信息)验证是否确实存在网络问题 22 | 23 | 根据 ops 同学(tony)的专业判断,提供如下信息可以更好地帮助定位问题: 24 | 25 | - DNS 是否正常 26 | - 如果 DNS 正常,server 的 IP 是什么 27 | - client ip 28 | - 具体时间点 29 | 30 | ## 解决方案 31 | 32 | 在 `帮助` 页面中,提供网络诊断功能(效果如下图所示),上述需求中要求的信息应该都满足了 33 | 34 | ![](https://phab.llsapp.com/file/data/bhv75odfllavlqya6xwr/PHID-FILE-hm27iuyxpwnihlpqw2vn/WechatIMG518.jpeg) 35 | 36 | 37 | 38 | ---------- 39 | 40 | 41 | ## 问题 42 | 43 | - 客户端都包括哪些设备? 44 | - 客户端发生问题的规律是什么? 45 | - 零星出现(时间点)? 46 | - 一小段时间出现(时间段)? 47 | - 发生问题的时段是否存在规律? 48 | - 地区相关性? 49 | - 运营商相关性? 50 | - 帮助页面中的网络诊断是在发生问题的时间段触发的么?还是发生问题后触发?如何保证发生问题的同时进行测试? 51 | 52 | 53 | 54 | ## 扩展 55 | 56 | bilibili 相关: 57 | 58 | - www.bilibi.com 59 | - interface.bilibi.com 60 | - comment.bilibi.com 61 | - api.bilibi.com 62 | - app.bilibi.com 63 | - passport.bilibi.com 64 | - account.bilibi.com 65 | - bangumi.bilibi.com 66 | 67 | 68 | liulishuo 相关: 69 | 70 | - 公司暴露的测速页面: 71 | - 后端测试页:https://ping-faas-prod.thellsapi.com/ 72 | - 前端测试页:http://ping.fe.thellsapi.com/ 73 | 74 | 阿里昆仑用户诊断工具:https://cdn.dns-detect.alicdn.com/https/doc.html 75 | 76 | > 已被用作用户 Ping 页面 77 | 78 | ![](https://phab.llsapp.com/file/data/akszi2bcqmadbpavylmu/PHID-FILE-2syuzqxwovqyzll672af/600568140916688410.jpg) 79 | 80 | 81 | ## frontend 组的处理步骤 82 | 83 | - [Android] 调研可行性 84 | - [iOS] 调研可行性 85 | - 新增 traceroute 功能 86 | 87 | > 背景: 88 | > 89 | > 根据 ops 同学的反馈,**如果 DNS 解析成功,但是 IP 连接失败的话**,当前的信息无法定位问题,需要 `traceroute` 的信息 90 | > 91 | > 功能: 92 | > 93 | > - 用户检测连通性时,如果存在 DNS 解析成功,ping IP 失败的情况下,自动触发 `traceroute` ; 94 | > - 如果一个域名有多个 A 记录的情况下,只有第一条 A 记录失败的情况才会触发 `traceroute` ; 95 | > - 存在多个 ping IP 失败的情况下,只去前面三条; 96 | > - `traceroute` 信息会被 append 在复制粘贴的日志中 97 | > 98 | > 建议增加: 99 | > 100 | > - 用户可以取消 `traceroute` 的功能; 101 | > - 用户应该可以看到 `traceroute` 的进度(至少是以百分比的形式); 102 | > - 建议加入最多多少跳,否则终止(例如 30)(连续 N 跳没反应终止的话,可能不是一个好的条件,4 跳无返回之后可能会给出 echo reply 的返回); 103 | > - 使用 ICMP echo request/reply 的方式实现效果比 UDP 垃圾数据 + unreachable 的 port 效果要好,即我们实现方案可能会比 mac 上的 `traceroute` 要好; 104 | > 105 | > Optional: 106 | > 107 | > - 给用户显示实时的 `traceroute` 进度; 108 | > - 点击条目可以触发的 `traceroute` ; 109 | > - 完成 `traceroute` 在条目上给出标记,但不影响重新触发; 110 | 111 | 112 | - 检测信息写入 LogX 113 | - 客户端通过 LogX 上报问题:每次检测把有问题的检测结果(DNS 解析错误或者 Ping 不通)上报到 LogX ; 114 | - 用户支付未打开案例分析 115 | - 用户 Ping 页面 116 | - 用户网络监测页面信息 117 | - kibana 上用户相关的请求信息 118 | 119 | 120 | -------------------------------------------------------------------------------- /线上事故处理指导手册.md: -------------------------------------------------------------------------------- 1 | # 线上事故处理手册(2018年3月27日 - 初稿) 2 | 3 | ## 故障发生前 4 | 5 | - 将“线上故障处理任务”进行细分 6 | - 按服务类别(基础服务 or 业务) 7 | - 按耦合程度 8 | - 确定故障排查责任人,以及责任人在排查过程中必须确定的 checkpoint(取决于问题域影响范围) 9 | - 定义故障排查+处理的时间窗口值(取决于实际情况,讨论合理性和必要性) 10 | - 确定并行操作的人员角色(依赖问题域的规模) 11 | - A 负责服务降级 12 | - B 负责服务器扩容 13 | - C 负责核心监控曲线的巡检 14 | - D 负责线上数据的收集+初步分析 15 | - 针对常规排障手段编写《线上事故处理手册》,明确定义实施操作的约束条件(因为一般情况下很难在短时间内定位并解决故障) 16 | - **服务降级**(什么时候,什么情况下进行) 17 | - **服务紧急扩容**(哪些场景) 18 | - **回退版本**(是否直接作为第一优先措施) 19 | - 枚举出之前遇到的各种需要通过“**回滚,重启,扩容**”进行问题解决的场景,并写入《线上事故处理手册》中 20 | - 确定核心业务指标(曲线)的基准线(明确定义什么样的情况算正常,什么样的情况算异常) 21 | - 可能需要对业务类型进行分类 22 | - 可能需要基于业务采用的语言和架构模型进行分类 23 | - 枚举出之前遇到的各种需要通过“回滚,重启,扩容”进行问题解决的场景,并写入《线上事故处理手册》中 24 | - 针对特定问题域制定**降级方案**或**临时方案**,并写入《线上事故处理手册》中(降级方案/临时方案一般是通过削减非必要功能来保证核心功能的正常运行) 25 | 26 | 27 | ## 故障发生时 28 | 29 | - 确定故障的**紧急程度**和**严重程度** 30 | - 在例行检查时主动发现的问题(建立例行检查机制)-- 不紧急+不严重 31 | - 系统监控告警(建立常规系统指标异常知识库) -- 视情况而定 32 | - 系统的 CPU 使用率 33 | - Load average 34 | - Memory 35 | - I/O (网络与磁盘) 36 | - SWAP 使用情况 37 | - 线程数 38 | - File Description 文件描述符 39 | - 等等 40 | - 业务监控告警(核心业务指标上大屏) -- 紧急+严重 41 | - 接口的响应时间 42 | - QPS 43 | - 调用频次 44 | - 接口成功率 45 | - 接口波动率等 46 | - 关联系统故障追溯(上下游系统出现问题时) -- 不紧急+可能严重 47 | - 生产事件上报(来自用户的问题反馈) -- 紧急+可能严重 48 | - 确定故障影响范围 49 | - 是不是真正的线上故障(可能是系统变更或业务发布导致) 50 | - 整个服务不可用?某些服务不可用?特定服务不可用?核心服务或组件不可用? 51 | - 故障系统是否对上下游产生了连带影响? 52 | - 问题类型筛查,如果条件具备,请给出针对问题原因的推断(基于历史经验,主要用于缩小问题范围,出现偏差的问题只能通过不断完善经验库解决) 53 | - 故障系统是否新版本发布? 54 | - 依赖的基础平台与资源是否升级过? 55 | - 依赖的系统是否上过线?升过级? 56 | - 是否业务量暴涨 57 | - 攻击 58 | - 特定场景下的暴涨(秒杀、大促、流量小生) 59 | - 上下游服务调用异常 60 | - 网络是否有波动? 61 | - 运营是否在系统内做过运营变更? 62 | - 运营方是否有促销活动? 63 | - 服务器故障(磁盘满,内存爆) 64 | - 数据库故障 65 | - 应用自身故障 66 | - 按照线上故障紧急处理预案进行操作 67 | - 回滚,重启,扩容 68 | - 降级/临时方案 69 | 70 | 71 | ## 故障发生后 72 | 73 | - 线上脏数据处理 74 | - 是否能够准确描述事故的发生前因后果(最好能基于时间线描述问题) 75 | - 是否能够将问题重现(以便更深入的挖掘问题) 76 | - 是否监控存在缺失 77 | - 是否在排障过程中存在不当操作 78 | - 输出线上事故报告(经验总结和积累) 79 | - 是否要确定责任人(主要用于提醒,非惩罚) 80 | - 是否能将一些流程化的工作自动化(减少人为操作,提高效率,降低错误发生率) 81 | - 哪些环节存在弱点?哪些流程/规范/制度需要优化?这类问题是否在其他系统或者团队中也存在? 82 | - 提出整改措施 83 | - 类似的问题哪些地方还有可能发生? 84 | - 做了哪些改进,事故就不会再发生? 85 | - 做了哪些改进,即使发生故障,也不会产生较大影响? 86 | 87 | 88 | 89 | --------------------------------------------------------------------------------