├── LICENSE
├── Monitoring.md
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2016, Ruairi Carroll
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 | 
 7 | * Redistributions of source code must retain the above copyright notice, this
 8 |   list of conditions and the following disclaimer.
 9 | 
10 | * Redistributions in binary form must reproduce the above copyright notice,
11 |   this list of conditions and the following disclaimer in the documentation
12 |   and/or other materials provided with the distribution.
13 | 
14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
24 | 


--------------------------------------------------------------------------------
/Monitoring.md:
--------------------------------------------------------------------------------
 1 | # Monitoring Tools
 2 | 
 3 | Below is a list of some tools which can be used to monitor your network
 4 | 
 5 | ## SNMP Based
 6 | 
 7 | - [LibreNMS](http://www.librenms.org/)
 8 | - [Observium](http://observium.org/)
 9 | - [Cacti](http://www.cacti.net/)
10 | - [OpenNMS](http://www.opennms.org/)
11 | - [Icinga](https://icinga.com/)
12 | 
13 | ## IPFIX Net/S Flow based 
14 | 
15 | - [SiLK](https://tools.netsa.cert.org/silk/)
16 | - [pmacct](http://www.pmacct.net/)
17 | - [NFDump](http://nfdump.sourceforge.net/)
18 | - [ntop](http://www.ntop.org/)
19 | 
20 | ## SPAN/Mirror/pcap based
21 | 
22 | - [tstat](tstat.polito.it)
23 | 
24 | 
25 | ## Time Series based
26 | 
27 | - [Collectd](https://collectd.org/)
28 | - Graphite
29 | 	- [Graphite-web - Frontend](https://github.com/graphite-project/graphite-web)
30 | 	- [Carbon - Metric processing](https://github.com/graphite-project/carbon)
31 | 	- [Whisper - Time Series DB](https://github.com/graphite-project/whisper)
32 | - [Prometheus](https://prometheus.io/)
33 | 
34 | ## Time Series Prediction/Aberrant Behavior Detection
35 | 
36 | - [Banshee](https://github.com/eleme/banshee)
37 | - [RRDTool](http://cricket.sourceforge.net/aberrant/rrd_hw.htm)
38 | 
39 | ## Dataplane monitoring
40 | - [todd](https://github.com/mierdin/todd)
41 | 
42 | ## Logcollection
43 | - [Graylog](https://www.graylog.org)
44 | - [ELK Stack](https://www.elastic.co/downloads)
45 | 
46 | ## External Monitoring
47 | - [RIPE-Atlas](https://atlas.ripe.net)
48 | - [StatusCake](https://statuscake.com)
49 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Manifesto
  2 | 
  3 | Network Engineers Manifesto 
  4 | 
  5 | ### Key motivating factors:
  6 | 
  7 | - Data driven decisions.
  8 | - Excellence in all things. 
  9 | - Technical depth and no technology religion 
 10 | - Clarity of vision, clarity of execution 
 11 | - Lead the business in decisions related to transportation of packets
 12 | - Dedication to all our customers  
 13 | - "Good enough" is too low a bar
 14 | 
 15 | 
 16 | ### [Monitoring](Monitoring.md) 
 17 | 
 18 | - Monitor, from outside:
 19 |     - Implement end-to-end tests (eg. server to server, enduser connections to DC)
 20 |     - Make use of external monitoring services mentioned in [Monitoring](Monitoring.md)
 21 | - Monitor, at least:
 22 |     - Per switch:
 23 |         - Interface pps,ups,mps,bitrate,drops,errors,buffer depth
 24 |         - CPU, Mem, ICMP messages generated 
 25 |         - STP states 
 26 |     - Per router:
 27 |         - All routing protocol states 
 28 |         - Interface pps,ups,mps,bitrate,drops,errors,buffer depth
 29 |         - CPU, Mem, ICMP messages generated 
 30 |     - Per Firewall:
 31 |         - Interface pps,ups,mps,bitrate,drops,errors,buffer depth
 32 |         - CPU, Mem, ICMP messages generated 
 33 |         - CPS, Throughput
 34 |         - Dropped connections 
 35 |         - ASIC drops 
 36 |     - Per LB
 37 |         - Interface pps,ups,mps,bitrate,drops,errors,buffer depth
 38 |         - CPU, Mem, ICMP messages generated 
 39 |         - CPS, Throughput per VIP 
 40 |         - Dropped connections 
 41 |         - ASIC drops 
 42 |     - Per AP
 43 |         - Interface pps,ups,mps,bitrate,drops,errors,buffer depth
 44 |         - CPU, Mem, ICMP messages generated 
 45 |         - Logged in users, failed login attempts
 46 |     - Per Service
 47 |         - p99, p95 metrics for service latency:
 48 |             - For end to end transaction 
 49 |             - For TCP re-transmissions 
 50 |             - Latency to/drop server from all DCs
 51 | - All monitoring to be a single pane of glass for our users, API driven to allow them to extract their own 
 52 | 
 53 | 
 54 | ### Documentation
 55 | 
 56 | - Everything required to understand the network should be documented
 57 | - Documentation must never be out of date.  Automation can help with this
 58 | - Use documentation to explain why choices have been made
 59 | - Use documentation to explain what other options were rejected
 60 | 
 61 |         
 62 | ### Deployment
 63 | 
 64 | - Static routing to be avoided wherever possible
 65 | - Zero touch deployment for new gear
 66 | - Entirely templated configlets:
 67 |     - Base system configuration, including:  AAA, Logging,  
 68 |     - OSPF
 69 |     - STP 
 70 |     - BGP configlets 
 71 |     - IPSec tunneling 
 72 | - Absolutely no manual configuration pushes to production 
 73 | - Design and build a working lab for prototyping configuration 
 74 | - Goal to provide an API to our end users to deploy their infrastructure as they see fit  
 75 | 
 76 | ### Planning for failure
 77 | - You need redundancy and failovers
 78 |     - Your `[storage|servers|routers|switches|uplinks|etc.]` are going to fail, sometimes in an isolated manner, sometimes in spectacular simultaneous blowouts. Plan for automatic alternatives.
 79 |     - Having 3 independent fail safe systems is just fluff if you don't test failover - periodically.
 80 | 
 81 | ### Remote offices
 82 | 
 83 | - Regular random polling of remote users on office internet, general feeling of office network
 84 | - Managing this data over time to ensure we have total inclusion of our users 
 85 | - Dynamic monitoring and failover of IPSec tunnelling 
 86 | - Monthly SLA reporting of WAN performance based on 100% meshed pinging of remote offices 
 87 | 
 88 | 
 89 | ### Reporting
 90 | 
 91 | - Every single SNMP trap has to be actionable 
 92 | - Every single packet drop in our network has to be actionable
 93 | - Every single TCP re-transmission inside the borders of our administrative control has to be actionable
 94 | - Apply predictive algorithms to our graphing to alert of trends before they become issues.
 95 | 
 96 | 
 97 | ### Personal Development
 98 | 
 99 | - everyone must commit to self-improvement
100 | - Certification track - optional but highly recommended 
101 | - Regular hardware deep dives based on freely available vendor documentations, talks, presentations 
102 | 
103 | 


--------------------------------------------------------------------------------