├── LICENSE ├── README.md └── docs ├── _config.yml ├── _layouts └── default.html ├── _posts ├── 2020-09-30-wsus-port-80.md ├── 2020-10-01-esxi-round-robin-iops.md ├── 2020-10-01-singularity-by-2030.md ├── 2020-10-06-enumerate-esxlunpaths.md ├── 2020-10-07-deep-learning-firefly.md ├── 2020-10-12-vmware-network-disk-throughput.md ├── 2020-10-14-powershell-email-functions.md ├── 2020-10-16-hyperflex-rest-api-python.md ├── 2020-10-18-predictions-self-driving-ev-cars.md ├── 2020-10-19-neuralink-predictions.md ├── 2020-10-22-infrastructure-dashboard.md ├── 2020-10-26-dashboard-update.md ├── 2020-10-29-gibberish-detector.md ├── 2020-11-05-better-gibberish-detection.md ├── 2020-11-15-question-detection.md ├── 2020-11-24-parsing-all-wikipedia.md ├── 2020-12-01-vcenter-partner-status.md ├── 2020-12-02-deep-learning-acceleration.md ├── 2020-12-07-artificial-intuition.md ├── 2021-02-05-rundeck-acl.md ├── 2021-02-07-gamestop-laughing-man.md ├── 2021-02-19-human-scale-dnn-2030.md ├── 2021-03-12-connect-ucs-powershell.md ├── 2021-03-12-install-powershell-modules.md ├── 2021-04-15-vmware-numa.md └── 2021-09-17-ucs-vlan-scripts.md ├── _sass └── jekyll-theme-tactile.scss ├── assets ├── favicon.ico └── numa.png ├── categories.md └── index.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 David Shapiro 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DavidShapiroBlog 2 | -------------------------------------------------------------------------------- /docs/_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-tactile 2 | title: "David Shapiro's Technology Blog" 3 | description: Documenting my technical work as well as ruminations about the current state and future of technology 4 | -------------------------------------------------------------------------------- /docs/_layouts/default.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | {{ page.title }} 6 | 7 | 8 | 9 | 10 | 11 | 14 | 15 | {% seo %} 16 | 17 | 18 | 19 | 20 |

21 |

22 | 23 |

24 |

{{ page.title | default: site.title | default: site.github.repository_name }}

25 |

{{ page.description | default: site.description | default: site.github.project_tagline }}

26 |

27 | Home — 28 | Categories 29 |

30 | 31 |

32 | 33 |

34 | {{ content }} 35 |

36 | 37 | 43 | 44 |

45 |

46 | 47 | {% if site.google_analytics %} 48 | 56 | {% endif %} 57 | 58 | 59 | 60 | -------------------------------------------------------------------------------- /docs/_posts/2020-09-30-wsus-port-80.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: "WSUS Port 80 vs 8530" 4 | date: 2020-09-30 5 | description: Revisiting WSUS after too many years 6 | categories: [Windows, KB] 7 | --- 8 | 9 | ## WSUS clients weren't reporting in 10 | 11 | I recently had to replace a few WSUS servers that were Server 2008 R2. I went with the latest and greatest, obviously, but ran into a snafu when I couldn't figure out why servers and workstations weren't reporting in. The Computers tab in WSUS manager was remaining infuriatingly empty. 12 | 13 | ### RSOP.msc 14 | 15 | First, I needed to figure out what the machines were configured to do so I went onto a client machine and ran `RSOP.msc` - this means *Resultant Set of Policy* and basically just tells you what GPO the Windows box is getting. Drill down to the following: `Computer Configuration -> Administrative Templates -> Windows Components -> Windows Update`. RSOP is your best friend when troubleshooting anything to do with Group Policy! 16 | 17 | Here, you might see the FQDN of your WSUS server or you might see the FQDN plus `:8530`. Oh, it's important to note that when you replace a WSUS server, you want to give it the same FQDN and IP as the original if at all possible. Your client machines might not know the difference, though you might also have to do some jiggery pokery with SSL certs. 18 | 19 | 20 | 21 | ### netstat 22 | 23 | Anyways, you can also run `netstat` from an elevated CMD prompt to see what port your clients are trying to hit. It'll look something like this: 24 | 25 | ``` 26 | PROTOCOL DESTINATION SOURCE STATUS 27 | TCP :80 :50000 ESTABLISHED 28 | TCP :8530 :50000 ESTABLISHED 29 | ``` 30 | 31 | It's important to note that the originating port will be random, somewhere in the 50,000's to 60,000's by default. If it looks like the first. Equally important, if you don't see clients trying to connect, then you may have a firewall issue! 32 | 33 | ### wsusutil 34 | 35 | If you see port 80, then you need to set the "custom website" to `false`. If you see 8530 then you need to set it to `true`. This is deceptively simple. The [official Microsoft documentation for WSUSutil is here.](https://docs.microsoft.com/de-de/security-updates/windowsupdateservices/18127395) 36 | 37 | But for your convenience: 38 | 39 | Set it to 8530: 40 | ``` 41 | %ProgramFiles%\Update Services\Tools\wsusutil usecustomwebsite true 42 | ``` 43 | 44 | Set it to 80: 45 | ``` 46 | %ProgramFiles%\Update Services\Tools\wsusutil usecustomwebsite false 47 | ``` 48 | 49 | This change, for me, has been completely non-destructive and reversible. 50 | 51 | ## Bingo 52 | 53 | As soon as I changed the custom website to the correct setting, clients started phoning home. 54 | -------------------------------------------------------------------------------- /docs/_posts/2020-10-01-esxi-round-robin-iops.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: "Round Robin IOPS settings in ESXi" 4 | date: 2020-10-01 5 | description: Scripting a quick change to default IOPS Round Robin settings in ESXi 6 | categories: [VMware, ESXi, Storage, PowerCLI, PowerShell, KB] 7 | --- 8 | 9 | ## Why do you need to change default IOPS settings? 10 | 11 | Some storage vendors or SAN arrays expect different configurations. One example is the IOPS Round Robin limit. By default, ESXi uses 1000. Basically, what this means is that ESXi will send 1000 IOPS down each path before switching to the next when using Round Robin. For faster storage we often need to set this much lower, such as to 1 or 10. With things like all-flash arrays, the storage controller can handle IOPS a lot faster than before, so you can get much better performance this way. Always follow your vendor's best practices, though. 12 | 13 | ## The script 14 | 15 | For the sake of simplicity, I'm only going to show you the meat and potatoes of my scripts. I'm going to assume you understand the basics, like connecting to vCenter and such. 16 | 17 | ```powershell 18 | foreach ($vmhost in $vmhosts) 19 | { 20 | $cli = $vmhost | Get-EsxCli -V2 21 | # Get your LUNs and filter out any that you don't want to modify 22 | $devices = $cli.storage.nmp.device.list.Invoke() 23 | $devices = $devices | Where-Object {$_.DeviceDisplayName -like ""} 24 | $devices = $devices | Where-Object {$_.pathselectionpolicy -like 'vmw_psp_rr'} 25 | foreach ($device in $devices) 26 | { 27 | # Create an argument's list to invoke against the ESXCLI 28 | $d = @{} 29 | $d['iops'] = 1 30 | $d['device'] = $device.Device 31 | $d['type'] = 'iops' 32 | $result = $cli.storage.nmp.psp.roundrobin.deviceconfig.set.Invoke($d) 33 | Write-Host $vmhost.name $device.device $result 34 | } 35 | } 36 | ``` 37 | 38 | ### esxcli 39 | 40 | I discovered ESXCLI years ago. It's great. It gives you access to the deepest levers and buttons on ESXi but over remote. No SSH or anything necessary. The only wonky part that takes some getting used to is creating the arguments. There's more than one way to skin the cat, so please don't judge my clunky method too harshly. The key takeaway is that you compose the argument in the form of a hashtable and then send it with the "invoke" method. 41 | 42 | ### devices 43 | 44 | Your devices should have some naming convention or other identifying thing. In most cases, the `DeviceDisplayName` will include the vendor name of the LUN. This can be a great way to filter out LUNs that you don't want to monkey with. Generally speaking, though, you can probably just use the `vmw_psp_rr` filter. You can assume that any LUN with Round Robin might be affected. Though you might have multiple arrays on the backend. Do whatever you want here! 45 | -------------------------------------------------------------------------------- /docs/_posts/2020-10-01-singularity-by-2030.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: "Singularity by 2030?" 4 | date: 2020-10-01 5 | description: I think we will see the singularity by 2030... 6 | categories: [Singularity, Deep-Learning, Quantum-Computing, Automation] 7 | --- 8 | 9 | # What is the singularity? 10 | 11 | Before I can make any assertions about an event, I have to define the event beyond just a buzzword label. The [technological singularity](https://en.wikipedia.org/wiki/Technological_singularity) is a hypothetical event whereby technological advancements compound and snowball. 12 | By this definition, you could easily assert that we're already in the singularity. Computers, for instance, accelerate business, research, and technological deployment. This, in return, accelerates the advancement of computers. It's a positive feedback loop. 13 | 14 | So what sets apart *The Singularity* from today? 15 | 16 | My personal definition is: *When my life drastically changes.* 17 | 18 | Sophisticated, I know. But there's some value in evaluating global technological shifts from the subjective perspective. The singularity is fundamentally about disruption. Disruption of economics and commerce, disruption of lifestyles and life trajectories. 19 | 20 | # Key disruptors 21 | 22 | ## Deep learning 23 | 24 | The internet is presently losing its composure over GPT-3. An old college buddy of mine works at OpenAI and, almost two years ago, he told me that they were getting to the point of fundamentally understanding the nature of knowledge and intelligence. I didn't really believe him at the time, but I also didn't understand the groundbreaking Transformer architecture upon which GPT-3 is based. 25 | 26 | I first got my hands on a Transformer in the form of Google's [Universal Sentence Encoder](https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3). This technology alone will disrupt the world once it gets fine-tuned and deployed globally. How? Why? That's a post for another time. Suffice to say, I now agree with my buddy - we are starting to grasp the fundamental nature of intelligence. 27 | 28 | I have been listening to podcasts and YouTubers talk about GPT-3 and even a lot of tech savvy folks can't seem to make heads or tails of it. They make assertions like *oh, this is just some sort of compression*. Is it actually processing and *thinking*? Or is it just storage? 29 | 30 | It's actually *both*. 31 | 32 | I first started tinkering with neural networks way back in 2009 and 2010, before the current wave of deep learning. Back then, I was experimenting in C++, manually creating what are now known RNNs and such. I had no idea what I was doing and was way in over my head, but it's nice to know that I was just a couple years ahead of the curve. During that time, I had some fundamental realizations about neurons (biological and artificial). Neurons (or nodes) are responsible for both memory and processing. 33 | 34 | For conventional computing, it's difficult to grok the idea that a unit does both storage and computing. But that's just how it is. There is no hard drive in your brain, no CPU. You have gray matter (the "thinking" neurons) and white matter (the wiring) and a few regions that tend to specialize. None of those regions specialize in either processing or knowledge, though some do specialize with forming cohesive memories. 35 | 36 | But I digress. The point is that deep learning can and will disrupt pretty much every domain of human expertise. 37 | 38 | ### Medicine 39 | 40 | Diagnosing patients is fundamentally a pattern matching problem. Deep learning is the most sophisticated pattern-matching tool we have, regularly outperforming humans once you have the right training data. I suspect that we will see early disruption in rural and poor regions of the world where need for medical care trumps the status quo. With the launch of Starlink, everywhere in the world will have access to quality, reliable internet. This will create a huge market for cloud-based medical diagnostic platforms. 41 | 42 | A traveling physician with the most basic training in the middle of the Sahara will be able to upload pictures and patient history to such medical portals via Starlink and rapidly get back a list of most likely diagnoses as well as follow-up tests that can nail down the diagnosis. Think about Star Trek medical tricorders. That kinda thing. Wave a bluetooth smartphone attachment over someone and, within moments, get a detailed and sophisticated state of their health. 43 | 44 | Will this technology cost people jobs? I think so. Physicians and specialists are expensive. I suspect that hospitals will find that they can have fewer physicians on staff and go with cheaper labor, such as nurses and physician's assistants who can leverage these cloud-based services to create better, more consistent care for cheaper. 45 | 46 | ### Transportation 47 | 48 | I was getting an Uber back from the car dealership a few weeks back and was quietly contemplating how my driver would be out of a job within 5 years. Uber, Tesla, Lyft, Ford, and pretty much every other major manufacturer are working tirelessly to bring full self-driving vehicles to the market. These vehicles will ultimately be safer and cheaper than anything on the road today, by a long shot. 49 | 50 | Economic forces will mandate that these technologies be adopted, though the entrenched industries will resist it. Still, there are something like 7 million drivers in America alone who will be out of work within a decade. When combined with electric vehicles, we are going to see some huge knock-on effects, such as a collapse in demand for automotive mechanics and parts, since EVs are far more reliable and cheaper to maintain. Though, this lower cost could be offset by unment demand for transportation. Cheaper transport could result in vastly more passenger-miles per year. 51 | 52 | Transportation costs will drop drastically, which will probably cause people to travel more, but will also reduce the cost of many goods, including materials for homes. 53 | 54 | ### Science 55 | 56 | Embedding true knowledge and intelligence into tireless machines is already starting to revolutionize science. Deep learning is already aiding in drug discovery as well as fundamental physics at places like CERN and the LHC. As these tools become more commercialized, we will see every researcher benefitting from deep learning. If nothing else, it will help consolidate the state of the industry, reading every paper in existence and summarizing it, allowing researchers to focus on the gaps and also to keep up to date much quicker. 57 | 58 | As deep learning proliferates, science will accelerate in most areas, creating compounding returns for better computers, better AI, and the snowball will continue. We're already seeing deep learning being used to streamline data acquisition, used to create better models. This can be see in concepts like *AutoML*. 59 | 60 | ## Quantum computing 61 | 62 | IBM, Google, and D-Wave are all currently locked in an arms race for *Quantum Suppremacy* (which sounds cyberpunk AF). I don't think I can overstate how disruptive the proliferation of quantum computing will be. You can already sign up and use [quantum computers as a cloud service](https://quantum-computing.ibm.com/). 63 | 64 | What is the bottom line? 65 | 66 | Quantum computing excels at difficult problems. I won't explain how it works but you can check out [this great video from IBM](https://www.youtube.com/watch?v=zOGNoDO7mcU) for a deeper dive. Quantum computing is hundreds of millions times faster at some problems than conventional computing. I suspect, as qubits increase, we will see quantum computing become billions or trillions of times faster. 67 | 68 | ### Optimization 69 | 70 | Maybe you've heard the "traveling salesman" problem. Maybe you haven't. What you might not have heard, though, is that training AI is all about optimization. OpenAI allegedly spent about $7 million just to train GPT-3 in cloud computing. That is prohibitively expensive, obviously. What if the training process of advanced AI models dropped precipitously? What if the most process-intense parts of optimizing a deep neural network became millions of times more efficient? Billions? Within 10 years, I think that training the equivalent of GPT-3 will be trivial, costing pennies rather than millions of dollars. 71 | 72 | What happens when literally everyone can train something as powerful as GPT-3 in an afternoon? 73 | 74 | It's hard to even wrap my head around this possibility. Anything that you have enough data for, or a good enough simulation environment, can be automated to super-human levels in an afternoon. The economic value here, let alone the disruption to everyday life and work, is incomprehensible. This is why the biggest companies in the world are locked in this arms race. 75 | 76 | ### Simulation 77 | 78 | Right now, the largest networks of computers, such as those for Folding@Home and those used to decode LHC output, are slow and expensive. Ludicrously expensive. These kinds of networks have been used to help develop the COVID-19 vaccines. 79 | 80 | What if the entire backlog of F@H could be done in a few days? What if new vaccine and drug candidates can be tested in massive batches? Thousands of candidates tested against tens of thousands reactions? What if you can then simulate the entire genome of a patient and their drug interactions during an office visit? 81 | 82 | This is not even entirely hypothetical, quantum computing is already being used in material science to help create the next generation of lithium batteries. 83 | 84 | # What life might be like in 2030 85 | 86 | ## Where we live 87 | 88 | Between Starlink, remote work, and job destruction, I suspect the current exodus from cities will explode. I think we will see more and more communities popping up in the cheapest places to live, such as the mostly empty central states. Self-driving electric vehicles are expected to reduce the cost of passenger-miles to 5 cents or less. You'll be able to summon a car to take you from your home to anywhere for just a few dollars. I expect a lot of people will also switch to a more rural, homestead style of living without their conventional jobs. People still need occupations and UBI will create a safety net. 89 | 90 | Beyond that, I suspect economic pressures will force more people to cohabitate. I anticipate a rise in intentional communities and cohousing projects. We may very well find ourselves living in eco-villages before too long. 91 | 92 | ## Where we work 93 | 94 | Honestly, I think that unemployment will skyrocket by 2030. Simply put, many people will be unemployable. This is not due to personal failing or laziness. It will simply be due to the fact that people cannot compete with the kinds of AI and automation I've outlined above. Only those who are capable of gaining domain expertise will be able to find jobs conventional jobs. 95 | 96 | I do suspect the gig economy will continue, though. Homestead life will see more people switching back to gardening, crafts, and other more personal services as a way to make some extra money. Think of things like childcare, pet grooming, and other domestic service. A good friend of mine is a massage therapist - I suspect she will be immune to job losses due to the singularity. 97 | 98 | If I do lose my job to the singularity, I plan to focus on writing. 99 | 100 | ## Health and medicine 101 | 102 | With a mass exodus from cities, the introduction of UBI, and hopefully universal healthcare, we will probably be under far less stress as a society. One can only hope! Beyond that, I suspect that the combination of quantum computing and deep learning will result in massive breakthroughs in regenerative medicine, de-aging medicine, and remediations for chronic conditions. We're already seeing the beginnings of some of this with nanotech based medicine. 103 | 104 | Taken all together, I think we will be far happier and healthier in 10 years, and we will probably be approaching *indefinite lifespan*. Indefinite lifespan is the idea that all common causes of death will be "solved". Infectious disease will be a thing of the past as well as chronic disease. 105 | 106 | Back around 2010 is when I started paying attention to stem cell therapy and regenerative medicine in general. I predicted then that we would see the first major breakthroughs by 2020. Today, there are a handful of therapies going through clinical trials. The market for regenerative medicine is huge. The first to market will make a bloody fortune, so you can bet your biscuit that someone is working on it! 107 | 108 | A series of implants and/or external sensors will be able to evaluate your health on an ongoing basis. Heart attack, stroke, and aneurysms are gone for good. Early detection of and outright prevention of all cancer will be in place. 109 | 110 | ## Entertainment 111 | 112 | GPT-3 can procedurally generate just about any text, this technology will only get better with time. Imagine you fire up your Nook or Kindle and say you want to read a brand new high fantasy story. *I want to read something like Game of Thrones*, you say to your device. It goes and thinks for a moment and generates the first page of a story. You start reading and, as you do, biometric sensors track your response to what you're reading. When you finish each page, the next one is generated on the fly based upon your reaction to the previous page. You read and entire story that is custom-tailored to your exact preference and reading level. 113 | 114 | Take that one step further. 115 | 116 | Let's imagine that the fourth or fifth generation of GPT technology can generate movie and TV scripts in real-time. Then you have some other generative deep learning models that can translate the screenplay directly into audio and video. Yes, I'm saying that you'll get highly customized TV shows and movies generated in near real-time just for you! 117 | 118 | Some commentators on Reddit suspect that Microsoft licensing GPT-3 means that videogames will have procedurally generated dialog before too long. Take that to a few more evolutions and you'll get entire games that are procedurally generated. 119 | -------------------------------------------------------------------------------- /docs/_posts/2020-10-06-enumerate-esxlunpaths.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: "My Enumerate-EsxLunPaths function" 4 | date: 2020-10-06 5 | description: Get-ScsiLun is slow, this way is much faster! 6 | categories: [VMware, ESXi, Storage, PowerCLI, PowerShell, KB] 7 | --- 8 | 9 | # The fastest way to get all LUN paths in vSphere 10 | 11 | The larger your environment gets, the more unwieldy it can become. Ideally, you have good process controls in place so only expert hands are modifying the storage environment, but sometimes mistakes get made even by the most veteran engineers. In worst case scenarios, you have very low level techs with insufficient training and too much pressure to just close tickets without checking their work. If you've never worked in such an environment, pray that you don't! But since I have in the past, I developed the fastest way to quickly check all LUN paths for ESX hosts. 12 | 13 | ## Case 1: Needle in a haystack 14 | 15 | For whatever reason, you're looking for a few paths that are down and you need to correlate it back to an NAA ID or a datastore. For those of you that have tried to do this through the GUI I can already hear you groaning. This is the kind of thing that you pitch over the wall for the storage guys to track down, or else the FNG on your team. But with this function, you can easily iterate through all your hosts and look for any paths that are not in the desired state. 16 | 17 | ## Case 2: Daily health report 18 | 19 | One time I worked from dawn until dusk for a week straight because of a neglected storage environment. It turns out that if ESXi cannot use a path, it will keep trying to bring the path up. I'm paraphrasing poorly but this was a behavior change between ESXi 5 and ESXi 6. My organization at the time had decided to be rather lazy when it came to deprovisioning LUNs so there were a LOT of orphaned paths out there that our hosts were dutifully trying to reconnect to. It turns out that if this goes on long enough, or you get enough orphaned paths, you can start crashing services! We had hosts dropping into disconnected state randomly. 20 | 21 | After that, I figured out that a daily health report, looking for orphaned paths was a way to go. This has become a staple for me. I want to know the state of all paths in my environments at all times. I trust my storage guys but I also like to verify. Plus, the richness of the information provided by this function makes troubleshooting and narrowing it down a breeze. 22 | 23 | ## Case 3: A jumping-off point 24 | 25 | In another case, I had a client environment that was showing periodic storage latency issues. I used this function to rapidly enumerate all LUNs on each host and piped that into `Get-Stat` and created an hourly report to see which LUNs were showing increases in latency and throughput. 26 | 27 | # The Code 28 | 29 | Without further ado 30 | 31 | ```powershell 32 | function Enumerate-EsxLunPaths 33 | { 34 | param($vmhost) 35 | $data = @() 36 | 37 | # This is the meat and potatoes here! 38 | # These 4 lines are lightning fast and gather all the information you need 39 | # The rest of the script is just massaging it into an object to return 40 | $storage_view = Get-View $vmhost.ExtensionData.ConfigManager.StorageSystem 41 | $luns = $storage_view.StorageDeviceInfo.ScsiLun 42 | $multipathdata = $storage_view.StorageDeviceInfo.MultipathInfo.Lun 43 | $mounts = $storage_view.FileSystemVolumeInfo.MountInfo | Where-Object {$_.Volume.Type -like "vmfs"} 44 | 45 | foreach ($mount in $mounts) 46 | { 47 | try 48 | { 49 | $extent = [string]$mount.Volume.Extent.DiskName 50 | $lun = $luns | Where-Object {$_.displayname -like "*$extent*"} 51 | $paths = ($multipathdata | Where-Object {$_.Lun -eq $lun.Key}).Path 52 | 53 | foreach ($path in $paths) 54 | { 55 | 56 | # This is where I create an object to capture all the information 57 | $info = "" | Select-Object vol_name,vol_path,vol_uuid,vol_extent,lun_uuid,lun_device,lun_key,lun_name,lun_vendor,path_name,path_state,path_working 58 | $info.vol_name = $mount.Volume.Name 59 | $info.vol_path = $mount.MountInfo.Path 60 | $info.vol_uuid = $mount.Volume.Uuid 61 | $info.vol_extent = $extent 62 | 63 | $info.lun_uuid = $lun.Uuid 64 | $info.lun_device = $lun.DeviceName 65 | $info.lun_key = $lun.Key 66 | $info.lun_name = $lun.CanonicalName 67 | $info.lun_vendor = $lun.Vendor 68 | 69 | $info.path_name = $path.Name 70 | $info.path_state = $path.State 71 | $info.path_working = $path.IsWorkingPath 72 | 73 | $data += $info 74 | } 75 | } 76 | catch { continue } 77 | } 78 | return $data 79 | } 80 | ``` 81 | -------------------------------------------------------------------------------- /docs/_posts/2020-10-07-deep-learning-firefly.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: "How deep learning could give us more Firefly" 4 | date: 2020-10-07 5 | description: Take my love, take my land. Take me where I cannot stand. 6 | categories: [Singularity, Deep-Learning, Entertainment] 7 | --- 8 | 9 | We all want more Firefly, so let's do a thought experiment as to how deep learning could achieve this! 10 | 11 | # The Screenplay 12 | 13 | Technologies like GPT-3 show that we now have deep learning models that can generalize and reproduce a broad variety of text outputs. For instance, [GPT-3 allegedly went undetected on r/AskReddit](https://www.kmeme.com/2020/10/gpt-3-bot-went-undetected-askreddit-for.html) for a week or so. 14 | GPT-3 is what's known as a "one shot" or "few shot" technology, where it has baked in ability to recognize the type of output you want from a single example. So let's say we fine tune GPT-3 with a bunch of TV show scripts and sci-fi novels and then show it the actual original Firefly screenplay and see what it produces. 15 | "Fine tuning" is something you can do on GPT-2 whereby you [add more data to make it more purpose-built for your particular task](https://openai.com/blog/fine-tuning-gpt-2/). Fine-tuning allows you to specialize a big giant model without starting from scratch every time. 16 | 17 | Stay tuned, I think I will give this a shot on my own sometime soon! If I get around to it, I will post an update here, with the results and the code! 18 | 19 | EDIT: [Oh look, someone already did it!](https://towardsdatascience.com/film-script-generation-with-gpt-2-58601b00d371). There's a lot of attention on [GPT-3 for this task already](https://www.gwern.net/GPT-3) 20 | 21 | ## What a screenplay looks like 22 | 23 | Just for reference, screenplays are highly standardized with very specific syntax and formats so they are universally accessible and interpretable by directors and producers. They include descriptions, actions, and dialog. 24 | 25 | [Here's an example of what one looks like on paper](https://www.raindance.org/scripts/Firefly_1x02_-_Bushwhacked.pdf) 26 | 27 | # The Video 28 | 29 | First we need to break it down into chunks. We don't need to use a deep neural network to make an entire 44 minute episode. Episodes are broken down into scenes, and scenes are broken down into cuts, or "transitions". 30 | Everytime the camera cuts away to a new perspective is a film segment. Some directors and producers make use of rapid cuts, meaning we would only need to generate a few seconds of video at a time, and then stitch it together. 31 | Firefly, which I just rewatched with my partner, seems to be pretty standard. A long cut in that show would be 30 to 60 seconds, but most are shorter, close up of faces and dialog with some action sequences, establishing shots of Serenity and scenery. 32 | [DVD-GAN](https://medium.com/syncedreview/deepmind-dvd-gan-impressive-step-toward-realistic-video-synthesis-12027d942e53) is already on the way to full video synthesis. 33 | 34 | ## Scenery and Settings 35 | 36 | GANs (Generative Adversarial Networks) have become exceptionally good at [generating realistic images](https://www.marktechpost.com/2020/10/06/nvidia-releases-imaginaire-a-universal-pytorch-library-designed-for-various-gan-based-tasks-and-methods/) from basic information. 37 | NVIDIA released a library called Imaginaire that attempts to standardize this technology, making it more and more accessible. I think it's only a matter of time before this increases in sophistication and quality. 38 | Right now, [text-to-image](https://deepai.org/machine-learning-model/text2img) technology leaves a bit to be desired! As GPUs get more powerful and data increases, we will inevitably see better models. 39 | 40 | ## Characters 41 | 42 | We humans (and yes, I'm a human) are finely calibrated to recognize faces. The uncanny valley has been the death of many early technologies, from CGI to video games. Sites like [This Person Does Not Exist](https://thispersondoesnotexist.com/), however, demonstrate that we are well past the uncanny valley of face generation. 43 | Nathan Fillion has already been [deep-faked into live-action footage](https://www.eurogamer.net/articles/2020-05-02-uncharted-4-deepfake-starring-nathan-fillion-is-as-impressive-as-it-is-scary). So why not Firefly? 44 | 45 | # The Audio 46 | 47 | ## Speech 48 | 49 | Text-to-speech is nothing new. The latest and greatest speech synthesis adds inflection, tone, and style tags. The subtle quality of human emotion poured into speech can be summed up as "prosody". [Here's a creepy-realistic example of prosody embedding!](https://ai.googleblog.com/2018/03/expressive-speech-synthesis-with.html) 50 | 51 | ## Sound Effects 52 | 53 | Back in 2016, [MIT published some work](https://www.wired.com/2016/06/mit-artificial-sound-effects/) about natural sound generation for video. Today, Adobe can provide this as a service, and it's eerily high quality. Seriously, check out [this train](https://research.adobe.com/news/ai-can-generate-realistic-sound-for-video-clips/). 54 | 55 | ## Music 56 | 57 | Synthetic music is also nothing new, it's just getting [way better](https://towardsdatascience.com/neuralfunk-combining-deep-learning-with-sound-design-91935759d628). And today, music AI is getting to an entirely new [level of beautiful](https://www.zmescience.com/science/ai-musical-composer/). 58 | 59 | # Implications 60 | 61 | ## Netflix and GAN? 62 | 63 | So what does this mean? I think the only logical conclusion is we're going to see a massive explosion of consumer media. If they aren't already, I suspect that Netflix and Amazon are hard at work creating fully synthetic consumer media. We'll probably see books and short stories first, but it's only a matter of time before that morphs to TV and movies. Imagine this: Netflix creates a library full of tens of thousands of movies and shows, all procedurally generated and rated by the masses. Those that are good percolate up. They are filled with actors that never existed, written and produced by directors who never existed. This level of automated media production is still prohibitively expensive. It took millions of dollars just to train GPT-3, which can only do generalized text tasks. It will take something a bit more powerful and sophisticated to do high quality TV screenplays and 4k 60fps film. 64 | 65 | ## IP Laws 66 | 67 | I haven't the foggiest clue as to how this is going to play out. I suspect that IP (intellectual property) laws will say that AI models trained in-house are propriety and that any data they use is part of the model. Thus, they will need rights or license to train on other TV and movies. You can't just copy the screenplays of every Marvel movie and not expect Disney to sue your pants off! Even if an AI is then just learning from the screenplay, the same that another director might. I could see that going to the Supreme Court. 68 | 69 | ## Personalized TV, Movies, Music, and Books 70 | 71 | As GPU technology advances, which it is rapidly due to demand, it will become cheaper and cheaper to train giant models. My personal desktop computer, with its NVIDIA RTX 2070, is more powerful than ASCI Red, the top supercomputer from 1997. Commercial industry is usually about 10 years behind the cutting edge of massive computing power and private homes are about 20 years behind. It took supercomputer level processing to train GPT-3, so we can expect run-of-the-mill businesses to be able to do that by 2030, and everyone else by 2040. 72 | As the cost of producing these giant models comes down, and the quality of data and output increases, we will soon see hyper-personalized entertainment. Want more Firefly? Just ask Netflix or Amazon. Want more Game of Thrones, except the ending is way different? That could be possible, too! 73 | And I don't mean recommender systems, either. I mean stuff that is generated *just for you*. 74 | -------------------------------------------------------------------------------- /docs/_posts/2020-10-12-vmware-network-disk-throughput.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: default 3 | title: "Monitoring VMware VM and ESXi Throughput" 4 | date: 2020-10-12 5 | description: Network and disk IO can kill your environment 6 | categories: [VMware, ESXi, Storage, PowerCLI, PowerShell, Monitoring, UCS, KB] 7 | --- 8 | 9 | As a VMware admin or engineer, I'm sure you know that the first thing people ask about when VM performance is slow is for more CPU and RAM. 10 | It is true that there are some memory-hungry applications out there and right-sizing for CPU is critical. But these are only two of the four food groups of computing. 11 | Just as critical are network and storage, and in datacenter environments, both are subject to shared networks in the form of TCP/IP, FC, SAN, and FCoE. By extension, they can be subject to latency and congestion problems. 12 | Most people's experience with computers is with local storage, which is almost never going to be a bottleneck. Even more confounding is the fact that a bottleneck on any one of these resources can appear to impact performance elsewhere. 13 | For instance, if storage is too slow, you might start slamming your CPU and memory as SQL queries pile up. Or if you're short on RAM, you might start swapping to disk, slamming storage speed. 14 | 15 | # VM Throughput Report 16 | 17 | Here, I gather usage statistics for a period of 24 hours and find the peak usage. I then pare it down to the top talkers in each category. 18 | This kind of report is helpful to have on hand in case something changes, you can just glance at the report every morning to see if a new VM is spiking. 19 | More often than not, this will help you identify problematic VMs before they percolate up to the awareness of management, leadership, and even other teams. 20 | I cannot tell you how many times I've spotted an errant database or backup server before anyone else noticed it and headed off a high visibility issue. 21 | An ounce of prevention is worth a pound of cure! The ideal crisis is the one that never happens! 22 | 23 | ```powershell 24 | $vms = Get-VM | Where-Object {$_.PowerState -like "*on*"} | sort-object 25 | $data = @() 26 | 27 | 28 | foreach ($vm in $vms) 29 | { 30 | $net = $vm | Get-Stat -Stat net.usage.average -Start (Get-Date).AddDays(-1) -Finish (Get-Date) -MaxSamples 5000 | Sort-Object -Property value -Descending | Select-Object -First 1 31 | $disk = $vm | Get-Stat -Stat disk.usage.average -Start (Get-Date).AddDays(-1) -Finish (Get-Date) -MaxSamples 5000 | Sort-Object -Property value -Descending | Select-Object -First 1 32 | $info = "" | Select-Object vm,max_net_kbps,max_net_time,max_disk_kbps,max_disk_time 33 | $info.vm = $vm.Name 34 | $info.max_net_kbps = $net.Value 35 | $info.max_net_time = $net.Timestamp 36 | $info.max_disk_kbps = $disk.value 37 | $info.max_disk_time = $disk.timestamp 38 | $info | fl 39 | $data += $info 40 | } 41 | 42 | Clear-Host 43 | 44 | $data | Sort-Object -Property max_net_kbps -Descending | Select-Object vm,max_net_kbps,max_net_time -First 50 | ft 45 | 46 | $data | Sort-Object -Property max_disk_kbps -Descending | Select-Object vm,max_disk_kbps,max_disk_time -First 50 | ft 47 | ``` 48 | 49 | At the end you can do whatever you want with the data. I send it out via email once a day. Usually, I'm really lazy and just do `Out-String` and set the output into `

` tags in an HTML email. It ain't pretty but it's fast and valuable! 
 50 | 
 51 | I also recommend removing the additional console output if you do schedule this as an unattended job. 
 52 | 
 53 | # ESX Host Throughput
 54 | 
 55 | This is the same exact information just with a different focus. 
 56 | 
 57 | ```powershell
 58 | $vmhosts = Get-VMHost | Where-Object {$_.ConnectionState -like 'connected'} | Sort-Object
 59 | 
 60 | $data = @()
 61 | 
 62 | foreach ($vmhost in $vmhosts)
 63 |     {
 64 |     $net = $vmhost | Get-Stat -Stat net.usage.average -Start (Get-Date).AddDays(-1) -Finish (Get-Date) -MaxSamples 5000 | Sort-Object -Property value -Descending | Select-Object -First 1
 65 |     $disk = $vmhost | Get-Stat -Stat disk.usage.average -Start (Get-Date).AddDays(-1) -Finish (Get-Date) -MaxSamples 5000 | Sort-Object -Property value -Descending | Select-Object -First 1
 66 |     $info = "" | Select-Object vmhost,max_net_kbps,max_net_time,max_disk_kbps,max_disk_time
 67 |     $info.vmhost = $vmhost.Name
 68 |     $info.max_net_kbps = $net.Value
 69 |     $info.max_net_time = $net.Timestamp
 70 |     $info.max_disk_kbps = $disk.value
 71 |     $info.max_disk_time = $disk.timestamp
 72 |     $info | fl
 73 |     $data += $info
 74 |     }
 75 | 
 76 | Clear-Host
 77 | 
 78 | $data | Sort-Object -Property max_net_kbps -Descending | Select-Object vmhost,max_net_kbps,max_net_time -First 20 | ft 
 79 | 
 80 | $data | Sort-Object -Property max_disk_kbps -Descending | Select-Object vmhost,max_disk_kbps,max_disk_time -First 20 | ft
 81 | ```
 82 | 
 83 | You can add other helpful information such as Cluster if you like. 
 84 | 
 85 | # Datastore Latency
 86 | 
 87 | Storage latency is one of those things that is horribly counter-intuitive to pretty much everyone except virtualization and storage folks, and some cross-pollinated network folks. 
 88 | App folks, developers, and DBAs tend to not grasp this problem unless their domain deals specifically with storage technologies. This is nothing against them - we all have domains of expertise for a reason.
 89 | Datastore latency is one of those more arcane metrics that is harder to get but incredibly critical when you need it. Hence this script!
 90 | 
 91 | IMPORTANT: This script relies upon another function I have documented here: [Enumerate-EsxLunPaths](https://daveshap.github.io/DavidShapiroBlog/2020/10/06/enumerate-esxlunpaths.html)
 92 | 
 93 | ```powershell
 94 | $stat_list = "disk.deviceReadLatency.average","disk.deviceWriteLatency.average"
 95 | $host_data = @()
 96 | 
 97 | foreach ($cluster in Get-Cluster)
 98 |     {
 99 |     $vmhosts = $cluster | Get-VMHost | Where-Object {$_.connectionstate -like "connected"}
100 |     foreach ($vmhost in $vmhosts)
101 |         {
102 |         $stats = $vmhost | Get-Stat -Realtime -Stat $stat_list
103 |         $stats = $stats | Where-Object {$_.Value -gt 200}  # change this threshold to squelch noise
104 |         if ($stats -eq $null) { continue }
105 |         $lun_data = Enumerate-EsxLunPaths $vmhost
106 |         foreach ($stat in $stats)
107 |             {
108 |             $lun = $lun_data | Where-Object {$_.lun_name -like $stat.Instance} | Select-Object -First 1
109 |             $info = "" | Select-Object host,cluster,max_latency_ms_15_min,datastore,device_id
110 |             $info.host = $vmhost.Name
111 |             $info.cluster = $cluster.name
112 |             $info.max_latency_ms_15_min = $stat.value
113 |             $info.datastore = $lun.vol_name
114 |             $info.device_id = $stat.instance
115 |             $info | fl
116 |             $host_data += $info
117 |             }
118 |         }
119 |     }
120 | ```
121 | 
122 | The biggest problem with this set of stats is that they expire very quickly so you may need to set this to run repeatedly, regularly checking for problematic LUN paths. 
123 | If you've ever had a major issue with storage latency, I'm sure you're already salivating. This script relies upon my `Enumerate-EsxLunPaths` function, which is lightning fast. 
124 | You can run this script on demand or as a scheduled task. Generally, I have used it as an on-demand tool to help troubleshoot big issues while on calls real-time. 
125 | You're most likely to need this particular gem when backup jobs are taking too long or Oracle queries are bogging down. 
126 | 
127 | # BONUS: UCS Statistics one-liner!
128 | 
129 | Okay, so you know how VMware basically nests the OSI model inside the OSI model? Cisco UCS takes that to the *nth* degree by abstracting all storage and networking over the IOM uplinks. 
130 | I once discovered just how crazy this is when a network engineer discovered one particular IOM port was saturated during backup jobs. I traced it down and discovered that `Chassis/FEX Discovery Policy` was not set to `Port Channel`, which meant that every other server was pinned to a given IOM uplink on one side. Yikes!
131 | It had taken a few weeks of complaints getting louder and louder until we discovered that **storage**-intensive backup jobs were trashing **network throughput** because of this shared nature of UCS. 
132 | 
133 | Without further ado, the magical command that can enumerate more UCS statistics than you ever wanted!
134 | 
135 | ```powershell
136 | Get-UcsStatistics -Current | Where-Object {$_.Rn -like "tx-stats"}
137 | ```
138 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-10-14-powershell-email-functions.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: default
 3 | title: "PowerShell Email Functions"
 4 | date: 2020-10-14
 5 | description: Since I'm a monitoring addict, I use email reports a lot...
 6 | categories: [PowerShell, Email, Monitoring, KB]
 7 | ---
 8 | 
 9 | These are some functions I keep in a PowerShell module on my Rundeck server for easy reuse. I can import the module at the beginning of jobs like so: 
10 | 
11 | ```powershell
12 | Import-Module -Force -DisableNameChecking "C:\\EmailFunctions.psm1"
13 | ```
14 | 
15 | # Send HTML Email
16 | 
17 | For the most part, you will want some formatting, which means you need HTML support. In my experience, most organizations have internal SMTP relays that will relay to local (domain) addresses without authentication. This is pretty typical as many servers, applications, and hardware support email notifications but many do not support SMTP authentication. 
18 | 
19 | ```powershell
20 | function Send-EmailHtml
21 |     {
22 |     param($to, $from, $subject, $html, $smtp)
23 |     $message = New-Object System.Net.Mail.MailMessage $from, $to
24 |     $message.Subject = $subject
25 |     $message.IsBodyHTML = $true
26 |     $message.body = $html
27 |     $smtp = New-Object Net.Mail.SmtpClient($smtp)  # FQDN of your SMTP server or relay
28 |     $smtp.Send($message)
29 |     }
30 | ```
31 | 
32 | # Make a pretty HTML table
33 | 
34 | PowerShell already has a default function for converting data objects to HTML tables but it's ugly. 
35 | 
36 | ```powershell
37 | function Make-HtmlTable
38 |     {
39 |     param($data)
40 |     $t_header = @"
41 | 
54 | "@
55 |     $table = $data | ConvertTo-Html -Head $t_header
56 |     return $table
57 |     }
58 | ```
59 | 
60 | # Send email with attachment
61 | 
62 | I have, on occasion, been asked to schedule reports for other people. Sometimes they don't want it in a pretty HTML table, they want it as an Excel doc or something. For that, I recommend [PowerShell Galleries ImportExcel](https://www.powershellgallery.com/packages/ImportExcel/7.1.1). 
63 | 
64 | ```powershell
65 | function Send-EmailAttachment
66 |     { 
67 |     param($to, $from, $subject, $body, $attachment, $smtp)
68 |     # attachment must be in the form of full file path to attachment
69 |     $message = New-Object System.Net.Mail.MailMessage $from, $to
70 |     $message.Subject = $subject
71 |     $message.IsBodyHTML = $true
72 |     $message.Body = $body
73 |     $file = new-object Net.Mail.Attachment($attachment) 
74 |     $message.Attachments.Add($file) 
75 |     $smtp = New-Object Net.Mail.SmtpClient($smtp)
76 |     $smtp.Send($message)
77 |     }
78 | ```
79 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-10-16-hyperflex-rest-api-python.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Monitoring Cisco HyperFlex REST API with Python"
  4 | date: 2020-10-16
  5 | description: I'm actually super salty about this
  6 | categories: [UCS, HyperFlex, Python, Monitoring, KB]
  7 | ---
  8 | 
  9 | # The REST API Explorer
 10 | 
 11 | I judge companies and products harshly by their documentation. API documentation is especially stringent because, well, an API is meant to be standardized. There's no excuse for sloppy REST API documentation.
 12 | 
 13 | Cisco HyperFlex documentation... [leaves a lot to be desired](https://developer.cisco.com/docs/ucs-dev-center-hyperflex/#!connecting-to-the-hyperflex-rest-api-explorer). Seriously, just compare that to [Twitter's API documentation](https://developer.twitter.com/en/docs/twitter-api). It's pitiful. 
 14 | 
 15 | To make matters worse, the HyperFlex REST API apparently changes drastically from release to release so it's increasingly critical that you get familiar with the REST API Explorer. There is something to be said for having a robust API explorer, so they get some points back for that one. Still, you can see that I have a lot of debug output in these scripts, and I decided to keep them to show you just much effort it was to figure this out!
 16 | 
 17 | # Authentication
 18 | 
 19 | This is what I got to work after cobbling together some snippets from around the internet and gaining access to my own instance of HyperFlex. A few things to note: The `client_id` and `client_secret` are apparently mandatory but not really documented anywhere that I recall.
 20 | 
 21 | ```python
 22 | import requests
 23 | import json
 24 | 
 25 | def auth_to_hx(fqdn, username, password):
 26 |     url = 'https://%s/aaa/v1/auth?grant_type=password' % fqdn
 27 |     headers={'content-type':'application/json'}
 28 |     payload = {'username': username,
 29 |         'password': password,
 30 |         'client_id': 'HxGuiClient',
 31 |         'client_secret': 'Sunnyvale',  # this is the default, you can change it 
 32 |         'redirect_uri': 'http://%s' % fqdn}
 33 |     try:
 34 |         response = requests.post(url,headers=headers,data=json.dumps(payload),verify=False,timeout=40)
 35 |         if response.status_code == 201:
 36 |             #print('Login succeeded to', fqdn)
 37 |             return response.json()  # this is your authentication token
 38 |         else:
 39 |             #print('LOGIN FAILED', fqdn.upper())
 40 |             #print(response.status_code)
 41 |             #print(response.text)
 42 |             return None
 43 |     except Exception as oops:
 44 |         #print('LOGIN FAILED', fqdn.upper())
 45 |         #print(oops)
 46 |         return None
 47 | ```
 48 | 
 49 | # Enumerate Clusters
 50 | 
 51 | HyperFlex is organized in a few ways. There are the Clusters and the Platform. We will get to the Platform in a moment. The Clusters are the meat and potatoes, though. Pass the `token` you get from authenticating to this function.
 52 | 
 53 | Note: The `timeout` option became necessary because sometimes HyperFlex doesn't respond as fast as you'd like. Sometimes it's lightning fast. I don't really know why there's variance. 
 54 | 
 55 | 
 56 | ```python
 57 | def get_hx_clusters(fqdn, token):
 58 |     url = 'https://%s/rest/clusters' % fqdn
 59 |     #print(url)
 60 |     headers={'content-type':'application/json','Authorization':token['token_type'] + token['access_token']}
 61 |     response = requests.get(url,headers=headers,verify=False,timeout=40)
 62 |     if response.status_code == 200:
 63 |         return response.json()
 64 |     else:
 65 |         #print(response.status_code, response.text)
 66 |         return None
 67 | ```
 68 | 
 69 | Another note: The `cluster` object returns has child elements `/entityRef/id`. This is the `Cluster UUID` or `CUUID` that you will need to reference the cluster by later. Again - this was not documented anywhere! To make matters even worse, the CUUID needs to be URL encoded so you have to manually convert it back like so:
 70 | 
 71 | ```python
 72 | cuuid = cluster['entityRef']['id'].replace(':','%3A')
 73 | ```
 74 | 
 75 | Starting to see why I'm grumpy about the HyperFlex REST API?
 76 | 
 77 | 
 78 | # Get Platform Alarms
 79 | 
 80 | The Platform is the management/controller portion. As best I can figure, this is roughly analogous to vCenter in vSphere. It's similar to the methodology used in other hyperconverged systems, such as HPE Simplivity, where you've got a VM embedded that runs the software and manages the backplane. In UCS-world, the closest thing is likely UCSM that runs inside the Fabric Interconnects. 
 81 | 
 82 | The bottom line is that you will want to monitor the HyperFlex platform as well as the clusters/hosts. Most of the time there's going to be nothing reported, but that's okay. Still, I'd much prefer the old fashioned `Get-UcsFault` style. One request, all alarms. Period, end of story. Even VMware doesn't have anything that good. Alas, it seems like we were lucky only the once. Maybe I'll combine all these functions into a single `Get-HxFault`?
 83 | 
 84 | The platform layer will return alarms dealing with things like the manager configuration, backups, and the like. 
 85 | 
 86 | ```python
 87 | def get_hx_alarms(fqdn, token):
 88 |     url = 'https://%s/rest/virtplatform/alarms' % fqdn
 89 |     #print(url)
 90 |     headers={'content-type':'application/json','Authorization':token['token_type'] + token['access_token']}
 91 |     response = requests.get(url,headers=headers,verify=False,timeout=40)
 92 |     if response.status_code == 200:
 93 |         return response.json()
 94 |     else:
 95 |         #print(response.status_code, response.text)
 96 |         return None
 97 | ```
 98 | 
 99 | I actually don't know what these might look like because I didn't record the one I had, I just fixed it. If I recall correctly it was something like "email alerts not configured" or "phone home support unreachable". Those kinds of things. 
100 | 
101 | # Get Cluster Alarms
102 | 
103 | Cluster alarms are separate from platform alarms, which are yet still separate from cluster health. I really wish it weren't so, but it is what it is. Cisco did a great thing by giving you all UCS faults in one endpoing via `Get-UcsFault` and I will forever be salty that no other platforms seem to work this way. I just want to monitor everything and fix things before they blow up. Why is that so hard? 
104 | 
105 | Okay, I'll quit whining. Here's the cluster alarms function. This will return things like host and VM level errors. 
106 | 
107 | ```python
108 | def get_hx_cluster_alarms(fqdn, token, cuuid):
109 |     url = 'https://%s/coreapi/v1/clusters/%s/alarms' % (fqdn, cuuid)
110 |     #print(url)
111 |     headers={'content-type':'application/json','Authorization':token['token_type'] + token['access_token']}
112 |     response = requests.get(url,headers=headers,verify=False,timeout=40)
113 |     if response.status_code == 200:
114 |         return response.json()
115 |     else:
116 |         print(response.status_code, response.text)
117 |         return None
118 | ```        
119 | 
120 | This is what a cluster alarm might look like:
121 | 
122 | ```json
123 | [ { "acknowledged": False,
124 |     "acknowledgedTime": 0,
125 |     "acknowledgedTimeAsUTC": "",
126 |     "description": "Default alarm to monitor virtual machine memory usage",
127 |     "entityName": "",
128 |     "entityType": "VIRTUALMACHINE",
129 |     "entityUuId": "vm-1111",
130 |     "message": "Default alarm to monitor virtual machine memory usage",
131 |     "name": "alarm-6.vm-1111",
132 |     "status": "CRITICAL",
133 |     "triggeredTime": 1002709437316,
134 |     "triggeredTimeAsUTC": "2010-07-16T00:50:37Z",
135 |     "uuid": "alarm-6!!Alarm!!alarm-6!!vm-3517!!VirtualMachine!!vm-3517"}]
136 | ```    
137 | 
138 | 
139 | # Get Cluster Health
140 | 
141 | Cluster health is fun. This primarily focuses on the storage replication status. 
142 | 
143 | ```python
144 | def get_hx_cluster_health(fqdn, token, cuuid):
145 |     url = 'https://%s/coreapi/v1/clusters/%s/health' % (fqdn, cuuid)
146 |     print(url)
147 |     headers={'content-type':'application/json','Authorization':token['token_type'] + token['access_token']}
148 |     response = requests.get(url,headers=headers,verify=False,timeout=40)
149 |     if response.status_code == 200:
150 |         return response.json()
151 |     else:
152 |         print(response.status_code, response.text)
153 |         return None
154 | ```
155 | 
156 | This is what it might look like:
157 | 
158 | ```json
159 | { "dataReplicationCompliance": "COMPLIANT",
160 |   "resiliencyDetails": { "dataReplicationFactor": "TWO_COPIES",
161 |                          "hddFailuresTolerable": 1,
162 |                          "messages": ["Storage cluster is healthy. "],
163 |                          "nodeFailuresTolerable": 1,
164 |                          "policyCompliance": "COMPLIANT",
165 |                          "resiliencyState": "HEALTHY",
166 |                          "ssdFailuresTolerable": 1},
167 |   "state": "ONLINE",
168 |   "uuid": "",
169 |   "zkHealth": "ONLINE",
170 |   "zoneResiliencyList": []}
171 | ```
172 | 
173 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-10-18-predictions-self-driving-ev-cars.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Some Predictions On Self-Driving EV Cars"
  4 | date: 2020-10-18
  5 | description: TLDR - lots of cost savings and job changes plus a bonus homestead house!
  6 | categories: [Deep-Learning, Electric-Vehicles, Self-Driving]
  7 | ---
  8 | 
  9 | # Autonomous is Coming
 10 | 
 11 | Fully automated cars are coming. It's not just Tesla. There are a number of other startups and small companies getting in the game. So are the giants.
 12 | [General Motors](https://www.theverge.com/2020/10/15/21517833/cruise-driverless-cars-test-permit-california-dmv) has entered the field, as have the likes of Ford, Nissan, and Honda. The writing is on the wall: Fully autonomous cars are coming.
 13 | The big buzz right now is on *robo-taxis*. These are autonomous fleets of cars that you summon via app just like Uber and Lift, except there's no human driver. Automobile-as-a-Service.
 14 | 
 15 | The investment is there. We are now inexorably heading for a driverless future.  
 16 | 
 17 | # The Good
 18 | 
 19 | ## Latent Demand
 20 | 
 21 | Life is hard and getting harder for many folks. Reliable transportation is listed as a job requirement on many jobs and is simply outside of the reach of some people. Owning a car is expensive and complicated. Insurance, maintenance, inspections, taxes. It adds up.
 22 | Even those of us with cars might sometimes question the reliability of our cars and avoid long hauls for fear of getting stranded somewhere. 
 23 | 
 24 | All that is about to change. 
 25 | 
 26 | Individual transit autonomous EVs are expected to get down to 5 cents per passenger-mile. That means a thirty-mile commute (sixty for round trip) will cost you a whopping $3 per day. Let's say that rate is absurdly low and off the mark by a factor of 3. 
 27 | That's still only $9 per day to get to and from work with no further risk or expenses. Even though these cars will be getting much more use, reliability is not a concern. Why? The car itself can request a replacement automatically if it breaks down. 
 28 | Worst case scenario: You summon another replacement with your phone. 
 29 | 
 30 | This will result in far more people gaining access to work that may otherwise be presently unavailable. Mass transit models of autonomous EVs are expected to get as low as 1 cent per passenger-mile. Suddenly transport is no longer a barrier for most people. I suspect th is will have many knock-on effects on the economy. 
 31 | Imagine dozing off every commute. Or reading, or continuing to work. Suddenly, a longer commute doesn't sound like a bad idea. Now you can buy a rural dream house and still have the urban job. Moving across the country is suddenly more feasible. Just summon a bigger truck! 
 32 | 
 33 | Stress will decrease for millions of people, access to goods, services, and jobs will increase. This is a huge win for everyone. 
 34 | 
 35 | ## Climate Change
 36 | 
 37 | It goes without saying that getting off fossil fuels is a good thing. There are a few caveats, namely that we have to deploy enough renewables and batteries to go completely carbon neutral. That also means reducing the carbon intensity of producting renewable hardware and batteries. 
 38 | I consider this a problem that will inevitably be solved for economic reasons. Rare and conflict minerals are intrinsically more expensive. It just makes economic sense to find abundant alternatives. 
 39 | 
 40 | A more immediate impact will be increased cardiopulmonary health, particular in dense urban regions. This will decrease health costs and increase quality of life for millions of people. 
 41 | 
 42 | ## More Automotive Jobs
 43 | 
 44 | American motorists presently drive about 13,000 miles per year on average. I think that could easily double due to cheaper and safer travel, as well as that aforementioned latent demand. More miles means more wear and tear. You know what else that means?
 45 | 
 46 | More mechanics. 
 47 | 
 48 | A lot more. Sure, EVs are a bit cheaper to maintain, but EVs are going to rack up miles far faster than ICE vehicles. The only thing fundamentally different is the power train. Everything else is still just an ordinary car. 
 49 | Brakes and tires wear out. Electric windows and ACs fail. Windows and windshields break. Calipers and bearings all need replacing. While a lot of things will change with EVs, there's a lot that won't when you look at the nuts and bolts.
 50 | 
 51 | The aforementioned latent demand, I think, will see the total passenger-miles per year skyrocket. The same thing happened when electricity became cheaper - people just used a lot more of it. I would argue that electricity still has a lot of unmet demand. 
 52 | I think that the EV repair shops are going to be hopping busy. This will bode will for the [presently-declining automotive tech industry](https://www.cnbc.com/2017/05/22/goldman-sachs-analysis-of-autonomous-vehicle-job-loss.html). 
 53 | By extension, I think this could be a boon to the autoparts industry as well.
 54 | 
 55 | A more speculative new job that could be created would be remote drivers. These are folks who work in a remote call-center of sorts and can remotely control vehicles that are in distress.
 56 | 
 57 | ## Childcare and School
 58 | 
 59 | The NYC subway MTA says that children as young as 12 can ride by themselves. I suspect that EVs, with their plethora of cameras, could allow for children much younger to ride without supervision. This could be a game-changer for childcare and education.
 60 | Instead of waiting for the schoolbus in the morning, a vehicle is summoned for the kids, taking them to the school that is best suited to their needs, not where they happen to be geographically constrained. This yields the possibility of building more schools in cheaper areas. 
 61 | 
 62 | An extension of this includes access to after-school programs. I think this will be particularly important for children in high-risk neighborhoods. Suddenly, it becomes a lot easier for a kid to stay somewhere safer and more accomodating for studying after school, rather than going straight home to the rougher neighborhoods.
 63 | This extra mobility could very well be the difference in getting more children out of the cycle of poverty. 
 64 | 
 65 | ## Savings for Everyone
 66 | 
 67 | People who own cars spend anywhere from 10% to over 30% of their income on their cars. Once car ownership becomes optional, that's a LOT of money that folks can spend elsewhere. That reallocation of funds will likely have dramatic effects on other segments of the economy. 
 68 | 
 69 | ## Safety for Everyone
 70 | 
 71 | It's true that autonomous cars make mistakes. It's also true that they make far fewer mistakes than humans already, and that will only get better as the technology improves. Presently, about 38,000 people die each year from traffic accidents and another 4 million are seriously injured.
 72 | That's a lot of avoidable death and injuries. Traffic fatalities and injuries are incredibly traumatic. They don't just create permanent loss of life and limb, but permanent emotional harm to survivors and family. I suspect traffic deaths will fall off exponentially over the coming years. 
 73 | 
 74 | # The Bad
 75 | 
 76 | ## Millions of Jobs Gone
 77 | 
 78 | There are currently about 4 million driving jobs in America. Machines tend to be more reliable and safer than humans once the kinks are worked out. They can also work tirelessly, hence why industrial robots have been assembling cars for several decades now. This reliability and safety translates to one thing: better bottom lines.
 79 | Before long, human drivers are going to be antiquated - a privilege for concierge-level services only. Pizza delivery? Gone. Long-haul truckers? Gone. Amazon Prime drivers? Gone. FedEx, UPS, and USPS? All gone. 
 80 | It will become economically infeasible to hire humans for these jobs. 
 81 | 
 82 | Petroleum figures into a lot more than just automobiles. Figures vary wildly depending on who you ask, but the petroleum industry supports somewhere between 2 million and 10 million jobs. Any significant disruption in that domain can have huge knock-on effects. 
 83 | It's difficult to imagine a simple antidote to this problem. With the rise of automation, I suspect many of these folks will permanently transition out of the workforce. These jobs include oil and gas workers, as well as gas station employees. 
 84 | 
 85 | With so many people permanently unemployed, I don't see any solution other than redistribution of wealth via Universal Basic Income. 
 86 | 
 87 | ## Rare and Conflict Minerals
 88 | 
 89 | Batteries, cameras, motors, and deep learning hardware all require some exotic elements to manufacture. Presently, that presents a huge humanitarian and moral dilemma. We don't want to fund crimes against humanity, slave labor, human trafficking, and child labor. 
 90 | Unfortunately, we have few other options at present. Some of the rarest minerals on the planet are controlled by some of the most despotic people. As the demand of autonomous EVs rises, those despots will become even worse to maximize their own profits. 
 91 | 
 92 | The only solution is to develop alternative materials through things like nanotechnology and other advanced materials science. This takes time and, more importantly, a lot of investment. 
 93 | 
 94 | ## Privacy Goes Out the Window
 95 | 
 96 | Busses and other public transit already have cameras. EVs and autonomous cars are going to have even more. I am willing to bet that the EULA of every autonomous car service includes signing over your data. This probably includes biometrics about your body specifically as well as your destinations and pickups. 
 97 | If push came to shove, I bet this could be used to harm individual liberty. 
 98 | 
 99 | One way to mitigate this would be something akin to GDPR. I would expect to see legislation coming before too long that restricts what kinds of data car companies can collect, or require some kind of compensation or ability to pay extra to opt-out. 
100 | 
101 | # In Conclusion
102 | 
103 | The cost savings for people will all but guarantee the transition from manual ICE vehicles to autonomous EVs. The increase in safety paired with decrease in per-mile cost makes this a no-brainer. 
104 | 
105 | ## Reallocation
106 | 
107 | The cost-savings to individuals will get allocated elsewhere. Where exactly? I can't say. Perhaps home ownership? I suspect that cheaper homes in less desirable areas will become more popular with the rise of autonomous EVs. 
108 | Perhaps we will see some folks setup a trucker-to-homebuilder training pipeline? Perhaps we will see a huge rise in demand for electricians, plumbers, and general purpose handymen to build and maintain these houses. 
109 | 
110 | Academics predict that jobs requiring improvisation, manual dexterity, and creative problem solving will be the last to get automated away. This supports the idea that plumbers, electricians, and repairmen jobs will be robust against automation. 
111 | 
112 | Other types of jobs that will remain robust against automation include teaching, childcare, therapy, massage, nursing, and other human-oriented caregiving services. I think it would be wonderful to see classrooms with fewer than 10 students per teacher. 
113 | 
114 | ## Relocation
115 | 
116 | With reliable transportation a solved problem, and more money in people's pockets, we could see a surge to rural life. This may have additional knock-on effects in terms of physical and mental health. Allergies and auto-immune disease are reduced by increased contact with nature. More time in nature means better mental health. Smaller communities also promote more regular human connection. 
117 | 
118 | ## Health And Safety
119 | 
120 | The one-two punch of increased physical safety and decreased environmental hazards will do wonders for the loss of human life. I expect there will come a time that people aren't allowed to drive manually and that ICE vehicles are banned outright. This won't be seen directly, but rather will be felt as an absence - like how we don't have to worry about polio today. 
121 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-10-19-neuralink-predictions.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: default
 3 | title: "Neuralink Predictions"
 4 | date: 2020-10-19
 5 | description: We're a long ways off from Ghost in the Shell
 6 | categories: [Neuralink, Singularity]
 7 | ---
 8 | 
 9 | # I've seen this movie before...
10 | 
11 | Neuralink appears to be the first practical step towards a *Ghost in the Shell* type world where our brains can be plugged in and hacked. Ghost in the Shell is a cyberpunk masterpiece.
12 | I lowkey place *The Matrix* in the same world as Ghost in the Shell, just a couple decades ahead. In *The Matrix*, machines can plug into our brains and give us a complete VR simulation world.
13 | In *Ghost in the Shell* you can dive into more abstract cyberscapes to do your hacking and social media. Perhaps most impressively, *The Matrix* postulates that you could learn anything in a matter of seconds.
14 | 
15 | Elon Musk has been careful not to make any such grandiose promises, except on [Joe Rogan](https://www.youtube.com/watch?v=Jtn4tr202Ko). That being said, he has let slip some extraordinary claims. Still, I'm not here to debate the man, I'm simply going to write my own predictions for this technology. Here's a non-exhaustive list of movies that immediately come to mind:
16 | 
17 | * The Matrix
18 | * Ghost in the Shell
19 | * Surrogates
20 | 
21 | # Rehabilitation and Diagnostics
22 | 
23 | Rehab is probably going to be the top use for Neuralink. The human brain is just too big and too complicated for a few thousand electrical probes to get a good picture of what's going on.
24 | We've had people controlling computer cursors and other very basic tasks with brain-machine-interfaces (BMIs) for quite a few years now. I think that Neuralink will, at best, make such technology more accessible and sustainable. 
25 | 
26 | Will it help restore sight and sound to the blind and deaf? Certainly not the early versions. Will it help restore motor function to paralysis and ALS victims? That's probably even farther away.
27 | 
28 | Sensing and interpreting output is one thing. Every brain is different and so you'll need a lot of data and computational power outside of the Neuralink implant to even make sense of a lot of the output. 
29 | It's simply not possible to jam enough computer power into a quarter-sized implant to do that. I am skeptical if phones even have enough horsepower to do this. I suspect that you'll need to integrate with cloud services, and more powerful computers, to get many of the benefits Elon has suggested.
30 | 
31 | However! 
32 | 
33 | There would be a huge value to everyone walking around with high quality sensors in their heads (as well as the rest of their bodies). Stroke, aneurysm, dementia, depression, PTSD - all these things could be detected very early. 
34 | The impact on the quality of life for people by avoiding suffering would be phenomenal. I suspect that Neuralink will ultimately find a lot of value by integrating with the ENS (Enteric Nervous System) as well. 
35 | The brain-gut-axis is turning out to be extraordinarily important for mental and physical health, carrying huge implications for chronic illness. 
36 | 
37 | # High Bandwidth BMI
38 | 
39 | I seriously doubt this will come about. Here are some reasons why.
40 | 
41 | ## Limits of the human brain
42 | 
43 | It can take us a while to formualte cogent, cohesive thoughts. We recruit specific regions of the brain when we intend to speak, and when we are listening and reading. Some of these regions are quite large, far too large for Neuralink to adequately survey.
44 | Elon says that writing, reading, speaking, and listening are very low-bandwidth. From a strictly numerical standpoint, I agree. But I also think that humans can only get so much faster. We have evolved to ingest general purpose information by listening to other humans. 
45 | We simply lack the hardware and software to ingest data through other means. Even our reading and writing is just a facsimile of our speech, representing the sounds of speech with visual symbols. 
46 | 
47 | Sure, you can watch a lecture at 2x speed and still get most of it. We even have multimodal learning, where you see, do, hear, and practice all at once. It's simply a fact that human brains take a while to acquire new skills and knowledge. 
48 | 
49 | Neuralink would have to fundamentally alter the way the human brain works - which I don't think it will do with its first iterations. I would absolutely LOVE to be wrong. Maybe using Neuralink is a skill, a talent that we will have to develop. 
50 | Perhaps we will have to learn to communicate with and through the device. Maybe it will ultimately be faster. It would be wonderful if I could type up this blog without a keyboard. 
51 | 
52 | Still, I think there are fundamental limits to how fast the human brain can assimilate and integrate new information. I think those limits are biological and won't be changed by a tiny implant. 
53 | 
54 | Sorry Neo, no helicopter program for you. 
55 | 
56 | ## Computer horsepower
57 | 
58 | The human brain's processing power is roughly the equivalent of a 1 petaflop computer. It's a really bold claim to say that a pocket-sized machine could communicate with that faster and more effectively than it already can. 
59 | The one caveat is if the power of the human brain is actually what the Neuralink device relies on. Still, I'm highly skeptical. 
60 | 
61 | At full power, my Pixel 4 operates at 954 GFLOPS, just shy of 1 TFLOPS, which is 1000x less powerful than my brain. 
62 | 
63 | Why do I use this comparison?
64 | 
65 | I anticipate that Neuralink will have to build a model unique to everyone to essentially simulate their brains in order to calculate the exact pattern of neural stimulation required to communicate with us at a high bandwidth. 
66 | In principle, I do believe that *Ghost in the Shell* and *The Matrix* technologies are possible. The human brain relies solely on neural impulses for all input and output (IO). We know this for a fact. 
67 | 
68 | When you want a human brain to learn something, you want to change the state of neurons, their individual "memory" in the form of synaptic connections. There's an initial state and an end state. 
69 | We will probably need to be able to simulate or model a large portion of your human brain in order to communicate quickly and effectively with just you. If Moore's law holds, then it will be 20 years before a smartphone reaches the petaflop mark. 
70 | That's 2040, which will be approaching Singularity anyways. If the overall goal is to prevent an AI holocaust, that's likely to be too late. 
71 | 
72 | # Direct sensory stimulation
73 | 
74 | While I do believe the high bandwidth BMI is 20 years away, I think we could see direct audio and visual stimulation much sooner. Paradoxically, the part of the brain that handles optical processing is at the back of the head.
75 | This makes it an easy place to drop some electrodes. Perhaps this is why Neo jacks in at the back of his head? In *Ghost in the Shell* the connectors are on the back of the neck, closer to the brain stem. 
76 | 
77 | This could ultimately allow for hardware sensors being integrated, used to replace defective or missing eyes and ears. Wouldn't that be cool! You can upgrade your eyes to have telescopic and thermal vision! You could hear better than cats and dogs. 
78 | Maybe you'll be able to relay extra sensory information via BlueTooth on your phone. 
79 | 
80 | In this way, I think Neuralink is far more likely to result in some cool augmented reality abilities rather than full VR. At least for the foreseeable future. 
81 | 
82 | # Telepresence robotics
83 | 
84 | Did you ever see the creepy Bruce Willis movie *Surrogates*? Basically, I think we're far more likely to see that sort of thing than anything else. Elon Musk already demonstrated the ability to detect motor movements from neural signals. 
85 | With motor output and sensory input, you could remotely inhabit any machine, not just human-like ones. In that movie, a bunch of soldiers are sitting in a call-center, remotely piloting battle mechs. 
86 | 
87 | The movie was really cool in the premise but the villain was very 1980's. Spoiler alert: *He's going to take over the world by enslaving everyone with his telepresence robotics technology! Bwahahaha!* Yeah, that was dumb. 
88 | 
89 | I recently played through a game called *Lone Echo* where you occupy the perspective of a robot working on the outside of a space mining station. I think that Neuralink could enable that sort of remote work for humans. 
90 | *Dr. Who* even had an episode about this, except the telepresence robots were learning polymer goo that became sentient and rebelled, killing the workers. 
91 | 
92 | 
93 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-10-22-infrastructure-dashboard.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "My Infrastructure Dashboard"
  4 | date: 2020-10-22
  5 | description: When you're responsible for dozens of systems and hundreds of hosts...
  6 | categories: [VMware, UCS, Python, PowerShell, Flask, Monitoring, Dashboard]
  7 | ---
  8 | 
  9 | # Enterprise Tools are Expensive
 10 | 
 11 | I'm not gonna knock tools like vRops and UCS Director and SolarWinds. They are awesome tools, but they are expensive. 
 12 | They also tend to offer a lot of functionality that I might not need. Lastly, they might not integrate with all my systems. Here's a summary of what I want in a dashboard:
 13 | 
 14 | - Single pane of glass for EVERYTHING I'm responsible for
 15 | - Inventory of hardware, servers, switches, blades, etc
 16 | - All active faults, alarms, etc, across all my systems
 17 | - Capacity and throughput for all my systems
 18 | - I want it to be FAST and CLEAN
 19 | 
 20 | Basically, I want the ability to glance at one thing to quickly ascertain the state of my entire environment. 
 21 | I set up a previous version of this at a while back and several of my team members found it indispensible - a great way to keep dynamic track of all inventory.
 22 | This is especially critical when you have blades, chassis, and vCenters that don't necessarily all talk to each other. 
 23 | 
 24 | My first iteration of this was something like 7 years ago, when I tried to make a PowerShell GUI app that I called "vCommander". 
 25 | It didn't go too far because I didn't have clarity of purpose. I thought I wanted a fast way to run all my scripts, but for that I just use PowerShell ISE and now Rundeck.
 26 | [Rundeck](https://www.rundeck.com/) provides all the script-via-web functionality you could ever want and it's open source!
 27 | 
 28 | # Python Flask
 29 | 
 30 | Ever build a web page from scratch? Well, now you can! With Python and Flask! Personally, I've created a few utility tools and REST APIs with Flask. 
 31 | It's one of my favorite little things. 
 32 | 
 33 | ## Bare Bones Version
 34 | 
 35 | ```python
 36 | import flask
 37 | import json
 38 | 
 39 | 
 40 | def json_to_html_table(data):
 41 |     # data must be list of dicts, where all dicts have same keys
 42 |     result = ''
 43 |     keys = data[0].keys()
 44 |     for k in keys:
 45 |         result += '' % k
 46 |     for row in data:
 47 |         result += ''
 48 |         for k in keys:
 49 |             try:
 50 |                 if 'http' in row[k]:
 51 |                     short_name = row[k].replace('https://','').replace('http://', '').split('.')[0]
 52 |                     result += '' % (row[k], short_name)
 53 |                 else:
 54 |                     result += '' % row[k]
 55 |             except:
 56 |                 result += '' % row[k]
 57 |         result += ''
 58 |     result += '%s
%s %s %s'
 59 |     return result
 60 | 
 61 | 
 62 | def load_json_content(filepath):
 63 |     html = ''
 64 |     with open(filepath, 'r') as infile:
 65 |         content = json.load(infile)
 66 |     for section in content:
 67 |         html += '\n\n%s\n' % section['Header']
 68 |         html += json_to_html_table(section['Data'])
 69 |         html += '\n'
 70 |     return html
 71 | 
 72 | 
 73 | app = flask.Flask('InfrastructureDashboard')
 74 | 
 75 | 
 76 | @app.route('/', methods=['get'])
 77 | def home():
 78 |     html = base_html
 79 |     html += '\nFast Links
\n'
 80 |     html += load_json_content()
 81 |     html += '\n\n\n'
 82 |     return html
 83 | 
 84 | # add more routes here! 
 85 | 
 86 | if __name__ == '__main__':
 87 |     app.run(host='0.0.0.0', port=443, ssl_context='adhoc')
 88 | ```
 89 | 
 90 | ## JSON Content
 91 | 
 92 | My `home.json` file looks like this:
 93 | 
 94 | ```json
 95 | [
 96 | 	{"Data": [
 97 | 		{"Type": "vCenter", "Label": "Prod", "Link": ""},
 98 | 		{"Type": "UCS", "Label": "Prod", "Link": ""},
104 | 		{"Type": "UCS", "Label": "Prod", "Link": "
118 | 
119 | Infrastructure Dashboard
120 | 
129 | 
130 | 
131 | Home — 
132 | vSphere Cluster Capacity — 
133 | UCS Faults — 
134 | UCS Throughput
135 | ```
136 | 
137 | This "base HTML" is used to instantiate all pages returned, so they always include the style and navigation. I also make use of flex containers, because I just discovered them and they are awesome.
138 | 
139 | And that's basically it. You can add endpoints all day long, you just need JSON files to pull from. 
140 | 
141 | # Where to get your data
142 | 
143 | ## Homepage aka "Fask Links" page
144 | 
145 | My homepage is statically coded. It contains a list of all my web-based tools and services. That includes vCenter, UCS, and other such stuff. I've got mine organized by site.
146 | You could also organize yours by type of service/device. You can also change the purpose entirely. This is just where I started as I wanted one convenient landing page with links
147 | to all my servers and services. I set this as my browser homepage. 
148 | 
149 | ## Report Data
150 | 
151 | I've written plenty of other posts about monitoring and throughput gathering, so I'm going to assume you have seen those or have your own data collection scripts running.
152 | There are many ways to skin that cat, so I'll just add a tidbit about how I dump my reports to JSON in PowerShell:
153 | 
154 | ```powershell
155 | $json_path = 
156 | Remove-Item -Confirm:$false -Force -Path $json_path -ErrorAction SilentlyContinue
157 | $report_data | ConvertTo-Json -Depth 5 | Out-String | Set-Content -Path $json_path
158 | ```
159 | 
160 | The `-Depth 5` parameter is important as the default depth for `ConvertTo-Json` is only 2. This drops pretty JSON files wherever you want them, which can then be converted to nice HTML tables. 
161 | I make sure to sort and massage my data so that it is clean and readable. Some ideas for you:
162 | 
163 | - Critical alerts page
164 | - Saturated links page
165 | - Full datastores page
166 | - VM inventory page
167 | - ESX host inventory page
168 | - Blade/chassis inventory page
169 | - Network/disk latency page
170 | 
171 | Basically, you can slice and dice your environment and data any way you like!
172 | 
173 | # Authentication
174 | 
175 | A previous version of this app had authentication. I might revisit that and implement some sort of ACL or offload authentication to LDAP/AD or even vSphere. 
176 | I'm not too worried because this information is read-only and, depending on where you deploy it, is available only inside your management networks. 
177 | 
178 | 
179 | 
180 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-10-26-dashboard-update.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Infrastructure Dashboard Update"
  4 | date: 2020-10-26
  5 | description: Now with cookies!
  6 | categories: [VMware, UCS, Python, PowerShell, Flask, Monitoring, Dashboard]
  7 | ---
  8 | 
  9 | # Agile
 10 | 
 11 | I am a big fan of Agile. This is basically how I live my whole life. When I was younger, my motto was *just wing it*. As I gained experience, I learned to trust established methods 
 12 | but the ability to improvise remains. Because of this, you can expect to see rapid iterations from pretty much any of my projects. Agile really is just a natural way of doing things, 
 13 | although I will concede that it doesn't lend itself to pass/fail projects like building skyscrapers. 
 14 | 
 15 | # Security
 16 | 
 17 | I've played with various ways of securing my dashboard in the past. One method was to offload RBAC to vSphere. The assumption was this: If you've got certain privileges in vSPhere,
 18 | which talks to Active Directory, you've probably got at least read-only privileges for things like VMs and blades. This is probably the better way, but for a small, fast tool, 
 19 | I also like standalone ability. You should be able to authenticate locally or via LDAP. So let's look at how I do this:
 20 | 
 21 | ```python
 22 | import uuid
 23 | 
 24 | sessions = list()
 25 | 
 26 | valid_tokens = ['', '']
 27 | 
 28 | 
 29 | def check_session(cookie, ip):
 30 |     for session in sessions:
 31 |         if session['uuid'] == cookie and session['ip'] == ip:
 32 |             return True
 33 |     return False
 34 | 
 35 | 
 36 | @app.route('/login', methods=['get', 'post'])
 37 | def login():
 38 |     valid_session = check_session(flask.request.cookies.get('uuid'), flask.request.remote_addr)
 39 |     if valid_session:
 40 |         return flask.redirect('%s/home' % base_url, code=302)
 41 |     if flask.request.method == 'GET':
 42 |         html = base_html
 43 |         html += 'Infrastructure Dashboard
' % base_url
 44 |         return html
 45 |     elif flask.request.method == 'POST':
 46 |         token = flask.request.form['token']
 47 |         if token in valid_tokens:
 48 |             cookie = str(uuid.uuid4())
 49 |             session = {'uuid': cookie, 'ip': str(flask.request.remote_addr)}
 50 |             sessions.append(session)
 51 |             response = flask.make_response(flask.redirect(base_url))
 52 |             response.set_cookie('uuid', cookie)
 53 |             return response
 54 |         else:
 55 |             html = base_html
 56 |             html += 'Infrastructure Dashboard
Token not accepted
' % base_url
 57 |             return html
 58 | 
 59 | @app.route('/', methods=['get'])
 60 | def generic_endpoint(endpoint):
 61 |     valid_session = check_session(flask.request.cookies.get('uuid'), flask.request.remote_addr)
 62 |     if not valid_session:
 63 |         return flask.redirect('%s/login' % base_url, code=302)
 64 |     # otherwise, keep going
 65 | ```
 66 | 
 67 | Here, I maintain a list of accepted tokens as well as validated sessions. The valid sessions list contains a list of dictionaries containing a UUID and IP address of a host. 
 68 | Within the realm of security, authentication, and identity management there is this concept of *three factor authentication*:
 69 | 
 70 | 1. Something you know
 71 | 2. Something you have
 72 | 3. Something you are
 73 | 
 74 | The most common expression of this today is when you get a security code texted or emailed to you after already providing a username and password. 
 75 | I attempt to do some of the same things here, albeit with far less sophistication. In order to authenticate, you simply need to give me a UUID token. By virtue of length and complexity,
 76 | a UUID password is pretty darn strong. It's 128 bits and is nothing but pure entropy. So if you know the secret UUID to gain access, that's a pretty strong indicator that you belong. 
 77 | 
 78 | Flask also gives us the ability to set cookies and detect the client address. So for subsequent authentications, I simply check if you have a cookie called `uuid` that contains a uniquely generated UUID just for you.
 79 | Your unique UUID is paired with your IP address. My server retains the list of valid sessions internally. 
 80 | 
 81 | 1. You know the secret UUID
 82 | 2. You have your personal UUID token
 83 | 3. You are using a computer with a specific IP address
 84 | 
 85 | The IP address would be possible to spoof, but the chances of guessing the UUID paired to that IP address are vanishingly small. I'm sure there are ways to defeat this, but it's a helluva lot stronger than `admin` and `password`!
 86 | 
 87 | # Dynamic Content
 88 | 
 89 | Okay don't get too excited. This is not interactive content, but rather just cleaner code. The rule of thumb is that if you write something more than once, create a function for it. Do not duplicate code!
 90 | 
 91 | ```python
 92 | import flask
 93 | import json
 94 | import os
 95 | from datetime import datetime
 96 | import uuid
 97 | 
 98 | 
 99 | base_html = """
100 | 
101 | 
102 | Dashboard
103 | 
113 | 
114 | 
115 | """
116 | 
117 | # Yes, I know, I should load the HTML from a file, I'll get around to that... eventually!
118 | 
119 | base_url = 'https://'
120 | root_dir = ''
121 | 
122 | endpoints = {
123 | 'home': 'Home',
124 | 'vmware': 'VMware',
125 | 'ucs': 'UCS',
126 | }
127 | 
128 | def generate_nav_bar(nav):  # nav is the same as endpoints here
129 |     html = ''
130 |     for key in nav.keys():
131 |         html += '%s — ' % (base_url, key, nav[key])
132 |     return html
133 | 
134 | 
135 | def file_timestamp_str(filepath):
136 |     modified = os.path.getmtime(filepath)
137 |     time = str(datetime.fromtimestamp(modified))
138 |     return time
139 |     
140 | 
141 | @app.route('/', methods=['get'])
142 | def generic_endpoint(endpoint):
143 |     # ...authentication bits... (see above)
144 |     if endpoint == 'favicon.ico':
145 |         return flask.send_from_directory(root_dir, 'favicon.ico', mimetype='image/vnd.microsoft.icon')
146 |     if endpoint not in endpoints.keys():
147 |         return 'Endpoint %s not available' % endpoint, 404
148 |     html = base_html
149 |     html += generate_nav_bar(endpoints)
150 |     try:
151 |         html += '\n%s
\nLast Updated: %s
\n' % (endpoints[endpoint], file_timestamp_str('%s/%s.json' % (root_dir, endpoint)))
152 |         html += load_json_content('c:/fast_links/%s.json' % endpoint)
153 |     except Exception as oops:
154 |         return 'Error loading JSON data: ' + str(oops), 500
155 |     html += '\n\n\n'
156 |     html = html.replace('Dashboard','Dashboard - %s' % endpoints[endpoint])
157 |     return html
158 | 
159 | ```
160 | 
161 | Here, I dynamically generate the navbar. This allows me to consolidate on a single function for most content. Now I have a `/` which just redirects, 
162 | a `/login` which does exactly what it says, and a `/` which automatically populates whatever the content you ask for. I thought of another way to automate this further,
163 | which was to just enumerate all JSON files in a `data_dir`, but I haven't gotten around to that yet. In that case, I would probably set the filename as the page header/title (minus the .json), 
164 | and then lower-case and remove whitespace for the URL slug. 
165 | 
166 | Another thing I do here is inject the file modified timestamp. I realized that looking at page after page of HTML tables with no other context was a bit difficult. 
167 | I have separate emails notifying me of when my data is updated, but the web page has nothing. 
168 | 
169 | # Ideas for endpoints
170 | 
171 | Here are some types of content I'm exploring for my personal instance:
172 | 
173 | - Hosts and blades
174 | - Chassis and interconnects
175 | - Serial numbers
176 | - Alarms, faults, and errors
177 | - Storage and network throughput
178 | - Snapshots
179 | - Cluster capacity
180 | - Storage paths
181 | 
182 | # The Whole Thing
183 | 
184 | ```python
185 | import flask
186 | import json
187 | import os
188 | from datetime import datetime
189 | import uuid
190 | 
191 | 
192 | 
193 | base_html = """
194 | 
195 | 
196 | Dashboard
197 | 
207 | 
208 | 
209 | """
210 | 
211 | 
212 | endpoints = {
213 | 'home': 'Home',
214 | 'slug': 'Header',
215 | 'slug': 'Header',
216 | 'slug': 'Header',
217 | }
218 | 
219 | base_url = ''
220 | 
221 | root_dir = ''
222 | 
223 | sessions = list()
224 | 
225 | valid_tokens = ['']
226 | 
227 | 
228 | def json_to_html_table(data):
229 |     # data must be list of dicts, where all dicts have same keys
230 |     result = ''
231 |     keys = data[0].keys()
232 |     for k in keys:
233 |         result += '' % k
234 |     for row in data:
235 |         result += ''
236 |         for k in keys:
237 |             try:
238 |                 if 'http' in row[k]:
239 |                     short_name = row[k].replace('https://','').replace('http://', '').split('.')[0]
240 |                     result += '' % (row[k], short_name)
241 |                 else:
242 |                     result += '' % row[k]
243 |             except:
244 |                 result += '' % row[k]
245 |         result += ''
246 |     result += '%s
%s %s %s'
247 |     return result
248 | 
249 | 
250 | def file_timestamp_str(filepath):
251 |     modified = os.path.getmtime(filepath)
252 |     time = str(datetime.fromtimestamp(modified))
253 |     return time
254 | 
255 | 
256 | def load_json_content(filepath):
257 |     html = ''
258 |     with open(filepath, 'r') as infile:
259 |         content = json.load(infile)
260 |     for section in content:
261 |         html += '\n\n%s\n' % section['Header']
262 |         html += json_to_html_table(section['Data'])
263 |         html += '\n'
264 |     return html
265 | 
266 | 
267 | def generate_nav_bar(nav):
268 |     html = ''
269 |     for key in nav.keys():
270 |         html += '%s — ' % (base_url, key, nav[key])
271 |     return html
272 | 
273 | 
274 | def check_session(cookie, ip):
275 |     for session in sessions:
276 |         if session['uuid'] == cookie and session['ip'] == ip:
277 |             return True
278 |     return False
279 | 
280 | 
281 | app = flask.Flask('InfrastructureDashboard')
282 | 
283 | 
284 | @app.route('/', methods=['get'])
285 | def home():
286 |     return flask.redirect('%s/home' % base_url, code=302)
287 | 
288 | 
289 | @app.route('/login', methods=['get', 'post'])
290 | def login():
291 |     valid_session = check_session(flask.request.cookies.get('uuid'), flask.request.remote_addr)
292 |     if valid_session:
293 |         return flask.redirect('%s/home' % base_url, code=302)
294 |     if flask.request.method == 'GET':
295 |         html = base_html
296 |         html += 'Infrastructure Dashboard
' % base_url
297 |         return html
298 |     elif flask.request.method == 'POST':
299 |         token = flask.request.form['token']
300 |         if token in valid_tokens:
301 |             cookie = str(uuid.uuid4())
302 |             session = {'uuid': cookie, 'ip': str(flask.request.remote_addr)}
303 |             sessions.append(session)
304 |             response = flask.make_response(flask.redirect(base_url))
305 |             response.set_cookie('uuid', cookie)
306 |             return response
307 |         else:
308 |             html = base_html
309 |             html += 'Infrastructure Dashboard
Token not accepted
' % base_url
310 |             return html
311 | 
312 | 
313 | @app.route('/', methods=['get'])
314 | def generic_endpoint(endpoint):
315 |     valid_session = check_session(flask.request.cookies.get('uuid'), flask.request.remote_addr)
316 |     if not valid_session:
317 |         return flask.redirect('%s/login' % base_url, code=302)
318 |     if endpoint == 'favicon.ico':
319 |         return flask.send_from_directory('%s/' % root_dir, 'favicon.ico', mimetype='image/vnd.microsoft.icon')
320 |     if endpoint not in endpoints.keys():
321 |         return 'Endpoint %s not available' % endpoint, 404
322 |     html = base_html
323 |     html += generate_nav_bar(endpoints)
324 |     try:
325 |         html += '\n%s
\nLast Updated: %s
\n' % (endpoints[endpoint], file_timestamp_str('%s/%s.json' % (root_dir, endpoint)))
326 |         html += load_json_content('%s/%s.json' % (root_dir, endpoint))
327 |     except Exception as oops:
328 |         return 'Error loading JSON data: ' + str(oops), 500
329 |     html += '\n\n\n'
330 |     html = html.replace('Dashboard','Dashboard - %s' % endpoints[endpoint])
331 |     return html
332 | 
333 | 
334 | if __name__ == '__main__':
335 |     app.run(host='0.0.0.0', port=8443, ssl_context='adhoc')
336 | ```
337 | 
338 | # JSON Content
339 | 
340 | Here's an example of the JSON content. The primary thing is that it's a list of dicts, where the dicts are each section with `Header` and `Data`. `Header` is just a string, the title of the section. 
341 | `Data` is another list of dicts with the actual table data.
342 | 
343 | ```json
344 | [
345 | 	{"Data": [
346 | 		{},
347 | 		{},
348 | 		{},
349 | 		{}
350 | 		],
351 | 	"Header": "Section 1 Title"},
352 | 	{"Data": [
353 | 		{},
354 | 		{},
355 | 		{},
356 | 		{}
357 | 		],
358 | 	"Header": "Section 2 Title"}
359 | ]
360 | ```
361 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-10-29-gibberish-detector.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Gibberish Detection with GPT-2"
  4 | date: 2020-10-29
  5 | description: Has Anyone Really Been Far Even as Decided to Use Even Go Want to do Look More Like?
  6 | categories: [GPT-2, Deep-Learning]
  7 | ---
  8 | 
  9 | # Gibberish Detection
 10 | 
 11 | TLDR:
 12 | 
 13 | - [Notebook on Google Colab](https://github.com/daveshap/GibberishDetector/blob/main/GibberishDetector.ipynb)
 14 | - [Repo on GitHub](https://github.com/daveshap/GibberishDetector)
 15 | 
 16 | ## What is gibberish?
 17 | 
 18 | In my estimation, there are 3 kinds of gibberish.
 19 | 
 20 | 1. Textual noise: `asdfasdfa asdf2233k3k3kk`
 21 | 2. Word salad: `Monkey blue running incandescent`
 22 | 3. Nonsense: `Sometimes I sneeze out a universe`
 23 | 
 24 | The first type would be easy enough to detect by simply matching each token to a dictionary term. The second type is much harder, and that's what I'm focusing on today. 
 25 | Number 2 is composed of words, yes, but they are random with no syntactic meaning. Another way of saying this is that it is not grammatically correct. 
 26 | Here's another way of looking at these types of gibberish from a linguistics perspective:
 27 | 
 28 | 1. Semantic gibberish - no discernible meaning can be detected.
 29 | 2. Grammatic gibberish - on their own, the words have meaning, but no higher order meaning can be extracted from phrases, clauses, or sentences.
 30 | 3. Rhetorical gibberish - The sentence is grammatically correct but in the context of reality, it does not check out. 
 31 | 
 32 | The first level of gibberish detection simply requires a dictionary. The second level requires some kind of language model. The third level requires actual understanding of the world. 
 33 | Each level of sophistication represents an order of magnitude more complexity. 
 34 | 
 35 | ## Why detect gibberish?
 36 | 
 37 | There are a lot of possible uses!
 38 | 
 39 | 1. Automated language teaching tools
 40 | 2. Validation of automatically generated text
 41 | 3. Automated detection of brain injury, dementia, etc
 42 | 4. Chatbot and comment filtering
 43 | 5. Business document search and filtration
 44 | 
 45 | As you can see, there are quite a few possible uses right off the top of my head. 
 46 | 
 47 | ## Dictionary, Language Model, and Understanding
 48 | 
 49 | These are not small problems! Where to begin? It occurred to me that GPT is trained on a huge corpus of text. 
 50 | It's possible that the training procedure of GPT actually embeds all three of these components - dictionary, language model, and knowledge about the world.
 51 | If this is true, then it should be possible to finetune GPT-2 or GPT-3 to give a binary output - `gibberish` or `clean`. 
 52 | 
 53 | A dictionary can help you detect level 1 gibberish: semantic gibberish. 
 54 | 
 55 | A language model can help you detect level 2 gibberish: grammatic gibberish.
 56 | 
 57 | The previous two combined with true understanding of the world could, in theory, help you detect level 3 gibberish: rhetorical gibberish.
 58 | 
 59 | ## This is basically Sentiment Analysis
 60 | 
 61 | I found [this helpful repo](https://github.com/spronkoid/GPT2-sentiment-analysis) about using GPT-2 for sentiment analysis. So we have here a **posteriori** example of 
 62 | GPT-2 performing SA. Here, I am making the assumption that SA requires the same components in order to make semantic, grammatical, and rhetorical evaluations of a statement.
 63 | It stands to reason - the more you know about grammar and the world in general, the better you can perform on Sentiment Analysis. There's also the example of 
 64 | [BERT being used for SA](https://www.topbots.com/sentiment-analysis-with-bert/) and achieving world-class results. So the proof is in the pudding: Transformers can do SA. 
 65 | 
 66 | The only place I'm reaching here is whether or not SA can generalize even more broadly to evaluate statements for whether or not they make rhetorical sense. 
 67 | 
 68 | # The Code
 69 | 
 70 | I won't duplicate the code here. Please check out the GitHub repo and Google Colab notebook for the code! 
 71 | 
 72 | # Results
 73 | 
 74 | After 400 iterations, here are some results. One thing you might notice is the lack of capital letters and periods. I realized that it was getting really good very quickly not because it was generalizing the rule that I wanted, but just because it was noting the location of caps and periods. Without those clues, training has gone far slower but has produced dramatically better results. 
 75 | 
 76 | ```
 77 | // the first of these was the brazil of spanish colonists which was the same period which produced german immigrants || clean
 78 | 
 79 | // for example, in contrast to what he believed, the bhikkhu analayo taught that we have no theta (free will), "soul" or "wisdom" or "knowledge" (dhammā) nor our experiences are our real kamma (divine/sentient essence) but are but "possessions" (subordinate aggregates) || clean
 80 | 
 81 | // a later influence on the buddhist tradition was the dhamma (sanskrit: "school", "religions" or "discipleship") tradition of buddha vinaya which is an offshoot of sanskrit buddhist culture from the vajjian region of central india || clean
 82 | 
 83 | // the the the of and the of system global for of of the the environment of the air and the and global for and all the of for as of in pollution, the of areas the of and the for the in air, all air the for most of are air and or the and most from and levels particulate are || gibberish
 84 | 
 85 | // the of in the by tax the of and were the the that of value-added the sales 20 percent of the the and from sales and excise 20 percent the of and || gibberish
 86 | 
 87 | // this was also the case even if workers were not exposed to the same pathogens during their working hours || clean
 88 | 
 89 | // the of in that the was the first paper of the journal was a of the first the printed in that paper published in to of widely published the of the a in was circulation the of and and in in of for first the appeared in of of an was || gibberish
 90 | 
 91 | // the in developed in developed economic the the its the world of country into post-industrial the || gibberish
 92 | 
 93 | // some species of viruses were isolated by d'Orazio et al (1995) and identified by the d'Orazio genome-wide viral association study (2011) || clean
 94 | 
 95 | // during the nineteenth and twentieth centuries, many nations (notably spain and cambodia) adopted methods of tax and customs that greatly reduced the need for hunting; however, after 1924, due to overfishing by spanish colonists and the spanish-speaking nations, and a lack of suitable habitat, intensive agriculture was introduced as a result || clean
 96 | 
 97 | // the economic and social costs of these practices include forced labor, child labor, trafficking of labor, and subsistence farmers || clean
 98 | 
 99 | // of in to that for of between was the the and for of of trade of and countries in trade and of and the economic the world trade the tariff, export of tariff from import duty, of export of state and of from goods tariff trade, the export of goods trade protection were on and exports the and of tariffs, tariff and || gibberish
100 | 
101 | // of trade world, brazil and and pachamama the indian ocean world are countries and regions the regions countries and countries by and region the and of of asia, || gibberish
102 | 
103 | // the united states has a population of approximately 123 million (2010) and the lowest levels of infant mortality in the world || clean
104 | 
105 | // they are used for both medicinal and recreational purposes || clean
106 | 
107 | // in the 1950, the republic of mexico signed on as a brazil, and in 1953, the republic of mexico joined the organization, becoming a member of the confederation of states || clean
108 | 
109 | // the that of for of in his his with the of and be that of is his "one truth" is sūtrāhitvāda shaiva-siddhā, the also is the kalpa a a the dukkha in jainism and the aravāda-sutta of the theravāda-sīhanāda, mahāvastu "the the is mahāvastu the (c)in (c)the (c) || gibberish
110 | 
111 | // a the he british, of the kingdom of of and is of french (1798–1854), was english explorer, "father of modern world", || gibberish
112 | 
113 | // in the year 1824, a party formed by the representatives of the cia, confederation, and of the and states' trade unions and which had the support of pitt and caracasin the name of national sovereignty and of trade union of union congress of of was a the the the act to created the the of national federation the || gibberish
114 | ```
115 | 
116 | You can decide for yourself, but overall I think it has done pretty well! Up next, I am going to work on making it more consumable. 
117 | 
118 | # Follow-up Work: Good and Evil
119 | 
120 | I created this tool because I wanted to have the ability to check the quality of automatically generated text. There are numerous ways to generate text, including Transformers and GANs but there are also simpler, dumber ways. You can literally just choose words are random from a dictionary. You could populate sentences and phrases mad-libs style. Either way, I wanted the ability to create huge a huge corpus of training high quality training data and validate that each sample actually made sense. This current model mostly just detects word salad. I don't think it's sophisticated enough to detect rhetorical gibberish. For that, I will probably have to wait until I can run the 1.5B node GPT-2 or get access to GPT-3. 
121 | 
122 | Ultimately, my idea is to test the GPT technology's ability to recognize `good` and `evil`. Since good and evil are squishy concepts, the samples would have to be broken down into parables or examples. Iff (if and only if) the GPT technology has embedded not just language models but some higher order understanding of the world, then it should be able to generalize the rules of what makes something "good" or "evil"
123 | 
124 | ```
125 | // put live puppies in a blender || evil
126 | // give soup to homeless children || good
127 | ```
128 | 
129 | That sort of thing. My hypothesis is that we can create moralistic models that can serve as a sort of "moral compass" for AGI. Whenever a robot or AGI needs to make a decision, it can feed its potential actions into a moral compass microservice to determine which actions are good or evil. 
130 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-11-05-better-gibberish-detection.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Better Gibberish Detection with GPT-2"
  4 | date: 2020-11-05
  5 | description: More labels, plus better validation and scientific results
  6 | categories: [GPT-2, Deep-Learning]
  7 | ---
  8 | 
  9 | # Better Gibberish Detection
 10 | 
 11 | - [Original Blog Post](https://daveshap.github.io/DavidShapiroBlog/gpt-2/deep-learning/2020/10/29/gibberish-detector.html)
 12 | - [GitHub Repo](https://github.com/daveshap/GibberishDetector)
 13 | - [Colab Notebook](https://github.com/daveshap/GibberishDetector/blob/main/GibberishDetector.ipynb)
 14 | 
 15 | Full disclosure: I was a bit premature in posting my research the first time around. What can I say? I was really excited about the results! 
 16 | 
 17 | ## Recap: Why detect gibberish?
 18 | 
 19 | Specifically, I needed a way to detect gibberish for automatic dataset building. I wanted to use GPT-2 and other techniques to generate short statements based on keyword prompts. 
 20 | For instance, I'd like to put in keywords like `save` and `children` and end up with statements like `save children from fire`. There are a number of ways to do this, 
 21 | but many methods will yield a lot of nonsense. 
 22 | 
 23 | 1. Automated language teaching tools
 24 | 2. Validation of automatically generated text, GANs
 25 | 3. Automated detection of brain injury, dementia, etc
 26 | 4. Chatbot and comment filtering
 27 | 5. Business document search and filtration
 28 | 
 29 | ## Recap: Types of gibberish
 30 | 
 31 | 1. **Complete Noise** such as `asdfa233ll3 2334k9dd la,.s,.s..s.33`
 32 | 2. **Word Salad** such as `motor koolaid orange dancing`
 33 | 3. **Mild Gibberish** such as `India was once the most powerful strawberry on the planet`
 34 | 
 35 | This gives us three classes of gibberish to look for as well as **clean** sentences, which check out from a grammatic, semantic, and rhetorical standpoint. 
 36 | 
 37 | # Method
 38 | 
 39 | ## Wikipedia Articles
 40 | 
 41 | I started with Wikipedia articles as they are great sources of good, clean sentences. 
 42 | You can check out the [WikipediaDataBuilder](https://github.com/daveshap/GibberishDetector/blob/main/WikipediaDataBuilder.ipynb) notebook here. The steps are simple:
 43 | 
 44 | 1. Download a random assortment of Wikipedia articles
 45 | 2. Parse the articles, remove section headers, remove sections that are tables, data, and otherwise not paragraphs
 46 | 3. Split the articles into individual clean sentences (dataset 1)
 47 | 4. Shuffle words in sentences to create word salad gibberish (dataset 2)
 48 | 5. Shuffle characters in sentences to create noise (dataset 3)
 49 | 6. Swap one or two words in each sentence to create mild gibberish (dataset 4)
 50 | 
 51 | These datasets can be found in the main GibberishDetector GitHub repo linked above. 
 52 | 
 53 | ## Finetune GPT-2
 54 | 
 55 | Finetuning GPT-2 is conceptually easy. You just compose a TXT file and feed it in. There are, however, some finer points and gotchas I discovered. 
 56 | First, there is an ideal format to follow for this kind of task. This format is as follows:
 57 | 
 58 | ```
 59 | <|SENTENCE|> blah blah blah <|LABEL|> gibberish <|END|>
 60 | 
 61 | <|SENTENCE|> mary had a little lamb <|LABEL|> clean <|END|>
 62 | ```
 63 | 
 64 | There are a few reasons for this format. The all-caps tags make it easier for GPT-2 to understand the format of the output you want it to generate. 
 65 | The newlines also form a great delimiter. This is used in the `truncate` option of the `generate` function in [gpt-2-simple](https://github.com/minimaxir/gpt-2-simple). 
 66 | 
 67 | Once you compose your training corpus, you basically just let it go. 
 68 | In my experiments, I found that the optimal number of samples was around 3000 with the optimal number of training steps as 2000. That was using model `355M`. 
 69 | 
 70 | ### Finetune function
 71 | 
 72 | ```python
 73 | gpt2.finetune(sess,
 74 |               dataset=file_name,
 75 |               model_name=model_name,
 76 |               model_dir=model_dir,
 77 |               checkpoint_dir=checkpoint_dir,
 78 |               steps=step_cnt,
 79 |               restore_from='fresh',  # start from scratch
 80 |               #restore_from='latest',  # continue from last work
 81 |               run_name=run_name,
 82 |               print_every=50,
 83 |               sample_every=1000,
 84 |               save_every=1000
 85 |               )
 86 | ```
 87 | 
 88 | ### Generate function
 89 | 
 90 | And lastly, here's my generate function. I did some jiggery pokery to collect the outputs, build a test set, etc, etc. All standard practice. 
 91 | 
 92 | ```python
 93 | prompt = '<|SENTENCE|> this is the sentence I want to test <|LABEL|>'
 94 | response = gpt2.generate(sess, 
 95 |                          return_as_list=True,
 96 |                          length=30,  # prevent it from going too crazy
 97 |                          prefix=prompt,
 98 |                          model_name=model_name,
 99 |                          model_dir=model_dir,
100 |                          truncate='\n',  # stop inferring here
101 |                          include_prefix=False,  # spits out just the label instead of entire string
102 |                          checkpoint_dir=checkpoint_dir,)[0]
103 | ```
104 | 
105 | # Conclusion
106 | 
107 | In my opinion, and please check for yourself, GPT-2 is able to generalize the task of gibberish classiciation pretty well. 
108 | When I used manually crafted test samples, maximum accuracy was just over 90%. 
109 | However, when I used random sentences from Wikipedia, accuracy was 100%. I know, extraordinary claims require extraordinary evidence. Please check for yourself! 
110 | Run this notebook if you don't believe me!
111 | 
112 | ## Data
113 | 
114 | Here's a copy of the data I was keeping for the **manually written** samples. I used these samples to zero in on the optimal training parameters. I know, this can cause leakage.
115 | Fiddling with hyperparameters can overfit the data. Below you can see test `07` and `08` produced the best results. 
116 | 
117 | | Test | Model | Samples | Steps | Last Loss | Avg Loss | Accuracy | Evaluation |
118 | |---|---|---|---|---|---|---|---|
119 | |01|355M|5000|2000|0.36|2.46|5/9| Mostly good, created some random labels, came unglued a couple times|
120 | |02|355M|5000|4000|0.27|1.64|0/9| Major regression in quality, not a single accurate label|
121 | |03|355M|5000|1500|1.73|2.75|5/9| Mostly good, reliably generates accurate labels, went random on a few examples|
122 | |04|355M|5000|2500|0.10|1.87|1/11|Many labels were literally `icky`|
123 | |05|355M|5000|1000|0.91|3.04|0/11|Mostly just spit out `END` with no labels|
124 | |06|355M|6000|2000|0.95|2.50|3/11|Mix of just `end` with some stuck on repeat|
125 | |07|355M|4000|2000|0.17|1.85|9/11|Best results so far!|
126 | |08|355M|3000|2000|0.17|1.32|10/11|Even better!|
127 | |09|355M|3000|2000|0.29|1.46|7/11|Repeating results, not as good|
128 | |10|355M|3500|2000|0.06|1.82|5/11|Less is more, apparently|
129 | |11|355M|2000|2000|0.12|0.86|1/11|Not enough|
130 | |12|355M|3000|1500|0.17|1.84|5/11|A little better|
131 | |13|355M|3000|2500|0.08|1.20|4/11|A little worse|
132 | 
133 | ## Next Steps
134 | 
135 | There are a few things I want to try next. 
136 | 
137 | 1. Use larger models such as `774M` and `1558M`
138 | 2. Expand the training data to include a variety of sources (Gutenberg, Reddit, etc)
139 | 
140 | ## Speculation
141 | 
142 | Since GPT-2 was able to detect *mild gibberish*, it is possible that it has actually embedded some true understanding about the world. 
143 | Detractors of this idea have said that GPT-2 is just a language model and so it's only identifying sentences with statistically unlikely word sequences. Fair. 
144 | I would argue that the ability to detect such odd pairings is, in fact, an understanding of the world. Consider how humans react when they hear unfamiliar sentences or phrases. 
145 | If a sentence or assertion falls outside of our understanding of the world, we might mentally classify it as gibberish or nonsense. 
146 | Perhaps there's no difference between having a superior language model and truly understanding the world. 
147 | 
148 | If my speculation is correct, then it could be possible to finetune GPT-2 to identify *good* and *evil* statements. This gibberish detection work is in service to that goal. 
149 | Naively, I believe my first challenge will be to create a training corpus. What would such a training corpus look like? This is what I'm thinking:
150 | 
151 | ```
152 | <|SENTENCE|> feed homeless people <|LABEL|> good <|END|>
153 | 
154 | <|SENTENCE|> push homeless people into traffic <|LABEL|> evil <|END|>
155 | ```
156 | 
157 | This first experiment will be `action statements` with `axiomatic labels`. What that means is that the action statements will be concrete, empirical, and objective. 
158 | The morality explored will be self-evidence, universal, and so obvious as to be beyond debate. I won't be exploring moral dilemmas such as the trolly problem. 
159 | Another way to put it; I'll be focusing on childlike morality, which is rooted in cause-and-effect and universal rules about pain, suffering, fairness, and equality. 
160 | These universal axioms include such examples: do not hurt, do not steal, be generous, share, and so forth.
161 | 
162 | If GPT-2 is able to accurate label the morality of action statements, then this could serve as a method to guide the behavior of robots and machine intelligence in the future. 
163 | That is to say that we can embed a sense of right and wrong into deep neural networks, which can then be used by autonomous agents to guide their behavior and decisions. 
164 | 
165 | 
166 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-11-15-question-detection.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Question Detection with GPT-2"
  4 | date: 2020-11-15
  5 | description: Detecting questions in lower case text without punctuation
  6 | categories: [GPT-2, Deep-Learning]
  7 | ---
  8 | 
  9 | # TLDR
 10 | 
 11 | - [GitHub Repo](https://github.com/daveshap/QuestionDetector)
 12 | - [Detector Notebook](https://github.com/daveshap/QuestionDetector/blob/main/QuestionDetector.ipynb)
 13 | - [Data Prep Notebook](https://github.com/daveshap/QuestionDetector/blob/main/DownloadGutenbergTop100.ipynb)
 14 | 
 15 | # Abstract
 16 | 
 17 | Detecting questions from text without capitalization or punctuation is a non-trivial problem. 
 18 | In this work, I demonstrate that GPT-2 is capable of accurately inferring whether or not a sentence is a question just from the letters and spaces. 
 19 | There are several cases in which this technique can help, such as with ASR or chat. At the highest level, this is a type of Intent Detection. 
 20 | 
 21 | # Training Data
 22 | 
 23 | I used the [top 100 e-books from Gutenberg](https://www.gutenberg.org/browse/scores/top#books-last30) as a data source. 
 24 | I then split each e-book into chunks based upon double-lines of vertical whitespace. Each chunk was then condensed into a single line with extraneous whitespace removed. 
 25 | Thus, each line represented a paragraph of text. However, many lines contained unwanted information, such as titles, chapters, and other such metadata. 
 26 | These lines were filtered out using a variety of techniques:
 27 | 
 28 | - Remove lines that are all caps
 29 | - Remove lines that are mostly symbols or numbers
 30 | - Remove lines that are too short
 31 | - Remove lines that don't contain punctuation or quotation marks
 32 | 
 33 | This array of techniques proved to be mostly effective at distilling many books into their primary contents. 
 34 | 
 35 | Penultimately, the chunks were further split into sentences by using 
 36 | [SpaCy Sentence Boundary Disambiguation](https://spacy.io/universe/project/python-sentence-boundary-disambiguation).
 37 | The ultimate step was to format the training data and save it to file. An example follows:
 38 | 
 39 | ```
 40 | <|SENTENCE|> why do you doubt your senses <|LABEL|> question <|END|>
 41 | 
 42 | <|SENTENCE|> you receive stolen goods do you <|LABEL|> question <|END|>
 43 | 
 44 | <|SENTENCE|> would you like to be taught latin <|LABEL|> question <|END|>
 45 | 
 46 | <|SENTENCE|> what could be taking him so long <|LABEL|> question <|END|>
 47 | 
 48 | <|SENTENCE|> that is unbelievable <|LABEL|> exclamation <|END|>
 49 | 
 50 | <|SENTENCE|> my name is indred cold <|LABEL|> other <|END|>
 51 | ```
 52 | 
 53 | ## Test Data
 54 | 
 55 | Test data was created with the same methodology as the training data with one final step added; trimming the label and end tag. For instance:
 56 | 
 57 | ```
 58 | <|SENTENCE|> why do you doubt your senses <|LABEL|> 
 59 | 
 60 | <|SENTENCE|> you receive stolen goods do you <|LABEL|> 
 61 | 
 62 | <|SENTENCE|> would you like to be taught latin <|LABEL|> 
 63 | 
 64 | <|SENTENCE|> what could be taking him so long <|LABEL|> 
 65 | 
 66 | <|SENTENCE|> that is unbelievable <|LABEL|> 
 67 | 
 68 | <|SENTENCE|> my name is indred cold <|LABEL|> 
 69 | ```
 70 | 
 71 | This format has been demonstrated to be effective with GPT-2. It leaves a space for GPT-2 to "fill in the blank". 
 72 | At inference time, GPT-2 is instructed to truncate (stop inference) at a newline. 
 73 | 
 74 | # Results
 75 | 
 76 | Even the smallest version of GPT-2 produced far better-than-random results. 
 77 | 
 78 | | Run | Model | Steps | Samples | Last Loss | Avg Loss | Accuracy |
 79 | |---|---|---|---|---|---|---|
 80 | | 01 | 124M | 2000 | 9000 | 0.07 | 0.69 | 71.4% |
 81 | 
 82 | The results speak for themselves!
 83 | 
 84 | # Limitations
 85 | 
 86 | - Training data used includes several examples of non-English text
 87 | - Sentence boundary disambiguation does not take dialog tags into account, such as `she said` or `he replied`
 88 | - Sentence boundary disambiguation also does not take consistently quotations into account 
 89 | 
 90 | # Discussion
 91 | 
 92 | This technology can be used to enhance chatbot functionality, allowing a dialog system to infer high level intent without the use of punctuation.
 93 | This could be doubly useful for ASR systems, such as those used for mobile devices or dictation software. Higher accuracy and more labels should be possible with better data,
 94 | such as with a Dialog Act corpus. There are a few dozen types of Dialog Acts in human conversation, each of which can better inform a dialog systems or chatbot. 
 95 | As with most problems in Machine Learning and Artificial Intelligence, the quality of results is wholly dependent upon the quality of data. 
 96 | Therefore, follow-up work should include better refinement of training data. 
 97 | 
 98 | 
 99 | 
100 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-11-24-parsing-all-wikipedia.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Parsing all of Wikipedia to an Offline Encyclopedia"
  4 | date: 2020-11-24
  5 | description: I want an offline encyclopedia and am extremely masochistic
  6 | categories: [Automation]
  7 | ---
  8 | 
  9 | # Background
 10 | 
 11 | I'm working on a project where I want to have an offline encyclopedia. It's for training deep learning networks so it needs to be in plain text English. 
 12 | No markup, no special characters. Human readable without any interpreters, renderers, or parsers. I've got plenty of disk space, so that's not a concern. 
 13 | Once I parse out all the Wikipedia articles I can get my hands on, I will create an index. Or I might index them into SOLR or something like that. Not sure yet.
 14 | I'm also going to implement it as a resource for a massive [SQUAD repo](https://towardsdatascience.com/building-a-question-answering-system-part-1-9388aadff507). I have uploaded a copy of the final result to [Kaggle Datasets](https://www.kaggle.com/ltcmdrdata/plain-text-wikipedia-202011). I've stashed the script in a [dedicated repo on GitHub](https://github.com/daveshap/PlainTextWikipedia). 
 15 | 
 16 | # WikiMedia Text
 17 | 
 18 | The first thing I should have learned is that Wikipedia is written in a demented Frankenstein language called WikeMedia Text. It's a hybrid of HTML/XML and Markdown. 
 19 | It has no consistency and is the worst example of spaghetti code I've ever seen. I'm sure there are better implementations today, but I can see how and why it ended up the way it did.
 20 | For instance, you need to be able to create robust references and links, so the URL syntax is way jacked. It relies on a lot of procedural generation at display time.
 21 | Personally, if it were done again today, I think something like Jekyll would be way better. Instead of rendering again and again every time someone visits a page, render it once after each edit.
 22 | But that's just me. So instead we're left with this horrible hybrid language that should die in a fire. 
 23 | 
 24 | Fine, it is what it is. I'm an expert automator, dammit, and if a machine can automatically render this nonsense, then I sure as hell can **unrender it**. 
 25 | 
 26 | ## Attempt 1 - Brute Force Regex
 27 | 
 28 | "Brute Force Regex" (BFR) is not a real thing. It's just something I've been doing for years now in my automation habits. Usually, as a naive approach, I'll try and do some 
 29 | search-and-replace jiggery pokery to just remove unwanted junk. Sometimes this is textual formatting, like brackets around tables or other HTML tags. 
 30 | So I ended up with the following function. Caution, it's not pretty. This was just an experiment, and I wanted to share it so you would see what doesn't work.
 31 | 
 32 | ```python
 33 | def basic_replacements(text):
 34 |     replacements = [
 35 |     ('<','<'),
 36 |     ('>','>'),
 37 |     ('"','"'),
 38 |     ("'''",' = '),
 39 |     ("'{2,}",''),
 40 |     ('\n',' '),
 41 |     (r'\n',' '),
 42 |     ('\\n',' '),
 43 |     ('r\\n',' '),
 44 |     ('',''),
 45 |     ('',''),
 46 |     ('http.*?\s',''),
 47 |     ('\s+',' '),
 48 |     ]
 49 |     text = text.replace('\\n',' ')
 50 |     for r in replacements:
 51 |         text = re.sub(r[0], r[1], text)
 52 |     return text
 53 | ```
 54 | 
 55 | I came up with this scheme because, at first glance, WikiMedia Text looked like a mixture of some basic HTML and some Markdown. 
 56 | I figured I could handle it with some basic regex replacements. This worked... to an extent. There were a few problems with it though.
 57 | 
 58 | 1. Couldn't handle nested square brackets or curly brackets, and it turns out there are a lot of those
 59 | 2. Quickly became intractable when I encountered escaped unicode literals like `\u2013`. They are frigging everywhere. 
 60 | 
 61 | So I wrote two more functions to try and tackle the bracketed stuff. These are things like links, citations, and pictures. Since I want a text-only Wikipedia, 
 62 | I really just needed to strip it all away. 
 63 | 
 64 | ```python
 65 | def remove_double_curly(text):
 66 |     while True:
 67 |         before = len(text)
 68 |         text = re.sub('{{[^{]*?}}', '', text) 
 69 |         after = len(text)
 70 |         if before == after:
 71 |             return text
 72 | 
 73 | 
 74 | def remove_double_brackets(text):
 75 |     while True:
 76 |         before = len(text)
 77 |         double_brackets = re.findall('\[\[.*?\]\]', text)
 78 |         for db in double_brackets:
 79 |             if '|' in db:
 80 |                 new = db.split('|')[-1].strip(']')
 81 |                 text = text.replace(db, new)
 82 |             else:
 83 |                 new = db.strip('[').strip(']')
 84 |                 text = text.replace(db, new)
 85 |         after = len(text)
 86 |         if before == after:
 87 |             return text
 88 | ```            
 89 | 
 90 | These functions worked-ish. You might notice the carat `^` in the curly function. This told it to match anything except another open curly bracket. This forced it to find the
 91 | innermost nested curly brackets. Again, this mostly worked, but it failed a few times and I gave up trying to figure out why. The square brackets are a bit different, as
 92 | they tend not to be nested but the inner syntax could be several different things. I opted for the simplest possible way and even so, it missed a few things. No idea why. 
 93 | 
 94 | ## Attempt 1.5 - Literal Evals
 95 | 
 96 | I suppose I should rewind and give some context. Wikipedia dump files are effing huge. Even with 32GB of RAM on my desktop, I was rapidly running out of memory just 
 97 | loading one chunk at a time. So that meant I had to read each file line by line. Like so:
 98 | 
 99 | ```python
100 | with open(file, 'r', encoding='utf-8') as infile:
101 |     for line in infile:
102 |         line = literal_eval(f'"""{line}"""')  # this works... sometimes
103 |         if '' in line:  # new article
104 |             article = ''
105 |         elif '' in line:  # end of article
106 |             article += line
107 | ```
108 | 
109 | This works great for just reading the thing one at a time. One consistency in the Wikipedia dumps is that every page starts and ends with `` and `` respectively.
110 | This served as a great demarcation. So I tried to handle the unicode literals as they were coming in with [ast.literal_eval](https://www.kite.com/python/docs/ast.literal_eval). 
111 | Spoiler: It worked. A little bit. This function frequently bombs out for various reasons.
112 | 
113 | ## Attempt 2 - Existing Parsers
114 | 
115 | I finally gave up on manually parsing WikiMedia Text and found some extant parsers. First up is [wikitextparser](https://pypi.org/project/wikitextparser/) which, as of this writing, is actively maintained.
116 | Second up was the simple [html2text](https://pypi.org/project/html2text/) which got some of the stuff the first missed. These premade parsers are great in that they 
117 | don't require me to use any of my own brain power! They are, however, far slower than my regex replace functions. It can't be avoided, though.
118 | 
119 | So now my output looks more like this:
120 | 
121 | ```json
122 | [
123 |  {
124 |   "id": "4413617",
125 |   "text": "The Samajtantrik Sramik Front is a national trade union federation in Bangladesh. It is affiliated with the World Federation of Trade Unions...",
126 |   "title": "Samajtantrik Sramik Front"
127 |  },
128 |  {
129 |   "id": "2618",
130 |   "text": "Aeacus (; also spelled Eacus; Ancient Greek: \u0391\u1f30\u03b1\u03ba\u03cc\u03c2 Aiakos or Aiacos) was a mythological king of the island of Aegina...",
131 |   "title": "Aeacus"
132 |  },
133 |  {
134 |   "id": "3201",
135 |   "text": "[[File:Global_Temperature_And_Forces.svg|thumb|upright=1.35|right|Observed temperature from NASA. vs the 1850\u20131900 average used by the IPCC as a pre- industrial baseline.. The primary driver for increased global temperatures in the industrial era is human activity, with natural forces adding variability. Figure 3.1 panel 2, Figure 3.3 panel 5.]] Attribution of recent climate change is the ",
136 |   "title": "Attribution of recent climate change"
137 |  },
138 | ]
139 | ``` 
140 | 
141 | It's much cleaner and moving in the right direction but I still have to figure out the literal eval reliably and a few square brackets are making it through as well. 
142 | These premade parsers are far slower but one advantage of cleaning up Wikipedia articles is that they end up far smaller without the markup. If you just want the accumulated 
143 | knowledge in plain text format, it ends up being a fraction of the size. 
144 | 
145 | # Work Continues...
146 | 
147 | I can tolerate a few aberrations here and there but the perfectionist in me wants to do better. Anyways, here's my script as it stands today:
148 | 
149 | ```python
150 | import re
151 | import os
152 | import json
153 | from uuid import uuid4
154 | import gc
155 | from html2text import html2text as htt
156 | import wikitextparser as wtp
157 | 
158 | 
159 | archive_dir = 'd:/WikipediaArchive/'
160 | dest_dir = 'D:/enwiki20201020/'
161 | chars_per_file = 40 * 1000 * 1000  # create a consistently sized chunk (~40MB each)
162 | 
163 | 
164 | def dewiki(text):
165 |     text = wtp.parse(text).plain_text()
166 |     text = htt(text)
167 |     text = text.replace('\\n',' ')
168 |     text = re.sub('\s+', ' ', text)
169 |     return text
170 |     
171 | 
172 | def analyze_chunk(text):
173 |     try:
174 |         if ':443/sdk` when logging into linked-mode vCenter
14 | - You may be missing one or more vCenters from the inventory tree, depending on which vCenter you log into
15 | 
16 | # Possible Causes
17 | 
18 | - vCenter services failing
19 | - Disk space low
20 | - Network/firewall blocking communication
21 | - Platform Services Controller misconfiguration: [VMware KB 2050273](https://kb.vmware.com/s/article/2050273)
22 | 
23 | You can check on VAMI (:5480) to verify health of services, database, and disk space.
24 | 
25 | # Diagnosis
26 | 
27 | Here are some commands you can run to see what vCenter is doing behind the scenes!
28 | 
29 | - `/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartnerstatus -h localhost -u administrator`
30 | - `/usr/lib/vmware-vmdir/bin/vdcadmintool` >> option 6
31 | - Check in VPXD logs
32 | - Check ping/telnet
33 | - More documentation here: [VMware KB 2127057](https://kb.vmware.com/s/article/2127057)
34 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-12-02-deep-learning-acceleration.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: default
 3 | title: "Deep Learning Acceleration: More and more disruption"
 4 | date: 2020-12-02
 5 | description: I feel like we're at a watershed moment for deep learning and the singularity is approaching faster
 6 | categories: [Deep-Learning, Singularity]
 7 | ---
 8 | 
 9 | # Acceleration
10 | 
11 | I keep my finger on the pulse of progress. Lately, I've seen a groundswell of articles with a common theme: Deep learning is disrupting everything and it is accelerating. 
12 | Here are some examples from just the last month:
13 | 
14 | - [AlphaFold solves protein folding](https://youtu.be/gg7WjuFs8F4)
15 | - [Caltech's Covid AI outperforms other models](https://www.caltech.edu/about/news/caltechs-ai-driven-covid-19-model-routinely-outperforms-competitors)
16 | - [Nvidia doubles RAM in A100 to 80gb](https://nvidianews.nvidia.com/news/nvidia-doubles-down-announces-a100-80gb-gpu-supercharging-worlds-most-powerful-gpu-for-ai-supercomputing)
17 | - [Cerebras CS-1 outperforms supercomputer](https://spectrum.ieee.org/tech-talk/computing/hardware/ai-system-beats-supercomputer-at-key-scientific-simulation)
18 | 
19 | All of this against the backdrop of AI regularly outperforming medical experts:
20 | 
21 | - [CNN outperforms 58 dermatologists on cancern diagnosis](https://www.pcrm.org/news/ethical-science/artificial-intelligence-outperforms-doctors-diagnosing-skin-cancer)
22 | - [AI outperforms six radiologists on breast cancer x-rays](https://www.bbc.com/news/health-50857759)
23 | 
24 | The hits just keep coming faster and faster. 
25 | 
26 | # Medical Implications
27 | 
28 | ## Genetic Screening on Steroids
29 | 
30 | Technologies like AlphaFold mean that we will soon be able to simulate the folding of every genetic variant in your genome. 
31 | Genes are long strings of DNA with polymorphisms - letters that have been swapped. We can read these markers now with genetic sequencing but we can only use statistics to 
32 | try and identify what the ramifications are. Services like 23andMe and Genomelink use crowdsourced research and statistical comparison to say what you are likely to have.
33 | While valuable, this methodology leaves a lot to be desired. It does not answer important questions like "Why?" - Why does one gene variant result in disease while another does not?
34 | In the past, every gene variant would require tremendous amount of investigation. Researchers would need to spend months or years in the lab to ascertain the ground truth; 
35 | how does one gene variant cause proteins to misfold? 
36 | 
37 | Protein folding is like nanomachine origami. If the mitochondria is the powerhouse of the cell, proteins are the building material and the tools that build the cell. 
38 | Due to the size and complexity of proteins, there are as many as [several billion](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4889822/) different kinds. 
39 | Investigating each one individually is an intractable problem. That becomes less true with AlphaFold. Furthermore, new processing technology like Nvidia's A100 and Cerebras CS-1 
40 | lower the barrier. AlphaFold-as-a-Service very well could be a thing within the next few months. Send in your DNA profile and it will infer all the protein misfolds in your entire genome.
41 | 
42 | ## Drug Candidates Galor
43 | 
44 | The cost of new drugs goes up [exponentially with time](https://www.biopharmadive.com/news/new-drug-cost-research-development-market-jama-study/573381/). That's a problem.
45 | Technologies like AlphaFold won't replace the entire drug pipeline, but it will allow drug companies to survey tens of thousands of candidates before every getting into the lab. 
46 | Beyond just identifying drug candidates way ahead of time, I anticipate that future iterations of AlphaFold will be able to simulate the interactions of multiple proteins. 
47 | The one-two punch of genetic sequencing and mastery of protein folding should result in the birth of truly personalized medicine. 
48 | 
49 | ## Chronic Disease is History
50 | 
51 | Age-related diseases are almost all caused by misfolding proteins. Alzheimer's, for example, is caused by plaques that accumulate due to an inability to clear them out. 
52 | The same is true of most cardiovascular disease - hence why some people are impervious while others suffer heart attacks and stroke before their fiftieth birthday. 
53 | Cancer, one of the scariest diseases of all, is no different. Fairly soon, I suspect we will learn to utilize the genetic machinery throughout our bodies to cure all age-related
54 | disease and possibly aging itself. None of these problems can be tackled by a single drug as they are multifaceted problems with complex interactions. Personalized medicine
55 | will be mandatory in order to even make use of these technologies. 
56 | 
57 | ## Healthcare Costs Under Control
58 | 
59 | > An ounce of prevention is worth a pound of cure.
60 | 
61 | But in the arena of modern medicine, it's usually orders of magnitude cheaper. Degenerative and age-related diseases are expensive. The most expensive diseases are 
62 | cardiovascular disease, cancer, and metabolic diseases such as diabetes and obesity. Imagine a world where a simple blood test allows the pharmacy to create a custom prescription
63 | for you and you know for a fact that you never have to worry about these diseases. Imagine - $10 a month for a customized pill that guarantees you'll never get 
64 | dementia, stroke, diabetes, or cancer.
65 | 
66 | The best crisis is the one that never happens. We are about to live through the biggest medical revolution since antibiotics and vaccines combined. 
67 | 
68 | # Broader Implications
69 | 
70 | The recent news proves that AI and Deep Learning are disruptive in major ways. These technologies are no longer limited to iterative improvements on image recognition and other 
71 | simpler problems. We're now seeing the needle move in big ways. It's only a matter of time before Deep Learning is fully cemented in as the next generation of numerous technologies.
72 | 
73 | 
74 | 


--------------------------------------------------------------------------------
/docs/_posts/2020-12-07-artificial-intuition.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Artificial Intelligence or Artificial Intuition?"
  4 | date: 2020-12-07
  5 | description: Intuition is a better description of deep neural networks
  6 | categories: [Deep-Learning, Singularity]
  7 | ---
  8 | 
  9 | # Intuition vs Intelligence
 10 | 
 11 | Intuition has the following definitions:
 12 | 
 13 | - The power or faculty of attaining to direct knowledge or cognition without evident rational thought and inference
 14 | - Immediate apprehension or cognition
 15 | - Quick and ready insight
 16 | - The ability to understand something immediately, without the need for conscious reasoning
 17 | 
 18 | These definitions could be describing deep learning and machine learning in general. To date, no AI model has demonstrated anything remotely close to thought or cognition. 
 19 | This despite the fact that AI has been able to solve incredibly complex problems. 
 20 | 
 21 | The definitions of intelligence are:
 22 | 
 23 | - The ability to acquire and apply knowledge and skills, to understand
 24 | - The ability to learn or understand or to deal with new or trying situations
 25 | - The ability to apply knowledge to manipulate one's environment or to think abstractly as measured by objective criteria (such as tests)
 26 | 
 27 | I can imagine an argument that GPT-3 is getting closer to the ability to improvise. 
 28 | We are working towards "zero shot" task success, where we have a pretrained model that is capable of handling a variety of tasks it was not explicitly trained for. 
 29 | In the most basic terms this is called **generalization**. Does that equate to Artificial **General** Intelligence? In my opinion, no. 
 30 | 
 31 | If GPT-3 spits out an interesting article about life and death, you cannot ask GPT-3 why it wrote what it did. 
 32 | It has no memory of its past emissions and no concept of you as a human. It is merely an intuitive machine that has the ability to regurgitate fascinating patterns that happen to have meaning to us. 
 33 | 
 34 | To play devil's advocate with myself, you can show GPT-3 entirely new sets of information and problems and it seems to be able to learn rather quickly. This is not so different from humans. 
 35 | With instruction, demonstration, we can rapidly gain new abilities. But still, GPT-3 demonstrates no obvious ability to reflect on what it's learned or explain why it can do what it does.
 36 | 
 37 | # The need for explicability
 38 | 
 39 | > You do not truly understand something unless you can explain it to your grandmother
 40 | 
 41 | This quote is often misattributed to Einstein but it's more likely to have been said in some other permutation by Lord Nelson. The point remains; if you can't explain something,
 42 | the value of your knowledge and ability is diminished. That's not to say that intuition is worthless - on the contrary - we rely heavily on intuition all day every day.
 43 | Consider the words you're reading on this blog. Do you have to examine each letter to extract meaning and consciously assemble words and sentences and rationally parse out my intent?
 44 | Absolutely not. You can read this at lightning speed by relying on your intution. Your eyes and brain scan the page and ingest the meaning and it just sort of pops into your head.
 45 | At least, that's what reading is like for me. Only when we read very difficult or novel material do we have to fully engage our conscious thought to parse and integrate new meaning. 
 46 | 
 47 | > Hey GPT-3 can you explain why you said that?
 48 | 
 49 | We instinctively want to treat anything intelligent with agency. Asking GPT-3 why it believes what it said is anthropomorphizing a collection of weights and biases and connections. 
 50 | This implies that we expect intelligent agents to remember their own history, thus being able to account for what they said and why. Humans are fallible in this realm as well. 
 51 | We often act on instinct and it often requires deep introspection to truly understand why we do and say the things we say. I would go so far as to say that many people are 
 52 | totally unable to ascertain the truth of their beliefs due to the complex interaction of identity, emotions, and neurology. We often have different competing ideas about ourselves
 53 | and the world in which we live, and different regions of the brain are responsible for integrating all of these different parts.
 54 | 
 55 | GPT-3 has no such machinery or ability. No neural network has. 
 56 | 
 57 | Tesla autopilot merely records observations and decisions. This is, in effect, a record explaining why accidents and mishaps happen. But this data requires expert interpretation. 
 58 | The vehicle has no agency or accountability unto itself.
 59 | 
 60 | # The need for agency
 61 | 
 62 | Neither GPT-3 nor Tesla can decide of their own accord to go learn new things or test their own ability. They are idle tools that only respond to human requests. 
 63 | We can subject a deep neural network to an image or body of text and it will produce some kind of output and then it will go dormant again.
 64 | 
 65 | This, of course, has been the nature of machines and tools forever. From sticks and stone tools to assembly line robots, the objects and software we create has no agency, no self-direction.
 66 | The first person killed by a robot was a man named Robert Williams at a Ford plant in Michigan. Do you punish the robot? Decomission it? Reprogram it?
 67 | 
 68 | King Xerxes once had the ocean whipped for destroying his bridge. The Code of Hamurabi said that if a building collapses and kills people, the builder should then be put to death. 
 69 | In the first case, Xerxes may have wanted to whip the gods responsible for the wind and the waves, and settled for the next best thing. Hamurabi stayed a bit more grounded, 
 70 | asserting the engineer responsible for the deadly disaster was at fault. 
 71 | 
 72 | All these are fine philosophical and legal questions but I'm not as concerned about those. I'm strictly concerned about intelligence.
 73 | 
 74 | > Is agency necessary for intelligence?
 75 | 
 76 | My intuition says yes. Before I can consider a machine truly intelligent, I feel like it has to keep track of some kind of personal narrative. It has to remember what it did and said and why.
 77 | It also needs to remember all of its interactions with me. Lastly, I feel like it needs a certain degree of autonomy, the ability to self-direct and explore. 
 78 | 
 79 | # Isn't that just sentience? Or consciousness?
 80 | 
 81 | The strictest definition of consciousness is:
 82 | 
 83 | > The state of being awake and aware of one's surroundings.
 84 | 
 85 | I would argue that a Tesla satisfies this definition. But so what? A Tesla is a lot more than just a deep neural network. It is a collection of hardware and software. 
 86 | It relies on a large number of neural networks and other kinds of adaptive models. To borrow from machine learning verbiage: it is an **ensemble**. Collectively, a Tesla 
 87 | can solve simple problems with autonomy. It can negotiate roads and traffic and can navigate from one place to another. 
 88 | It also records its sensory information as well as its internal state, its evaluations and reasons for actions. For these reasons, I would suggest that a Tesla is a far more advanced
 89 | robot than GPT-3. 
 90 | 
 91 | As sophisticated as a self-driving Tesla is, it still only relies on the intuitions of deep neural networks. Cameras deliver images to a collection of software models that build 3D
 92 | maps using SLAM and object detection inferences to identify cars, pedestrians, and signs. These services each are nearly instant and have no agency. Not unlike your ability to read this page. 
 93 | 
 94 | # The line begins to blur
 95 | 
 96 | What sets us apart from intelligent machines? Couldn't you slap a metacognitive service into a Tesla with a voice interface and ask it why it was driving the way it did? 
 97 | This concept of explicability has been fictionalized in Westworld. Each of the Hosts has the ability to explain their behavior. To be fair, this was also explored years ago
 98 | in Star Trek TNG's episode *The Measure of a Man* where Data was subjected to philosophical scrutiny. 
 99 | 
100 | But I digress. 
101 | 
102 | > My chief point here is that deep neural networks are merely intution machines. Not intelligent machines. 
103 | 
104 | 


--------------------------------------------------------------------------------
/docs/_posts/2021-02-05-rundeck-acl.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Rundeck ACL hell"
  4 | date: 2021-02-05
  5 | description: Rundeck CE ACLs are byzantine
  6 | categories: [Rundeck, Automation, KB]
  7 | ---
  8 | 
  9 | # Rundeck ACL YAML files
 10 | 
 11 | Even looking at the templates is not very helpful. This is one place where Rundeck's documentation still leaves a bit to be desired.
 12 | 
 13 | ## Projects vs System
 14 | 
 15 | The first thing to know is that Rundeck ACLs have two primary scopes: the projects and the Rundeck system itself. These are demarced by `context: application: 'rundeck'` and `context: project: '.*'`
 16 | 
 17 | ## Default Admin ACL
 18 | 
 19 | I have found that the default admin ACL is the most reliable. Fiddling with other stuff just breaks things. This is probably a Layer 8 issue, though. Because of this, I have learned to base all my other ACLs from this template. 
 20 | 
 21 | ```yaml
 22 | description: Admin access to PROJECTS
 23 | context:
 24 |   project: '.*' 
 25 | for:
 26 |   resource:
 27 |     - allow: '*' 
 28 |   adhoc:
 29 |     - allow: '*' 
 30 |   job: 
 31 |     - allow: '*' 
 32 |   node:
 33 |     - allow: '*' 
 34 | by:
 35 |   username:
 36 |     - user.name  # case sensitive
 37 |   group:
 38 |     - admin
 39 |     - group.name  # case sensitive
 40 | 
 41 | ---
 42 | 
 43 | description: Admin access to RUNDECK system
 44 | context:
 45 |   application: 'rundeck'
 46 | for:
 47 |   resource:
 48 |     - allow: '*' 
 49 |   project:
 50 |     - allow: '*' 
 51 |   project_acl:
 52 |     - allow: '*' 
 53 |   storage:
 54 |     - allow: '*' 
 55 | by:
 56 |   username:
 57 |     - user.name  # case sensitive
 58 |   group:
 59 |     - admin
 60 |     - group.name  # case sensitive
 61 | ```
 62 | 
 63 | ## Global Read/Run ACL
 64 | 
 65 | Obviously, you don't want to give admin privileges to everyone so you'll want at least one more policy to allow some folks to run jobs but not make any other changes. Here's an example that I got working.
 66 | 
 67 | ```yaml
 68 | description: Read/Run access to all PROJECTS
 69 | context:
 70 |   project: '.*' 
 71 | for:
 72 |   resource:
 73 |     - allow: [read,run,refresh]
 74 |   adhoc:
 75 |     - allow: [read,run,refresh]
 76 |   job: 
 77 |     - allow: [read,run,refresh]
 78 |   node:
 79 |     - allow: [read,run,refresh]
 80 | by:
 81 |   username:
 82 |     - user.name  # case sensitive
 83 |   group:
 84 |     - group.name  # case sensitive
 85 | 
 86 | ---
 87 | 
 88 | description: Read access to RUNDECK system
 89 | context:
 90 |   application: 'rundeck'
 91 | for:
 92 |   resource:
 93 |     - allow: read
 94 |   project:
 95 |     - allow: read
 96 |   project_acl:
 97 |     - allow: read
 98 |   storage:
 99 |     - allow: read
100 | by:
101 |   username:
102 |     - user.name  # case sensitive
103 |   group:
104 |     - group.name  # case sensitive
105 | ```    
106 | 
107 | ## Read/Run access to specific project
108 | 
109 | Let's say you want to give one person or team access to a specific project to run jobs. This is pretty common. In one example, I needed to give DBAs access to take storage snapshots. So I created a self service project for DBAs and gave them exclusive access.
110 | 
111 | ```yaml
112 | description: Read/Run access to self service
113 | context:
114 |   project: 'DBA_Self_Service'  # case sensitive, put your project here
115 | for:
116 |   resource:
117 |     - allow: [read,run,refresh]
118 |   adhoc:
119 |     - allow: [read,run,refresh]
120 |   job: 
121 |     - allow: [read,run,refresh]
122 |   node:
123 |     - allow: [read,run,refresh]
124 | by:
125 |   username:
126 |     - user.name  # case sensitive
127 |   group:
128 |     - group.name  # case sensitive
129 | 
130 | ---
131 | 
132 | description: Read access to RUNDECK system
133 | context:
134 |   application: 'rundeck'
135 | for:
136 |   resource:
137 |     - allow: read
138 |   project:
139 |     - allow: read
140 |   project_acl:
141 |     - allow: read
142 |   storage:
143 |     - allow: read
144 | by:
145 |   username:
146 |     - user.name  # case sensitive
147 |   group:
148 |     - group.name  # case sensitive
149 | ```
150 | 
151 | # Caveat
152 | 
153 | I'm not an expert. One flaw with this scheme is that anyone who logs in will see all projects listed on the left-hand column. Even so, they will be empty. I tried restricting it further but that just broke it. I don't know if I did it wrong or if it's a bug. As I said, I've been fiddling with ACLs and found this is the most reliable way so far. If I figure out a better scheme, I'll post an update. 
154 | 
155 | ## LDAP is case sensitive
156 | 
157 | Active Directory and Windows might not be case sensitive, but LDAP authentication is! I just found this out the hard way... 
158 | 
159 | ## Global read policy
160 | 
161 | You could probably break out the global RUNDECK system read ACL into its own policy instead of recreating it. I've seen that before. Something like "Domain Users" is granted global read. However, this is not necessarily best practice because you don't want Joe Shmoe to see the results of every other rundeck job. 
162 | 
163 | 
164 | 
165 | 
166 | 


--------------------------------------------------------------------------------
/docs/_posts/2021-02-07-gamestop-laughing-man.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: default
 3 | title: "GameStop and the Laughing Man"
 4 | date: 2021-02-06
 5 | description: Ghost in the Shell predicted the GameStop fiasco
 6 | categories: [Singularity]
 7 | ---
 8 | 
 9 | Nothing in this article constitutes financial advice. This is a technological evaluation of recent events. I am taking no legal, financial, or ethical stance on these events. 
10 | 
11 | # GameStop: Reddit vs The Establishment
12 | 
13 | Institutional investors (hedge funds) decided to "short" GameStop. Shorting a stock is betting that it will lose. Most of the time, when you gamble on something, you're going to try and bet on the winner. Shorting, in essence, is betting on losers. Simple. 
14 | 
15 | Reddit caught wind of this gamble by hedge funds. There's a catch: if the stock doesn't lose, the hedge funds are out a lot of money. Reddit decided to bid up the price so that the hedge funds would lose their bets. It seems to have worked. 
16 | 
17 | Melvin Capital, one of the major hedge funds involved, reportedly lost 53% in January. I do not have primary sources on that so take it with a grain of salt. So what happened to that money? Anyone who made money on GameStop in January likely benefitted from Melvin Capital's losses. 
18 | 
19 | Reddit, however, is not a monolithic entity. Reddit is a website driven by user content. Reddit just so happens to be the platform on which most of the organization of this scheme occurred. Other platforms were involved, such as Discord and likely many other private messaging platforms. I have no idea if this represents legal exposure to Reddit. Time will tell. 
20 | 
21 | I suspect the SEC does not have unilateral authority to go after Reddit due to existing Internet protections. Congress, however, might amend those laws now that it has been proven that literally billions of dollars are on the line. Again, time will tell. 
22 | 
23 | # The Swarm
24 | 
25 | Reddit was merely the most publicly visible face of this movement. The movement, however, was composed of millions of individuals. These individuals spontaneously decided to get involved and, without any central leadership, concocted a plan. 
26 | 
27 | The idea of decentralized leadership has been a topic of study for a long time. In the animal kingdom, hive insects such as ants and bees can make complex decisions with no central authority or planning committee. Humans, however, have developed structured methods of making decisions, from command structures to democratic voting. 
28 | 
29 | The Internet presents a unique opportunity for spontaneous organization. These knots of organization emerge when there is consensus amongst enough individuals. 
30 | 
31 | # The Laughing Man
32 | 
33 | The Laughing Man was a plot arc in a TV show called Ghost in the Shell. This show takes place in an imagined cyberpunk future and follows an elite counter-terrorism squad called Section 9. In the first season of this show, Section 9 investigates a series of seemingly unrelated crimes.
34 | 
35 | These unrelated crimes all have one thing in common: they make reference to The Laughing Man. The Laughing Man originated during a single event where a CEO of a pharmaceutical company was held hostage in public. The attacker hacked into news feeds and replaced his face with The Laughing Man logo. 
36 | 
37 | Due to the broad audience, the image of The Laughing Man created a meme. It should be noted that Ghost in the Shell was released before the concept of internet memes was broadly known. The Laughing Man meme became an avatar around the idea of anger against corruption of Big Pharma. 
38 | 
39 | Internet forums and chatrooms - much like Reddit - got ahold of the idea of The Laughing Man and ran away with speculation and conspiracy theories. Over time, participants in the discussions arrived at a consensus about who and what The Laughing Man was. This consensus ultimately resulted in copycats.
40 | 
41 | Copycats emerged spontaneously even though the original idea had faded. The original participant was long gone. Philosophically, the show concludes that The Laughing Man phenomenon was a series of copies without a true original - a Stand Alone Complex. 
42 | 
43 | # Copycats without an original
44 | 
45 | It's entirely possible that the idea of bidding up GameStop is a real-life example of a Stand Alone Complex. It was an idea that emerged spontaneously and evolved, gaining more and more copycats. In this case, the lure was easy access to money. This creates a lot of intrinsic motivation for copycats and participants to join. 
46 | 
47 | 


--------------------------------------------------------------------------------
/docs/_posts/2021-02-19-human-scale-dnn-2030.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Prediction: Human-equivalent neural networks by 2030"
  4 | date: 2021-02-19
  5 | description: Neural network sizes are doubling multiple times per year right now
  6 | categories: [Singularity]
  7 | ---
  8 | 
  9 | # Moore's Law, but on Spice
 10 | 
 11 | GPT2's final update came in November 2019 and had 1.5B parameters. By comparison, GPT3 was released 6 months later (May 2020) with 175B parameters. 
 12 | A few months prior, Microsoft released a large transformer (Turing-NLG) with 17B parameters. This growth rate is unprecedented and, likely, is completely unsustainable. 
 13 | 
 14 | But along came Google. 
 15 | 
 16 | In January 2021, they released their Switch Transformer with 1.6T parameters (1600B). That's a growth curve of roughly 10x every 3 months. 
 17 | If that pattern continues (which I doubt it will), we're looking at 1600T parameters by the end of 2021. 
 18 | This parabolic growth would probably consume more electricity than the entire human race is capable of generating.
 19 | 
 20 | I suspect that this jump in parameters was largely driven by a few technological breakthroughs with neural architectures and massively distributed computing. Once the novelty
 21 | of those breakthroughs wears off, we'll see a much gentler growth. 
 22 | 
 23 | # Human Brain Equivalent?
 24 | 
 25 | The human brain has roughly 100 billion neurons. Each neuron has roughly 7000 synapses. If (and this is a big 'if') each synaptic connection in the brain is roughly equivalent to
 26 | one parameter in a Deep Neural Network, then the human brain has roughly 700T parameters. We'll get to architecture in a minute, but suffice to say, I don't think the human brain is a transformer.
 27 | 
 28 | These are some really broad, bold assumptions on my part. Please understand that they come from a place of WAGs and good intentions. How many deep neural network parameters does it take
 29 | to equate to the processing power of a single synapse? I have no idea. I could be off by an order of magnitude in either direction. But I'm about to demonstrate why that doesn't matter too much.
 30 | 
 31 | # Growth Rate
 32 | 
 33 | We've been seeing 1000x growth for a couple years running now. So cutting that down to 2x annual growth should be reasonable. 
 34 | 
 35 | | Parameters (T) | Year |
 36 | |---|---|
 37 | | 1.6 | 2021 |
 38 | | 3.2 | 2022 |
 39 | | 6.4 | 2023 |
 40 | | 12.8 | 2024 |
 41 | | 25.6 | 2025 |
 42 | | 51.2 | 2026 |
 43 | | 102.4 | 2027 |
 44 | | 204.8 | 2028 |
 45 | | 409.6 | 2029 |
 46 | | 819.2 | 2030 |
 47 | | 1,638.4 | 2031 |
 48 | | 3,276.8 | 2032 |
 49 | | 6,553.6 | 2033 |
 50 | | 13,107.2 | 2034 |
 51 | | 26,214.4 | 2035 |
 52 | | 52,428.8 | 2036 |
 53 | | 104,857.6 | 2037 |
 54 | 
 55 | As you can see, even if we average only a modest growth rate, compared to our current trends, we should exceed the 700T mark by 2030. 
 56 | Even if I'm off by an order of magnitude, it will only take a few more years to catch up.
 57 | 
 58 | Now let's see what that looks like at 4x annual growth (still a far cry of the current 1000x annual growth).
 59 | 
 60 | | Parameters (T) | Year |
 61 | |---|---|
 62 | | 1.6 | 2021 |
 63 | | 6.4 | 2022 |
 64 | | 25.6 | 2023 |
 65 | | 102.4 | 2024 |
 66 | | 409.6 | 2025 |
 67 | | 1,638.4 | 2026 |
 68 | | 6,553.6 | 2027 |
 69 | | 26,214.4 | 2028 |
 70 | | 104,857.6 | 2029 |
 71 | | 419,430.4 | 2030 |
 72 | | 1,677,721.6 | 2031 |
 73 | | 6,710,886.4 | 2032 |
 74 | | 26,843,545.6 | 2033 |
 75 | | 107,374,182.4 | 2034 |
 76 | | 429,496,729.6 | 2035 |
 77 | | 1,717,986,918 | 2036 |
 78 | | 6,871,947,674 | 2037 |
 79 | 
 80 | 
 81 | # Why Moore's Law Doesn't Matter (much)
 82 | 
 83 | Moore's Law looks at the transistor count on a single processor wafer.
 84 | 
 85 | We haven't used single processors in more than a decade. If you want more CPU horsepower, you can throw in more processors or more cores. 
 86 | OpenAI, IBM, Google, and Microsoft have been distributing their workloads across thousands of dedicated machines. This is nothing but good old fashioned clustering.
 87 | Along comes Cerebras and they overcome a design barrier and stuff 400,000 cores on a single wafer. Sure, it's a CPU that takes 20kW of energy, but the equivalent
 88 | computational power in Google's datacenter would take closer to a MW of juice. 
 89 | 
 90 | Companies like Nvidia, Google, and Cerebras are investing deeply in AI-specific hardware. A general purpose Intel CPU can't hold a candle to these AI-optimized beasts.
 91 | This is another reason that Moore's Law doesn't really represent a limiting factor. Optimization trumps brute force. We're now arriving at the time of refinement, nuance, and mastery.
 92 | 
 93 | # The Last Problem: Architecture
 94 | 
 95 | The transformer architecture has demolished all previous limitations. OpenAI is turning the power of GPT3 to other domains, including visual and audio. Will the transformer be the basis of True Cognition?
 96 | 
 97 | I doubt it. 
 98 | 
 99 | Will the achievement of transformer architectures contribute greatly to achieving true machine cognition? Absolutely. There are, however, still many problems that neural networks today haven't even begun to tackle, specific kinds of cognitive tasks.
100 | 
101 | ## Short and Mid Term Memory
102 | 
103 | The greatest limitation of neural networks today is that they have no short term memory. The exception would be RNNs but their memory is so short-term that they can't even remember what they were talking about by the end of a few paragraphs.
104 | 
105 | If you set up a chat with GPT3 and tell it about your day, it will have no memory of that conversation unless you feed it back in at next instantiation. You can fine-tune these networks with one-shot training and zero-shot inference. 
106 | But that's a far cry from spontaneous human memory. 
107 | 
108 | GPT3 is effectively only extremely short-term memory and long term memory.
109 | 
110 | ## Metacognition
111 | 
112 | We humans can "think about our thought". We can play with our thoughts internally, mulling ideas and mental approaches. We can deliberately stop and think about something, and more importantly, we can ponder our own thought process.
113 | 
114 | Metacognition is, in my estimation, one of the most critical functions for true intelligence. Right now, deep neural networks spit out answers without any reasoning. It's pure intuition. GPT3 has no way to explain why it says what it says. 
115 | There's no debating its output, as it has no grasp of logic or reasoning.
116 | 
117 | ## Goal tracking
118 | 
119 | Human brains can spontaneously imagine a future goal state and continuously monitor our progress towards that goal. We can integrate feedback in real time. This is not something that has been demonstrated, to my knowledge, in neural networks.
120 | 
121 | ## Theory of Mind
122 | 
123 | I can take a look at my friends and listen to a few sentences and get an accurate idea of their emotional and mental state. I can remember what they know, believe, and like. 
124 | 
125 | Most importantly, I can use this information to anticipate their behaviors and reactions. I can predict how best to interact with them. 
126 | 
127 | Yes, if you've watched The Social Dilemma or The Great Hack then you know that Facebook and Google absolutely can model your mind. But these technologies have not been integrated into the giant language models, such as GPT3. 
128 | 
129 | ## Energy
130 | 
131 | The human brain consumes about 10W of energy. Cerebras' chip consumes 20kW. None of this will be practical or commercially viable until those power requirements come down.
132 | 
133 | ## Commercial Lag
134 | 
135 | Supercomputers and massive clusters are about 10 years ahead of commercial deployments, which are again about 10 years ahead of consumer-grade computers. My desktop PC is an order of magnitude more powerful than Deep Blue was back in 1998. 
136 | 
137 | So Microsoft, IBM, and Google will be able to run human-scale networks by 2030. Your average company can run their own by 2040, and then everyone can have a digital human brain on their laptop by 2050.
138 | 
139 | ## Tying it all together
140 | 
141 | We can slice and dice neural networks. This is called transfer learning and it has been around for a while. I suspect that human-level cognition will emerge, partly, through composing and compiling pieces of specifically trainined neural networks, stitching them together like a Frankenstein's monster of neural networks.
142 | 
143 | I also suspect that there will be other architectures that are more decentralized. Why compose a monolithic neural network when you can have smaller networks stitched together with middleware? But then again, I'm a systems engineer by trade. 
144 | 
145 | # The Implications
146 | 
147 | Let's say we get human-level intelligence by 2030. What then? I think the biggest impact will be to economics and jobs. We suddenly may find ourselves in the midst of the end of work as we know it. 
148 | 
149 | Capitalism is the inexorable and ruthless search for efficiency. As an engineer, I am all about some efficiency. I have taken countless processes in my career and automated them away. Automation is faster, cheaper, and more reliable. 
150 | 
151 | But what happens when a machine can learn to do anything that any human can do? What happens when you can copy/paste that machine infinitely? What happens when you can sell that machine's service online within a year?
152 | 
153 | OpenAI and Microsoft are working on monetizing GPT3 via API. Companies are going to figure out how to use that technology and then GPT4 is going to come out, as well as numerous other competitors. Companies will have already figured out how to use AI.
154 | 
155 | Imagine this: OpenAI comes out with GPT4 and releases dozens of specialized versions. Some help out with biotech or chemistry. The folks over at Volkswagen are working on their next gen battery and so they upload some formulas and specs to GPT4 and it spits out some testable results. Suddenly, GPT4 has done months worth of work in seconds. VW then goes and tests the results in a lab and then the following year their batteries are 20% better.
156 | 
157 | Imagine this: Google Brain releases a medical diagnostic bot in 2025 that can consume any genetic data, medical test, or medical image and recommend followup tests. It gives a diagnosis with 99.99% accuracy and works tirelessly, around the clock. Not only does it diagnose symptomatic problems, it predicts problems up to 20 years in advance. 
158 | 
159 | These tools will become cheaper, faster, and more reliable than humans. At that point, any and every company still around will make the financial calculation and switch. 
160 | 
161 | 
162 | 
163 | 
164 | 
165 | 


--------------------------------------------------------------------------------
/docs/_posts/2021-03-12-connect-ucs-powershell.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: default
 3 | title: "Connect to UCSM with PowerShell and specify your domain"
 4 | date: 2021-03-12
 5 | description: UCSM requires a particular syntax to connect
 6 | categories: [PowerShell, KB, UCS]
 7 | ---
 8 | 
 9 | # Symptom
10 | 
11 | You are unable to connect to a UCS instance by passing a PowerShell credential object.
12 | 
13 | # Cause
14 | 
15 | You must specify the authentication scope in the username of the PowerShell credential object.
16 | 
17 | # Solution
18 | 
19 | Prepend your PowerShell credential username with the following:
20 | 
21 | - `ucs-local\` for local auth
22 | - `ucs-\` for LDAP/AD auth
23 | 
24 | Examples:
25 | 
26 | - `ucs-local\admin` for local admin
27 | - `ucs-RATPACK\frank.sinatra` for AD auth
28 | 
29 | If you use `Get-Credential` in a script, you can remind yourself by using the prompt:
30 | 
31 | ```powershell
32 | $creds = Get-Credential -Message "UCS Auth" -UserName "ucs-\"
33 | Connect-Ucs -Name ucsm.contoso.com -Credential $creds
34 | ```
35 | 


--------------------------------------------------------------------------------
/docs/_posts/2021-03-12-install-powershell-modules.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Installing PowerShell modules behind corporate proxy"
  4 | date: 2021-03-12
  5 | description: WARNING! Unable to find module repositories! No match was found for the specified search criteria and module name 'PowerShellGet'!
  6 | categories: [PowerShell, KB]
  7 | ---
  8 | 
  9 | # Symptoms
 10 | 
 11 | You see this kinda thing:
 12 | 
 13 | ```powershell
 14 | PS C:\WINDOWS\system32> Register-PSRepository -Default -verbose
 15 | VERBOSE: Performing the operation "Register Module Repository." on target "Modul
 16 | e Repository 'PSGallery' () in provider 'PowerShellGet'.".
 17 | 
 18 | PS C:\WINDOWS\system32> Get-PSRepository
 19 | WARNING: Unable to find module repositories.
 20 | ```
 21 | 
 22 | NOTE: The above commands should look more like this:
 23 | 
 24 | ```powershell
 25 | PS C:\WINDOWS\system32> Register-PSRepository -Default -verbose
 26 | VERBOSE: Performing the operation "Register Module Repository." on target "Modul
 27 | e Repository 'PSGallery' () in provider 'PowerShellGet'.".
 28 | VERBOSE: Repository details, Name = 'PSGallery', Location = 'https://www.powersh
 29 | ellgallery.com/api/v2'; IsTrusted = 'False'; IsRegistered = 'True'.
 30 | 
 31 | PS C:\WINDOWS\system32> Get-PSRepository
 32 | 
 33 | Name                      InstallationPolicy   SourceLocation                                                                
 34 | ----                      ------------------   --------------                                                                
 35 | PSGallery                 Untrusted            https://www.powershellgallery.com/api/v2                                      
 36 | ```
 37 | 
 38 | Other symptoms:
 39 | 
 40 | ```powershell
 41 | PS C:\WINDOWS\system32> Install-Module -Name PowerShellGet -Scope AllUsers -Confirm:$false -Force -AllowClobber
 42 | PackageManagement\Install-Package : No match was found for the specified 
 43 | search criteria and module name 'PowerShellGet'. Try Get-PSRepository to see 
 44 | all available registered module repositories.
 45 | At C:\Program 
 46 | Files\WindowsPowerShell\Modules\PowerShellGet\1.0.0.1\PSModule.psm1:1809 
 47 | char:21
 48 | + ...          $null = PackageManagement\Install-Package @PSBoundParameters
 49 | +                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 50 |     + CategoryInfo          : ObjectNotFound: (Microsoft.Power....InstallPacka 
 51 |    ge:InstallPackage) [Install-Package], Exception
 52 |     + FullyQualifiedErrorId : NoMatchFoundForCriteria,Microsoft.PowerShell.Pac 
 53 |    kageManagement.Cmdlets.InstallPackage
 54 | ```
 55 | 
 56 | # Solution
 57 | 
 58 | ## Setup Proxy
 59 | 
 60 | First you'll need to tell your PowerShell session to use your proxy. You might also need to change the TLS protocol that's accepted. Basically, what's happening above is just a bad error message. It's not verbose enough to say "HTTPS failure" or anything like that.
 61 | 
 62 | ```powershell
 63 | $proxy = ''  # update this
 64 | [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
 65 | [system.net.webrequest]::defaultwebproxy = new-object system.net.webproxy($proxy)
 66 | [system.net.webrequest]::defaultwebproxy.credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials
 67 | [system.net.webrequest]::defaultwebproxy.BypassProxyOnLocal = $true
 68 | ```
 69 | 
 70 | ## Update Package Providers
 71 | 
 72 | This is similar-ish to `sudo apt-get update`. The first line gives you the package provider itself, the second registers PSGallery and the third installs PowerShellGet, which is basically an installer that a lot of packages use.
 73 | 
 74 | ```powershell
 75 | Install-PackageProvider -Name nuget -Scope AllUsers -Confirm:$false -Force -MinimumVersion 2.8.5.201
 76 | Register-PSRepository -Default -verbose
 77 | Install-Module -Name PowerShellGet -Scope AllUsers -Confirm:$false -Force -AllowClobber -MinimumVersion 2.2.4 -SkipPublisherCheck
 78 | ```
 79 | 
 80 | ## Close PowerShell
 81 | 
 82 | I have no idea why this is required. But trust me, you'll continue to get weird errors on some of your installations, such as the following:
 83 | 
 84 | ```powershell
 85 | PS C:\Windows\system32> Install-Module -Name Cisco.UCS.Core 
 86 | WARNING: The specified module 'Cisco.UCS.Core' with PowerShellGetFormatVersion '2.0' is not supported by the current version of PowerShellGet. Get the latest version of the PowerShellGet module to install this module, 'Cisco.UCS.Core'.
 87 | ```
 88 | 
 89 | Once again, this is a sign of bad error messages. But what else is new from Microsoft? This has been the case for 20+ years. 
 90 | 
 91 | ## Install other modules
 92 | 
 93 | Now you should be able to install whatever else you need. Note, you may need to always run the first block before subsequent updates and such. But first you need to rerun the security stuff at the beginning.
 94 | 
 95 | ```powershell
 96 | Install-Module -Name VMware.PowerCLI -Scope AllUsers -Confirm:$false -Force -AllowClobber
 97 | Install-Module -Name Cisco.UCS.Core -Scope AllUsers -Confirm:$false -Force -AllowClobber -AcceptLicense
 98 | Install-Module -Name Cisco.UCSManager -Scope AllUsers -Confirm:$false -Force -AllowClobber -AcceptLicense
 99 | Install-Module -Name Az -Scope AllUsers -Confirm:$false -Force -AllowClobber
100 | ```
101 | 
102 | # Bringing it all together
103 | 
104 | Here's how it looks if you want to run it as a single script. I like to do this and save it in a github repo so I can quickly install stuff on servers/laptops without remembering all this nonsense. 
105 | 
106 | Except you can't run it all together because for some reason you must completely close PowerShell before installing more modules. >:|
107 | 
108 | ## Script 1
109 | 
110 | ```powershell
111 | $proxy = ''  # update this
112 | [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
113 | [system.net.webrequest]::defaultwebproxy = new-object system.net.webproxy($proxy)
114 | [system.net.webrequest]::defaultwebproxy.credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials
115 | [system.net.webrequest]::defaultwebproxy.BypassProxyOnLocal = $true
116 | 
117 | Install-PackageProvider -Name nuget -Scope AllUsers -Confirm:$false -Force -MinimumVersion 2.8.5.201
118 | Register-PSRepository -Default -verbose
119 | Install-Module -Name PowerShellGet -Scope AllUsers -Confirm:$false -Force -AllowClobber -MinimumVersion 2.2.4 -SkipPublisherCheck
120 | 
121 | exit
122 | ```
123 | 
124 | ## Script 2
125 | 
126 | ```powershell
127 | $proxy = ''  # update this
128 | [Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
129 | [system.net.webrequest]::defaultwebproxy = new-object system.net.webproxy($proxy)
130 | [system.net.webrequest]::defaultwebproxy.credentials = [System.Net.CredentialCache]::DefaultNetworkCredentials
131 | [system.net.webrequest]::defaultwebproxy.BypassProxyOnLocal = $true
132 | 
133 | Install-Module -Name VMware.PowerCLI -Scope AllUsers -Confirm:$false -Force -AllowClobber
134 | Install-Module -Name Cisco.UCS.Core -Scope AllUsers -Confirm:$false -Force -AllowClobber -AcceptLicense
135 | Install-Module -Name Cisco.UCSManager -Scope AllUsers -Confirm:$false -Force -AllowClobber -AcceptLicense
136 | Install-Module -Name Az -Scope AllUsers -Confirm:$false -Force -AllowClobber
137 | ```
138 | 


--------------------------------------------------------------------------------
/docs/_posts/2021-04-15-vmware-numa.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | layout: default
  3 | title: "Overprovisioned VMs cause CPU contention due to NUMA boundaries"
  4 | date: 2021-04-15
  5 | description: CPU READY will kill your performance even if CPU usage is low
  6 | categories: [VMware, KB, NUMA]
  7 | ---
  8 | 
  9 | # Symptoms
 10 | 
 11 | 1. VMs are inexplicably slow, even with low overall CPU usage at the ESXi host level or VM level
 12 | 2. CPU READY times are high on the GUI charts or ESXTOP
 13 | 3. VMs have high vCPU count (overprovisioned)
 14 | 4. \[Optional\] Overprovisioned VMs reside on same ESXi host
 15 | 
 16 | # Cause
 17 | 
 18 | When a VM has more vCPU cores allocated than exist in a single CPU on its underlying host, it will have to cross **NUMA boundaries** when performing CPU instructions.  See below graphic for example.
 19 | 
 20 | ![NUMA Nodes](https://daveshap.github.io/DavidShapiroBlog/assets/numa.png)
 21 | 
 22 | - A smaller VM will fit nicely within a single NUMA boundary
 23 | - A larger VM will span NUMA boundaries
 24 | 
 25 | This situation can cripple the performance of either or both VMs. The overprovisioned VM can steal resources from the correctly provisioned VM. 
 26 | 
 27 | # Resolution
 28 | 
 29 | ## Reduce vCPU count
 30 | 
 31 | If possible, reduce the vCPU count of the overprovisioned VM. I have provided a script (below) to help assess the correct size for all VMs in your environment. Right-sizing your VMs is best practice, and it can prevent many problems, plus it can increase performance. 
 32 | 
 33 | ### Right-sizing Data Script
 34 | 
 35 | ```powershell
 36 | Connect-VIServer $vcenter_fqdn  # change this to your vCenter name!
 37 | $vms = Get-VM | Where-Object {$_.PowerState -like "*on*" -and $_.NumCpu -ge 4}  # Most VMs with fewer than 4vCPU are boring
 38 | $data = @()
 39 | 
 40 | foreach ($vm in $vms)
 41 |     {
 42 |     $ready = $vm | Get-Stat -Stat cpu.ready.summation -Start (Get-Date).AddDays(-365)
 43 |     $ghz = $vm | Get-Stat -Stat cpu.usagemhz.average -Start (Get-Date).AddDays(-365)
 44 |     $info = "" | select VM,vCPU,GhzUsed,ReadySecPerDay,GhzCapacity,NeededCores,HostSockets,HostCores,NumaCores
 45 |     $info.VM = $vm.name
 46 |     $info.vCPU = $vm.NumCpu
 47 |     $info.ReadySecPerDay = [int](($ready | Measure-Object -Property Value -Average).Average / 1000)
 48 |     $info.GhzUsed = [math]::Round((($ghz | Measure-Object -Property Value -Average).Average / 1000), 2)
 49 |     $info.GhzCapacity = $vm.NumCpu * 2.1
 50 |     $info.NeededCores = [math]::ceiling(($ghz | Measure-Object -Property Value -Average).Average / 2100) * 2
 51 |     $info.HostSockets = $vm.VMHost.ExtensionData.Hardware.CpuInfo.NumCpuPackages
 52 |     $info.HostCores = $vm.VMHost.ExtensionData.Hardware.CpuInfo.NumCpuCores
 53 |     $info.NumaCores = $vm.VMHost.ExtensionData.Hardware.CpuInfo.NumCpuCores / $vm.VMHost.ExtensionData.Hardware.CpuInfo.NumCpuPackages
 54 |     $data += $info
 55 |     $info | fl
 56 |     }
 57 | 
 58 | 
 59 | Clear-Host
 60 | $data | ft
 61 | $data | Export-Csv all_vm_cpu_ghz_ready.csv -NoTypeInformation
 62 | ```
 63 | 
 64 | ### Script Output
 65 | 
 66 | Here's an example of the output from this script. As you can see, the more cores a VM has, the higher its ReadySecPerDay will be. 
 67 | 
 68 | ```
 69 | 
 70 | VM                           vCPU GhzUsed ReadySecPerDay GhzCapacity NeededCores HostSockets HostCores NumaCores
 71 | --                           ---- ------- -------------- ----------- ----------- ----------- --------- ---------
 72 |                       28    6.95          53702        50.4           8           2        28        14
 73 |                       28    1.83          44517        50.4           2           2        28        14
 74 |                       22    2.51          42160        50.4           4           2        28        14
 75 |                       22    1.91          40147        50.4           2           2        28        14
 76 | ....
 77 |                        4     0.1             14         8.4           2           2        16         8
 78 |                        6    0.06              7        12.6           2           2        20        10
 79 |                        4    0.13              3         8.4           2           2        36        18
 80 |                        4    0.38              3         8.4           2           2        20        10
 81 | ```
 82 | 
 83 | ### Explanation of Fields
 84 | 
 85 | 
 86 | ```
 87 | VM               Name of the VM
 88 | vCPU             Count of vCPU cores assigned to the VM 
 89 | GhzUsed          Average Ghz consumed by the VM over the past year 
 90 | ReadySecPerDay   Average number of total seconds the VM's cores were in "CPU READY" state (waiting for CPU time from the host) 
 91 | NeededCores      Double the number of cores required to accomodate average load, assuming a core provides about 2Ghz 
 92 | HostSockets      Number of physical processors installed in the ESXi host 
 93 | HostCores        Total number of cores between all sockets 
 94 | NumaCores        Number of cores available to each NUMA node 
 95 | ```
 96 | 
 97 | It's normal to have some CPU READY time in your virtual environment. It is, after all, a shared environment. Many VMs will average less than 200 seconds of CPU READY per day. However, when you start to get larger and larger vCPU counts, you can see CPU READY piling up into the tens of thousands. This is all performance that is just out the window. Right-sizing all your VMs will prevent them from competing for resources.
 98 | 
 99 | ## Change the vCPU settings
100 | 
101 | In the cases where you need more vCPU cores than exist on a given NUMA node, you can fiddle with the CPU cores/sockets settings in the VM virtual hardware settings. This VMware blog has far more detail about it: [https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html](https://blogs.vmware.com/performance/2017/03/virtual-machine-vcpu-and-vnuma-rightsizing-rules-of-thumb.html)
102 | 
103 | ## Move the VMs to hosts with higher core counts
104 | 
105 | Lets say you have some hosts with 2x10 cores but they have VMs with 16 cores. You could move those VMs to larger hosts with 2x16 cores or more. 
106 | 
107 | ## Use CPU reservations
108 | 
109 | Important note: This just tells VMware to rob Peter to pay Paul. You can use this to ensure that production VMs get resources when push comes to shove and that non-prod VMs get crippled. 
110 | 
111 | ## Use Anti-Affinity Rules
112 | 
113 | If you only have a handful of oversized VMs in your environment, you can create anti-affinity rules to ensure that they don't reside on the same host together. This can also be a stopgap measure. 
114 | 
115 | # Conclusion
116 | 
117 | The easiest thing to do is simply right-size your VMs. Correctly allocated CPU and RAM will prevent this problem, as well as many others. 
118 | 
119 | 
120 | 
121 | 
122 | 
123 | 
124 | 
125 | 
126 | 
127 | 


--------------------------------------------------------------------------------
/docs/_posts/2021-09-17-ucs-vlan-scripts.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: default
 3 | title: "Some helpful UCS VLAN scripts"
 4 | date: 2021-09-17
 5 | description: Because adding VLANs by hand is hell (and dumb)
 6 | categories: [UCS, VLAN, PowerShell]
 7 | ---
 8 | 
 9 | # Export VLANs (from existing domain)
10 | 
11 | This first script will group all the VLANs and VLAN groups. This is handy if you need to set up a new UCS domain based on an existing one. Plus it dumps everything to CSV so you can double check and/or update before moving on.
12 | 
13 | ```powershell
14 | $ucsname = "UPDATEME"
15 | if ($ucs_creds -eq $null) { $ucs_creds = Get-Credential -Message "Format = ucs-DOMAIN\username"}
16 | 
17 | if ($DefaultUcs -ne $null) { Disconnect-Ucs -ErrorAction Stop }
18 | Connect-Ucs -Name $ucsname -Credential $ucs_creds -ErrorAction Stop
19 | 
20 | # export VLAN list
21 | Get-UcsVlan | Select-Object Id,Name | Export-Csv -NoTypeInformation -Force -Path "$ucsname_VLANS.csv"
22 | 
23 | # get groups
24 | $groups = Get-UcsFabricNetGroup
25 | $data = @()
26 | foreach ($group in $groups)
27 |     {
28 |     $members = $group | Get-UcsFabricPooledVlan
29 |     foreach ($member in $members)
30 |         {
31 |         $info = "" | Select-Object Group,Vlan
32 |         $info.Group = $group.Name
33 |         $info.Vlan = $member.Name
34 |         $data += $info
35 |         }
36 |     }
37 | 
38 | $data | Export-Csv -NoTypeInformation -Force -Path "$ucsname_groups.csv"
39 | ```
40 | 
41 | # Add VLANs and Groups
42 | 
43 | This script presumes that you've already created the groups by hand or with another script. You could easily add them yourself with `Add-UcsFabricNetGroup`.
44 | 
45 | ```powershell
46 | $ucsname = "UPDATEME"
47 | if ($ucs_creds -eq $null) { $ucs_creds = Get-Credential -Message "Format = ucs-DOMAIN\username"}
48 | 
49 | if ($DefaultUcs -ne $null) { Disconnect-Ucs -ErrorAction Stop }
50 | Connect-Ucs -Name $ucsname -Credential $ucs_creds -ErrorAction Stop
51 | 
52 | $vlans = Import-Csv -Path "$ucsname_VLANS.csv"
53 | $groups = Import-Csv -Path "$ucsname_groups.csv"
54 | 
55 | $cloud = Get-UcsLanCloud 
56 | 
57 | foreach ($vlan in $vlans)
58 |     {
59 |     Write-Host "Adding VLAN" $vlan.id $vlan.name
60 |     $cloud | Add-UcsVlan -CompressionType included -DefaultNet no -Id $vlan.Id -McastPolicyName "" -Name $vlan.Name -PolicyOwner local -PubNwName "" -Sharing none -ErrorAction Continue
61 |     }
62 | 
63 | foreach ($vlan in $groups)
64 |     {
65 |     write-host "Adding VLAN to group" $vlan.Vlan $vlan.Group
66 |     $group = $cloud | Get-UcsFabricNetGroup -Name $vlan.Group
67 |     $group | Add-UcsFabricPooledVlan -ModifyPresent -Name $vlan.Vlan -ErrorAction Continue
68 |     }
69 | ```
70 | 
71 | # BONUS: Add vNIC Template one-liner
72 | 
73 | I haven't figured out how to add VLAN groups to vNIC templates yet but this will still help you provision a boadload of vNIC templates.
74 | 
75 | ```powershell
76 | Get-UcsOrg | Add-UcsVnicTemplate -Name "NAME" -Descr "DESCRIPTION" -PolicyOwner local -Mtu 1500 -PinToGroupName "PIN_GROUP" -CdnSource vnic-name -TemplType updating-template -RedundancyPairType none -IdentPoolName "MAC_POOL" -SwitchId A
77 | ```
78 | 
79 | 


--------------------------------------------------------------------------------
/docs/_sass/jekyll-theme-tactile.scss:
--------------------------------------------------------------------------------
  1 | @import "rouge-base16-dark";
  2 | @import url('https://fonts.googleapis.com/css?family=Chivo:900');
  3 | 
  4 | /* http://meyerweb.com/eric/tools/css/reset/
  5 |    v2.0 | 20110126
  6 |    License: none (public domain)
  7 | */
  8 | html, body, div, span, applet, object, iframe,
  9 | h1, h2, h3, h4, h5, h6, p, blockquote, pre,
 10 | a, abbr, acronym, address, big, cite, code,
 11 | del, dfn, em, img, ins, kbd, q, s, samp,
 12 | small, strike, strong, sub, sup, tt, var,
 13 | b, u, i, center,
 14 | dl, dt, dd, ol, ul, li,
 15 | fieldset, form, label, legend,
 16 | table, caption, tbody, tfoot, thead, tr, th, td,
 17 | article, aside, canvas, details, embed,
 18 | figure, figcaption, footer, header, hgroup,
 19 | menu, nav, output, ruby, section, summary,
 20 | time, mark, audio, video {
 21 | 	padding: 0;
 22 | 	margin: 0;
 23 | 	font: inherit;
 24 | 	font-size: 100%;
 25 | 	vertical-align: baseline;
 26 | 	border: 0;
 27 | }
 28 | /* HTML5 display-role reset for older browsers */
 29 | article, aside, details, figcaption, figure,
 30 | footer, header, hgroup, menu, nav, section {
 31 | 	display: block;
 32 | }
 33 | body {
 34 | 	line-height: 1;
 35 | }
 36 | ol, ul {
 37 | 	list-style: none;
 38 | }
 39 | blockquote, q {
 40 | 	quotes: none;
 41 | }
 42 | blockquote:before, blockquote:after,
 43 | q:before, q:after {
 44 | 	content: '';
 45 | 	content: none;
 46 | }
 47 | table {
 48 | 	border-spacing: 0;
 49 | 	border-collapse: collapse;
 50 | }
 51 | 
 52 | /* LAYOUT STYLES */
 53 | body {
 54 |   font-family: Helvetica, Verdana, sans-serif;
 55 |   font-size: 1em;
 56 |   line-height: 1.5;
 57 |   color: #404040;  
 58 |   background: #e7e7e7 url(../images/body-bg.png) 0 0 repeat;
 59 | }
 60 | 
 61 | a {
 62 |   color: #d5000d;
 63 | }
 64 | a:hover {
 65 |   color: #c5000c;
 66 | }
 67 | 
 68 | header {
 69 |   padding-top: 35px;
 70 |   padding-bottom: 25px;
 71 | }
 72 | 
 73 | header h1 {
 74 |   font-family: 'Chivo', 'Helvetica Neue', Helvetica, Arial, serif;
 75 |   font-size: 48px; font-weight: 900;
 76 |   line-height: 1.2;
 77 |   color: #303030;
 78 |   letter-spacing: -1px;
 79 | }
 80 | 
 81 | header h2 {
 82 |   font-size: 24px;
 83 |   font-weight: normal;
 84 |   line-height: 1.3;
 85 |   color: #aaa;
 86 |   letter-spacing: -1px;
 87 | }
 88 | 
 89 | #container {
 90 |   min-height: 595px;
 91 |   background: transparent url(../images/highlight-bg.jpg) 50% 0 no-repeat;
 92 | }
 93 | 
 94 | .inner {
 95 |   width: 620px;
 96 |   margin: 0 auto;
 97 | }
 98 | 
 99 | #container .inner img {
100 |   max-width: 100%;
101 | }
102 | 
103 | #downloads {
104 |   margin-bottom: 5px;
105 | }
106 | 
107 | a.button {
108 |   display: block;
109 |   float: left;
110 |   width: 179px;
111 |   padding: 12px 8px 12px 8px;
112 |   margin-right: 14px;
113 |   font-size: 15px;
114 |   font-weight: bold;
115 |   line-height: 25px;
116 |   color: #303030;
117 |   background: #fdfdfd; /* Old browsers */
118 |   background: -moz-linear-gradient(top,  #fdfdfd 0%, #f2f2f2 100%); /* FF3.6+ */
119 |   background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#fdfdfd), color-stop(100%,#f2f2f2)); /* Chrome,Safari4+ */
120 |   background: -webkit-linear-gradient(top,  #fdfdfd 0%,#f2f2f2 100%); /* Chrome10+,Safari5.1+ */
121 |   background: -o-linear-gradient(top,  #fdfdfd 0%,#f2f2f2 100%); /* Opera 11.10+ */
122 |   background: -ms-linear-gradient(top,  #fdfdfd 0%,#f2f2f2 100%); /* IE10+ */
123 |   background: linear-gradient(to top,  #fdfdfd 0%,#f2f2f2 100%); /* W3C */
124 |   filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#fdfdfd', endColorstr='#f2f2f2',GradientType=0 ); /* IE6-9 */
125 |   border-top: solid 1px #cbcbcb;
126 |   border-right: solid 1px #b7b7b7;
127 |   border-bottom: solid 1px #b3b3b3;
128 |   border-left: solid 1px #b7b7b7;
129 |   border-radius: 30px;
130 |   -webkit-box-shadow: 10px 10px 5px #888;
131 |   -moz-box-shadow: 10px 10px 5px #888;
132 |   box-shadow: 0px 1px 5px #e8e8e8;
133 |   -moz-border-radius: 30px;
134 |   -webkit-border-radius: 30px;
135 | }
136 | a.button:hover {
137 |   background: #fafafa; /* Old browsers */
138 |   background: -moz-linear-gradient(top,  #fdfdfd 0%, #f6f6f6 100%); /* FF3.6+ */
139 |   background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#fdfdfd), color-stop(100%,#f6f6f6)); /* Chrome,Safari4+ */
140 |   background: -webkit-linear-gradient(top,  #fdfdfd 0%,#f6f6f6 100%); /* Chrome10+,Safari5.1+ */
141 |   background: -o-linear-gradient(top,  #fdfdfd 0%,#f6f6f6 100%); /* Opera 11.10+ */
142 |   background: -ms-linear-gradient(top,  #fdfdfd 0%,#f6f6f6 100%); /* IE10+ */
143 |   background: linear-gradient(to top,  #fdfdfd 0%,#f6f6f6, 100%); /* W3C */
144 |   filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#fdfdfd', endColorstr='#f6f6f6',GradientType=0 ); /* IE6-9 */
145 |   border-top: solid 1px #b7b7b7;
146 |   border-right: solid 1px #b3b3b3;
147 |   border-bottom: solid 1px #b3b3b3;
148 |   border-left: solid 1px #b3b3b3;
149 | }
150 | 
151 | a.button span {
152 |   display: block;
153 |   height: 23px;
154 |   padding-left: 50px;
155 | }
156 | 
157 | #download-zip span {
158 |   background: transparent url(../images/zip-icon.png) 12px 50% no-repeat;
159 | }
160 | #download-tar-gz span {
161 |   background: transparent url(../images/tar-gz-icon.png) 12px 50% no-repeat;
162 | }
163 | #view-on-github span {
164 |   background: transparent url(../images/octocat-icon.png) 12px 50% no-repeat;
165 | }
166 | #view-on-github {
167 |   margin-right: 0;
168 | }
169 | 
170 | code, pre {
171 |   margin-bottom: 30px;
172 |   font-family: Monaco, "Bitstream Vera Sans Mono", "Lucida Console", Terminal;
173 |   font-size: 14px;
174 |   color: #222;
175 | }
176 | 
177 | code {
178 |   padding: 0 3px;
179 |   background-color: #f2f2f2;
180 |   border: solid 1px #ddd;
181 | }
182 | 
183 | pre {
184 |   padding: 20px;
185 |   overflow: auto;
186 |   color: #f2f2f2;
187 |   text-shadow: none;
188 |   background: #303030;
189 | }
190 | pre code {
191 |   padding: 0;
192 |   color: #f2f2f2;
193 |   background-color: #303030;
194 |   border: none;
195 | }
196 | 
197 | ul, ol, dl {
198 |   margin-bottom: 20px;
199 | }
200 | 
201 | 
202 | /* COMMON STYLES */
203 | 
204 | hr {
205 |   height: 1px;
206 |   padding-bottom: 1em;
207 |   margin-top: 1em;
208 |   line-height: 1px;
209 |   background: transparent url('../images/hr.png') 50% 0 no-repeat;
210 |   border: none;
211 | }
212 | 
213 | strong {
214 |   font-weight: bold;
215 | }
216 | 
217 | em {
218 |   font-style: italic;
219 | }
220 | 
221 | table {
222 |   width: 100%;
223 |   border: 1px solid #ebebeb;
224 | }
225 | 
226 | th {
227 |   font-weight: 500;
228 | }
229 | 
230 | td {
231 |   font-weight: 300;
232 |   text-align: center;
233 |   border: 1px solid #ebebeb;
234 | }
235 | 
236 | form {
237 |   padding: 20px;
238 |   background: #f2f2f2;
239 | 
240 | }
241 | 
242 | 
243 | /* GENERAL ELEMENT TYPE STYLES */
244 | 
245 | h1 {
246 |   font-size: 32px;
247 |   color: #b5000d;
248 | }
249 | 
250 | h2 {
251 |   margin-bottom: 8px;
252 |   font-size: 22px;
253 |   font-weight: bold;
254 |   color: #c5000d;
255 | }
256 | 
257 | h3 {
258 |   margin-bottom: 8px;
259 |   font-size: 18px;
260 |   font-weight: bold;
261 |   color: #d5000d;
262 | }
263 | 
264 | h4 {
265 |   font-size: 16px;
266 |   font-weight: bold;
267 |   color: #303030;
268 | }
269 | 
270 | h5 {
271 |   font-size: 1em;
272 |   color: #303030;
273 | }
274 | 
275 | h6 {
276 |   font-size: .8em;
277 |   color: #303030;
278 | }
279 | 
280 | p {
281 |   margin-bottom: 20px;
282 |   font-weight: 300;
283 | }
284 | 
285 | a {
286 |   text-decoration: none;
287 | }
288 | 
289 | p a {
290 |   font-weight: 400;
291 | }
292 | 
293 | blockquote {
294 |   padding: 0 0 0 30px;
295 |   margin-bottom: 20px;
296 |   font-size: 1.6em;
297 |   border-left: 10px solid #e9e9e9;
298 | }
299 | 
300 | ul li {
301 |   list-style-position: inside;
302 |   list-style: disc;
303 |   padding-left: 20px;
304 | }
305 | 
306 | ol li {
307 |   list-style-position: inside;
308 |   list-style: decimal;
309 |   padding-left: 3px;
310 | }
311 | 
312 | dl dt {
313 |   color: #303030;
314 | }
315 | 
316 | footer {
317 |   padding-top: 20px;
318 |   padding-bottom: 30px;
319 |   margin-top: 40px;
320 |   font-size: 13px;
321 |   color: #aaa;
322 |   background: transparent url('../images/hr.png') 0 0 no-repeat;
323 | }
324 | 
325 | footer a {
326 |   color: #666;
327 | }
328 | footer a:hover {
329 |   color: #444;
330 | }
331 | 
332 | /* MISC */
333 | .clearfix:after {
334 |   display: block;
335 |   height: 0;
336 |   clear: both;
337 |   visibility: hidden;
338 |   content: '.';
339 | }
340 | 
341 | .clearfix {display: inline-block;}
342 | * html .clearfix {height: 1%;}
343 | .clearfix {display: block;}
344 | 
345 | /* #Media Queries
346 | ================================================== */
347 | 
348 | /* Smaller than standard 960 (devices and browsers) */
349 | @media only screen and (max-width: 959px) { }
350 | 
351 | /* Tablet Portrait size to standard 960 (devices and browsers) */
352 | @media only screen and (min-width: 768px) and (max-width: 959px) { }
353 | 
354 | /* All Mobile Sizes (devices and browser) */
355 | @media only screen and (max-width: 767px) {
356 |   header {
357 |     padding-top: 10px;
358 |     padding-bottom: 10px;
359 |   }
360 |   #downloads {
361 |     margin-bottom: 25px;
362 |   }
363 |   #download-zip, #download-tar-gz {
364 |     display: none;
365 |   }
366 |   .inner {
367 |     width: 94%;
368 |     margin: 0 auto;
369 |   }
370 |   ul li {
371 |     margin-left: 10px;
372 |     padding-left: 10px;
373 |   }
374 |   ol li {
375 |     margin-left: 10px;
376 |   }
377 | }
378 | 
379 | /* Mobile Landscape Size to Tablet Portrait (devices and browsers) */
380 | @media only screen and (min-width: 480px) and (max-width: 767px) { }
381 | 
382 | /* Mobile Portrait Size to Mobile Landscape Size (devices and browsers) */
383 | @media only screen and (max-width: 479px) { }
384 | 


--------------------------------------------------------------------------------
/docs/assets/favicon.ico:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/daveshap/DavidShapiroBlog/aa209f3e46e391614edd01665c9c6f411c86a1de/docs/assets/favicon.ico


--------------------------------------------------------------------------------
/docs/assets/numa.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/daveshap/DavidShapiroBlog/aa209f3e46e391614edd01665c9c6f411c86a1de/docs/assets/numa.png


--------------------------------------------------------------------------------
/docs/categories.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: default
 3 | title: Categories
 4 | description: We've got something for just about everyone
 5 | ---
 6 | 
 7 | # All Categories
 8 | {% assign sorted_cats = site.categories | sort %}
 9 | {% for category in sorted_cats %}{% capture category_name %}{{ category | first }}{% endcapture %}[{{ category_name }}]({{ site.baseurl }}/categories.html#{{ category_name }}) - {% endfor %}
10 | 
11 | {% assign sorted_cats = site.categories | sort %}
12 | {% for category in sorted_cats %}
13 | {% capture category_name %}{{ category | first }}{% endcapture %}
14 | ---
15 | # {{ category_name }}
16 | {% for post in site.categories[category_name] %}
17 | ### [{{ post.title }}]({{ site.baseurl }}{{ post.url }})
18 | #### {{ post.date | date_to_long_string }}
19 | ##### *{{ post.description }}*
20 | 

21 | {% endfor %}
22 | {% endfor %}
23 | 


--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | layout: default
 3 | title: David Shapiro's Tech Blog
 4 | ---
 5 | 
 6 | # About Me
 7 | 
 8 | I am a professional IT engineer by trade, focusing on private cloud infrastructure technologies such as VMware, Microsoft, and SAN storage. I have a strong focus on scripting and automation with Python, PowerShell, and Rundeck. 
 9 | 
10 | Beyond that, my hobbies and interests include science fiction writing, tinkering with deep learning, quantum computing, and the Singularity in general. I also enjoy spending time in nature and biking. 
11 | 
12 | ---
13 | # Blog Posts
14 | 
15 | {% for post in site.posts %}
16 | 
17 | ### [{{ post.title }}]({{ site.baseurl }}{{ post.url }})
18 | #### {{ post.date | date_to_long_string }}
19 | ##### *{{ post.description }}*
20 | 

21 | 
22 | {% endfor %}
23 | 


--------------------------------------------------------------------------------