Surgo's Homepage

├── CNAME ├── images ├── carla.png ├── bridgeport_cvs.jpg ├── drillpress_pulleys.jpeg ├── jack_latency_comparison.png └── native_latency_comparison.png ├── index.html ├── surge └── scope_proposal.html ├── roller.html ├── speedcontrols.html ├── ai └── format_proposal.html └── jack.html /CNAME: -------------------------------------------------------------------------------- 1 | www.surgo.net -------------------------------------------------------------------------------- /images/carla.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mx/mx.github.io/main/images/carla.png -------------------------------------------------------------------------------- /images/bridgeport_cvs.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mx/mx.github.io/main/images/bridgeport_cvs.jpg -------------------------------------------------------------------------------- /images/drillpress_pulleys.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mx/mx.github.io/main/images/drillpress_pulleys.jpeg -------------------------------------------------------------------------------- /images/jack_latency_comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mx/mx.github.io/main/images/jack_latency_comparison.png -------------------------------------------------------------------------------- /images/native_latency_comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mx/mx.github.io/main/images/native_latency_comparison.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Surgo's Homepage 4 | 5 | 6 | 7 |

Just a basic homepage for a few of my random musings or useful things. I don't like fancy web tools and stuff so all of this is written by hand in plain html/css/javascript.

8 | 9 |

Dice roller for my D&D game.
JACK and Carla are Windows Audio's best friends. 12 |
About mechanical vs. electronic speed controls.
Design doc for an inbuilt oscilloscope for Surge synthesizer.
Design doc for an extensible model format for image generation (diffusion) models and their frontends.

16 | 17 | 18 | -------------------------------------------------------------------------------- /surge/scope_proposal.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Oscilloscope Proposal for Surge 4 | 5 | 6 | 7 |

Overall Goal

8 |

Issue #1970 requests an oscilloscope. Personally I need one as well for my own use case, so I went about adding one. In the process, however, I found that there might be some generic use case for concepts like "tapping the audio chain at a certain point", or "subscribe to the output". In other words, some future feature might want to do the same thing. So I wanted to propose some abstractions and design that could satisfy these use cases.

9 | 10 |

Initial Steps

11 |

Before anything else and before we overengineer anything, let's just get a basic scope done. So that's the following:

12 |

Overlay component that can be opened from the main menu to turn on the scope.
Basic ring buffer plopped into SurgeSynthesizer.h.
SurgeSynthesizer::process, will push to the ring buffer if the scope is subscribed to it.
Overlay component performs FFT on ring buffer blocks (probably on a separate thread from the UI update), displays results on UI update. Or just displays the waveform (based on user's desire).

18 |

Simple enough concept. It's probably worth abstracting that second step into a generic SPSC RingBuffer class with a proper API (better than just leaving a bunch of atomics and arrays lying around). That's all it will take to get the first version of the scope done.

19 | 20 |

Future Work: Generalized SPMC PubSub

21 |

I don't think I'm the first, and I probably won't be the last, person who wants to pull data from the audio pipeline. So after the above is all done, it's worth considering augmenting the ring buffer infrastructure to support multiple consumers. We could handle this in a few different ways, and possibly all of them:

22 |

Just keep separate markers for each reader, and give a reader a token when registering to remember who is who.
Have a process() method that, when calls, executes a callback on all the readers. Obviously the process() would itself need to be called outside the realtime audio thread.
Have #2, but let the reader say if they want to be notified or want to do the pulling themselves.

27 | 28 |

Anyway, then we can move the data at the end of the audio pipeline into this structure and anyone who cares can subscribe to it. We could probably move the channel loudness markers to read from this too if we wanted to clean that up.

29 | 30 |

Future work: Attach Anywhere

31 |

This one's a bit more complicated. There's no reason we should limit ourselves to just pulling data from the end of the audio pipeline.

32 | 33 |

SurgeSynthesizer::process is a pretty rigid function. It's simple enough but it really just routes blocks from one thing to the next to the next. There's no concept of hooking into things. To make this idea work with minimal changes, we'd basically be adding a ring buffer in between every pipeline stage, and filling it as we move. (Obviously, the filling would be a no-op if there were no subscribers.) That...works, but it's not very elegant and not all that extensible either.

34 | 35 |

Going beyond this would require a serious refactoring of SurgeSynthesizer::process() to make the whole thing a little more modular. You could imagine an abstract class representing a pipeline unit, and said class would hold the data it's producing. Or something like that. Anyway, this is way beyond the scope of a basic oscilloscope. I just bring this up to mention possible future ideas for hooking in anywhere you want. This is the sort of thing you'd want to do to help implement issue #4355. It would also pave the way for turning Surge into a more modular synthesizer (which is also, basically, the path that the aforementioned issue is going down).

36 | 37 |

I can try to write up a design for that sort of thing, but there's a couple of dangers there. First of all I'm a new contributor so I already feel like I'm stepping on toes with this doc alone. Second of all I think there's several pitfalls around real-time processing that I'm simply not aware of. There's a lot of subtleties around real-time work which makes this difficult to get right.

38 | 39 | 40 | -------------------------------------------------------------------------------- /roller.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Dice Roller 4 | 5 | 22 | 23 | 181 | 182 | 183 | 184 |

185 | Input: 186 |
187 | 191 |
192 | 193 | 194 | 196 | Collate 197 |
198 | Output: 199 |
200 | 202 |

203 |

204 |

Instructions

205 |

Put a roll string in the input. Such a string looks like, minus the [] brackets:

206 |

name #x#d#+#d#+#

207 |

The name and first number, including the "x", is optional. It states how many times to run the roll:

208 |

5x3d6 => Roll 3d6, 5 separate times.

209 |

You can include as many different dice expressions (#d#) as you want, summed together:

210 |

damage 3d6+1d8 => Roll 3d6 and 1d8, add it together. Labeled as "damage" in the output.

211 |

You can optionally include a final adder or subtractor:

212 |

1d8+1d4-3 => Roll 1d8 and 1d4, add them together, and subtract 3.

213 |

Each dice expression (#d#) can include an optional specifier afterwards. You can DROP the lowest value of dice by specifying r, the highest value of dice by specifying R or EXPLODE on a certain value or higher by specifying e or E:

214 |

6x4d6r1 => Roll 4d6 and drop the lowest die, six times (basic stats).

215 |

2d20R1 => Roll 2d20, drop the highest die (basic disadvantage).

216 |

6d6e6 => Roll 6d6, exploding on sixes.

217 |

6d6e5 => Roll 6d6, exploding on fives, and the exploded dice can also explode (up to 5 times, hard limit).

218 |

Check the "collate" box to have the die results sorted and counted by the number of times the face appeared:

219 |

6d6 (collated) => [1x1,2x3,3x6] -> 25

220 |

221 | 222 | 223 | -------------------------------------------------------------------------------- /speedcontrols.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Mechanical and Electronic Speed Control: Advanced Tech Isn't Always Better 4 | 5 | 6 | 7 | 8 |

What Happened Here?

9 |

I was watching machining video recently from Adam Booth, in which he attempts a simple job: put a hole into some flat steel bar. It's basic job-shop work but he was using it as a learning experience for computer-controlled (CNC) machining, which he is not used to. It didn't go very well, and to his credit he posted the video with his mistakes.

10 | 11 |

In this video, he attempts to use a massive drill bit (over 1" diameter) in a CNC milling machine to go from completely unbroken steel to a close-to-sized hole. (For those unaware, you can think of a milling machine as a very large, beefed-up drill press that can also travel in lateral directions, not just up and down.) He is unsure if this will work at all, and indeed, it does not. The interesting part here is that this likely would have worked if he had done the operation on one of his manual milling machines. The reason it doesn't work on his fancy computer-controlled machine is extremely interesting and represents an example of technological advancement not always leading to better outcomes.

12 | 13 |

What Went Wrong?

14 |

When a milling machine, or a drill press, fails to drive a drill bit through a piece of steel, a common problem will be the motor stalling, which is what happened in the video. This happens because the motor does not produce enough power to make the cut. This might come as a bit of a surprise because Adam is using, as far as I can tell from the video, a Milltronics TRM3016 or larger model which comes with a motor that is at least 10 horsepower. That's a lot of power! If he were to put that drill bit into one of his manual machines, it's likely that said machine would only have a 3 horsepower motor. Despite only having 3/10ths the power of the CNC machine, they would likely be able to make the cut just fine.

15 | 16 |

But wait. Only 3/10ths the power, but would be powerful enough to make the cut when the 10 horsepower motor failed it? What sort of magic makes that work? We have to talk about speed control.

17 | 18 |

Mechanical Speed Control

19 |

If you've ever operated a drill press before and opened it up at the head, you'd see an array of pulleys with a belt connecting them. Even very cheap drill presses have this. Here's an example from the Washington Open Proftech book "Introduction to Machining", which shows all the parts:

20 | 21 |

22 | 23 |

This clever little arrangement of belts and pulleys allows the operator to transform the speed of the motor, which is roughly constant, to a more convenient speed for the drill bit to do the job at hand. For drilling into steel this involves reducing the speed enormously, from roughly 3600 RPM from the motor into something like 250 RPM at the drill bit. Making the change is somewhat inconvenient since you have to pop open the machine head, slacken and move multiple belts, and re-tension everything once you're done. As a result it's best to drill a bunch of similar holes at once so you can minimize the number of times you have to do that.

24 | 25 |

So, now we've transformed the motor from 3600 RPM to 250 RPM, or roughly stepped it down about 14:1. The fascinating thing about doing it with belts and pulleys (or equivalently with gears) is that the motor is still running at 3600 RPM. That means it's still going to produce its fully rated power, even though that power is being brought to bear by the drill bit running much slower! From your physics class, you might remember that the power of a rotating system is equal to its rate of rotation multiplied by its torque. Our power is constant, and the rate of rotation has dropped by about 14x, so that means the torque at the bit has gone up by 14x! Remember that, it's going to be important later.

26 | 27 |

Mechanical Variable Speed Control

28 |

Not to put too fine a point on it, but belts and gears suck to use. They're inconvenient to change all the time, and they lock you into a few discrete speeds. Wouldn't it be great if there was a way to get the speeds you want without having to change all these belts and pulleys? It turns out, there is.

29 | 30 |

31 | 32 |

Some machines come with what's known as a variable-speed head. Whoever first invented this was an absolute genius truly deserving of the word. This is an ungodly contraption of belts and pulleys shaped like cones, designed so that you tilt a plate and it makes a belt ride up and down to set an approximate speed. It's complicated as hell, it's heavy, and it works beautifully. You can see all the individual parts and how they work together in this video by Watch Wes Work where he takes apart and repairs such a head.

33 | 34 |

These heads are not so common anymore. They aren't used in computer controlled machines at all because the speed readout is somewhat approximate and not constant. Still, they work great for manual machines and just like with the belts-and-pulleys assembly (because it basically is a belts-and-pulleys assembly) the motor is able to bring its full power to bear at any given speed.

35 | 36 |

What Went Wrong: Electronic Variable Speed Control

37 |

Computer controlled machines use what's known as electronic variable speed control. As the name "electronic" implies, there's no belts or knobs involved here. Instead a device known as a variable-frequency drive which is capable of changing the speed of an AC motor by varying the frequency of the electricity applied to it. This is extraordinarily convenient for computer control! It can be done entirely with software talking to solid-state hardware. The speeds themselves can be controlled extremely finely, down to the RPM. It's an ideal system for when you need to tell a drill bit exactly how fast it needs to be turning. Well...almost.

38 | 39 |

Remember how in those mechanical systems we discussed above, the motor was always running at its full rate, and somewhere between the motor and the drill bit we stepped the speed down to the desired number? That is not how it works with electronic control. Instead it changes the speed of the motor directly. That has massive consequences and, if you happen to ever go shopping for CNC machines, will explain why you see motors sized the way they are.

40 | 41 |

When you reduce the speed of a motor with a variable-frequency drive, the torque remains constant. Previously, when we took our drill bit from 3600 RPM to 250 RPM, the speed went down roughly 14x but the bit's torque went up 14x to compensate. With an electronic control reducing our speed, on the other hand, the speed still goes down roughly 14x but the torque remains the same! That means that the overall power available at the drill bit doing the cutting is 1/14th the original power of the motor! That 10 horsepower motor that the machine came equipped with can now only put out a piddling 0.71 horsepower! We're never getting through steel with a 1" diameter drill bit with that! That cut that Adam was trying to make with his Milltronic machine was a failure from the start.

42 | 43 |

This is why CNC machines come standard with such enormously overpowered motors compared to manual machines. In order to make reasonable power at slower speeds, the motors must be ridiculously oversized to still put out enough power at lower speeds. Coincidentally, this is why you should also avoid buying products like variable-speed drill presses. They just won't have the power to do anything reasonable at slow speeds.

44 | 45 |

So What Does This Mean?

46 |

Advanced tech isn't always better ("Pareto Optimal" if you're a nerd like me). Sometimes the older ways of doing something have a particular advantage. That advantage might be forgotten about in the relentless march of time and technology, but sometimes we run into problems with the new way that wouldn't exist in the old way because of how the old way had solved it. Electronic speed control is a fantastic technology that powers most of modern machining, but it can lead us to make errors in judgment because of new trade-offs that we didn't have to think about before.

47 | 48 |

Aside: What About Sewing Machines?

49 |

When I was drafting this post, I was discussing it with my wife and she wondered about how this works on her FlexShaft™. A FlexShaft™ is a rotating tool you hold in your hand, kind of like a Dremel™, attached by a cord to a motor with its speed controlled by a foot pedal. Far older than any of the other machines discussed, are sewing machines. A sewing machine's speed is similarly controlled by a foot pedal. While the machines we've been talking about above use AC motors, both FlexShafts™ and sewing machines tend to use DC motors. Foot pedal control uses a potentiometer (or something electronically similar) to change the applied voltage to the motor, which reduces its speed. Since the speed is reduced at the motor, it doesn't gain any torque from doing this, so they have the same issues as variable-frequency drives with AC motors.

50 | 51 | 52 | 53 | -------------------------------------------------------------------------------- /ai/format_proposal.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | Single-File Format Proposal 8 | 9 | 10 | 11 |

Overall Goal

12 |

We currently have a proliferation of standards for distributing models, grown organically. This worked quite well in the days of small models with small text encoders (SD 1.5, SDXL). As both diffusion models and their accompanying text encoders have grown larger, however, we've started to hit problems. We have users that want to train individual pieces of the model, such as only the DiT of the SD3 model. This then leaves an open question of how to distribute the resulting trained piece. A single file that any UI can load is the most convenient form of such distribution. However, as of this moment, single files must include any corresponding text encoders (reaching 10+ gigabytes) for proper loading, even if they are untouched as part of the fine-tuning process. This is not a good long-term prospect.

13 | 14 |

The format that competes with single-file distribution is the Diffusers format. This represents a directory structure with the individual pieces of the model as different files. Splitting out the model like this is advantageous, but Diffusers format is not a panacea. You still have to distribute all the pieces of the model, even if you haven't modified them from their original forms. Furthermore, it is not a single file! It could be zipped up, but this is explicitly not part of the standard.

15 | 16 |

Thus, this document proposes a new standard for single-file model distribution going forward. None of the pieces here hold any technical challenge. The only difficulty is organizational: getting all inference frontends and trainers to agree on this format.

17 | 18 |

To aid with future compatibility, this standard also specifies a key naming format that implementing models will use. This key naming format is very simple: for full models, it is the naming that came with the original model. For derived models like LoRAs, the names have _omi appended. Further details are in the individual derived format sections. For the purposes of compatibility, OMI provides a reference implementation in Python for converting models between known key names in the wild. This implementation can be found under the namespace "omi.key_conversion".

19 | 20 |

The Format: Full Models

21 |

A model will be distributed as a single .safetensors file. It will have, contained in the metadata section of the file, the following JSON key:

22 |


 23 | "omi_data": {
 24 |   "schema_version": 1,
 25 |   // A pipeline lays out the type of model this file contains, and how to find
 26 |   // each of the pieces. The pipeline is optional. If the file doesn't include
 27 |   // a pipeline, the value of this key is null.
 28 |   "pipeline": {
 29 |     // The type comes from a pre-defined set of supported pipelines.
 30 |     // See the list below
 31 |     "type": "SDXL",
 32 |     "models": {
 33 |       // Each value in this dictionary can be either an object (dict), or a string.
 34 |       // An object means the model needs to be loaded from another file. A
 35 |       // string means that the model is contained in this file, with details in
 36 |       // the corresponding "models" object below.
 37 |       "clip_l_tokenizer": {
 38 |         "model_type": "CLIP_L/TOKENIZER",
 39 |         // The file hash is the simplest possible way to locate a model piece.
 40 |         // It is the raw hash of the entire file the piece is located in.
 41 |         "file_hash": "sha256:0xdeadbeef",
 42 |         // Optionally, other hashes can be specified. They are detailed in the
 43 |         // hashes section below.
 44 |         "hashes": {
 45 |           "content_hash": "sha256:0x0000",
 46 |           "similarity_hash": "sha256:0x0000"
 47 |         }
 48 |       },
 49 |       "clip_l": "clip_l",
 50 |       "unet": "sdxl_unet",
 51 |       "clip_g_tokenizer": "...",
 52 |       "clip_g": "...",
 53 |       "vae": "..."
 54 |     },
 55 |     "info": {
 56 |       // Optional, free-form object that can contain arbitrary data.
 57 |     }
 58 |   },
 59 |   // A dictionary of models contained in this file. For example, the text
 60 |   // encoder, tokenizer, or unet. The dictionary keys do not have any meaning;
 61 |   // they are used only as references to the models in the "pipeline" dict above
 62 |   // and as a prefix for tensor keys in this file.
 63 |   "models": {
 64 |     "clip_l": {
 65 |       // The type is a name from a pre-defined set of supported models. This
 66 |       // information is intended to be informative for the user.
 67 |       "type": "CLIP_L/TEXT_ENCODER",
 68 |       // Specifies the key layout of the model. Each model architecture has a
 69 |       // single "default" key layout. Other key layouts are only added to support
 70 |       // specific needs like quantization schemes that distribute a single
 71 |       // parameter across multiple tensors.
 72 |       // See below for more details.
 73 |       "key_layout": "default",
 74 |       "data": {
 75 |         // A structured object containing additional information about the model.
 76 |         // Each model_type has a specific set of keys that can be defined here.
 77 |         // Every key has a default value. If a key is not defined here, the
 78 |         // default value is assumed. See model-specific data
 79 |         // for additional information.
 80 |       },
 81 |       // Several hashes can be defined on the model, for ease of use in
 82 |       // frontends for locating and displaying choices to the user.
 83 |       // See hash types for the full list and what each one
 84 |       // means.
 85 |       "hashes": {
 86 |         "content_hash": "sha256:0xdeadbeef",
 87 |         "similarity_hash": "sha256:0xdeadbeef"
 88 |       },
 89 |       // An optional structure that can contain arbitrary data. Intentionally
 90 |       // not defined.
 91 |       "info": {}
 92 |     },
 93 |     "sdxl_unet": {
 94 |       "type": "SDXL/UNET",
 95 |       "hashes": {
 96 |         "content_hash": "sha256:0x0000",
 97 |         "similarity_hash": "sha256:0x0000",
 98 |       },
 99 |       "data": {
100 |         // See model-specific data.
101 |         "prediction_type": "eps",
102 |         "clip_l_layer": 2,
103 |         "clip_g_layer": 2,
104 |       }
105 |     }
106 |   }
107 | }
108 |

109 | 110 |

Key Definitions

111 | 112 |

Pipeline Names

113 |

A name that uniquely identifies the base type of the model. Each name specifies which components are listed under the models section. Current name/component mappings:

114 |

SD1.5: clip_l, clip_l_tokenizer, unet, vae
SD2: clip_l_clip, clip_l_tokenizer, unet, vae
SDXL: clip_l, clip_l_tokenizer, clip_g, clip_g_tokenizer, unet, vae
SD3: clip_l, clip_l_tokenizer, clip_g, clip_g_tokenizer, t5_xxl, t5_xxl_tokenizer, transformer, vae
FLUX: clip, clip_l_tokenizer, t5_xxl, t5_xxl_tokenizer, transformer, vae
PIXART ALPHA: t5_xxl, t5_xxl_tokenizer, transformer, vae
PIXART SIGMA: t5_xxl, t5_xxl_tokenizer, transformer, vae
HUNYUAN DIT: t5_xxl, t5_xxl_tokenizer, transformer, vae
WUERSTCHEN 2: prior_clip_l, prior_clip_l_tokenizer, prior_prior, effnet_encoder, clip_l, clip_l_tokenizer, decoder, vqgan
STABLE CASCADE: prior_clip_l, prior_clip_l_tokenizer, prior_prior, effnet_encoder, clip_l, clip_l_tokenizer, decoder, vqgan

126 | 127 |

Hashes

128 |

Models in the pipeline section must include a file hash, which is a sha256 hash of the file in which the component can be found. This is the minimum necessary to allow a frontend to easily find a model piece that is not included directly in the file.

129 |

Optionally, additional hashes can be included to aid the frontend in the search for a possibly similar model. The names, and their algorithms, are described here. Reference code for calculating these is included in the OMI Python repository, under the name "omi.hashes".

130 | 131 |

content_hash

132 |

The content hash is defined by hashing the concatenation of the first 4 kilobytes (4096 byte) of every tensor in the model, or less if the tensor is smaller. The ordering of the concatenation is done key name in alphabetical order.

133 | 134 |

similarity_hash

135 |

The similarity_hash is currently undefined.

136 | 137 |

Model-Specific Data

138 |

Some models have data that define how they operate. The most obvious example is the clip text encoder skip layer, which is set to 2 by default in SDXL, but there are others. The following are defined by the standard.

139 | 140 |

prediction_type

141 |

This can be one of "eps", "v", or "x0". For most model types, it defaults to "eps", but for SD2 it defaults to "v". Any DDPM/DDIM model can have a prediction_type entry. At the time of this document that includes SD1.5, SD2, SDXL, PIXART_ALPHA, PIXART_SIGMA, and HUNYUAN_DIT.

142 | 143 |

*_layer

144 |

Where * is the name of a text encoder. The layer at which the output should be taken from text encoder. A number of "0" means the first layer, "1" second layer, and so on. If undefined, the model default will be used. This data element can be included in any model that takes a text encoder output as input.

145 | 146 |

Key Layouts

147 |

The following strings are specified for the key_format field:

148 |

default: the default key layout for each model
151 | bnb_nf4: same as default, but the weight matrix of each linear layer is spread across 6 tensors. 152 | A tensor with the original weight name and 5 additional tensors with the postfix ".absmax", ".nested_absmax", ".quant_map", ".nested_quant_map" and ".quant_state.bitsandbytes__nf4" 153 |
bnb_int8: TBD

156 | 157 |

This list will be added to if formats become used in the community. This list should not be seen as an endorsement of the quality of any of the quantization schemes in it.

158 | 159 |

TODO

160 |

Not done yet: PEFT formats, bundled embeddings. 161 | 162 | 163 | -------------------------------------------------------------------------------- /jack.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Jack and Carla Make Windows Audio Bearable 4 | 5 | 6 | 7 |

I've been working on my musical education lately. After hurting my hand I haven't really been able to practice guitar like I want, so I started learning synthesizers. My hand has slowly been recovering (yay!) and I want to be able to play both guitar and synthesizers. I've been doing that, sometimes to my frustration, on Windows.

8 | 9 |

Linux audio is in kind of a weird state. In terms of system and software it's a little annoying. USB audio works fine, but it has to be USB. Super low latency Thunderbolt interfaces (like the PreSonus Quantum 2626, my ideal) just don't work at all, you need either Windows or Mac. (PreSonus, if you're reading this, get with the program already and release Linux drivers!) Software plugins are the same way, almost nobody commercially is releasing a Linux build even though the far-and-away most popular framework supports it natively. There's solutions like yabridge but they add CPU performance penalties. What Linux does have going for it though, is Pipewire.

10 | 11 |

Pipewire is this snazzy audio subsystem that lets you take sound coming out of any program or microphone and direct it into any other program or speaker, and it does it all without any additional latency. There's somewhat more to it than that, but that's the elevator pitch. And it is so convenient! I can send all my inputs through a compressor or a limiter, and all my outputs through a system-wide equalizer. Nothing you couldn't do with a DAW, but it works on everything on the system. It isn't limited to plugins you'd load in your favorite DAW.

12 | 13 |

Windows on the other hand, is a nightmare. It has multiple questionably-compatible sound subsystems, and trying to do something as simple as getting a program to send sound out of a speaker instead of your headphones is so painful that everyone has to use a third-party program just to accomplish this! That's not even getting to the latency, which is horrendous. I don't want to be playing my guitar or software synthesizer through this.

14 | 15 |

Enter ASIO

16 | 17 |

ASIO is Yet Another sound subsystem for Windows, developed by Steinberg (owned by Yamaha). Through a variety of techno mumbo-jumbo and being entirely in userspace, it bypasses all the latency issues of the multiple Windows subsystems. It's something of an industry standard: basically any commercial audio interface will come with a specialized ASIO driver. This lets you output sound into it and it Just Works, with the best possible latency. For example, here's the latency numbers for my system. I have REAPER, my DAW, sending a signal out my audio interface, a Behringer UMC1820. The signal then comes back in and is recorded into REAPER. The difference is the latency. (By the way, if anyone knows how to do this faster than manually matching sample points in REAPER, please let me know, because this is a pain in the ass.) The left side is with the Windows native APIs, and the right side is with ASIO.

18 | 19 |

20 | 21 |

My interface sample rate is 48khz, so that translates into 5.66 milliseconds under ASIO, and 37.7 msec under Windows native API. That's a pretty massive difference! It makes a huge difference when it comes to usability as well. For me 5.66 milliseconds or so is about my limit before I start noticing latency when playing an instrument, meanwhile the Windows native latency is practically unplayable. Note that I use a buffer size of 64 samples for this test; I could probably get a lot better on ASIO by dropping to 32 (maybe 4msec or so). Now that I've done this, I'll probably do exactly that.

22 | 23 |

Room Issues

24 | 25 |

This is all well and good, but where does the subject of this post come in? Well, that's due to further dissatisfaction. My "studio" is a tiny little room in this house that serves double-duty as our library. It just so happens to have a small corner cutout, making the room L-shaped, where we had enough free space that I could stick my desk and speakers. Music studio? Music room? Hah, I would be so lucky! What I have here is a music corner. And putting your setup in a room corner pretty much ignores every single piece of advice in the world about how to set up a music room.

26 | 27 |

When practicing guitar or synthesizer, or even just listening to music, I noticed that some sounds would create serious echoes. It would sound like blowing into a glass tube. Not really surprising that this corner setup would have some serious sonic deficiencies! Some testing with a spectrum analyzer revealed that it would be centered around 130 Hz. Testing with Room EQ Wizard confirmed this, and some other problem areas. What to do?

28 | 29 |

This is where a dedicated person would start investing in some acoustic treatment to make the room sound good. I might be dedicated, but I'm also very cheap. I figure that the only place that matters that the room sounds okay is the place that I'm actually sitting in, so I just have to make that particular spot sound acceptable. Can we set up an equalizer profile to make the sound okay here?

30 | 31 |

As it turns out, yes. There's a lot of ways to do this. I chose to use IK Multimedia ARC X, because I happened to get it for free when I purchased another piece of their software (Amplitube 5). It's a pretty nifty piece of software: you fire it up, use a measurement microphone (I chose a Behringer ECM8000, because again, I'm cheap) to take several measurements around your listening area, and it generates an equalizer profile that should correct that position into something reasonably flat-sounding. That's the theory, anyway.

32 | 33 |

ARC X, or rather, Windows, has a flaw. There's no way to just have the entire system's worth of sound flow into it to be corrected before going out my speakers. ARC X itself is meant to be used as a plugin in a DAW, taking all audio on the master bus and applying the final correction before going straight to speakers. Remember Pipewire on Linux? Sure would be nice if we had that on Windows. Luckily, we can get something like it.

34 | 35 |

Enter JACK

36 | 37 |

JACK can be thought of as the predecessor to Pipewire and it operates on a similar concept. Take audio out of program X, stick it into program Y. There's a lot under the hood there but that's the basics. Turns out it works just as well on Windows as it does on Linux! Well, sort of.

38 | 39 |

Windows doesn't really know what to do with JACK. It doesn't speak any of the native Windows APIs. It can't really get sound into it, though luckily you can get sound out of it into your speakers, so that's handy. JACK has a trick up its sleeve though. It exposes an extra ASIO audio interface, acting sort of like a soundcard. Any program that can speak ASIO can dump its sound to that virtual soundcard, thus getting the sound into JACK. Unfortunately, not that many programs speak ASIO.

40 | 41 |

Luckily, somebody else went ahead and solved this problem. The fine folks over at VB Audio have released a donationware bit of software called HiFi Cable. This exposes a sound interface that virtually all Windows programs can speak, and sends the other end into any ASIO device. In our case, we'll hook it up to our JACK audio device. Now any Windows sound can get into JACK.

42 | 43 |

Once it's into JACK, what do we do with it? Well, the whole point of this exercise was to get it all transformed by ARC X. ARC X doesn't have a standalone mode for this. It's what's known as an audio plugin, in VST format. That's something meant to be loaded by a DAW, but we don't want to use that. We just want it to sit in the background transforming everything! What can we do?

44 | 45 |

Enter Carla

46 | 47 |

Carla is a kind of patchbay-like system for JACK. It can handle all the routing between programs, which is really convenient for us because JACK's UI is kinda bad at doing that. What's even more convenient though, is that it can load VST plugins like a DAW can and expose them to JACK. It was written for Linux, but luckily it works on Windows too. So now we can take all that sound that HiFi Cable is dumping into JACK, and send it through ARC X that gets loaded into Carla. Then ARC X can send it right out our speakers.

48 | 49 |

50 | 51 |

Success! It no longer sounds like I'm playing music in a train tunnel whenever I hit a note or harmonic at 130Hz! The rest of the room probably sounds terrible but, who cares? I only sit in this spot anyway.

52 | 53 |

How Does This All Work Anyway?

54 | 55 |

Honestly...it's kind of flaky. A lot of that has to do with Windows, which has trouble initializing things in order. JACK has to launch Carla and HiFi Cable, in that order, which I do by means of a VBS script that JACK launches at startup. Resuming from sleep mode is a real problem; oftentimes HiFi Cable will just stop responding and need to be forcefully killed and restarted. From a fresh boot though, works great!

56 | 57 |

Latency's not bad, either. JACK advertises itself as theoretically zero latency. That's great and all but theoretically I'm the world's most handsome boy (according to my mother) and we all know how that turned out. Here's the same comparison I made before, between REAPER going in/out the raw Behringer ASIO interface and going through JACK before going out. ARC X and HiFi cable aren't included in this, because they aren't used in any latency-sensitive applications and I'll run ARC X as a plugin in REAPER when I'm doing music work.

58 | 59 |

60 | 61 |

So that's 5.66 msec for our audio interface by itself, and 8.33 when going through the extra layer of JACK. Not exactly zero latency but honestly, not that bad. Remember you're going to be bypassing JACK entirely and working directly (ASIO in -> DAW -> ASIO out) when doing actual music work.

62 | 63 |

Thanks JACK! And Carla! And VB Audio! You've made Windows usable for general-purpose audio for me!

64 | 65 |

Wishlist

66 | 67 |

There's a few things that aren't great that I wish would improve.

68 | 69 |

First of all, Linux support. PreSonus, get some drivers for your Quantum 2626 and other thunderbolt devices! If you do this, I will personally buy a 2626 and tell every single person I know to do so as well. Plugin developers, you too. Where's the Linux support? I buy plugins extremely rarely because there's no Linux support. Melda, if you start releasing Linux versions of your plugins I will personally buy your complete bundle. IK Multimedia, you guys are a big commercial outfit, let's get Linux support into Amplitube. C'mon already.

70 | 71 |

On the Windows side, I figure Windows will never get any better because lol. There are a few things that can be done to improve this stack though.

72 | 73 |

The aforementioned VB Cable flakiness on sleep/awake. Not really sure what can be done about this.
Carla, it would be really nice if it could be minimized to the system tray.
Carla again, it would be nice if they could cut a new release. CLAP support has been in the repo for a while, but it's been an extremely long time since an official release was made.
JACK, I wonder if the 128 sample latency (two buffers, basically) that's added on top of the audio interface can be improved or not. Would be nice if so though.
Not sure if it's possible to create default routing in Carla and JACK, but it would be great if I could automatically route everything to ARC X inputs as it appears rather than it automatically go to the system playback inputs.
I don't use headphones very often, but it would be great if I could enable/disable ARC X or change its profile when they're plugged in. This is complicated by the UMC1820 not providing a signal to the OS when headphones are plugged in (I think, ayway). 80 |

81 | 82 | 83 | --------------------------------------------------------------------------------