Skip to content

Instantly share code, notes, and snippets.

@amodm
Created February 14, 2022 17:14
Show Gist options
  • Save amodm/4892a5f940cb08e068ad89701b82ed55 to your computer and use it in GitHub Desktop.
Save amodm/4892a5f940cb08e068ad89701b82ed55 to your computer and use it in GitHub Desktop.

On limitations of human interface devices (HIDs)

This post is in response to the WeekendDevPuzzle of 2022-02-12, which has submissions from people who chose to share their perspectives on the puzzle. I'll be linking this post to this Twitter thread, to avoid polluting the original one.

Motivation for today's topic

Every piece of technology gets engineered for a certain set of characteristics, e.g. the vehicle that you use, was designed with a certain typical & max capacity in mind. Same is true for the devices we use to interact with computing systems, be it your workstation, your laptop, or your smartphone. But how often do we reflect on those design characteristics?

Today's puzzle is about throwing some light on these devices, with the hope that it leaves us more informed.

Dissecting the puzzle

Flow of information

So, we have an arcade style game being played on a computer. Clearly, the flow of information would be something like this:

  1. CPU of the computer calculates its game model (which alien ship, or bullets, are at what position etc) & tells the GPU to draw.
  2. The GPU uses this model information received from CPU to paint a frame of the picture, and sends it over the display cable to the screen.
  3. The screen uses these rapidly received frames to tell all the pixels to change themselves to their new values.
  4. Our sensors observing the screen observe these pixels (or blocks of pixels) to change, and send them over the wire to our program (sitting in a different computer).
  5. Our program does some calculations & determines what steps to take, e.g. pressing the left key 4 times. This is fed as an electrical signal to some wires we've attached to the keyboard keys to electrically press a key.
  6. The input is received by the OS and fed to the game program, looping back to step 1.

Potential areas of bottleneck

Looking at the above, potential areas of bottleneck would pretty much map 1-1 with the primary actor in each of those steps. But we can make some simplifying assumptions to narrow them down. e.g. given that the game mechanics are quite simple (old arcade style), we can make an informed guess that CPU+GPU of the computer are not a bottleneck. This assumption breaks down if the computer is quite old, but let's stick with this for now.

Likewise, we can assume that the external sensors we've put in front of the screen are not a bottleneck because we can use the fastest possible sensors, and other optimisations, which allow us to keep them as fast as possible.

That leaves us with the following areas:

  • Latency of the monitor screen
  • Latency of the keyboard
  • OS latency when receiving inputs. This is the time when the OS detects a keyboard input to the time it sends it to the application. Again, assuming a fast enough computer, we'll ignore this for now.
  • Bot/Human latency introduced by the time difference b/w the sensor observing something vs our logic controller (on which our code is running) making a decision by sending a keyboard event. We can assume this to be reducable to the order of microseconds, given that it's a fairly simple set of logic, thanks to the simple game mechanics.

Let's analyse the first two in more detail.

Screen latency

When we're using any screen, there are three parameters that become important to us in this scenario:

  1. Response time. This is a measure of how fast pixels can flip (typically measured in milliseconds). Actual number depends on the nature of tech used for the screen. You can read here for comparison between different technologies. For our calculations, we'll consider two scenarios: 1ms (possible in today's gaming monitors) and 10ms (typical of IPS monitors).
  2. Input lag. This is a measure of time delta b/w the time that a signal was received to the time it's converted to signals for pixel flipping. This can vary from microsecs to several 100 ms if too much image processing is going on (this is why TVs often have a "game mode" - it switches off this processing). We'll assume this to be zero, assuming that we can turn it off in our hypothetical scenario too.
  3. Refresh rate. Closely related to response times. Essentially captures how many images/sec can be displayed. So, a 60Hz monitor can paint the whole screen 60 times per sec. This matters to us, because this effectively adds latency to our sensor, as our sensor can read only what has been displayed. So no matter how fast our sensor operates, we'll be limited to say 1000/240=4ms latency in our sensor for a 240Hz screen. For our analysis, we'll assume two scenarios: 60Hz (typical LCD) and 360Hz (extreme gaming screens). Before you say wtf to the number 360, please read this.

So, our scenarios are:

  • Latency added due to response time: 1ms, 10ms
  • (Effective) latency addition in sensor due to screen refresh rate: 16ms (for 60Hz), 2.7ms (for 360Hz).

Keyboard latency

You might be forgiven if you thought that every time you press a key on your keyboard, you raise an interrupt. Indeed, at one point in time, that's how PS/2 ports for keyboard & mice used to work. But modern keyboards work on USB, which doesn't rely on interrupts, but instead a polling from the OS. This poll rate depends upon the device, the USB negotiation done, and driver settings. Higher polling rates require better keyboard USB controller, and have a higher power draw, but give better lantecy.

For our scenario analysis, we'll use values of: 10ms (typical), and 0.125ms (extreme).

Remember that wireless keyboards are also a factor. We can consider two kinds of wireless:

  1. Bluetooth - where the keyboard is directly talking to BT of the computer. These have horrible latencies (20-80ms), so we'll skip these entirely.
  2. USB dongle in computer, and this dongle talks wirelessly to the keyboard. Latencies here are always going to be less than or equal to a USB wired keyboard, depending upon the quality of wireless h/w, so we'll skip this scenario as well, and instead just use wired USB keyboard for our analysis.

Analysis

We can breakdown all the factors as follows:

Screen Response Time Screen Refresh Rate Keyboard Latency Limited By
1ms 60Hz (16ms) 10ms or 0.125ms Screen Refresh Rate
1ms 360Hz (2.7ms) 10ms Keyboard Latency
1ms 360Hz (2.7ms) 0.125ms Too close to call
10ms 60Hz (16ms) 10ms or 0.125ms Screen Refresh Rate
10ms 360Hz (2.7ms) 10ms Screen RT/Keyboard
10ms 360Hz (2.7ms) 0.125ms Screen Response Time

Simplistic much?

The astute amongst you would've noted that the assumptions we've called out till now are not sufficient. There's also the reality that our sensor can observe multiple frames before making a decision (assuming that the alien ship's attacks take a few frames to reach us). So in that sense, we should really be dividing the (screen response time + screen refresh rate) by a factor (say 10, assuming we can afford to lose 10 frames before making a decision), and the answer would look quite a bit different. But in the interest of keeping this short, we'll ignore this.

Conclusion

As always, the answer to the puzzle was dependent on the exact configuration chosen. A bunch of folks who responded had the right intuition behind this, largely from their gaming b/g I suppose, but I hope some of this was still new information. Personally for me, while setting up this puzzle, the 360Hz monitor was a bit of a revelation :)

@amodm
Copy link
Author

amodm commented Feb 18, 2022

@smit-hinsu makes an excellent point about doing away with monitor + sensor altogether without impacting DRM restrictions, and reading the frames directly from HDMI (assuming no HDCP). That will firmly place the bottleneck on the keyboard.

@animesh-chouhan
Copy link

A great puzzle indeed, forces us to examine the latencies of various devices such as monitor, CPU, and HID devices. After researching a bit and from my experience a polling rate of 1000Hz for HID(Human interface device) devices is more than enough for humans as they are designed for us. So according to the setup mentioned a HID device isn't suitable for this application. But since this is a human-playable game it expects human inputs and therefore we are constrained to use it.

But if we are capturing frames directly we can use synthetic keyboard events that would remove this bottleneck. Also, I tried to evaluate the computation latency and it would be in the order of 10ms even for a very simple game. So this estimation is based on the fact that just looping over 1920*1080 pixels and doing a dummy operation like incrementing a variable takes around 1s for 1000 frames in a C program. So it gives 1ms per frame, assuming that some operations will be performed after analyzing the frame we can easily see that this is also a possible candidate for bottleneck. My intuition suggests that the bot program will be the bottleneck by a large margin.

@amodm
Copy link
Author

amodm commented Feb 24, 2024

@animesh-chouhan, you're right - this isn't the ideal setup for automating gameplay. Idea was to explore latencies of HIDs, and the puzzle was constructed around it as you can guess :)

The point about computational latency limit is quite valid when processing it on a regular CPU. One can argue though that I can always put the nature of such processing on an FPGA or a GPU (latencies will be well below 100us range). Even on a high end CPU with large vector register files + HBM3, this should be possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment