X11 is the protocol that is used for displaying windows on all Linux and Unix-like systems. For talking to an X11 server, there are two libraries: Xlib and XCB.
XCB was designed to support "latency-hiding". In X11 there are two kinds of requests that a client can send to the X server. Those that cause a reply from the server and those that don't (All of them can cause errors, let's ignore those for now). Since X11 can be used over the internet, waiting for a reply can take some time. "Latency-hiding" now means that one can send multiple requests at once and doesn't have to wait for the replies. The "Basic XCB notions" section in this document gives some examples on this. Xlib one cannot do something like this.
But just because XCB supports this, stuff doesn't magically get faster. The application that uses XCB has to be written with this in mind. But how does one measure how bad your code behaves? My idea here was to write a program that accepts the TCP connection and forwards everything to the X11 server, but adding some latency. This means that one can simulate how much this latency hurts. Together with xtrace's "--relative-timestamps" one can now take a closer look at which requests are hurt by this latency.
With these tools, I analyzed awesome's behavior. For example, this is xtrace's output while awesome determines the number and geometry of the available monitors (my "latency" program was set to delay the data transfer by 0.5 secs, that means the round-trip time was one second):
2.107 000:<:0046: 16: Request(98): QueryExtension name='RANDR'
2.610 000:>:0046:32: Reply to QueryExtension: present=true(0x01) major-opcode=144 first-event=91 first-error=148
2.610 000:<:0047: 12: RANDR-Request(144,0): QueryVersion major-version=1 minor-version=1
3.113 000:>:0047:32: Reply to QueryVersion: major-version=1 minor-version=1
3.113 000:<:0048: 8: RANDR-Request(144,8): GetScreenResources window=0x00000111
3.620 000:>:0048:676: Reply to GetScreenResources: timestamp=0x010f89c5 config-timestamp=0x010fc55c [...]
3.659 000:<:0049: 16: Request(98): QueryExtension name='XINERAMA'
4.165 000:>:0049:32: Reply to QueryExtension: present=true(0x01) major-opcode=145 first-event=0 first-error=0
4.165 000:<:004a: 4: XINERAMA-Request(145,4): IsActive
4.669 000:>:004a:32: Reply to IsActive: state=true(0x00000001)
4.669 000:<:004b: 4: XINERAMA-Request(145,5): QueryScreens
5.173 000:>:004b:40: Reply to QueryScreens: screens={x=0 y=0 width=1500 height=500};
How does one read xtrace's output? Let's take a closer look at the first
line:
2.107 000:<:0046: 16: Request(98): QueryExtension name='RANDR'
2.107 seconds after connecting to the X server (this is the output from
--relative-timestamps) the client number "000" sent a request to the X server
("<") with the sequence number "0046" (The sequence number is increased with
each new request, the first request gets the sequence number 1). This request
was awesome asking for the extension 'RANDR'. As one sees from the output,
awesome waits after each request for the reply before continuing.
To speed this up, something had to be done. XCB wants to provide latency-hiding, so there has to be a way to do this. A quick look at the docs finds a function called "xcb_prefetch_extension_data()". This function sends the QueryExtension request without waiting for the reply.
This resulted in a patch for awesome that prefetches the needed extensions and moves some stuff later places, so that we don't have to wait for the reply any more.