Nixers Book Club - Book #3: The Wayland Book

Nixers Book Club - Book #3: The Wayland Book - Community & Forums Related Discussions

Users browsing this thread:

	seninha Offline \| 15-03-2021, 12:20 PM \| #1

As proposed in the last thread, the next book of the Nixer's Book Club is gonna be «The Wayland Book» by Drew DeVault.

The book is online for free here.

Since it's a short book, we can do 3 chapters a week (or 4, depending on how we go).

The next book club session to discuss the first chapters will be the next Saturday, March 20.

	venam Offline \| 20-03-2021, 11:29 AM \| #2

Here's my summary/overview/review of the first 3 chapters.

The book starts with a very high-level view of what computer interaction
is about: input and output devices as shared resources, the kernel taking
care of abstracting interacting with them via drivers and libraries that
have a common interface.
We're presented with a run down from hardware, kernel, to userspace.
This takes the form of evdev for mapping devices in /dev, and DRM to
abstract graphics and map them into userspace unto /dev/dri for each card.
KMS, a subset of DRM, is used to enumerate displays and configure their
settings.

Wayland is explained as a protocol to do compositing, drawing windows
on output devices, and providing a common access to input and output
devices. Wayland is the protocol allowing interaction between different
clients containing objects implementing interfaces (requests and events)
related these graphics, input and output devices.

In the Wayland world different libraries are used, the author lists a
couple of them:

libdrm is the user-space interaction for the kernel DRM.
Mesa is a vendor optimized OpenGL (and Vulkan) and GBM (Generic Buffer Management). From what I know OpenGL is a graphic standard, so is abstract, while DRM is a specific Linux implementation.
libinput: abstract handling input devices
xkbcommon: used for keyboard mapping, from scan codes to symbols. It reminds me of keysymdef header.
pixman: pixel manipulation library (image compositing)
libwayland: most commonly used implementation of the Wayland protocol, tools to generate high-level code from Wayland protocol definition (XML files). This is sort of like what XCB does for X.

That's as far as the introductory chapter goes, the next two chapters
(2 and 3), in my opinion, go over the wayland basic wire protocol and
library implementation.
The main message I took from them was that Wayland is built on the
concept of separation of concerns. The clients and servers implement
well-known interfaces, they agree on a shared abstract language and
promise to respect them.
Practically, this takes the form of auto-generated functions, built from
agreed primitive types, that will take care of marshalling two things
for each implemented interface: events and requests.

When installing the basic Wayland package you're given some
basic definition of the protocol via XML files, the main one being
wayland.xml, which on my machine are found in /usr/share/wayland and
/usr/share/wayland-protocols.
In these directories you can also find a distinction between stable
and unstable branches, which is explained to be the way Wayland decides
whether something should be supported by everyone or kept on the side
as testing.

A script called wayland-scanner can be used to convert these xml files
into code, for either the server, or client, or other.
libwayland, if I understood properly, is some pregenerated library from
the basic protocol.
wayland-scanner generates for interfaces requests and listeners functions.
Listening for events using `wl_listener` and can send requests.
listening to events such as "enter", "leave".
The namespace, the start of the function name is the interface.

As far as the protocol is concerned, the messages are exchanged over
Unix sockets. The Wayland compositor, the server, being the routing.
The clients find the address of the server by running checks on different
environment variables:
WAYLAND_SOCKET, WAYLAND_DISPLAY along with XDG_RUNTIME_DIR, or assume
the name is XDG_RUNTIME_DIR/wayland-0

Objects can live either on the client side or server side.
Objects implements some of these interfaces, knowing which interfaces
an object implements let you know what actions you can do with them. These
are either requests or events.
Objects are known to both client and server, it has states.
On the client side the object are interacted with using wl_proxy
interface, which is used to marshall requests of a specific interface
to be ready to go on the wire.
On the server side they are accessed using wl_resource, to find a
reference to the wl_client owning the resource.
The manpages say wl_proxy "Represents a protocol object on the client
side."

For example, we've seen the wl_surface interface, which represents a
drawable, sort of like in X.
There's a lot of other interfaces in the wayland-protocols directory
other than wayland.xml I find on my machine for different things. These
are probably the protocol extensions mentioned.

The basic protocol also implements atomicity. When issuing multiple
requests they don't take effect until the "commit" request is sent. I
think the summary in the xml is better than in the book "commit pending
surface state".

Chapter 3 ends by asking the question of how clients and server know
about all the objects, and their ids, that are currently available.

	seninha Offline \| 20-03-2021, 12:13 PM \| #3

The introduction chapter does an overview of the Wayland environment from the hardware to the userspace and explains what is packaged into Wayland. Wayland.xml defines the high-level protocol and wayland-scanner generates the glue code from this xml.

Chapter 2 explains how requests and events work under that protocol.
Surfaces is kinda what Windows and Drawables are for X11, there is a “damage” request that is similar to the “exposed” event on X11. The damage request is a recurring example through the book. Wayland is an asynchronous protocol. Requests are accumulated until commit.

Chapter 3 deals with libwayland and its implementation. It explains how wayland-scanner works and how it generates client and server headers and the glue code from the xml protocol. It also explains how objects are referred by clients and the server.

In chapter 4 we are taught how to connect to the display, which is similar to a X11 display. We create our first client, a simple program that just connects to the display. We are also taught how to implement the event loop.

	movq Offline \| 21-03-2021, 03:45 AM \| #4

Whoops, I somehow thought it said “Sunday”, not “Saturday”. Are we doing 1-3 or 1-4? Here’s 1-3, which were rather basic I think, and I got 4 ready where we finally write some code.

1 Introduction

Quote:designed and built by the alumni of the venerable Xorg server

Very important point right at the start: It’s made by the same people that were involved with X11. So we can hope that they learned from past mistakes and did better this time. Put differently, it’s not a bunch of “know-it-alls” who saw X11 and arrogantly proclaimed: “Nah, I can do that better!”

Quote:Outside of the box, we have your displays, keyboard, mouse, perhaps some speakers and a cute USB cup warmer.

Oops, guilty. https://movq.de/v/66aa180637/cupwarmer.jpg (Works. Kind of.)

Chapter 1.1 finishes with a rough overview of the several libraries involved. Mostly nothing new to learn here.

2 Protocol Design

First interesting point: Unix domain sockets are so important for Wayland, because we can transfer file descriptors between processes. That’ll be used to transfer clipboard contents, yes? Good! So we can finally get rid of the horrible X11 clipboard mechanism.

In 2.2, we already see a bit of the protocol on the wire. Looks good, simple.

Okay, requests, events, nothing spectacular.

2.3 tells us how the XML files are to be read. Nothing fancy. I couldn’t help but notice that the XML files are rather simple – unlike some other beasts.

2.4: Atomicity. Much different from X11, where it didn’t exist. Someone (probably mort?) once said in IRC, that X11 looks terribly ugly with all the partial screen updates, but you just don’t notice them, unless you have used Wayland for a while. I hope that isn’t true, because I didn’t plan on switching to Wayland right away. We’ll see.

3 libwayland in depth

Hmm, alright, we get an overview of the code that is generated from the XML specs. It doesn’t look too complicated and it’s really only just a bunch of objects, requests, and events. I’m starting to see the pieces come together, but we’re not quite there yet. It’s probably just the foundation for the next chapters.

	movq Offline \| 23-03-2021, 01:51 PM \| #5

(21-03-2021, 03:45 AM)movq Wrote: 2.4: Atomicity.

That, of course, complicates things. For example, this blog post of Drew mentions (in the footnotes) a common scenario in sway: Changing the tiling layout. That requires resizing all currently visible clients and we want to do that atomically. An X11 WM like dwm just loops over the clients, resizes them, done. Sway has to “reach deeper into our shell implementations to atomically syncronize the resizing of several clients at once”. Not sure what that means, haven’t read the code.

On the other hand, there’s probably nothing that forces you to do this in that scenario (tiling WM changing its layout). It’s just that sway wants to provide as much atomicity as possible, I guess.

(Another thing I just realized: You can’t really enforce atomicity anyway. sway might resize/move/whatever my terminal emulator atomically, okay, good. But the terminal then tells, let’s say, Vim that the window has been resized. That’s done using SIGWINCH, which takes some time to be emitted and processed. I’m not saying Wayland’s efforts to promote atomicity are pointless, but it’s still something to keep in mind.)

	seninha Offline \| 27-03-2021, 10:54 AM \| #6

Chapter 4 deals with the display, the process of connecting to the server. The process, for clients, is very similar to opening a display in X. We can open a display based on an environment variable, or directly via a file descriptor. The process for servers is shown: the display is created and the socked is added, possibly automatically.

The routine wl_display_run run libwayland's internal event loop on the server. It will process events and block until the display terminates via wl_display_terminate. You can add file descriptors, timers and handle signals with its own event loop implementation, so the programmer does not have to implement their own. But it is possible to manually process client updates: wl_event_loop_get gets a poll-able file descriptor.

For clients, the event loop must be implemented by the programmer. wl_display_dispatch dispatches the queued events for the programmer to process them.

Chapter 5 deals with the wl_registry interface. If I understood correctly, when creating an object, the server an event for each global object on the server. Once the object is created, we can use its interface. An example is shown, where we connect to the display, obtain the object registry and add a listener to it, implement a prototype of weston-info, from the weston project. Then, the chater shows how to do registering of globals in the server side.

Chapter 6 deals with the wl_buffer and wl_surface interfaces. Through the wl_compositor interface, we send the window to the server to be composited with the other windows. Surface seems to be what a Window is to X11, a rectangualar area with a local coordinate system that can receive input and be displayed. Before being presented, we need to attach a wl_buffer to it.
I did not try the examples, as I couldn't use my Linux machine this week. I'll try do the compositing thing next week.

	venam Offline \| 27-03-2021, 11:33 AM \| #7

Here are my notes/reviews of chapter 4-6.

In chapter 4 and 5 we're presented with the concept of global singleton
objects, core globals, that implement features requested by all other
objects, omnipresent in all connection. Basically what the "wayland
compositor" or server usually implements.

We learn that one of them is the object implementing the wl_display,
that has id 1, so it's easy to find it.
It implements a method called get_registry and is used to registering
new objects, removing them, and getting the list of them. So from it we
can learn about all that is living on the server.

For clients the connection to the wl_display is done by selecting a file
descriptor manually or through an environment variable.
On the server, it's about creating that file descriptor and running the
wayland internal loop. A lifecycle as follow:

wl_display_create
wl_display_add_socket_auto auto create the Unix socket file, we can also specific where the file is create.
wl_display_run (run libwayland's internal event loop and block until wl_display_terminate is called.)
wl_display_destroy

The server runs the wl_event_loop, obtained using wl_display_get_event_loop.
Then it's all event driven programming, we listen to events we configure
and act on them.
The events are received using a fd that is pollable so we can
possibly even do it ourselves manually using wl_event_loop_get_fd,
wl_event_loop_dispatch, and wl_display_flush_clients.

Clients don't have their own event loops, a simple wait/select on
events suffice.

Code:
while (wl_display_dispatch(display) != -1) {

    /* This space deliberately left blank */

}

When implementing an interface we have to implement its methods. On
the server side, for the wl_wayland, that means we have to handle the
registry to keep a list of the objects.

On the client side, when binding to the registry we have to announce
which interface we implement, so that others become aware of them. Then
the server emits a "global" event for all object, which when we catch will
have objects that exist on the server, their id and the interface and
version they implement.
With that we have the ids and we know which interfaces are implemented
by who, so we can interact with them.

I'm not sure why they go into dissecting the binary protocol. There should
be a parser for this to be in human-readable form, and it's available
by setting the env WAYLAND_DEBUG to 1.

So overall that gives a sort of loop to interact with the objects for clients::
- connect to display
- get wl_registry from wl_display singleton (id=1)
- add a listener for registry events "global" and "global_remove"
- let the handler print info when it receives global events such as
interface and object id (here it's 'name')

So far this example is pretty clean, I like the way the listeners are
defined through structures that have pointers to handlers, and how
everything is done locally through pollable unix socket as file descriptors.

Chapter 6 goes into the inner workings of the graphic part philosophy.
In wayland it's done through buffers (wl_buffer) and surfaces (wl_surface).
From what I understood, surfaces are sort of like windows, rectangular
areas (onscreen surface) that have roles and can receive user inputs, and
buffers are the pixels that will be displayed on the window.
We have to explicitly attach the buffer to the surface, using
wl_surface_attach, damage, and commit.

The buffers can be stored in a shared memory, mmap, and the file
descriptor given to the server, or through a GPU interface like DMA.

To make this possible, the server (the object implementing wl_display
I guess, so ID=1, or other that should be caught with "global" event)
implements two interfaces: one called
wl_compositor, that can be called to create new surfaces objects,
returning their ids, and another interface called wl_shm, that can be
called to create shared memory pools for clients to use as buffers for
their surfaces (wl_shm_pool).

It's interesting that then manipulating the mmap data is equivalent to
manipulating the pixels. Obviously, we have to define the pixel format
and all that.
The rendering technique seems very "raw" but I'm sure it's way more
optimized when using the GPU with dma instead of the CPU to manipulate the
pixels (through wl_egl_window_create and wl_egl_window, which apparently
aren't part of the standard wayland.xml description).

Quote:With the shared memory approach, sending buffers from the client to the
compositor in such cases is very inefficient, as the client has to read
their data from the GPU to the CPU, then the compositor has to read it
from the CPU back to the GPU to be rendered.

The chapter ends with a teaser about the role that the wl_surface can
play. The description in wayland.xml is interesting too.

Quote:A surface without a "role" is fairly useless: a compositor does
not know where, when or how to present it. The role is the
purpose of a wl_surface. Examples of roles are a cursor for a
pointer (as set by wl_pointer.set_cursor), a drag icon
(wl_data_device.start_drag), a sub-surface
(wl_subcompositor.get_subsurface), and a window as defined by a
shell protocol (e.g. wl_shell.get_shell_surface).

So far, I think the design thinking behind Wayland is still in touch with
my initial impression: very role-centric, or can be seen as a
decentralization/separation of concerns. I like that all the possible interactions,
requests and events are defined in XML format, that makes for a no-surprise
environment. However, I kind of know that this is also limiting when the basic
protocol doesn't implement all that you need. We'll see more of that later in
the book I guess.

	movq Offline \| 27-03-2021, 11:54 AM \| #8

4 The Wayland display

Alright, so the “display” is the “core” object that everything hinges on? (Much like a display in X11.) It provides access to the “registry”, which you can use to allocate or retrieve other objects.

In 4.1, finally get our hands dirty with some code! To test the first code example, I just fired up `weston` inside of my X session. I am a tiny little bit confused at this point, because I can use my standard hotkey to launch a new terminal – inside of weston. How is that possible? Left-overs from previous Wayland experiments? Oh, no! GTK simply prefers Wayland over X11, so when I press my hotkey, my normal X11 hotkey daemon interprets it and starts a new terminal process, which then proceeds to … pop up in my nested Wayland session. Bit confusing at first, but understandable.

Wait, I’m stumped. `wl_display_run()` in the server runs the event loop, but how do I get to process the events? What am I missing here? I would have expected that there is a way for me to specify callbacks or something like that. “On event $foo, call function $bar.”

I peaked at chapter 5 and expect these questions to be answered there.

Nope, but they’re answered here: https://github.com/swaywm/wlroots/blob/m...ywl.c#L910 I guess it’s just because the book is focused on clients. It worries me a bit that tinywl links to Drew’s blog instead of “official” documentation.

5 Globals & the registry

When you bind to the registry, the server emits events for each existing object. Is that correct? That’s a bit unusual, isn’t? I kind of expected that I could query the registry and receive a list of objects, by doing a synchronous request. How do I know when the server has finished emitting those events? I probably don’t.

Also a bit surprising to see that the client allocates IDs.

We learn about the environment variable `$WAYLAND_DEBUG`. That’s going to be handy, I guess.

Registering globals on the server appears to be a bit tedious at first glance with all the interfaces and their implementations.

6 Buffers and surfaces

So, a surface is, I would say, a “window” and a buffer is something that clients actually draw into – or “a source of pixels”, as they call it.

I tried to play the puzzle, i.e. create a working client that displays a window showing the XOR pattern. I failed. I’m missing the basic program skeleton: The example in 5.1 only calls “wl_display_roundtrip(display);” once, but what is the full event loop supposed to look like? At which point do I create and commit my surface? At the moment, my “main()” looks like this:

Code:
int

main(int argc, char *argv[])

{

    struct wl_display *display = wl_display_connect(NULL);

    struct wl_registry *registry = wl_display_get_registry(display);

    struct our_state state = { 0 };

    wl_registry_add_listener(registry, &registry_listener, &state);

    while (wl_display_dispatch(display) != -1) {

        wl_display_roundtrip(display);

        wl_display_flush(display);

    }

    return 0;

}

At the end of “registry_handle_global()”, I have this:

Code:
if (state->compositor && state->shm)

        pixels(state);

And “pixels()” does the whole dance of creating the buffer and committing the surface. Doesn’t work and I’m a bit too tired at the moment. Maybe I’ll find out tomorrow.

– edit: The end of chapter 7 contains a full working example. My code above is very wrong.

	venam Offline \| 03-04-2021, 09:08 AM \| #9

Here's my summary/review/notes on chapter 7 to 9.
These were more heavy and hands-on chapters, after getting to know the
basic ideas from previous chapters regarding interfaces and decoupling,
we now put it all into practical actions with actual windows and inputs.

XDG shell basics

The XDG (cross-desktop group) shell is a standard protocol extension for
Wayland which describes the semantics for application windows. Roles are
like children in OOP, or traits, defining extra methods for wl_surfaces.
This extension of wl_surface will keep reappearing over and over, that's
how everything is done in Wayland, adding capabilities, functionalities,
and features over it, new traits.

In XDG, there are 2 roles defined: toplevel and popup. To form a tree
of surfaces. It's not a standard protocol, it's defined in extensions
/usr/share/wayland-protocols/stable/xdg-shell/xdg-shell.xml
xdg_surfaces are surfaces in the domain of xdg-shell, new traits that
add functionalities over normal surfaces. Namely, toplevel and popup,
xdg_toplevel and xdg_popup. xdg_toplevel is finally our window.

I initially tried the example movq showed on irc by using wl_shell
get_shell_surface method but apparently, we should use xdg_wm_base
instead with get_xdg_surface.
As we said, it's not defined in the usual wayland headers that come
with the distro packages, so at this point we need to generate the files
using the wayland-scanner we've seen before and include it in our Makefile.

The actual drawing of pixels happens on the configure and is done after
acknowledging it.
To answer movq, we assume after the rountrip we have all the globals
from the registry, after wl_display_roundtrip(state.wl_display).

The example is extensive, it takes quite a lot of implementation to get
a window drawn, and without decoration. It's even more decoupled than
I thought. Yet, in the following chapter about inputs, we add even more
boiler/glue code.
Also, there's no decoration, but let's wait, maybe the book will talk
about that later.

Surfaces in depth.

This chapter dives more into the functionalities of the wl_surface. Which
as I understood correctly is extended heavily into different roles, adding
traits with methods that it can fill in extensions. From wl_surface to
xdg_surface to xdg_toplevel.

These wl_surfaces have their own lifecycle.
The wl_surface drives the atomicity through its pending, committed,
and applied states.
A lot of states can be changed before committing to the surface, such
as the wl_buffer, damage region, input events, etc..

To give the first state, you need to give the surface a role, allocating
and attaching the buffer, then commit again.
We see that in the previous example at the end of the configure request
for xdg_surface_listener.configure. I guess that's what it does.

Quote:The next question is: when should I prepare a new frame?

And how too? in the event loop? … And no, it's done preferably after
receiving an event from the wl_callback called "done", or after input
events in event-driven applications.
This is interestingly efficient and also low-level. We can manage
each frame.

To get this behavior of frame-callback, we need an object implementing
this wl_callback interface which we get from the wl_surface.frame
request. Then we set a listener for event "done".
callback_data is the current time in millisecond.

Then inside this callback, we destroy and recreate the callback, call
draw again, reattach the buffer to the wl_surface, call damage the entire
surface, and commit. The destruction and reconstruction of the callback
is a bit confusing.
So I guess internally it'll automatically only redraw what needs to
be redrawn if we damage only a certain area.
That's exactly what is done in the 8.3 section.

Overall, it seems like we need to keep all these structures, these objects
adding "traits" to the surface and manipulate them from everywhere.
The global state invades everything, passed to all events. I'm sure it
could be done otherwise though.

Surface regions
wl_compositor can be used to create an object of type region, a wl_region
by callling the request create_region.
A region is a goup of rectangles creating an arbitrary shape, by doing
operations between different rectangles.
These arbitrary regions can then be passed to wl_surface as either opaque
region, for what part of the wl_surface is visible, or as input region,
for which part can accept input.
These are interesting to control surfaces I think.

subsurfaces
In the core protocol, wayland.xml, only one surface role is defined and
that's subsurface.
They are child surfaces that are positioned relatively to a parent
surface and a z-index, kind of like transient/modal/popup/dialog windows.
This can be used to do window decoration.
Funnily, these are created from yet another global object:
wl_subcompositor, even more separation of roles, yey!
The subsurface can then be manipulated like a normal surface but has
"place_above/below" functions. It's in sync with the parent surface
lifecycle as far as the atomic operations on buffers and others go.

high density surface (hiDPI)
wl_output, which represents a display object, I guess, sends an event
saying the scale factor in place. This scaling factor can then be applied
to wl_surface via set_buffer_scale.
I think that's a really nice way to handle hiDPI, that should solve a
lot of things. However, as with everything Wayland is only the protocol,
which practically is only the XML definition of interfaces, so we have
to handle this manually.

Chapter 9: Seats, handling inputs

Finally, we're going to interact with windows.

A seat represent a user with the inputs, pointer and keyboard.
Yet another global that is accessible and that you can bind during startup.
It offers a pointer, keyboard, and touch that you can get through requests.
Each have their own interfaces defining how to interact with them.
You can know what is supported from the capabilities, which is a bitfield,
you can do bitwise operations to compare them with constants in the form:
`WL_SEAT_CAPABILITY_*`.

A concept of serial IDs is introduced, each input event is associated
with an ID that needs to be sent back so that the server can decide
whether to respect it or not.

Another concept of input frame is introduced, each input event is actual
fragmented into multiple ones, which are sent separately, until a final
"frame" event is received indicating that it was a single set of states
that belong together to the same input.
We're adviced to buffer things until we actually receive that event. That
mindset goes along the drawing on wl_surface, it's efficient.

The book then dives into each input type.

First one pointer input, returning a `wl_pointer`.
It has all the usual events we would guess: from enter, to leave, to
button clicked, scrolling/axis, etc..
We can notice the serial id being included.
We can create a cursor for the pointing device using the request
set_cursor and passing a surface.
It's interesting how the axis source is well-defined.

Second one is keyboard input, returning a `wl_keyboard`.
We're getting an explanation for XKB, keymap, and scancode, this shouldn't
be new for people who have been using X11. xkbcommon is the standalone
library offering the translation from scancode to the keymap symbols.
We receive the keymap from the wl_keyboard in an event called "keymap",
which has a format and file descriptor. I would've guessed it would be
a string but no.

Quote:Bulk data like this is transferred over file descriptors.
We could simply read from the file descriptor, but in general it's
recommended to mmap it instead.

Well… that's something new!
The keyboard mmap seem to fail on my machine for the example though,
I had to use plain malloc an read the file descriptor manually instead.

The keyboard events themselves are also somewhat obvious, key, key_state,
modifiers.
I like that they are separated, maybe that should fix some issues I've
personally had while manipulating keys in X11 when they get modified.
Key repeat event, alright...

Third one is touch input, returning a `wl_touch`.
Now, that makes it easy to go next level on multi-touch screens!
The "frame" event makes a lot of sense in this case, when multiple
fingers press the screen, each finger having a different id.
"down", "up", "motion", yep, that's nice.

Now for the example code.
That's extensive, new globals, states, and listeners everywhere, glue
code programming.
We add the wl_seat, wl_keyboard, wl_pointer, wl_touch.
Set the wl_set in the listener for the registry, set listeners for the
capabilities that it supports.
We then create pointer events storage structures, and obviously also
add it to our global state.
After that we can check if we have the capabilities to set the pointer
and the related many listeners that will update the new structure in our
global state.
Interestingly, I've discovered that my touchpad support 3 touch input
buttons: single finger, two fingers, and three fingers (similar to
middle mouse).

I think in general this could be always present and people would just
need to handle the "frame" event instead of all this.
We do a similar thing for the keyboard, but this time to compile we need
lib xbkcommon.
We also get a glimpse at wl_array objects helpers, with their
wl_array_for_each.

After these chapters, I think I have a better idea of Wayland mechanisms
and way of thinking. Having things defined in a protocol defining objects
and interfaces makes it easy to know what to expect, yet it also somehow
decouples things a bit too much.
Still, if the glue code is present it's a really clear and clean way to
handle things.

	movq Offline \| 03-04-2021, 02:42 PM \| #10

7 XDG shell basics

So far, we’ve dealt with “display server stuff”, now we’re entering the realm of “window management”.

So, an `xdg_surface` is a `wl_surface` with the additional meaning of “this is a standard desktop window”. It’s like the relationship between a “drawable” and a “window” in X11, isn’t it?

The configure / ack_configure dance: This is how pending changes are finally “committed”.

`xdg_toplevel` finally is a “real” application window. There’s a clear object hierarchy, which is nice.

Oh, so there’s an “app ID”. Is this the equivalent of X11’s “WM_CLASS” property? Is this what wayland window managers will use to apply rules to windows? (An “ID” is something very different from a “class”, though. Hmm, we’ll see.)

And there’s the working example of a client. At the beginning, we set up a listener and then call `wl_display_roundtrip()`, which, I think, is a big misnomer. To me, “roundtrip” means: “We go back and forth once.” Like a roundtrip of an ICMP ping. But the manual page says:

Quote: int wl_display_roundtrip (struct wl_display * display)
Block until all pending request are processed by the server

This function blocks until the server has processed all
currently issued requests by sending a request to the display
server and waiting for a reply before returning.

So, this can be more than literally one “roundtrip on the networking level”, right? It’s actually something like “wl_display_process_pending()”?

This whole business of having 1000 objects and listeners and all that is something that I’ll have to get used to. As is the asynchronous nature of everything. I’m not super happy with it, yet. (And I’m also not convinced that so many things have to be asynchronous these days.) It’s clear and well structured, but it does make the code and the events harder to follow. At least that’s my initial impression of it.

It’s a bit strange that we draw the buffer in `xdg_surface_configure()`. More so, we first acknowledge the configure event and then draw? Maybe I’m too stuck in X11 terminology, where “configure” meant something else entirely? Is drawing unrelated to configuration anyway and we just use the “configure” event to know when a surface is ready to be drawn into?

Our window with the XOR pattern is shown on screen. But, there are no window decorations. So you’re telling me, the default is to have no decorations? What do I have to do to get them, implement another extension? Unfortunate. Or is this just a property of Weston?

8 Surfaces in depth

The moving XOR pattern introduces draw callbacks. That’s more like what I expected in chapter 7. Kind of. This callback fires for every new frame, which is required for animations/movies/games.

I’m not imaginative enough to understand why we need surface regions.

Wow, subsurfaces? At first, I thought this is very much not needed (this was once used in X11 a lot, but eventually abandoned because it was simply too convoluted). They give some good reasons for using them, but hmm … Not 100% convinced, yet.

Wayland provides HiDPI information per output now. How did that work in X11 again? My client had to manually query XRandR, right?

9 Seats: Handling input

Quote:A single input event from an input device may be broken up into several Wayland events for practical reasons.

Must be fun for toolkits to reassamble that stuff.

Quote:there is strictly speaking an uppercase version of 'ß', it's hardly ever used and certainly never typed

Hey, hey! I have an ẞ on my keyboard and it’s Shift+ß! (Okay, I admit it, I have a custom keymap.)

Okay, I admit that I only skimmed over this chapter. I find input handling to be mostly dull.

	seninha Offline \| 03-04-2021, 04:41 PM \| #11

7. XDG Shell Basics

Chapter 7 deals with the window (or surface) management.

We have the xdg_surface interface, a window-like surface obtained from an wl_surface.

(03-04-2021, 02:42 PM)movq Wrote: It’s like the relationship between a “drawable” and a “window” in X11, isn’t it?

Yeah, it seems like it. Except that xdg_surfaces are toplevel windows, while in X11 you can have non-toplevel windows down the hierarchy.

Instead of override-redirect windows which manage themselves, we have popups and the positioner interface.
Popups and toplevel winodws are the two kinds of XDG surfaces.
If I understood correctly, the configure and ack_configure event and request have the same semantic of the configure request and configure notify events on X11.

Toplevel windows have events related to minimization and maximization, setting titles and so on.
To create a XDG toplevel from a XDG surface we use the get_toplevel request of the xdg_surface interface.
The set_app_id request is what changing the window class property is to X11.
I don't know what are the semantic differences between an ID and a Class, but I think they are used to apply different management rules to different set of windows.

8. Surfaces in Depth

Chapter 8 deals with the intrinsics of the surface interface, their events/requests, etc.

Surfaces have a pending state, which is its state while being drawn, and an applied state, which is the state when the surface is commited and both server and client agree that it represents a consistent state.
There is also the initial state, which is related to the unmapped state in X11.

The compositor use the frame callback request to tell the client it is ready for a new frame.
With surface damaging, we can request the compositor to redraw part of a surface.

Subsurfaces are what subwindows are in X11.
They are constrained by the bounds of their parent surface, have az-order relative to each other, and have a origin for its position relative to the parent surface.

In wayland, thre is a scale factor attached to each output, and clients need to apply this scale factor to their interfaces.

9. Seats: Handling Input

We deal with input with the wl_seat interface.

Quote:The server sends the client a capabilities event to signal what kinds of input devices are supported by this seat (represented by a bitfield of capability values) and the client can bind to the input devices it wishes to use accordingly.

Depending on what inputs we have at our seat, we can use different requests and interfaces.

The introduction to this chapter also deals with input grabbing, one of the most misunderstood topics on X11.
On wayland, the server decides whether to accept the input grabbing for the client, so the client does not abuse it.

To use a input device, we first need to get it from the wl_seat interface. Then, events are send to it.

As movq, I only skimmed through this chapter.
But most of the concepts seem to be straight forward.

	mcol Offline \| 09-04-2021, 07:02 AM \| #12

(23-03-2021, 01:51 PM)movq Wrote: Changing the tiling layout. That requires resizing all currently visible clients and we want to do that atomically. An X11 WM like dwm just loops over the clients, resizes them, done. Sway has to “reach deeper into our shell implementations to atomically syncronize the resizing of several clients at once”. Not sure what that means, haven’t read the code.

...

(Another thing I just realized: You can’t really enforce atomicity anyway. sway might resize/move/whatever my terminal emulator atomically, okay, good. But the terminal then tells, let’s say, Vim that the window has been resized. That’s done using SIGWINCH, which takes some time to be emitted and processed. I’m not saying Wayland’s efforts to promote atomicity are pointless, but it’s still something to keep in mind.)

(Ideally a sway dev might correct me where I'm wrong but) as I understand it from having spelunked in the sway source, sway will loop over each client to do a resize on it while in a pre-commit state. Clients that have listeners subscribed to these events (probably all of them) will respond by reconfiguring their content (or in the case of a terminal running Vim, will send it a SIGWINCH) and setting some damage, which is noted by the compositor. When the compositor finishes its loop it has accumulated a bunch of damage and then commits it and re-renders the damaged regions. By the time it comes to committing, it's possible that clients will have dealt with the change and submitted their damage, so that the change of both WM and client content is rendered simultaneously. Of course a TUI is still a bit outside of this dance so might update a bit later, but with e.g. GTK clients there doesn't appear to be a content update lagging behind the resizing. When you look out for it, or when you go back to X after using wayland for a while, the difference is striking.

	venam Offline \| 10-04-2021, 06:06 AM \| #13

Chapter 10-12

XDG Shell in depth
Oh a mention of client-side decoration, finally.

The xdg_toplevel has states that can be updated, it knows it needs to
update them whenever it receives a configure event, then it sends back
an ack_configure when it's done updating them.
That's very much async!

Popups require a special object requested from the xdg_wm_base called
a xdg_positioner. That's weird.

Wayland clients are responsible for their own decorations, done through
an extension xdg-decoration.
You can request the compositor to take over and perform the interactive
move with the "move" request.
You can also specify resizing edges for a window and ask the compositor
to take over.

We clearly see from these chapters how the base protocols gets extended,
and it's through these countless and different extensions that we find
useful functionality. Though, I guess they aren't standard and so that
makes them non-portable.

Chapter 11 is missing but I guess this would go over a clipboard extension
such as wl-clipboard, found here: https://github.com/bugaevc/wl-clipboard
Or it could be implemented outside wayland through a dbus service.
Same for chapter 12.

Conclusion:
I think that after reading these chapters I get a better idea of what
Wayland is about. It's all based on defining the protocol before anything
else, and then generating wrappers for the protocol. That makes for a
clearly defined API of what's available.
There's a lot of juggling to get basic things going because of that,
asynchronicity everywhere. Also it seems that when the basic protocol
doesn't fit your case you are forced to use something outside, a specific
extension. Yet, the protocol that's already there seems fine.

I think at this point, I'd be comfortable reading the code of
libraries. I'd especially be looking for the ones that wrap-up all the
code that seems just to be there to fulfill the interfaces but would
mostly be expected to do the same thing everywhere.
I guess I'll start getting into what compositors are available and
test things.

	movq Offline \| 11-04-2021, 01:56 PM \| #14

(10-04-2021, 06:06 AM)venam Wrote: it's through these countless and different extensions that we find useful functionality. Though, I guess they aren't standard and so that makes them non-portable.

And that is my main takeaway from the book. At the core, Wayland is rather simple, and then the protocol extensions on top provide “actual” functionality.

Now, I’m either very stupid or my main concern about Wayland still stands: The compositor, a single program, has to implement all of that. I don’t see a way for, say, clipboard stuff to be implemented entirely in another program? This means when there’s an update to the clipboard protocol, then all compositors that implement it have to be updated as well. Right?

Given that the compositor is both display server and window manager (and people’s opinions on window managing varies a lot), we can expect there to be lots and lots of compositors, eventually. It’ll be absolutely imparative that, say, 2-4 libraries exist which can be used to build a compositor. Otherwise, this is going to be an enormous mess.

It would have been lovely if the protocol encouraged splitting all this into different programs, I think. Something that manages the basic display functions, a window manager, a clipboard implementation, and so on. Like a micro kernel, not a monolithic one. You know, this has always been one of my favorite aspects of the Linux world: You can re-combine things. Wayland appears to make this a bit harder – but it’s out of scope of the book, and I’m just speculating anyway.

Funny enough, reading the book has not yet convinced me to jump on the Wayland bandwagon. Yes, atomicity. Yes, no “override redirect” hack anymore. But other than that? Alright, X11 carries historical baggage, lots of it (stippled lines, looking at you) … and we can never get rid of that if we still want to call it “X11”. That’s probably the issue. Still, maybe a stripped down implementation of X11 that only offers what we really need today? So that we don’t have to reinvent the entire wheel? Hmm. Maybe it doesn’t work that way. Maybe we need a clean cut.

(09-04-2021, 07:02 AM)mcol Wrote: …

Thanks! That’s not as bad as I expected, actually. The book made it sound much worse. :)

	seninha Offline \| 12-04-2021, 11:52 AM \| #15

Sorry for not posting during the weekend, I had to stay home with gramps taking care of him and had almost no time.

Chapter 10 continues the topic on the XDG Shell.

Quote:Previously, we created a window at a fixed size of our choosing: 640x480. However, the compositor will often have an opinion about what size our window should assume, and we may want to communicate our preferences as well. Failure to do so will often lead to undesirable behavior, like parts of your window being cut off by a compositor who's trying to tell you to make your surface smaller.

I witnessed those undesirable behavior on X11. I forgot to communicate the size the wm set to the window back to the client, and drag-and-drop refused to work. I wonder if the same undesirable behavior occurs in Wayland.

The client can also request some constraints in the size of the window. This reminds me of the XSizeHints on X11.

Popups implement what in X11 is done with Override Redirect windows. While in X the client configures a override redirect window by itself, in Wayland it is the compositor who configures it using the positioner.

In xmenu I have to calculate the position of the menu relative to the monitor size so the menu is not mapped off the screen. I have to use Xinerama to query monitor information for that. On Wayland, it would be done by the compositor, not by the client itself.

Quote:This is used to allow the compositor to participate in the positioning of popups using its priveleged information, for example to avoid having the popup extend past the edge of the display.

Just like a X11 menu, a Wayland popup needs to grab the input. On xmenu I need to grab the input on the client side. Any client can grab the input. On wayland, it is the compositor who decides whether or not the client can grab the input, and in what conditions. I think it is a saner thing to do, otherwise a client can grab the input when not requested by the user. It is also the compositor who cancels the input grab (when the user presses escape or clicks outside of the popup). On xmenu, I have to cancel the grab manually on those conditions.

Then, the chapter goes to the topic of moving and resizing toplevel windows. On X11, there is a EWMH hint (_NET_WM_MOVERESIZE) for the client with CSD (client-side decoration) initiate the move or resize operation. On Wayland, this is done with a request. Again, like the popup input grabbing, the client needs to provides an input serial event to the compositor in order to start the interactive operation (so a client does not begin to move unexpectedly).

The chapter continues on the topic of CSD. On my window manager, I ignore any client-side decoration and draw the decoration on any window. I think that all windows should be uniformly decorated by the compositor/window manager.

The chapter is incomplete, as the section on positioners is missing the final part. Positioner is a topic I am really interested, because of xmenu.

Chapter 11 and 12 are also incomplete.

(11-04-2021, 01:56 PM)movq Wrote: It would have been lovely if the protocol encouraged splitting all this into different programs, I think. Something that manages the basic display functions, a window manager, a clipboard implementation, and so on. Like a micro kernel, not a monolithic one. You know, this has always been one of my favorite aspects of the Linux world: You can re-combine things. Wayland appears to make this a bit harder – but it’s out of scope of the book, and I’m just speculating anyway.

Funny enough, reading the book has not yet convinced me to jump on the Wayland bandwagon. Yes, atomicity. Yes, no “override redirect” hack anymore. But other than that? Alright, X11 carries historical baggage, lots of it (stippled lines, looking at you) … and we can never get rid of that if we still want to call it “X11”. That’s probably the issue. Still, maybe a stripped down implementation of X11 that only offers what we really need today? So that we don’t have to reinvent the entire wheel? Hmm. Maybe it doesn’t work that way. Maybe we need a clean cut.

That's also what I think of Wayland. I tend to combine different applications to compose my graphical environment (and I'm often writing such applications). I overuse the override redirect hack on my programs, as I said before, and they have to know things outside of the scope of the application (like using Xrandr/Xinerama to request the monitor size). Wayland makes it easier (or less hacky) to write such a menu application.

View a Printable Version