wingologA mostly dorky weblog by Andy Wingo2014-02-18T20:21:18Ztekutihttps://wingolog.org/feed/atomAndy Wingohttps://wingolog.org/compost, a leaf function compiler for guilehttps://wingolog.org/2014/02/18/compost-a-leaf-function-compiler-for-guile2014-02-18T20:21:18Z2014-02-18T20:21:18Z

What's that out by the woodshed? It's a steaming pile -- it's full of bugs -- it's compost, a leaf function compiler for Guile!

Around this time last year, a few of us cooked up some hack-dishes to bring to a potluck for Guile 2.0's release anniversary. Last year, mine was a little OpenGL particle demo.

That demo was neat but it couldn't be as big as I would have liked it to be because it was too slow. So, this year when the potluck season rolled around again I sat down to make a little compiler for the subset of Scheme that you see in inner numeric loops -- bytevector access, arithmetic, and loops.

The result is compost. Compost compiles inner loops into native x86-64 machine code that operates on unboxed values.

As you would imagine, compost-compiled code is a lot faster than code interpreted by Guile's bytecode interpreter. I go from being able to compute and render 5K particles at 60 fps up to 400K particles or so -- an 80-fold improvement. That's swell but it gets sweller. The real advantage is that with fast inner loops, I can solve more complicated problems.

Like this one!

(Videos on the internet are a surprisingly irritating problem. Maybe because it's not a cat? Check wingolog.org/pub/ for various other versions of 1300-bodies-512x416 if that doesn't play for you.)

Last year's demo hard-coded a gravitational attractor at (0, 0, 0). This one has no hard-coded attractor -- instead, each particle attracts each other. This so-called n-body simulation is an n-squared problem, so you need to be really efficient with the primitives to scale up, and even then the limit approaches quickly.

With compost, I can get to about 1650 particles at 60 frames per second, using 700% CPU on this 4-core 2-thread-per-core i7-3770 machine, including display with the free software radeon drivers. Without compost -- that is to say, just with Guile's bytecode virtual machine -- I max out at something more like 120 particles, and only 200% CPU.

The rest of this post describes how compost works. If compilers aren't your thing, replace the rest of the words with cat noises.

meow meow meow meow meow meow meow meow

The interface to compost is of course a macro, define/compost. Here's a little loop to multiply two vectors into a third, element-wise:

(use-modules (compost syntax) (rnrs bytevectors))
(define/compost (multiply-vectors (dst bytevector?)
                                  (a bytevector?)
                                  (b bytevector?)
                                  (start exact-integer?)
                                  (end exact-integer?))
  (let lp ((n start))
    (define (f32-ref bv n)
      (bytevector-ieee-single-native-ref bv (* n 4)))
    (define (f32-set! bv n val)
      (bytevector-ieee-single-native-set! bv (* n 4) val))
    (when (< n end)
      (f32-set! dst n (* (f32-ref a n) (f32-ref b n)))
      (lp (1+ n)))))

It's low-level but that's how we roll. If you evaluate this form and all goes well, it prints out something like this at run-time:

;;; loading /home/wingo/.cache/guile/compost/rmYZoT-multiply-vectors.so

This indicates that compost compiled your code into a shared object at macro-expansion time, and then printed out that message when it loaded it at runtime. If composting succeeds, compost writes out the compiled code into a loadable shared object (.so file). It then residualizes a call to dlopen to load that file at run-time, followed by code to look up the multiply-vectors symbol and create a foreign function. If composting fails, it prints out a warning and falls back to normal Scheme (by residualizing a plain lambda).

In the beginning of the article, I called compost a "leaf function compiler". Composted functions should be "leaf functions" -- they shouldn't call other functions. This restriction applies only to the low-level code, however. The first step in composting is to run the function through Guile's normal source-to-source optimizers, resulting in a CPS term. The upshot is that you can use many kinds of abstraction inside the function, like the little f32-ref/f32-set! helpers above, but in the end Guile should have inlined or contified them all away. It's a restriction, but hey, this is just a little hack.

Let's look at some assembly. We could get disassembly just by calling objdump -d /home/wingo/.cache/guile/compost/rmYZoT-multiply-vectors.so, but let's do it a different way. Let's put that code into a file, say "/tmp/qux.scm", and add on this code at the end:

(define size #e1e8) ;; 100 million
(define f32v (make-f32vector size 2.0))
(multiply-vectors f32v f32v f32v 0 size)

OK. Now we run Guile under GDB:

$ gdb --args guile /tmp/qux.scm 
(gdb) b 'multiply-vectors'
Function "multiply-vectors" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 ('multiply-vectors') pending.
(gdb) r
Starting program: /opt/guile/bin/guile /tmp/qux.scm
[New Thread 0x7ffff604b700 (LWP 13729)]
;;; loading /home/wingo/.cache/guile/compost/Kl0Xpc-multiply-vectors.so

Breakpoint 1, 0x00007ffff5322000 in multiply-vectors () from /home/wingo/.cache/guile/compost/Kl0Xpc-multiply-vectors.so
(gdb) step
multiply-vectors () at /tmp/qux.scm:12
12	    (when (< n end)

Word! GDB knows about the symbol, multiply-vectors. That's top. We are able to step into it, and it prints Scheme code!

Both of these swell things are because compost residualizes its compiled code as ELF files, and actually re-uses Guile's linker. The ELF that we generate can be loaded by dlopen, and its symbol tables and DWARF debugging information are known to GDB.

(In my last article on ELF I mentioned that Guile had no plans to use the system dynamic library loader (dlopen). That's still true; Guile has its own loader. I used the system loader in this place, though, just because I thought it was a neat hack.)

We can tell GDB to disassemble the next line:

(gdb) set disassemble-next-line on
(gdb) step
9	      (bytevector-ieee-single-native-ref bv (* n 4)))
=> 0x00007ffff532201d <multiply-vectors+29>:	4c 0f af c9	imul   %rcx,%r9
(gdb) 
13	      (f32-set! dst n (* (f32-ref a n) (f32-ref b n)))
=> 0x00007ffff532203b <multiply-vectors+59>:	f2 0f 59 c1	mulsd  %xmm1,%xmm0
   0x00007ffff532203f <multiply-vectors+63>:	49 b9 04 00 00 00 00 00 00 00	movabs $0x4,%r9

GDB does OK with these things, but it doesn't have special support for Scheme, and really you would want column pointers, not just lines. That data is in the DWARF but it's not processed by GDB. Anyway here's the disassembly:

(gdb) disassemble
Dump of assembler code for function multiply-vectors:
   0x00007ffff5322000 <+0>:	push   %rbx
   0x00007ffff5322001 <+1>:	push   %rbp
   0x00007ffff5322002 <+2>:	push   %r12
   0x00007ffff5322004 <+4>:	push   %r13
   0x00007ffff5322006 <+6>:	push   %r14
   0x00007ffff5322008 <+8>:	push   %r15
   0x00007ffff532200a <+10>:	cmp    %r8,%rcx
   0x00007ffff532200d <+13>:	jge    0x7ffff5322060 <multiply-vectors+96>
   0x00007ffff5322013 <+19>:	movabs $0x4,%r9
   0x00007ffff532201d <+29>:	imul   %rcx,%r9
   0x00007ffff5322021 <+33>:	cvtss2sd (%rsi,%r9,1),%xmm0
   0x00007ffff5322027 <+39>:	movabs $0x4,%r9
   0x00007ffff5322031 <+49>:	imul   %rcx,%r9
   0x00007ffff5322035 <+53>:	cvtss2sd (%rdx,%r9,1),%xmm1
=> 0x00007ffff532203b <+59>:	mulsd  %xmm1,%xmm0
   0x00007ffff532203f <+63>:	movabs $0x4,%r9
   0x00007ffff5322049 <+73>:	imul   %rcx,%r9
   0x00007ffff532204d <+77>:	cvtsd2ss %xmm0,%xmm15
   0x00007ffff5322052 <+82>:	movss  %xmm15,(%rdi,%r9,1)
   0x00007ffff5322058 <+88>:	inc    %rcx
   0x00007ffff532205b <+91>:	jmpq   0x7ffff532200a <multiply-vectors+10>
   0x00007ffff5322060 <+96>:	movabs $0x804,%rdi
   0x00007ffff532206a <+106>:	mov    %rdi,%rax
   0x00007ffff532206d <+109>:	pop    %r15
   0x00007ffff532206f <+111>:	pop    %r14
   0x00007ffff5322071 <+113>:	pop    %r13
   0x00007ffff5322073 <+115>:	pop    %r12
   0x00007ffff5322075 <+117>:	pop    %rbp
   0x00007ffff5322076 <+118>:	pop    %rbx
   0x00007ffff5322077 <+119>:	retq   
End of assembler dump.
(gdb) 

Now if you know assembly, this is pretty lame stuff -- it saves registers it doesn't use, it multiplies instead of adds to get the bytevector indexes, it loads constants many times, etc. It's a proof of concept. Sure beats heap-allocated floating-point numbers, though.

safety and semantics

Compost's goal is to match Guile's semantics, while processing values with native machine operations. This means that it needs to assign concrete types and representations to all values in the function. To do this, it uses the preconditions, return types from primitive operations, and types from constant literals to infer other types in the function. If it succeeds, it then chooses representations (like "double-precision floating point") and assigns values to registers. If the types don't check out, or something is unsupported, compilation bails and runtime will fall back on Guile's normal execution engine.

There are a couple of caveats, however.

One is that compost assumes that small integers do not promote to bignums. We could remove this assumption with better range analysis. Compost does do some other analysis, like sign analysis to prove that the result of sqrt is real. Complex numbers will cause compost to bail.

Compost also doesn't check bytevector bounds at run-time. This is terrible. I didn't do it though because to do this nicely you need to separate the bytevector object into two variables: the pointer to the contents and the length. Both should be register-allocated separately, and range analysis would be nice too. Oh well!

Finally, compost is really limited in terms of the operations it supports. In fact, if register allocation would spill on the function, it bails entirely :)

the source

If it's your thing, have fun over on yon gitorious. Compost needs Guile from git, and the demos need Figl, the GL library. For me this project is an ephemeral thing; a trial for future work, with some utility now, but not something I really want to support. Still, if it's useful to you, have at it.

coda

I woke up this morning at 5 thinking about the universe, and how friggin big it is. I couldn't go back to sleep. All these particles swirling and interacting in parallel, billions upon zillions, careening around, somehow knowing what forces are on them, each somehow a part of the other. I studied physics and I never really boggled at the universe in the way I did this morning thinking about this n-body simulation. Strange stuff.

I remember back in college when I was losing my catholicism and I would be at a concert or a dance show or something and I would think "what is this, this is just particles vibrating, bodies moving in the nothing; there is no meaning here." It was a thought of despair and unmooring. I don't suffer any more over it, but mostly because I don't think about it any more. I still don't believe in some omniscient skydude or skylady, but if I did, I know he or she has got a matrix somewhere with every particle's position and velocity.

Swirl on, friends, and until next time, happy hacking.

Andy Wingohttps://wingolog.org/closedgl particle simulation in guilehttps://wingolog.org/2013/02/16/opengl-particle-simulation-in-guile2013-02-16T19:20:14Z2013-02-16T19:20:14Z

Hello, internets! Did you know that today marks two years of Guile 2? Word!

Guile 2 was a major upgrade to Guile's performance and expressiveness as a language, and I can say as a user that it is so nice to be able to rely on such a capable foundation for hacking the hack.

To celebrate, we organized a little birthday hack-feast -- a communal potluck of programs that Guilers brought together to share with each other. Like last year, we'll collect them all and publish a summary article on Planet GNU later on this week. This is an article about the hack that I worked on.

figl

Figl is a Guile binding to OpenGL and related libraries, written using Guile's dynamic FFI.

Figl is a collaboration between myself and Guile super-hacker Daniel Hartwig. It's pretty complete, with a generated binding for all of OpenGL 2.1 based on the specification files. It doesn't have a good web site yet -- it might change name, and probably we'll move to savannah -- but it does have great documentation in texinfo format, built with the source.

But that's not my potluck program. My program is a little demo particle simulation built on Figl.

Source available here. It's a fairly modern implementation using vertex buffer objects, doing client-side geometry updates.

To run, assuming you have some version of Guile 2.0 installed:

git clone git://gitorious.org/guile-figl/guile-figl.git
cd guile-figl
autoreconf -vif
./configure
make
# 5000 particles
./env guile examples/particle-system/vbo.scm 5000

You'll need OpenGL development packages installed -- for the.so links, not the header files.

performance

If you look closely on the video, you'll see that the program itself is rendering at 60 fps, using 260% CPU. (The CPU utilization drops as soon as the video starts, because of the screen capture program.) This is because the updates to the particle positions and to the vertices are done using as many processors as you have -- 4, in my case (2 real and 2 threads each). I've gotten it up to 310% or so, which is pretty good utilization considering that the draw itself is single-threaded and there are other things that need to use the CPU like the X server and the compositor.

Each particle has a position and a velocity, and is attracted to the center using a gravity-like force. If you consider that updating a particle's position should probably take about 100 cycles (for a square root, some divisions, and a few other floating point operations), then my 2.4GHz processors should allow for a particle simulation with 1.6 million particles, updated at 60fps. In that context, getting to 5K particles at 60 fps is not so impressive -- it's sub-optimal by a factor of 320.

This slowness is oddly exciting to me. It's been a while since I've felt Guile to be slow, and this is a great test case. The two major things to work on next in Guile are its newer register-based virtual machine, which should speed things up by a factor of 2 or so in this benchmark; and an ahead-of-time native compiler, which should add another factor of 5 perhaps. Optimizations to an ahead-of-time native compiler could add another factor of 2, and a JIT would probably make up the rest of the difference.

One thing I was really wondering about when making this experiment was how the garbage collector would affect the feeling of the animation. Back when I worked with the excellent folks at Oblong, I realized that graphics could be held to a much higher standard of smoothness than I was used to. Guile allocates floating-point numbers on the heap and uses the Boehm-Demers-Weiser conservative collector, and I was a bit skeptical that things could be smooth because of the pause times.

I was pleasantly surprised that the result was very smooth, without noticeable GC hiccups. However, GC is noticeable on an overloaded system. Since the simulation advances time by 1/60th of a second at each frame, regardless of the actual frame rate, differing frame processing times under load can show a sort of "sticky-zipper" phenomenon.

If I ever got to optimal update performance, I'd probably need to use a better graphics card than my Intel i965 embedded controller. And of course vertex shaders are really the way to go if I cared about the effect and not the compiler -- but I'm more interested in compilers than graphics, so I'm OK with the state of things.

Finally I would note that it has been pure pleasure to program with general, high-level macros, knowing that the optimizer would transform the code into really efficient low-level code. Of course I had to check the optimizations, with the ,optimize, command at the REPL. I even had to add a couple of transformations to the optimizer at one point, but I was able to do that without restarting the program, which was quite excellent.

So that's my happy birthday dish for the Guile 2 potluck. More on other programs after the hack-feast is over; until then, see you in #guile!

Andy Wingohttps://wingolog.org/so you want to build a compositorhttps://wingolog.org/2008/07/26/so-you-want-to-build-a-compositor2008-07-26T22:02:34Z2008-07-26T22:02:34Z

a dialog with someone i don't know

One way to think about it is that it's like driving a car down the road, and suddenly swapping the steering wheel and brakes out for a tiller and gear shifter. And having to downshift for braking until you learn that the brakes moved to the turn indicator lever. By trial and error.

That's the internet's Joey Hess, a wonderful writer, who recently switched window managers.

A while back I discussed some new interaction possibilities, and a lot of them hinged on the relation between people and the set of tasks that they do on the computer -- not just the tasks themselves, but the relationships between the tasks, and with focus.

In the free desktop, the locus of this experience is in the window manager. Again, Joey tries to give analogies:

Another way to look at it is adopting a new philosophy. Or, in some cases a cult. (In some cases, with crazy cult leaders.) Whether they use Windows or a Mac, or Linux, most computer users are members of a big established religion, with some implicit assumptions, like "thy windows shall be overlapping, like papers on the desktop, and thou shalt move them with thy mouse".

The obvious step if you want to apostatize yourself from the "desktop metaphor", then, is to start hacking window managers. That's how I spent my hack-time in the last month and a half; this writing is an attempt at exegesis.

descent from the mountain, source in hand

I wanted to make a 3D compositing window manager. By 3D, I mean that I wanted the pixels behind the monitor glass to come entirely from one process' OpenGL. By "compositing", I mean that the graphical output from other programs should be redirected through my program. (Compositors don't necessarily need to be implemented with OpenGL; xcompmgr and metacity's compositor are implemented with XRender.)

As an implementation strategy, I took this opportunity to check out Clutter, a GL-based canvas library. What follows is a minimal translation into C of the basic concepts, a broken but fun-to-hack substrate for experimentation.

int main (int argc, char *argv[])
{
    prep_clutter (&argc, &argv);
    prep_root ();
    prep_overlay ();
    prep_stage ();
    prep_input ();

    g_main_loop_run (g_main_loop_new (NULL, FALSE));

    return 0;
}

The main function is pretty simple. We'll be looking at the various functions one by one, but out of order. Let's start with prep_root:

Display *dpy;
Window root;
void prep_root (void)
{
    dpy = clutter_x11_get_default_display ();
    root = DefaultRootWindow (dpy);

    XCompositeRedirectSubwindows (dpy, root, CompositeRedirectAutomatic);
    XSelectInput (dpy, root, SubstructureNotifyMask);
}

This function does two things of interest: first, it tells the X server to redirect output of all windows into backing pixmaps, such that the contents of all mapped windows are available, even if the X server does not think that they are visible. Second, we ask the X server to give us notification when windows come and go, and when they change size.

Window overlay;
void prep_overlay (void)
{
    overlay = XCompositeGetOverlayWindow (dpy, root);
    allow_input_passthrough (overlay);
}

The Composite extension, which allows us to redirect window output, also provides for the existence of an "overlay" window, one that is above all other windows. You can draw directly to the overlay window, but the normal way that you use it is as a parent window -- you create your own window, then reparent it to the overlay.

Composite is pretty cool, except that it is only for output -- that is, you can put the output of a window anywhere, like on the corner of a Compiz cube, but you can't redirect input. By that I mean to say that when you have a rotated compiz cube, you can't interact with the window.

The reason for this is that in order to interact with a window, for example to click a button in it, the X server needs to know how to map the pointer position to a position inside a certain window. This is a tricky problem which needs to be done inside the X server, and no solution to this problem has been merged yet.

This upshot is that what happens today with compositing window managers is that the compositor is very careful to make sure that the underlying X window is exactly positioned behind where the compositor is drawing it. Then, when the user clicks on what essentially is the redirected image of the button, the click actually falls behind the overlay to the actual X window.

Hence, the need to allow events (clicks, pointer motion, etc) to pass through the overlay:

void allow_input_passthrough (Window w)
{
    XserverRegion region = XFixesCreateRegion (dpy, NULL, 0);
 
    XFixesSetWindowShapeRegion (dpy, w, ShapeBounding, 0, 0, 0);
    XFixesSetWindowShapeRegion (dpy, w, ShapeInput, 0, 0, region);

    XFixesDestroyRegion (dpy, region);
}

Basically we call a few undocumented functions, whose goal is to tell the X server that the window in question is not to receive pointer input events.

ClutterActor *stage;
Window stage_win;
void prep_stage (void)
{
    ClutterColor color;

    stage = clutter_stage_get_default ();
    clutter_stage_fullscreen (CLUTTER_STAGE (stage));
    stage_win = clutter_x11_get_stage_window (CLUTTER_STAGE (stage));
    clutter_color_parse ("DarkSlateGrey", &color);
    clutter_stage_set_color (CLUTTER_STAGE (stage), &color);
    
    XReparentWindow (dpy, stage_win, overlay, 0, 0);
    XSelectInput (dpy, stage_win, ExposureMask);
    allow_input_passthrough (stage_win);
    
    clutter_actor_show_all (stage);
}

The next step is to create the clutter "stage", the GLX window to which all of our output will go. We reparent the stage to the overlay, so that it will always be on top, then we allow input events to pass through it as well.

Window input;
void prep_input (void)
{
    XWindowAttributes attr;

    XGetWindowAttributes (dpy, root, &attr);
    input = XCreateWindow (dpy, root,
                           0, 0,  /* x, y */
                           attr.width, attr.height,
                           0, 0, /* border width, depth */
                           InputOnly, DefaultVisual (dpy, 0), 0, NULL);
    XSelectInput (dpy, input,
                  StructureNotifyMask | FocusChangeMask | PointerMotionMask
                  | KeyPressMask | KeyReleaseMask | ButtonPressMask
                  | ButtonReleaseMask | PropertyChangeMask);
    XMapWindow (dpy, input);
    XSetInputFocus (dpy, input, RevertToPointerRoot, CurrentTime);

    attach_event_source ();
}

So if the events fall through the stage, and fall through the overlay, what happens if they don't fall onto a window? Well, you probably want the stage to get the last crack at them, instead of the root window. So hence this terrible trick, making a separate fullscreen input-only window, located below all windows except the root window. We select for all input on this window, so that we get e.g. key presses as well.

Then we install a GSource to process pending X events, redirecting events from the input window to the stage window:

GPollFD event_poll_fd;
static GSourceFuncs event_funcs = {
    event_prepare,
    event_check,
    event_dispatch,
    NULL
};
void attach_event_source (void)
{
    GSource *source;

    source = g_source_new (&event_funcs, sizeof (GSource));

    event_poll_fd.fd = ConnectionNumber (dpy);
    event_poll_fd.events = G_IO_IN;

    g_source_set_priority (source, CLUTTER_PRIORITY_EVENTS);
    g_source_add_poll (source, &event_poll_fd);
    g_source_set_can_recurse (source, TRUE);
    g_source_attach (source, NULL);
}

The prepare() and check() functions are a bit boring, but the dispatch() is worth a look:

static gboolean
event_dispatch (GSource *source, GSourceFunc callback, gpointer user_data)
{
    ClutterEvent *event;
    XEvent xevent;

    clutter_threads_enter ();

    while (!clutter_events_pending () && XPending (dpy)) {
        XNextEvent (dpy, &xevent);
        
        /* here the trickiness */
        if (xevent.xany.window == input) {
            xevent.xany.window = stage_win;
        }

        clutter_x11_handle_event (&xevent);
    }
    
    if ((event = clutter_event_get ())) {
        clutter_do_event (event);
        clutter_event_free (event);
    }

    clutter_threads_leave ();

    return TRUE;
}

We just pick off events, translating the input to the stage window, then give the events to clutter.

Obviously this is a crapload of code just to shunt events around; normally with clutter this is not necessary, but with the X compositing architecture being like it is, we have to roll our own GSource to do the event translation.

This brings me back to the beginning, prep_clutter:

GType texture_pixmap_type;
void prep_clutter (int *argc, char ***argv)
{
    clutter_x11_disable_event_retrieval ();
    clutter_init (argc, argv);
    clutter_x11_add_filter (event_filter, NULL);
    
    if (getenv ("NO_TFP"))
        texture_pixmap_type = CLUTTER_X11_TYPE_TEXTURE_PIXMAP;
    else
        texture_pixmap_type = CLUTTER_GLX_TYPE_TEXTURE_PIXMAP;
}

Here we see that you have to turn off event retrieval before calling clutter_init. Then, we tell Clutter to call a event_filter whenever it gets an X event. I'll get back to that function in a minute.

I suppose that this is as good a time as any to speak of getting window contents into OpenGL. The basic idea is that since we're already storing all of the window's pixels into a backing pixmap (because we are redirecting all output), we don't actually need to transfer the pixels back over the wire to the compositor to have the compositor show the window on the screen. There is a GLX extension, texture from pixmap or TFP, which tells the libGL implementation to get a texture's contents from a pixmap that it already knows about, automatically updating when the pixmap updates.

I digress for a moment to mention something that I did not know when I was getting into all of this. There are two kinds of GLX rendering, direct and indirect. Perhaps you recall these words from running glxinfo to see whether your GL implementation has hardware acceleration or not. In fact, direct vs. indirect exists on a different axis as accelerated vs. software rendering. What "indirect rendering" means is that instead of talking directly to the graphics card through the kernel, a GL application is sending all of its commands over the wire to the X server, which then does the rendering.

But that server-side rendering may be accelerated; indeed, that was the whole point of AIGLX. Technically, with free drivers, this is implemented via having the server's GLX rendering use the same DRI library as libGL does when doing direct rendering. It is the DRI library which does the accelerated communication with the hardware-specific kernel module (the DRM module).

Indirect rendering is slower, however, and it is advantageous to use direct rendering when possible. The problem comes when wanting to use TFP and direct rendering; if I want to bind a GL texture to a pixmap corresponding to some other application, and I am not going to go through the X server to do so, then obviously the kernel itself has to know about X drawables. If I have a named pixmap open from one process that was created from another process, there needs to be a unified memory manager in the kernel:

It's a crude approximation but the most crucial difference between the nvidia architecture and DRI/DRM is that nvidia actually have a memory manager - and a unified one at that. Without a memory manager it's impossible to allocate offscreen buffers (hence, no pbuffers or fbos) and without a unified memory manager it's impossible to reconcile 2D and 3D operations (hence no redirected Direct Rendering). The Accelerated Indirect GLX feature that the freetards were busy raving about is an endless source of confusion - and ultimately a hack to workaround their lack of a memory manager.

(That from Linux hater, someone I have warmed to in recent weeks -- I basically agree with Jeremy Allison's position.)

Anyway. You can't do direct TFP with free drivers, is the conclusion of that digression. That's OK, because you can always force indirect rendering via setting LIBGL_ALWAYS_INDIRECT=1 in your environment. But you can't do indirect rendering in Xephyr, only direct, so it's tough to test out window managers. Fortunately, you can get window contents into clutter without TFP, using the fallbacks -- hence the NO_TFP check in prep_clutter.

ánimo, peregrino

OK, at this point we're almost there. Here's the event handler:

static ClutterX11FilterReturn
event_filter (XEvent *ev, ClutterEvent *cev, gpointer unused)
{
    switch (ev->type) {
    case CreateNotify:
        window_created (ev->xcreatewindow.window);
        return CLUTTER_X11_FILTER_REMOVE;

    default:
        return CLUTTER_X11_FILTER_CONTINUE;
    }
}

It's pretty simple, an X event is a tagged union. We respond to the CreateNotify events, which come because we selected for SubstructureNotifyMask on the root window. It turns out we don't need any more events, because clutter handles the rest:

static void
window_created (Window w)
{
    XWindowAttributes attr;    
    ClutterActor *tex;

    if (w == overlay)
        return;

    XGetWindowAttributes (dpy, w, &attr);
    if (attr.class == InputOnly)
        return;
    
    tex = g_object_new (texture_pixmap_type, "window", w,
                        "automatic-updates", TRUE, NULL);

    g_signal_connect (tex, "notify::mapped",
                      G_CALLBACK (window_mapped_changed), NULL);
    g_signal_connect (tex, "notify::window-x",
                      G_CALLBACK (window_position_changed), NULL);
    g_signal_connect (tex, "notify::window-y",
                      G_CALLBACK (window_position_changed), NULL);

    {
        gint mapped, destroyed;
        g_object_get (tex, "mapped", &mapped, NULL);
        if (mapped)
            window_mapped_changed (tex, NULL, NULL);
    }
}

Once we create the window, Clutter will listen for changes in its state -- resizes, maps or unmaps, and ultimately its destruction. (Or at least it will, once #1020 is applied.) Here we connect with the minimum bits necessary to make a window usable. Finally, the functions to show, hide, and resize windows:

static void
window_position_changed (ClutterActor *tex, GParamSpec *pspec, gpointer unused)
{
    gint x, y, window_x, window_y;
    g_object_get (tex, "x", &x, "y", &y, "window-x", &window_x,
                  "window-y", &window_y, NULL);
    if (x != window_x || y != window_y)
        clutter_actor_set_position (tex, window_x, window_y);
}

static void
window_mapped_changed (ClutterActor *tex, GParamSpec *pspec, gpointer unused)
{
    gint mapped;
    g_object_get (tex, "mapped", &mapped, NULL);

    if (mapped){
        clutter_container_add_actor (CLUTTER_CONTAINER (stage), tex);
        clutter_actor_show (tex);
        window_position_changed (tex, NULL, NULL);
    } else {
        clutter_container_remove_actor (CLUTTER_CONTAINER (stage), tex);
    }
}

final notes

Code is here, compile as:

 gcc `pkg-config --cflags --libs clutter-glx-0.8` \
     -o mini-clutter-wm mini-clutter-wm.c 

You might want to run it in a Xephyr; do it with the script here:

 ./run-xephyr ./mini-clutter-wm

Joey again:

So ideally, "I switched to a new window manager" doesh't mean "my screen has some different widgets on it now". It means "I'm looking at the screen with new eyes."

This blog is already way too long, so I'll revisit interface concepts at some point in the future. For now I just wanted to pull together this knowledge in one place. Happy hacking!

Andy Wingohttps://wingolog.org/regarding decadencehttps://wingolog.org/2008/06/10/regarding-decadence2008-06-10T18:07:11Z2008-06-10T18:07:11Z

Quand les gens s’aperçoivent qu’ils s’ennuient, ils cessent de s’ennuyer.

Paris 1968

Regarding my last post, Benjamin said to me that "the biggest omission was a solution to the problem". Well, I don't have a solution, not really anyway. But while the topic is fresh, and it seems to have resonated with a few people, I do have some thoughts on the topic.

technology

I seem to recall Christian complaining about the new GDM animations on login, that they feel slow. Well that's because we have a static conception of interface, that things are static by default, and that motion is the abnormal case. This is a mindset that distorts how we make programs (static with sprinkled animation), and one that does not reflect the way that hardware works these days.

Three years on, Jon Smirl's breakdown of the Free graphics stack is still relevant. Go read it! But instead of his conclusion ("make free applications faster through a new X server"), I would draw the line differently: let us make our applications with GL. Then the applications that we make will be new, 100 times a second.

This hertzian rebirth liberates the computer to react to us, and to anticipate our desires.

For example, as the user starts to focus on an actionable surface, that surface might become more prominent at the same time as other, related actions present themselves. To take a most basic example, if you are typing in a word processor, the mouse cursor goes away. (Or can be made to do so.) If you then grab the mouse again, probably you want to click somethingorother; yet the clickable surfaces are present even when typing, unnecessarily, and are equally small once you do grab the mouse.

Or to take a more familiar example, the "expanding icon" behaviour of the Mac OS dock contextually expands the icon that you are focusing on, and anticipates focus of adjacent choices; and it's something that we still do not have, years later.

It is for these reasons that I think that a GL-based canvas like Clutter is an excellent option for GTK+. The library provides nouns and verbs that compose to make a dynamic, pleasing interface; and if you are programming in a fast language, you can implement your own nouns and verbs, which then form part of the GL rebirth/draw loop.

Another thing to think about is that the mouse is simply a way of indicating focus and a limited form of selection. Our verbs are point, click, and drag. New input methods are here, or coming soon: things like multitouch, head tracking, and gestural recognition. These input methods do not map exactly to the mouse. In fact, once you have such capabilities, other, more "normal" parts of standard interfaces start to look restricting. Menu bars are made for mice and framebuffers; the panel starts to look superfluous; etc.

Obviously this strategy has problems when it comes to power consumption; I do not know the solution. I assume the OpenGL ES people have thought about this somehow.

audience

Whether or not you think these ideas are good or bad, you can rest assured that they will not come to pass in GNOME as it is now.

The problem, to my mind, is that we have painted ourselves into a corner of serving a fictional clientele that is not ourselves. As Havoc says:

Is GNOME for UNIX and shell users? Is it for Linus Torvalds? Is it for ourselves? Is it for American college students? 35-year-old corporate office workers with an IT staff? And are we willing to tell whoever it's not for to jump in a lake?

...There is a default audience if a project doesn't find a way to choose one deliberately, and it's what I've called "by and for developers." If there's no way to stay out of that gravity, it's better to embrace it wholeheartedly IMO...

I think we should have a project for hackers. A GNOME skunkworks, if you will; a place in which hackers can create something new for ourselves, in which innovation and even failed experimentation is possible and encouraged. If something developed there is useful to a wider public, as we would hope would be the case, then by all means folks can pull it out, and start to put it somewhere more stable.

Note that "for hackers" does not necessarily entail complexity; it entails only that complexity which is necessary. Physics does not strive to make ugly theories, string theory aside; we as hackers, though oft maligned, neither want such things. Working together, with the best spirit of peer review, we could come out with something with power, and simplicity, and diversity.

Such a project would question everything: what is our interface metaphor? Is it a desktop? Is it a space, like Neuromancer? Is it a set of overlapping planes, like in Minority Report? Is it something else? What is it populated with? How do we compose separate parts into one graphical whole? How do we choose parts of the whole to focus on, to interact with? What is the nature of a part of the whole? How do focus and applications interact? Does the phrase "switching applications" make sense, and if so, how do we do so, or start new applications? Do applications exist in windows? How do multiple users interact with one live system, when we run GNOME on our TV and on our laptop on the arm of the sofa, and using the wiimote as an input device?

interlude

I think we can learn a couple of things from the Pyro experiment. In case you don't recall, Pyro was an attempt to bring web developers out of the confines of the browser window, to let them manipulate the whole "desktop". It was a really neat hack. It seems to have failed, also; I don't hear much about its uptake, a year later. At the risk of poking Alex's old wounds, we should probably wonder why.

If I were to guess, I would say that Pyro failed for cultural reasons. Its target "audience" was web developers, and also those parts of GNOME that would be comfortable changing desktop development into web development. But web developers like to work for an audience of users, and of these there were not many.

But more than that, and reasonable people may disagree, I think that Pyro failed because it wasn't exciting to the developers that we do have, to our culture. I don't think this situation was particularly gratifying to Alex, who (I would imagine, I do not know) found more fulfillment in other parts of his life. Ah well. "One must bear a chaos inside to give birth to a dancing star."

Regardless of your judgment on Pyro, I think that there's still loads of interesting work to do on "the client side", not least experiments with the GPU and self-renewing interfaces. So we should embrace the hackers that we do have, and see where that takes us.

steps

Let's take a number of cases of existing applications, and see what they might look like, coming out of the skunkworks.

The web browser breaks out of its four walls.

Marvin is browsing planet gnome. The web page fills up the entire screen. He finds some link of interest, and selects it. The Planet GNOME page seems to recede in distance as the new page fades in to replace the full screen.

He reads for a while, somehow scrolling the page, then realizes he wanted to go back to some other link. He bangs the mouse against the edge of the screen, or gives a certain hand gesture, and the web page shrinks down from fullscreen to just one page in a series of pages fading off into the distance, representing his browsing history. As he runs his mouse over that series, the pages pop up under the cursor, with contents if available. The effect is similar to that of flipping pages in a phone book. or passing the mouse over the Mac OS dock.

The "window manager" expands its scope, and admits new forms of input.

Marvin middle-clicks on a link to open it in a new "window". The existing train of web pages slides left, off the screen, leaving space for the new page to fade in. Marvin reads for a while, then decides he wants to get some hacking done. He holds up his hand, pushes back the space, pans his applications around until he is on Emacs, and lets his fingers fall into a fist. Emacs comes close to him and fills him and the screen with its radiant light.

Alternately, Marvin could pan via moving the mouse while holding down the shift key, or pressing some appropriate key combination that does not conflict with his carefully-crafted emacs keybindings.

A photo browser fills the screen.

Erinn, having just finished hacking the good hack, wants to enjoy her photos on the TV in the living room. She sits down on the couch, and grabs the wiimote. She presses a button to open a media navigator, showing her all of the media shares on the home LAN, neuromancer-style. She pilots into the living room machine, selects the photos, and begins browsing. The photos are arranged on a timeline rolling into the distance; she focuses in on the set from last year's GUADEC, and begins flipping through them.

OK, so you see where I want to go with this. Try it yourself, perhaps taking "video chat" as an example. How are calls received? How are they made? How does video chat interact with your photo app, with the rest of your applications?

My point is that although we have reached a kind of local equilibrium with the "desktop" metaphor, the existence of the GPU gives us all kinds of possibilities, unique capabilities of the box connected to the monitor. My ideas (and these are not all mine, I stole most of them from folk at work) are those that I see from here, which is not very far. We can do better than this.

conclusion

I recognize that these are words from a marginal player in GNOME, and a Schemer at that. But I think that somehow, a skunkworks is how GNOME will "revision up", whether it comes from the free community, or whether it is dropped on us from some corporate lab.

The concrete step that interested hackers should take is to learn GL, to play around with it. Use the texture_from_pixmap extensions to pull in a WebKit window or two, toss them around. Use Clutter, if that's what you feel like. Build a video chat application (the Telepathy people swear it's possible), build a new window manager (with tight Ruby bindings), make a mockup. Then if your mockup is inspiring, and organic enough to live in, we can start a GNOME skunkworks to play around in.

But above all, make it for you. Pay no attention to mental questions of how your mother would see this, those questions will fix themselves in time. A focus on beauty and simplicity and power cannot fail to make something interesting. Code against boredom!