chevron_right

Aryan Kaushik: Create Custom System Call on Linux 6.8

news.movim.eu / PlanetGnome • 28 February, 2025 • 4 minutes

Ever wanted to create a custom system call? Whether it be as an assignment, just for fun or learning more about the kernel, system calls are a cool way to learn more about our system.

Note - crossposted from my article on Medium

Why follow this guide?

There are various guides on this topic, but the problem occurs due to the pace of kernel development. Most guides are now obsolete and throw a bunch of errors, hence I’m writing this post after going through the errors and solving them :)

Set system for kernel compile

On Red Hat / Fedora / Open Suse based systems, you can simply do

Sudo dnf builddep kernel
Sudo dnf install kernel-devel

On Debian / Ubuntu based

sudo apt-get install build-essential vim git cscope libncurses-dev libssl-dev bison flex

Get the kernel

Clone the kernel source tree, we’ll be cloning specifically the 6.8 branch but instructions should work on newer ones as well (till the kernel devs change the process again).

git clone --depth=1 --branch v6.8 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

Ideally, the cloned version should be equal to or higher than your current kernel version.

You can check the current kernel version through the command

uname -r

Create the new syscall

Perform the following

cd linux
make mrproper
mkdir hello
cd hello
touch hello.c
touch Makefile

This will create a folder called “hello” inside the downloaded kernel source code, and create two files in it — hello.c with the syscall code and Makefile with the rules on compiling the same.

Open hello.c in your favourite text editor and put the following code in it

#include <linux/kernel.h>
#include <linux/syscalls.h>
SYSCALL_DEFINE0(hello) {
 pr_info("Hello World\n");
 return 0;
}

It prints “Hello World” in the kernel log.

As per kernel.org docs

" SYSCALL_DEFINEn() macro rather than explicitly. The ‘n’ indicates the number of arguments to the system call, and the macro takes the system call name followed by the (type, name) pairs for the parameters as arguments.”

As we are just going to print, we use n=0

Now add the following to the Makefile

obj-y := hello.o

Now

cd ..
cd include/linux/

Open the file “syscalls.h” inside this directory, and add

asmlinkage long sys_hello(void)

captionless image

This is a prototype for the syscall function we created earlier.

Open the file “Kbuild” in the kernel root (cd ../..) and to the bottom of it add

obj-y += hello/

captionless image

This tells the kernel build system to also compile our newly included folder.

Once done, we then need to also add the syscall entry to the architecture-specific table.

Each CPU architecture could have specific syscalls and we need to let them know for which architecture ours is made.

For x86_64 the file is

arch/x86/entry/syscalls/syscall_64.tbl

Add your syscall entry there, keeping in mind to only use a free number and not use any numbers prohibited in the table comments.

For me 462 was free, so I added the new entry as such

462 common hello sys_hello

captionless image

Here 462 is mapped to our syscall which is common for both architectures, our syscall is named hello and its entry function is sys_hello.

Compiling and installing the new kernel

Perform the following commands

NOTE: I in no way or form guarantee the safety, security, integrity and stability of your system by following this guide. All instructions listed here have been my own experience and doesn’t represent the outcome on your systems. Proceed with caution and care.

Now that we have the legal stuff done, let’s proceed ;)

cp /boot/config-$(uname -r) .config
make olddefconfig
make -j $(nproc)
sudo make -j $(nproc) modules_install
sudo make install

Here we are copying the current booted kernel’s config file, asking the build system to use the same values as that and set default for anything else. Then we build the kernel with parallel processing based on the number of cores given by nproc. After which we install our custom kernel (at own risk).

Kernel compilation takes a lot of time, so get a coffee or 10 and enjoy lines of text scrolling by on the terminal.

It can take a few hours based on system speed so your mileage may vary. Your fan might also scream at this stage to keep temperatures under check (happened to me too).

The fun part, using the new syscall

Now that our syscall is baked into our kernel, reboot the system and make sure to select the new custom kernel from grub while booting

captionless image

Once booted, let’s write a C program to leverage the syscall

Create a file, maybe “test.c” with the following content

#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>
int main(void) {
  printf("%ld\n", syscall(462));
  return 0;
}

Here replace 462 with the number you chose for your syscall in the table.

Compile the program and then run it

make test
chmod +x test
./test

If all goes right, your terminal will print a “0” and the syscall output will be visible in the kernel logs.

Access the logs by dmesg

sudo dmesg | tail

And voila, you should be able to see your syscall message printed there.

Congratulations if you made it 🎉

Please again remember the following points:

Compiling kernel takes a lot of time
The newly compiled kernel takes quite a bit of space so please ensure the availability
Linux kernel moves fast with code changes

chevron_right

Thibault Martin: Prosthetics that don't betray

news.movim.eu / PlanetGnome • 28 February, 2025 • 13 minutes

Tech takes a central place in our lives. Banking, and administrative tasks are happening more and more online. It's becoming increasingly difficult to get through life without a computer or a smartphone. They have become external organs necessary to live our life.

Steve Jobs called the computer the bicycle for the mind . I believe computers & smartphones have become prosthetics, extensions of people that should unconditionally and entirely belong to them. We must produce devices and products the general public can trust.

Microsoft, Google and Apple are three American companies that build the operating systems our computers, phones, and servers run on. This American hegemony on ubiquitous devices is dangerous both for all citizens worldwide, especially under an unpredictable, authoritarian American administration.

Producing devices and an operating system for them is a gigantic task. Fortunately, it is not necessary to start from zero. In this post I share what I think is the best foundation for a respectful operating system and how to get it into European, and maybe American, hands. In a follow-up post I will talk more about distribution channels for older devices.

[!warning] The rest of the world matters

In this post I take a European-centric view. The rest of the world matters, but I am not familiar with their needs are nor how to address them.

We're building prosthetics

Prosthetics are extension of ourselves as individuals. They are deeply personal. We must ensure our devices & products are:

Transparent about what they do. They must not betray people and do things behind their backs. Our limbs do what we tell them. When they don't, it's considered a problem and we go to a physician to fix it.
Intuitive, documented, accessible and stable. People shouldn't have to re-learn how to do things they were used to doing. When they don't know how to do something, it must be easy for them to look it up or find someone to explain it to them. The devices must also be accessible and inclusive to reduce inequalities, instead of reinforcing them. Those requirements are a social matter, not a technical one.
Reliable, affordable, and repairable. Computers & smartphones must not allow discrimination based on social status and wealth. Everyone must have access to devices they can count on, and be able to maintain them in a good condition. This is also a social problem and not a technical one. It is worth noting that "the apps I need must be available for my system" is an often overlooked aspect of reliability, and "I don't have to install the system because it's bundled with my machine" is an important aspect of affordability.

I believe that the GNOME project is one of the best placed to answer those challenges, especially when working in coordination with the excellent postmarketOS people who work on resurrecting older devices abandoned by their manufacturers. There is real stagnation in the computing industry that we must see as a social opportunity.

Constraints are good

GNOME is a computing environment aiming for simplicity and efficiency. Its opinionated approach benefits both users and developers:

From the user perspective , apps look and feel consistent and sturdy, and are easy to use thanks to well thought out defaults.
From the developer perspective , the opinionated human interface guidelines let them develop simpler, more predictable apps with less edge cases to test for.

GNOME is a solid foundation to build respectful tech. It doesn't betray people by doing things behind their back. It aims for simplicity and stability, although it could use some more user research to back design decisions if there was funding to do so, like this has successfully been the case for GNOME 40 .

Mobile matters

GNOME's Human Interface Guidelines and development tooling make it easy to run GNOME apps on mobile devices. Some volunteers are also working on making GNOME Shell (the "desktop" view) render well on mobile devices.

postmarketOS already offers it as one of the UIs you can install on your phone. With mobile taking over traditional computers usage, it is critical to consider the mobile side of computing too.

Hackability and safety

As an open source project, GNOME remains customizable by advanced users who know they are bringing unsupported changes, can break their system in the process, and deal with it. It doesn't make customization easy for those advanced users, because it doesn't optimize for them.

The project also has its fair share of criticism, some valid, and some not. I agree that sometimes the project can be too opinionated and rigid, optimizing for extreme consistency at the expense of user experience. For example, while I agree that system trays are suboptimal, they're also a pattern people have been used to it for decades and removing them is very frustrating for many.

But some criticism is also coming from people who want to tinker with their system and spend countless hours building a system that's the exact fit for their needs. Those are valid use cases, but GNOME is not built to serve them. GNOME aims to be easy to use for the general public, which includes people who are not tech-experts and don't want to be.

We're actually building prototypes

As mighty as the GNOME volunteers might be, there is still a long way before the general public can realistically use it. GNOME needs to become a fully fledged product shipped on mainstream devices, rather than an alternative system people install. It also needs to involve representatives of the people it intends to serve.

You just need to simply be tech-savvy

GNOME is not (yet) an end user product . It is a desktop environment that needs to be shipped as part of a Linux distribution. There are many distributions to chose from. They are not shipping the same version of GNOME, and some patch it more or less heavily. This kind of fragmentation is one of the main factors holding the Linux desktop back.

The general public doesn't want to have to pick a distribution and bump into every edge cases that creates. They need a system that works predictably, that lets them install the apps they need, and that gives them safe ways to customize it as a user.

That means they need a system that doesn't let them shoot themselves in the foot in the name of customizability, and that prevents them from doing some things unless they sign with their blood that they know it could make it unusable. I share Adrian Vovk's vision for A Desktop for All and I think it's the best way to productize GNOME and make it usable by the general public.

People don't want to have to install an "alternative" system . They want to buy a computer or a smartphone and use it. For GNOME to become ubiquitous, it needs to be shipped on devices people can buy.

For GNOME to really take off, it needs to become a system people can use both in their personal life and at work. It must become a compelling product in entreprise deployments, both to route enough money towards development and maintenance, to make it an attractive platform for vendors to build software for, and to make it an attractive platform for devices manufacturers to ship.

What about the non tech-savvy?

GNOME aims to build a computing platform everyone can trust. But it doesn't have a clear, scalable governance model with representatives of those it serves. GNOME has rudimentary governance to define what is part of the project and what is not thanks to its Release Team, but it is largely a do-ocracy as highlighted in the Governance page of GNOME's Handbook as well was in GNOME Designer Tobias Bernard's series Community Power .

A do-ocracy is a very efficient way to onboard volunteers and empower people who can give away their free time to get things done fast. It is however not a great way to get work done on areas that matter to a minority who can't afford to give away free time or pay someone to work on it.

The GNOME Foundation is indeed not GNOME's vendor today, and it doesn't contribute the bulk of the design and code of the project. It maintains the infrastructure (technical and organizational) the project builds on. A critical, yet little visible task.

To be a meaningful, fair, inclusive project for more than engineers with spare time and spare computers, the project needs to improve in two areas:

It needs a Product Committee to set a clear product direction so GNOME can meaningfully address the problems of its intended audience. The product needs a clear purpose, a clear audience, and a robust governance to enforce decisions. It needs a committee with representatives of the people it intends to serve, designers, and solution architects. Of course it also critically needs a healthy set of public and private organizations funding it.
It needs a Development Team to implement the direction the committee has set. This means doing user research and design, technical design, implementing the software, doing advocacy work to promote the project to policymakers, manufacturers, private organizations' IT department and much more.

[!warning] Bikeshedding is a real risk

A Product Committee can be a useful structure for people to express their needs, draft a high-level and realistic solution with designers and solution architects, and test it. Designers and technical architects must remain in charge of designing and implementing the solution.

The GNOME Foundation appears as a natural host for these organs, especially since it's already taking care of the assets of the project like its infrastructure and trademark. A separate organization could more easily pull the project in a direction that serves its own interests.

Additionally, the GNOME Foundation taking on this kind of work doesn't conflict with the present do-ocracy, since volunteers and organizations could still work on what matters to them. But it would remain a major shift in the project's organization and would likely upset some volunteers who would feel that they have less control over the project.

I believe this is a necessary step to make the public and private sector invest in the project, generate stable employment for people working on it, and ultimately make GNOME have a systemic, positive impact on society.

[!warning] GNOME needs solution architects

The GNOME community has designers who have a good product vision. It is also full of experts on their module, but has a shortage of people with a good technical overview of the project, who can turn product issues into technical ones at the scale of the whole project.

So what now?

"The year of the Linux desktop" has become a meme now for a reason. The Linux community, if such a nebulous thing exists, is very good at solving technical problems. But building a project bigger than ourselves and putting it in the hands of the millions of people who need it is not just a technical problem.

Here are some critical next steps for the GNOME Community and Foundation to reclaim personal computing from the trifecta of tech behemoths, and fulfill an increasingly important need for democracies.

Learn from experience

Last year, a team of volunteers led by Sonny Piers and Tobias Bernard wrote a grant bid for the Sovereign Tech Fund, and got granted €1M. There are some major takeaways from this adventure.

At risk of stating the obvious, money does solve problems! The team tackled significant technical issues not just for GNOME but for the free desktop in general. I urge organizations and governments that take their digital independence seriously to contribute meaningfully to the project.

Uncertainty and understaffing have a cost . Everyone working on that budget was paid €45/hour, which is way lower than the market average. The project leads were only billing half-time on the project but worked much more than that in practice, and burnt out on it. Add some operational issues within the Foundation that wasn't prepared to properly support this initiative and you get massive drama that could have been avoided.

Finally and unsurprisingly, one-offs are not sustainable . The Foundation needs to build sustainable revenue streams from a diverse portfolio to grow its team. A €1M grant is extremely generous from a single organization. It was a massive effort from the Sovereign Tech Agency, and a significant part of their 2024 budget. But it is also far from enough to sustain a project like GNOME if every volunteer was paid, let alone paid a fair wage.

Tread carefully, change democratically

Governance and funding are a chicken and egg problem. Funders won't send money to the project if they are not confident that the project will use it wisely, and if they can't weigh in on the project's direction. Without money to support the effort, only volunteers can set up the technical governance processes on their spare time.

Governance changes must be done carefully though. Breaking the status quo without a plan comes with significant risks. It can demotivate current volunteers, make the project lose tractions for newcomers, and die before enough funding makes it to the project to sustain it. A lot of people have invested significant amounts of time and effort into GNOME, and this must be treated with respect.

Build a focused MVP

For the STF project, the GNOME Foundation relied on contractors and consultancies. To be fully operational and efficient it must get in a position of hiring people with the most critical skills. I believe right now the most critical profile is the solution architect one. With more revenue, developers and designers can join the team as it grows.

But for that to happen, the Foundation needs to:

Define who GNOME is for in priority, bearing in mind that "everyone" doesn't exist.
Build a team of representatives of that audience, and a product roadmap: what problems do these people have that GNOME could solve, how could GNOME solve it for them, how could people get to using GNOME, and what tradeoffs would they have to make when using GNOME.
Build the technical roadmap (the steps to make it happen).
Fundraise to implement the roadmap, factoring in the roadmap creation costs.
Implement, and test

The Foundation can then build on this success and start engaging with policymakers, manufacturers, vendors to extent its reach.

Alternative proposals

The model proposed has a significant benefit: it gives clarity. You can give money to the GNOME Foundation to contribute to the maintenance and evolution of GNOME project, instead of only supporting its infrastructure costs. It unlocks the possibility to fund user research that would also benefit all the downstreams.

It is possible to take the counter-point and argue that GNOME doesn't have to be an end-user product, but should remain an upstream that several organizations use for their own product and contribute to.

The "upstream only" model is status-quo, and the main advantage of this model is that it lets contributing organizations focus on what they need the most. The GNOME Foundation would need to scale down to a minimum to only support the shared assets and infrastructure of the project and minimize its expenses. Another (public?) organization would need to tackle the problem of making GNOME a well integrated end-user product.

In the "upstream only" model, there are two choices:

Either the governance of GNOME itself remains the same , a do-ocracy where whoever has the skills, knowledge and financial power to do so can influence the project.
Or the Community can introduce a more formal governance model to define what is part of GNOME and what is not, like Python PEPs and Rust's RFCs .

It's an investment

Building an operating system usable by the masses is a significant effort and requires a lot of expertise. It is tempting to think that since Microsoft, Google and Apple are already shipping several operating systems each, that we don't need one more.

However, let's remember that these are all American companies, building proprietary ecosystems that they have complete control over. In these uncertain times, Europe must not treat the USA as a direct enemy, but the current administration makes it clear that it would be reckless to continue treating it as an ally.

Building an international, transparent operating system that provides an open platform for people to use and for which developers can distribute apps will help secure EU's digital sovereignty and security, at a cost that wouldn't even make a dent in the budget. It's time for policymakers to take their responsibilities and not let America control the digital public space.

chevron_right

Felipe Borges: GNOME is participating in Google Summer of Code 2025!

news.movim.eu / PlanetGnome • 27 February, 2025

The Google Summer of Code 2025 mentoring organizations have just been announced and we are happy that GNOME’s participation has been accepted!

If you are interested in having a internship with GNOME, check gsoc.gnome.org for our project ideas and getting started information.

chevron_right

Jussi Pakkanen: The price of statelessness is eternal waiting

news.movim.eu / PlanetGnome • 27 February, 2025 • 4 minutes

Most CI systems I have seen have been stateless. That is, they start by getting a fresh Docker container (or building one from scratch), doing a Git checkout, building the thing and then throwing everything away. This is simple and matematically pure, but really slow. This approach is further driven by the fact that in cloud computing CPU time and network transfers are cheap but storage is expensive. Probably because the cloud vendor needs to take care of things like backups, they can't dispatch the task on any machine on the planet but instead on the one that already has the required state and so on.

How much could you reduce resource usage (or, if you prefer, improve CI build speed) by giving up on statelessness? Let's find out by running some tests. To get a reasonably large code base I used LLVM. I did not actually use any cloud or Docker in the tests, but I simulated them on a local media PC. I used 16 cores to compile and 4 to link (any more would saturate the disk). Tests were not run.

Baseline

Creating a Docker container with all the build deps takes a few minutes. Alternatively you can prebuild it, but then you need to download a 1 GB image.

Doing a full Git checkout would be wasteful. There are basically three different ways of doing a partial checkout: shallow clone, blobless and treeless. They take the following amount of time and space:

shallow: 1m, 259 MB
blobless: 2m 20s, 961 MB
treeless: 1m 46s, 473 MB

Doing a full build from scratch takes 42 minutes.

With CCache

Using CCache in Docker is mostly a question of bind mounting a persistent directory in the container's cache directory. A from-scratch build with an up to date CCache takes 9m 30s.

With stashed Git repo

Just like the CCache dir, the Git checkout can also be persisted outside the container. Doing a git pull on an existing full checkout takes only a few seconds. You can even mount the repo dir read only to ensure that no state leaks from one build invocation to another.

With Danger Zone

One main thing a CI build ensures is that the code keeps on building when compiled from scratch. It is quite possible to have a bug in your build setup that manifests itself so that the build succeeds if a build directory has already been set up, but fails if you try to set it up from scratch. This was especially common back in ye olden times when people used to both write Makefiles by hand and to think that doing so was a good idea.

Nowadays build systems are much more reliable and this is not such a common issue (though it can definitely still occur). So what if you would be willing to give up full from-scratch checks on merge requests? You could, for example, still have a daily build that validates that use case. For some organizations this would not be acceptable, but for others it might be reasonable tradeoff. After all, why should a CI build take noticeably longer than an incremental build on the developer's own machine. If anything it should be faster, since servers are a lot beefier than developer laptops. So let's try it.

The implementation for this is the same as for CCache, you just persist the build directory as well. To run the build you do a Git update, mount the repo, build dir and optionally CCache dirs to the container and go.

I tested this by doing a git pull on the repo and then doing a rebuild. There were a couple of new commits, so this should be representative of the real world workloads. An incremental build took 8m 30s whereas a from scratch rebuild using a fully up to date cache took 10m 30s.

Conclusions

The amount of wall clock time used for the three main approaches were:

Fully stateless
- Image building: 2m
- Git checkout: 1m
- Build: 42m
- Total : 45m
Cached from-scratch
- Image building: 0m (assuming it is not "apt-get update"d for every build)
- Git checkout: 0m
- Build: 10m 30s
- Total : 10m 30s
Fully cached
- Image building: 0m
- Git checkout: 0m
- Build: 8m 30s
- Total : 8m 30s

Similarly the amount of data transferred was:

Fully stateless
- Image: 1G
- Checkout: 300 MB
Cached from-scratch:
- Image: 0
- Checkout: O(changes since last pull), typically a few kB
Fully cached
- Image: 0
- Checkout: O(changes since last pull)

The differences are quite clear. Just by using CCache the build time drops by almost 80%. Persisting the build dir reduces the time by a further 15%. It turns out that having machines dedicated to a specific task can be a lot more efficient than rebuilding the universe from atoms every time. Fancy that.

The final 2 minute improvement might not seem like that much, but on the other hand do you really want your developers to spend 2 minutes twiddling their thumbs for every merge request they create or update? I sure don't. Waiting for CI to finish is one of the most annoying things in software development.

chevron_right

Sebastian Pölsterl: scikit-survival 0.24.0 released

news.movim.eu / PlanetGnome • 26 February, 2025 • 4 minutes

It’s my pleasure to announce the release of scikit-survival 0.24.0.

A highlight of this release the addition of cumulative_incidence_competing_risks() which implements a non-parameteric estimator of the cumulative incidence function in the presence of competing risks. In addition, the release adds support for scikit-learn 1.6, including the support for missing values for ExtraSurvivalTrees .

Analysis of Competing Risks

In classical survival analysis, the focus is on the time until a specific event occurs. If no event is observed during the study period, the time of the event is considered censored. A common assumption is that censoring is non-informative, meaning that censored subjects have a similar prognosis to those who were not censored.

Competing risks arise when each subject can experience an event due to one of $K$ ($K \geq 2$) mutually exclusive causes, termed competing risks. Thus, the occurrence of one event prevents the occurrence of other events. For example, after a bone marrow transplant, a patient might relapse or die from transplant-related causes (transplant-related mortality). In this case, death from transplant-related mortality precludes relapse.

The bone marrow transplant data from Scrucca et al., Bone Marrow Transplantation (2007) includes data from 35 patients grouped into two cancer types: Acute Lymphoblastic Leukemia (ALL; coded as 0), and Acute Myeloid Leukemia (AML; coded as 1).

from sksurv.datasets import load_bmt
bmt_features, bmt_outcome = load_bmt()
diseases = bmt_features["dis"].cat.rename_categories(
{"0": "ALL", "1": "AML"}
)
diseases.value_counts().to_frame()

dis	count
AML	18
ALL	17

During the follow-up period, some patients might experience a relapse of the original leukemia or die while in remission (transplant related death). The outcome is defined similarly to standard time-to-event data, except that the event indicator specifies the type of event, where 0 always indicates censoring.

import pandas as pd
status_labels = {
0: "Censored",
1: "Transplant related mortality",
2: "Relapse",
}
risks = pd.DataFrame.from_records(bmt_outcome).assign(
label=lambda x: x["status"].replace(status_labels)
)
risks["label"].value_counts().to_frame()

label	count
Relapse	15
Censored	11
Transplant related mortality	9

The table above shows the number of observations for each status.

Non-parametric Estimator of the Cumulative Incidence Function

If the goal is to estimate the probability of relapse, transplant-related death is a competing risk event. This means that the occurrence of relapse prevents the occurrence of transplant-related death, and vice versa. We aim to estimate curves that illustrate how the likelihood of these events changes over time.

Let’s begin by estimating the probability of relapse using the complement of the Kaplan-Meier estimator. With this approach, we treat deaths as censored observations. One minus the Kaplan-Meier estimator provides an estimate of the probability of relapse before time $t$.

import matplotlib.pyplot as plt
from sksurv.nonparametric import kaplan_meier_estimator
times, km_estimate = kaplan_meier_estimator(
bmt_outcome["status"] == 1, bmt_outcome["ftime"]
)
plt.step(times, 1 - km_estimate, where="post")
plt.xlabel("time $t$")
plt.ylabel("Probability of relapsing before time $t$")
plt.ylim(0, 1)
plt.grid()

However, this approach has a significant drawback: considering death as a censoring event violates the assumption that censoring is non-informative. This is because patients who died from transplant-related mortality have a different prognosis than patients who did not experience any event. Therefore, the estimated probability of relapse is often biased.

The cause-specific cumulative incidence function (CIF) addresses this problem by estimating the cause-specific hazard of each event separately. The cumulative incidence function estimates the probability that the event of interest occurs before time $t$, and that it occurs before any of the competing causes of an event. In the bone marrow transplant dataset, the cumulative incidence function of relapse indicates the probability of relapse before time $t$, given that the patient has not died from other causes before time $t$.

from sksurv.nonparametric import cumulative_incidence_competing_risks
times, cif_estimates = cumulative_incidence_competing_risks(
bmt_outcome["status"], bmt_outcome["ftime"]
)
plt.step(times, cif_estimates[0], where="post", label="Total risk")
for i, cif in enumerate(cif_estimates[1:], start=1):
plt.step(times, cif, where="post", label=status_labels[i])
plt.legend()
plt.xlabel("time $t$")
plt.ylabel("Probability of event before time $t$")
plt.ylim(0, 1)
plt.grid()

The plot shows the estimated probability of experiencing an event at time $t$ for both the individual risks and for the total risk.

Next, we want to to estimate the cumulative incidence curves for the two cancer types — acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) — to examine how the probability of relapse depends on the original disease diagnosis.

_, axs = plt.subplots(2, 2, figsize=(7, 6), sharex=True, sharey=True)
for j, disease in enumerate(diseases.unique()):
mask = diseases == disease
event = bmt_outcome["status"][mask]
time = bmt_outcome["ftime"][mask]
times, cif_estimates, conf_int = cumulative_incidence_competing_risks(
event,
time,
conf_type="log-log",
)
for i, (cif, ci, ax) in enumerate(
zip(cif_estimates[1:], conf_int[1:], axs[:, j]), start=1
):
ax.step(times, cif, where="post")
ax.fill_between(times, ci[0], ci[1], alpha=0.25, step="post")
ax.set_title(f"{disease}: {status_labels[i]}", size="small")
ax.grid()
for ax in axs[-1, :]:
ax.set_xlabel("time $t$")
for ax in axs[:, 0]:
ax.set_ylim(0, 1)
ax.set_ylabel("Probability of event before time $t$")

The left column shows the estimated cumulative incidence curves (solid lines) for patients diagnosed with ALL, while the right column shows the curves for patients diagnosed with AML, along with their 95% pointwise confidence intervals. The plot indicates that the estimated probability of relapse at $t=40$ days is more than three times higher for patients diagnosed with ALL compared to AML.

If you want to run the examples above yourself, you can execute them interactively in your browser using binder .

chevron_right

Michael Meeks: 2025-01-17 Friday

news.movim.eu / PlanetGnome • 17 January, 2025

Up early, sync with Dave, Anuj, lunch with Julia, worked away at contractuals. Onto mail catch-up, and slides.

chevron_right

Michael Meeks: 2025-01-16 Thursday

news.movim.eu / PlanetGnome • 16 January, 2025

Up too early; train - with Christian, sky-train, some data analysis on the plane, heathrow-express.
Home, read minutes of calls I missed: seems I should miss more calls; text review, dinner with the family. Worked after dinner, missed bible-stidy group, bed early.

chevron_right

Luis Villa: non-profit social networks: benchmarking responsibilities and costs

news.movim.eu / PlanetGnome • 15 January, 2025 • 4 minutes

I’m trying to blog quicker this year. I’m also sick with the flu. Forgive any mistakes caused by speed, brevity, or fever.

Monday brought two big announcements in the non-traditional (open? open-ish?) social network space, with Mastodon moving towards non-profit governance (asking for $5M in donations this year), and Free Our Feeds launching to do things around ATProto/Bluesky (asking for $30+M in donations).

It’s a little too early to fully understand what either group will do, and this post is not an endorsement of specifics of either group—people, strategies, etc.

Instead, I just want to say: they should be asking for millions.

There’s a lot of commentary like this one floating around:

I don’t mean this post as a critique of Jan or others. (I deliberately haven’t linked to the source, please don’t pile on Jan!) Their implicit question is very well-intentioned. People are used to very scrappy open source projects , so millions of dollars just feels wrong. But yes, millions is what this will take.

What could they do?

I saw a lot of comments this morning that boiled down to “well, people run Mastodon servers for free, what does anyone need millions for”? Putting aside that this ignores that any decently-sized Mastodon server has actual server costs (and great servers like botsin.space shut down regularly in part because of those), and treats the time and emotional trauma of moderation as free… what else could these orgs be doing?

Just off the top of my head:

Moderation, moderation, moderation, including:
- moderation tools, which by all accounts are brutally badly needed in Masto and would need to be rebuilt from scratch by FoF. (Donate to IFTAS !)
- multi-lingual and multi-cultural, so you avoid the Meta trap of having 80% of users outside the US/EU but 80% of moderation in the US/EU.
Jurisdictionally-distributed servers and staff
- so that when US VP Musk comes after you, there’s still infrastructure and staff elsewhere
- and lawyers for this scenario
Good governance
- which, yes, again, lawyers, but also management, coordination, etc.
- (the ongoing WordPress meltdown should be a great reminder that good governance is both important and not free)
Privacy compliance
- Mention “GDPR compliance” and “Mastodon” in the same paragraph and lots of lawyers go pale; doing this well would be a fun project for a creative lawyer and motivated engineers, but a very time-consuming one.
- Bluesky has similar challenges, which get even harder as soon as meaningfully mirrored.

And all that’s just to have the same level of service as currently.

If you actually want to improve the software in any way, well, congratulations: that’s hard for any open source software, and it’s really hard when you are doing open source software with millions of users. You need product managers, UX designers, etc. And those aren’t free. You can get some people at a slight discount if you’re selling them on a vision (especially a pro-democracy, anti-harassment one), but in the long run you either need to pay near-market or you get hammered badly by turnover, lack of relevant experience, etc.

What could that cost, $10?

So with all that in mind, some benchmarks to help frame the discussion. Again, this is not to say that an ATProto- or ActivityPub-based service aimed at achieving Twitter or Instagram-levels of users should necessarily cost exactly this much, but it’s helpful to have some numbers for comparison.

Wikipedia: ( source )
- legal: $10.8M in 2023-2024 (and Wikipedia plays legal on easy mode in many respects relative to a social network—no DMs, deliberately factual content, sterling global brand)
- hosting: $3.4M in 2023-2024 (that’s just hardware/bandwidth, doesn’t include operations personnel)
Python Package Index
- $20M/year in bandwidth from Fastly in 2021 ( source ) (packages are big, but so is social media video, which is table stakes for a wide-reaching modern social network)
Twitter
- operating expenses, not including staff , of around $2B/year in 2022 ( source )
Signal
- $50M/year ( source )
Content moderation
- Hard to get useful information on this on a per company basis without a lot more work than I want to do right now, but the overall market is in the billions ( source ).
- Worth noting that lots of the people leaving Meta properties right now are doing so in part because tens of thousands of content moderators, paid unconscionably low wages , are not enough .

You can handwave all you want about how you don’t like a given non-profit CEO’s salary, or you think you could reduce hosting costs by self-hosting, or what have you. Or you can pushing the high costs onto “volunteers”.

But the bottom line is that if you want there to be a large-scale social network, even “do it as cheap as humanly possible” is millions of costs borne by someone .

What this isn’t

This doesn’t mean “give the proposed new organizations a blank check”. As with any non-profit, there’s danger of over-paying execs, boards being too cozy with execs and not moving them on fast enough, etc. ( Ask me about founder syndrome sometime !) Good governance is important.

This also doesn’t mean I endorse Bluesky’s VC funding; I understand why they feel they need money, but taking that money before the techno-social safeguards they say they want are in place is begging for problems. (And in fact it’s exactly because of that money that I think Free Our Feeds is intriguing—it potentially provides a non-VC source of money to build those safeguards.)

But we have to start with a realistic appraisal of the problem space. That is going to mean some high salaries to bring in talented people to devote themselves to tackling hard, long-term, often thankless problems, and lots of data storage and bandwidth.

And that means, yes, millions of dollars.

chevron_right

Hans de Goede: IPU6 camera support status update

news.movim.eu / PlanetGnome • 14 January, 2025 • 1 minute

The initial IPU6 camera support landed in Fedora 41 only works on a limited set of laptops. The reason for this is that with MIPI cameras every different sensor and glue-chip like IO-expanders needs to be supported separately.

I have been working on making the camera work on more laptop models. After receiving and sending many emails and blog post comments about this I have started filing Fedora bugzilla issues on a per sensor and/or laptop-model basis to be able to properly keep track of all the work.

Currently the following issues are being either actively being worked on, or are being tracked to be fixed in the future.

Issues which have fixes pending (review) upstream:

IPU6 camera on TERRA PAD 1262 V2 not working , fix has been accepted upstream .
IPU6 camera on Dell XPS 9x40 models with ov02c10 sensor not working , sensor driver has been submitted upstream .

Open issues with various states of progress:

See all the individual bugs for more details. I plan to post semi-regular status updates on this on my blog.

This above list of issues can also be found on my Fedora 42 change proposal tracking this and I intent to keep an updated complete list of all x86 MIPI camera issues (including closed ones) there.

comments