FreeBSD has lots of AMD graphics code because Sony Playstation 4 and 5 are both based on FreeBSD. I'm not sure how much, but some of OpenBSD's AMD graphics code comes from FreeBSD[0]. Considering the prolific success of the PS4/PS5, it would make sense that FreeBSD's AMD graphics code is a considerable size. If OpenBSD is using parts of it, then we should expect those parts to be large.
The sys.tar.gz from https://ftp.eu.openbsd.org/pub/OpenBSD/7.7/ (normally unpacked into /usr/src/sys on an OpenBSD machine) represents the entire kernel source code of OpenBSD 7.7. (The userland and compiler are in src.tar.gz, Xorg is in xenocara.tar.gz, and ports are in ports.tar.gz.)
It has grown to 634MB unpacked for the entire kernel source tree.
But the vast majority of this growth is attributable to the sys/dev/pci/drm/amd directory, which is AMD Direct Rendering Manager, standing at 499MB, with the include files at sys/dev/pci/drm/amd/include being 458MB, the biggest of which is the asic_reg directory.
499/634 is 79%.
It follows that 79% of OpenBSD kernel by source code is dedicated to AMD's DRM implementation. Note that we're talking about the source code, NOT compiled code.
It's a huge part of Linux, too, over at drivers/gpu/drm/amd/include/asic_reg:
In OpenBSD, the last release before the explosive growth, was OpenBSD 6.5 (Apr 2019) with sys.tar.gz at 20MB. With OpenBSD 6.6 (Oct 2019), it went to 30MB, now with 7.7 (Apr 2025) it's 64MB compressed.
Clearly this is generated code. I have mixed feelings about this. On the one hand, I'm glad its a single file, as its faster to parse than if it'd been split up among a whole bunch of smaller files. And that it isn't generated during the build is a complexity advantage, even if it is huge. OTOH, no human is going to read this whole file, so I wonder if there was not a better way.
All other things being equal, I'd rather have a codegen step added to the build process for mechanical, non-human-maintained code rather than foist mega files on everyone if those were the only two choices.
Don't make or allow complex, non-portable programs. There's no reason for this. Simplicity and Turing completeness means it can always be written in something understandable and maintainable.
I use open-source OpenBSD is because the entire source tree is small enough for me to understand and manipulate. I guess I expect that it is all human-generated. This unwieldy, proprietary chunk makes me want to ditch graphics support in order to keep my source tree significantly smaller.
If you look at the history of these files, they've basically changed at most once after being committed years ago.
Regenerating such static data from some master source, would be completely pointless, and would add pointless extra dependencies to the build process, and in this specific case, may likely not even be possible because of the proprietary nature of the off-topic tooling that may be required for effective management of the initial files.
---
In OpenBSD, NetBSD and other systems, there's actually a whole bunch of machine-generated files that are always part of the repository.
Things like build manifests (lists of all the binary files in the shipping product, e.g., distrib/sets/lists/base/mi) and pcidevs/usbdevs, are things that immediately come to mind:
Avoiding bison/yacc parser generators as a build dependency, is another common case for the practice.
Personally, I'm a huge proponent of the practice. It allows you to reduce the complexity of the build system, increase the transparency on the history of the changes, and allows people to have a better understanding of where things are coming from, because you can directly find those things in the respective pcidevs.h / usbdevs.h, instead of wondering what is going on, and where those things are defined. It's a HUGE advantage.
I never understood why so many people are horrified at the idea of small amounts of the machine-generated code being manually committed straight into the repositories. It seems like they're incorrectly applying the general rule against such practice, ignoring the specific exceptions that are certainly most beneficial under the circumstances.
One of my favourite other examples is the self-documenting code. E.g., man-pages or test results. For example, maybe you use Go, and your man-pages are automatically generated based on the inline documentation within each go file itself. Committing such human-readable artefacts into the repository is a great idea if that allows everyone to immediately see what's going on with regards to the documentation, instead of having to run the code to see how it works. This increases transparency and code review efficiency, make it easier to promote the changes, because it's very clear to everyone what's going on, without having to reverse-engineer the code, or apply the patches and recompile etc.
Of course, if your whole idea is to hide things from management, and increase the complexity of the system to prevent the newcomers from catching up quickly, then such practices may indeed be detrimental.
> I never understood why so many people are horrified at the idea of small amounts of the machine-generated code being manually committed straight into the repositories.
If you haven't understood maybe you could think more about it, or ask, or reduce the hyperbole until you're looking at something reasonable.
The anount of code were talking about here is by no measure small, nor is it "horrifying" people. Your post reads like the kind of weird advocacy that shows up in Jira pissing matches.
I'm not talking about this specific AMD DRM code that's taking over the entire tree, I'm talking about the specific other examples that I've outlined in my prior message.
Instead of violating the HN guidelines by assuming malice and portraying me as never having bothered to think about the issue, why don't you educate all of us why exactly is it a problem to have the build lists and usbdevs/pcidevs artefacts be part of the repos of reputable open-source projects like OpenBSD / NetBSD? Specifically, addressing the lack of benefits that I've identified as existing in the status quo?
Because all these other people who have an issue with such practice, never bother to provide any convincing arguments why the rule should never be violated, either, and how the benefits that I've identified, aren't worth the hassle. They literally just don't even listen why we're doing it, and they never provide alternatives that fit the requirements and don't require lots of extra worse or hassle. And they're not even doing OSS, either, so, it's not like their code quality is even better in any way, because it's well known that the quality of the closed-source software is often far worse than OSS, especially when we're talking about OpenBSD here.
But really, who needs THAT many hardware registers? I'm guessing that what's happening here is the internal "registers" are actually structs in memory, and someone wanted to address one memory location and thought, "I know, I'll auto-generate #defines for all of the words/longs/bytes!" And now, we're stuck with a billion "hardware registers" that are no such thing. Because somewhere in the code they might be used as such.
The size of the MMIO addressable register space on many modern VLSI devices is shocking the first time you see it. However, OS device drivers often do not need to access more than a small subset of the registers. In addition, the register layout for functional units within a larger device is often identical after accounting for unremarkable changes to unit base addresses or iterative, generational addition of new registers.
The problem is the language and toolchain for OS device drivers cannot consume the manifests RTL designers use to enumerate registers, and and RTL designers rarely share the manifests and toolchains they use to generate source files for the OS developers. Instead, it is common to generate and share sources for the entire MMIO space of every supported chip revision.
To eliminate the source bloat this produces, OS driver developers would need to work with RTL design teams to release IP sanitized register manifests and tooling that can generate saner outputs for their own consumption. This is fairly specialized work and there is not a strong business incentive for most large firms to support it.
What about just committing the symbols that are actually referenced in the code? I be most almost all of those registers are never mentioned elsewhere and and could be culled with an appropriate dead-code elimination step (per release).
I feel then people would then complain that they're shipping "incomplete" or "obfuscated" code.
These headers are likely generated from their internal RTL, and somewhat part of the documentation to allow other OSS users to understand the interface. Even if most aren't used by most client drivers, and "just" some internal detail of their firmware/GPU command processing, or even optional and completely unused in current drivers, they may be useful to the community if the host can see and modify them.
Sure, you could argue that should be "split out" from the headers and documented separately, but at this point I'm generally happy for the hardware companies to give the community as much as possible and let them decide what's relevant.
I’m not aware of any widely available tools that can identify unreferenced C preprocessor macros and newer language constructs that are amenable to analysis are still fairly new.
Removing unreferenced definitions from open source patches would also underscore the fact that driver code is already largely inaccessible to contributors that don’t have access to hardware specifications. The few that persist without it probably appreciate that the full listings are still published somewhere.
On the one hand I sort of understand it, by not having a stable ISA you don't get tied down in past mistakes and can have really clever hardware innovations. But on the other hand it leads to bullshit like this. where every hardware device is a special snowflake and effectively needs it's own unique driver.
Final thoughts, It's not perfect but I really appreciate AMD's more open stance that gives us this source. And to the absolute heroes who are able to take this big steaming linux centric turd and make it work on openbsd, huge salutes. well done.
However I do wish AMD would clean up their driver. Or at least settle on an ISA.
> However I do wish AMD would clean up their driver. Or at least settle on an ISA.
These two things have absolutely nothing to do with each other - drivers don't talk to the cores, which "run" the ISA, they talk to the command processors (which are RISCV or MIPS or some other uC) using the command packet protocol.
Most of the graphic drivers is already in user space. What's in the kernel is there the to manage memory and handle coordinate command submission and scheduling as well as display output configuration. There is no reason why this shouldn't be in the kernel.
But why do we need 499MB for such simple tasks, where all the rest of the kernel combined still fits in 135MB, with all the drivers for all the other devices, including Intel graphics?
I mean, it's 3.7x larger than all the rest of the kernel (135x3.7=499), how is that in any way reasonable for such a small set of functions across so few devices to take up so much space?
FreeBSD has lots of AMD graphics code because Sony Playstation 4 and 5 are both based on FreeBSD. I'm not sure how much, but some of OpenBSD's AMD graphics code comes from FreeBSD[0]. Considering the prolific success of the PS4/PS5, it would make sense that FreeBSD's AMD graphics code is a considerable size. If OpenBSD is using parts of it, then we should expect those parts to be large.
[0] https://www.phoronix.com/news/MTQzNjI
> 79% of OpenBSD kernel source is AMD DRM
The sys.tar.gz from https://ftp.eu.openbsd.org/pub/OpenBSD/7.7/ (normally unpacked into /usr/src/sys on an OpenBSD machine) represents the entire kernel source code of OpenBSD 7.7. (The userland and compiler are in src.tar.gz, Xorg is in xenocara.tar.gz, and ports are in ports.tar.gz.)
It has grown to 634MB unpacked for the entire kernel source tree.
But the vast majority of this growth is attributable to the sys/dev/pci/drm/amd directory, which is AMD Direct Rendering Manager, standing at 499MB, with the include files at sys/dev/pci/drm/amd/include being 458MB, the biggest of which is the asic_reg directory.
499/634 is 79%.
It follows that 79% of OpenBSD kernel by source code is dedicated to AMD's DRM implementation. Note that we're talking about the source code, NOT compiled code.
It's a huge part of Linux, too, over at drivers/gpu/drm/amd/include/asic_reg:
https://github.com/torvalds/linux/tree/master/drivers/gpu/dr...
In OpenBSD, the last release before the explosive growth, was OpenBSD 6.5 (Apr 2019) with sys.tar.gz at 20MB. With OpenBSD 6.6 (Oct 2019), it went to 30MB, now with 7.7 (Apr 2025) it's 64MB compressed.
https://github.com/torvalds/linux/blob/master/drivers/gpu/dr...
11k lines of #defines
Is this truly necessary?
11k defines in gc is nothing, the files in the nbio dir are so big, github even refuses to parse many them:
https://kernel.googlesource.com/pub/scm/linux/kernel/git/tor...
https://github.com/torvalds/linux/blob/master/drivers/gpu/dr...
The last file in nbio is a header file with 38900 lines — a single file of 3.92 MB.
There's actually another one in nbio that's 16MB:
https://github.com/torvalds/linux/blob/master/drivers/gpu/dr...
> The last file in nbio is a header file with 38900 lines — a single file of 3.92 MB.
Good programming practices gone extreme. We really need some low memory machines for developers.
Brotli was able to crush it down to 229 KiB after about 30 seconds, but still, this is an absurd amount of unnecessary, low value bullshit.
Clearly this is generated code. I have mixed feelings about this. On the one hand, I'm glad its a single file, as its faster to parse than if it'd been split up among a whole bunch of smaller files. And that it isn't generated during the build is a complexity advantage, even if it is huge. OTOH, no human is going to read this whole file, so I wonder if there was not a better way.
All other things being equal, I'd rather have a codegen step added to the build process for mechanical, non-human-maintained code rather than foist mega files on everyone if those were the only two choices.
I suppose it depends on the portability of the mega-files. It could be an output from a complex non-portable program.
Don't make or allow complex, non-portable programs. There's no reason for this. Simplicity and Turing completeness means it can always be written in something understandable and maintainable.
Simple portable programs that perform nontrivial tasks are expensive. Open source overcomes this where possible by socializing the cost.
I use open-source OpenBSD is because the entire source tree is small enough for me to understand and manipulate. I guess I expect that it is all human-generated. This unwieldy, proprietary chunk makes me want to ditch graphics support in order to keep my source tree significantly smaller.
After cleaning up the sources, the whole chip would still be an unwieldy proprietary chunk - you would just be able to ignore it more easily.
If you look at the history of these files, they've basically changed at most once after being committed years ago.
Regenerating such static data from some master source, would be completely pointless, and would add pointless extra dependencies to the build process, and in this specific case, may likely not even be possible because of the proprietary nature of the off-topic tooling that may be required for effective management of the initial files.
---
In OpenBSD, NetBSD and other systems, there's actually a whole bunch of machine-generated files that are always part of the repository.
Things like build manifests (lists of all the binary files in the shipping product, e.g., distrib/sets/lists/base/mi) and pcidevs/usbdevs, are things that immediately come to mind:
https://github.com/search?q=repo%3Aopenbsd%2Fsrc+sync&type=c...
https://github.com/search?q=repo%3Aopenbsd%2Fsrc+regen&type=...
Avoiding bison/yacc parser generators as a build dependency, is another common case for the practice.
Personally, I'm a huge proponent of the practice. It allows you to reduce the complexity of the build system, increase the transparency on the history of the changes, and allows people to have a better understanding of where things are coming from, because you can directly find those things in the respective pcidevs.h / usbdevs.h, instead of wondering what is going on, and where those things are defined. It's a HUGE advantage.
I never understood why so many people are horrified at the idea of small amounts of the machine-generated code being manually committed straight into the repositories. It seems like they're incorrectly applying the general rule against such practice, ignoring the specific exceptions that are certainly most beneficial under the circumstances.
One of my favourite other examples is the self-documenting code. E.g., man-pages or test results. For example, maybe you use Go, and your man-pages are automatically generated based on the inline documentation within each go file itself. Committing such human-readable artefacts into the repository is a great idea if that allows everyone to immediately see what's going on with regards to the documentation, instead of having to run the code to see how it works. This increases transparency and code review efficiency, make it easier to promote the changes, because it's very clear to everyone what's going on, without having to reverse-engineer the code, or apply the patches and recompile etc.
Of course, if your whole idea is to hide things from management, and increase the complexity of the system to prevent the newcomers from catching up quickly, then such practices may indeed be detrimental.
> I never understood why so many people are horrified at the idea of small amounts of the machine-generated code being manually committed straight into the repositories.
If you haven't understood maybe you could think more about it, or ask, or reduce the hyperbole until you're looking at something reasonable.
The anount of code were talking about here is by no measure small, nor is it "horrifying" people. Your post reads like the kind of weird advocacy that shows up in Jira pissing matches.
I'm not talking about this specific AMD DRM code that's taking over the entire tree, I'm talking about the specific other examples that I've outlined in my prior message.
Instead of violating the HN guidelines by assuming malice and portraying me as never having bothered to think about the issue, why don't you educate all of us why exactly is it a problem to have the build lists and usbdevs/pcidevs artefacts be part of the repos of reputable open-source projects like OpenBSD / NetBSD? Specifically, addressing the lack of benefits that I've identified as existing in the status quo?
Because all these other people who have an issue with such practice, never bother to provide any convincing arguments why the rule should never be violated, either, and how the benefits that I've identified, aren't worth the hassle. They literally just don't even listen why we're doing it, and they never provide alternatives that fit the requirements and don't require lots of extra worse or hassle. And they're not even doing OSS, either, so, it's not like their code quality is even better in any way, because it's well known that the quality of the closed-source software is often far worse than OSS, especially when we're talking about OpenBSD here.
Yes, it's hardware registers. Could be a better idea to generate that from some more compact format, though.
But really, who needs THAT many hardware registers? I'm guessing that what's happening here is the internal "registers" are actually structs in memory, and someone wanted to address one memory location and thought, "I know, I'll auto-generate #defines for all of the words/longs/bytes!" And now, we're stuck with a billion "hardware registers" that are no such thing. Because somewhere in the code they might be used as such.
The size of the MMIO addressable register space on many modern VLSI devices is shocking the first time you see it. However, OS device drivers often do not need to access more than a small subset of the registers. In addition, the register layout for functional units within a larger device is often identical after accounting for unremarkable changes to unit base addresses or iterative, generational addition of new registers.
The problem is the language and toolchain for OS device drivers cannot consume the manifests RTL designers use to enumerate registers, and and RTL designers rarely share the manifests and toolchains they use to generate source files for the OS developers. Instead, it is common to generate and share sources for the entire MMIO space of every supported chip revision.
To eliminate the source bloat this produces, OS driver developers would need to work with RTL design teams to release IP sanitized register manifests and tooling that can generate saner outputs for their own consumption. This is fairly specialized work and there is not a strong business incentive for most large firms to support it.
What about just committing the symbols that are actually referenced in the code? I be most almost all of those registers are never mentioned elsewhere and and could be culled with an appropriate dead-code elimination step (per release).
I feel then people would then complain that they're shipping "incomplete" or "obfuscated" code.
These headers are likely generated from their internal RTL, and somewhat part of the documentation to allow other OSS users to understand the interface. Even if most aren't used by most client drivers, and "just" some internal detail of their firmware/GPU command processing, or even optional and completely unused in current drivers, they may be useful to the community if the host can see and modify them.
Sure, you could argue that should be "split out" from the headers and documented separately, but at this point I'm generally happy for the hardware companies to give the community as much as possible and let them decide what's relevant.
I’m not aware of any widely available tools that can identify unreferenced C preprocessor macros and newer language constructs that are amenable to analysis are still fairly new.
Removing unreferenced definitions from open source patches would also underscore the fact that driver code is already largely inaccessible to contributors that don’t have access to hardware specifications. The few that persist without it probably appreciate that the full listings are still published somewhere.
So it’s not Digital Rights Management?
Direct Rendering Manager
I've always thought that was an unfortunate name collision.
On the one hand I sort of understand it, by not having a stable ISA you don't get tied down in past mistakes and can have really clever hardware innovations. But on the other hand it leads to bullshit like this. where every hardware device is a special snowflake and effectively needs it's own unique driver.
Final thoughts, It's not perfect but I really appreciate AMD's more open stance that gives us this source. And to the absolute heroes who are able to take this big steaming linux centric turd and make it work on openbsd, huge salutes. well done.
However I do wish AMD would clean up their driver. Or at least settle on an ISA.
> However I do wish AMD would clean up their driver. Or at least settle on an ISA.
These two things have absolutely nothing to do with each other - drivers don't talk to the cores, which "run" the ISA, they talk to the command processors (which are RISCV or MIPS or some other uC) using the command packet protocol.
This lends some credence to the idea that modern computers are actually GPUs and the CPU and OS are just boot support software.
related "Linux kernel maintainer says no to AMDGPU patch" https://news.ycombinator.com/item?id=13136426
So how did we end up with a situation where 80% of the kernel is AMD DRM?
Openbsd != Linux
Not relevant because AMD DRM is still the same, and it is distributed via Linux.
For Mac and Windows users: Not to be confused with Digital Rights Management:
https://en.wikipedia.org/wiki/Direct_Rendering_Manager
Belongs in userspace. Absolutely not the kernel.
But we can't seem to move past archaic UNIX architecture.
While there are issues with Window's WDM (badly writen drivers abound), it really should be looked as a model.
_but_ that would require a stable ABI. Which is specifically called out as not desired here https://www.kernel.org/doc/html/next/process/stable-api-nons...
There are valid concerns, but the analysis doesn't lay out the issues with the current design either making it a one sided review.
Could you tell more about the WDM advantages ?
Most of the graphic drivers is already in user space. What's in the kernel is there the to manage memory and handle coordinate command submission and scheduling as well as display output configuration. There is no reason why this shouldn't be in the kernel.
But why do we need 499MB for such simple tasks, where all the rest of the kernel combined still fits in 135MB, with all the drivers for all the other devices, including Intel graphics?
I mean, it's 3.7x larger than all the rest of the kernel (135x3.7=499), how is that in any way reasonable for such a small set of functions across so few devices to take up so much space?
AMD is bloat