Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The fun thing about learning to boot from PXE, is that you have to learn it every time you onboard a new type of hardware... or a new VM hypervisor... or new NIC firmware... or new BIOS firmware.

God help you if you actually want to install an operating system.

PXE is such a vital capability for working with on-prem servers. But it's ten different things which all have to play nicely together. Every time I build a PXE system I feel like I'm reinventing the universe in my tiny subnet.



I've not found this at all -- PXE "just works" on legacy boot or UEFI for me. I've used it for years to install hosts via Foreman (https://theforeman.org/), as well as for personal stuff on my home network, and it's so much better than getting people to use USB sticks or whatever else!


Agreed, PXE seems ideal for provisioning things, but it's just too hard to use, especially when you're not on a network you fully control.

I just want to start the computer, and have it download an immutable OS image from somewhere I decide (and supply a checksum for, etc). I don't want to set up TFTP or any of this other stuff. It feels like I should be able to just specify an IP (let's say) a checksum (maybe supply that information to the NIC directly somehow), and be off to the races after a reboot.


replace the PXE stack with an OS installer written in UEFI. This bootload can be installed through a guest running on the host in the EFI partition, or possibly through PXE or direct UEFI http load.

this allows you intermediate the boot process without coordinating with the administrative owner of the DHCP server, and is actually less janky than PXE


Not to sound incredibly lazy (because I am), but is there anything off the shelf that does this? Anyone doing something similar would also be great -- I saw some UEFI-in-Rust projects recently so maybe it's not too hard to hack through myself.

That said, my original need for this was on Hetzner, and what I did instead was actually completely automate their reset setup (thankfully they have an API to that), so I have a solution, but I would much rather if I could prebake and load images much easier.

I think these days Hetzner has much more UEFI support across it's dedicated server fleet.

Side note: Every time I see tinkerbell and other relatively new PXE boot projecs and it's set of tools I feel relief, then dig in, then feel dread again around iPXE.

Side note 2: I really want to do two things:

1. Load OSes from the network easily

2. Run the OS in RAM (ECC)

I feel like it would take my server management experience to the next level -- I've spent an inordinate amount of time messing with Hetzner USB add-ons and Alpine to try to get things to work, but it wasn't reliable (it worked, but wasn't reliable).


If you're familiar with the Linux boot process, you may be aware that the system often does run from RAM for a period of time, before mounting a filesystem from a block device and calling pivot_root to transfer the rootfs over.

If you want to run the OS from RAM, the absolute simplest way to do it is to simply never transfer. Building a custom initramfs has never been easier. There are tools explicitly geared towards making an initramfs, like Dracut, but also a huge ecosystem of tools for building container images which could almost certainly be scripted to spit out a cpio without a lot of trouble. You can even use something like Buildroot.

As far as loading it from the network easily, I would also look into Unified Kernel Images (UKIs). They combine the EFI stub, the kernel, and the initramfs into one single file. You should be able to load that directly from the PXE firmware.


Yeah I'm famliar with initramfs-es (well at least enough to be dangerous)! I've run Alpine from ram before but it just wasn't stable and I ended up going back to a more traditional and perfectly fine setup.

> If you want to run the OS from RAM, the absolute simplest way to do it is to simply never transfer. Building a custom initramfs has never been easier. There are tools explicitly geared towards making an initramfs, like Dracut, but also a huge ecosystem of tools for building container images which could almost certainly be scripted to spit out a cpio without a lot of trouble. You can even use something like Buildroot.

This certainly seems like a lot of responsibility to take on, I'm just surprised there isn't already a distro that is very good at doing everything it does (with modern ease of use) with the caveat of the OS disk being in RAM. Technically there are (there's a whole list on Wikipedia), but last I tried it just didn't work well for me.

These days there are more build-your-own distro options like CoreOS, Flatcar, linuxkit that absolutely make things even easier but they don't have that final bit of being runnable from RAM.

I question whether it's worth it to maintain my own UEFI/initramfs/boot setup rather than just building (or pulling off the shelf) an immutable OS image and using the reset + wipe automation to flash the disks once in a blue-moon when reconfiguring/scaling machines up/down.

> As far as loading it from the network easily, I would also look into Unified Kernel Images (UKIs). They combine the EFI stub, the kernel, and the initramfs into one single file. You should be able to load that directly from the PXE firmware.

This is certainly interesting -- haven't looked into these much, but this would certainly be useful in the PXE firmware (or custom UEFI). But my problem here is that iPXE is dependent on support at the provider level (and Hetzner does not make this available to end users, though they use it internally obviously when you reset), so the other commenter's suggestion:

> replace the PXE stack with an OS installer written in UEFI.

Would not work with this approach. The suggestion is interesting though, custom installer that pulls down a UKI sounds viable from my particular armchair.


Tails works like this, and has gained a lot of traction in the past few years — although one may argue that running from RAM is only indirectly responsible for its popularity.

I think this idea appeals to many people, also concerning remanence: keeping your system and user data separate to the point that you could virtually mount your /home on any given UNIX host, with the added bonus that if the host is not compatible with your setup, you can always reboot it on your USB stick, run a live ISO on RAM, and retrieve a decent work environment.


True PXE doesn't require coordination with the DHCP server.

https://en.wikipedia.org/wiki/Preboot_Execution_Environment

The network client boot stack sends a DHCPDISCOVER as a broadcast. Any machine can be listening on UDP 67 (bootps) for this. The real DHCP server responds with the DHCPOFFER containing the IP address the client should use. Around the same time, the PXE server responds with its own DHCPOFFER that does not issue an address, but does contain the values for the requested DHCP options.

The client basically keeps broadcasting DHCPDISCOVER until it gets both, then it does the unicast DHCPREQUEST and wait for unicast DHCPACK with the normal DHCP server.

Now, that said, I've only ever seen this work with commercial PXE servers like Microsoft RIS. To my knowledge, ISC DHCPD is unable to send a DHCPOFFER with options but no address. But my knowledge is at least a decade out of date.

At home I just set the options on the main DHCP server like every other hobbyist does, but this isn't true PXE, this is just plain old DHCP+TFTP remote boot.

Let's say you do have such a server that sends DHCPOFFER with the options and no IP address. If it's on its own machine, then it can listen on port 67, same as the real DHCP server on another machine. But, if it's on the same machine as the DHCP server, it has to listen on port 4011. In this case the client behaves a little differently. For this to work, the DHCP server must send as part of the DHCPOFFER an unsolicited option 60 to indicate that the client should go ahead and accept the IP then send a second unicast DHCPDISCOVER to port 4011 and await a DHCPOFFER from that port. Option 60 is only needed, and can only be used, if the independent PXE server is running on the same host as the DHCP server.

So there's basically 3 scenarios: * Hobbyist: just configure the booting options on the real DHCP server * Real PXE, separate machine: Both real DHCP and PXE listen for broadcast DHCPDISCOVER and respond with complementary DHCPOFFER. Real DHCP server has no knowledge whatsoever about booting. * Real PXE, same machine: Real DHCP server responds with unsolicited option 60 no matter what. This is the extend of its knowledge of booting. Separate PXE server runs on port 4011 instead of 67, and everything is unicast.

There may finally be hobbyist projects that support this model, but when I last did this stuff, there were not. Learning how RIS worked was a revelation for me, and it really made me wonder why the hobbyist community of the time seemed hellbent on not doing PXE correctly, which annoyingly requires control over the options set by the real DHCP server, and often makes it impossible to do fun stuff like use different boot files for different clients.


we need to go /stalinmode/ on the whole bootup and initialization industry subsector. it should be required by law for that stuff to be open source and documented.

"but muh competitive advantage??"

its literally a for loop that reads sectors from disk/network into memory and jumps to the start address.

if a local build of the (vendor provided source code) firmware doesn't match the checksum of the build thats flashed on the actual mobo, you get sent to a cobalt mine.


Boot by committee (UEFI) doesn't seem much better than boot by fiat (BIOS). For everything nice it gives you, you lose something nice that BIOS gave you ... or you have something nice that you lose when you exit boot services. Or there's an extension for something nice that isn't usable on mainstream hardware.

UEFI gives you nicer video modes, but not a text mode after boot services.

UEFI has an extension for booting images from the network, but afaik, it's impossible to use, and there's no reasonable way to boot from a disk image; working UEFI network boot has to pull pieces out of the filesytem and present them seperately; as opposed to MEMDISK which makes the image available as a BIOS disk and the image is labeled so that one the OS is loaded, the image can be used without BIOS hooks. If this is possible on UEFI generally, it isn't widely distributed knowledge. Something that will work on any UEFI system that makes it to iPXE, subject to changes to the OS in the image (which is reasonable... MEMDISK needs changes too, unless the OS runs all disk I/O through BIOS APIs)


It's been a long while since I did anything with UEFI, but my recollection is that the standard data structures are reasonably well documented, especially when they are meant to be part of booting an OS

I imagine that whatever UEFI extension implements loading a disk image over the network probably also implements some way of knowing where the sectors are in RAM so that the OS bootloader can choose to hand off access to the memory disk to the OS.

Is such support implemented in any bootloaders? I have no idea. My guess is probably not because people would rather just have the bootloader use the available services to download the disk image itself.


You're getting downvotes for being hyperbolic about it, but boot integrity is really both a consumer safety and a national security issue.


I’m confused, are you talking about getting PXE enabled in the hardware, or customizing something about your PXE software for the new hardware?


There's a lot of nonsense at every level. Especially when dealing with heterogenous infrastructure.

Some NICs support http. Some NICs support tftp. Some NICs have enough memory for a big iPXE, other NICs don't. Some BMC systems make next-boot-to-lan easy, but not all.

We almost always use iPXE in order to normalize our pxe environment before OS kickstart. There's a lot to it and quite a lot of little things that can go wrong. Oh, and every bit of it becomes critical infra.


Ok, that makes more sense. I'm used to iPXE, and I guess that quick bootstrap from PXE->iPXE bypasses a lot of the nonstandard weirdness.


All of 'em.


Yeah in order to automate, you’ve gotta know something about what you’re automating. PXE is not different.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: