Homelab

15 Jun 2025

It ain't dumb if it works

By Adam on 15 Jun 2025, 08:27

Sometimes I think I over-complicate things.

In fact, I know that I do. e.g. this blog. I really don't need this to be hosted in my homelab on a high-availability server cluster. I could just use Blogger or any other free, hosted blog solution.

Anyway, today's example is an oddity I noticed in my DNS logs. Well, actually, it started when I noticed that the RAM in my router was filled with files in /tmp. Like a whole GB of files. I traced it down to my DNS logs, which were occupying almost the entire /tmp mount. I found that I had apparently set the log retention to 90 days. I reduced it to something more reasonable and I think that should solve the problem. However, that also made me curious about what was in those logs...

I found that the majority of the DNS requests were for "0.10.0.5.1". Which sort of looks like an IP address, except for that extra 0 at the start. Since it's not a valid address, something was interpreting it as a hostname and trying to resolve it. I knew from the logs that it was originating from my webserver (the same server which hosts this blog), but I didn't know which application was actually making the requests. I was able to catch the outgoing port with lsof, but it was an ephemeral port and too short-lived for me to trace it back to an application.

I spent a while trying to catch it with auditctl rules, but I'm not very familiar with auditd and I wasn't making any progress. I google'd around and even tried a chat bot to come up with more ideas on how to catch this application; strace, bpftrace, and so on, without any success.

Finally, I decided to try doing it "the dumb way."

Instead of trying to outsmart this application and catch it in the act, I figured something must be configured to send requests to 0.10.0.5.86, so I just searched the entire /etc/ folder for that string via grep -Rnw '/etc/' -e '0\.10\.\0\.5\.1'.

And grep: /etc/systemd/timesyncd.conf:20:NTP=0.10.0.5.1

Well, crap. I had configured timesyncd months ago to use my router as its time server and apparently I accidentally added an extra 0 to the IP address. Oh well. A quick edit and systemctl restart systemd-timesyncd.service later, and the issue is fixed.

no trackbacks

3 Mar 2025

Refreshing Wireguard VPN

By Adam on 3 Mar 2025, 07:24

Quick post:

I've been using Wireguard to connect to my offsite back-up server while a little while now. It works quite well, but I've encountered an issue with one of it's limitations: dynamic IP addresses. The remote network has a dynamic IP address and uses dynamic DNS to provide a resolution. This works perfectly fine when initializing the WG interface, but if the IP changes afterwards, WG does not attempt to re-resolve the domain name. The only way to restore the link is to restart the WG service.
This limitation isn't much of a problem for static IPs or devices that routinely disconnect and re-connect (like a cell phone or laptop), but when trying to run two servers with a 24/7 connection, it poses a problem.

Luckily, I think it's a problem with a simple solution:

#/bin/sh
if ping -c 1 10.10.0.15 > /dev/null 2>&1; then
  echo "success"
else
  echo "failure, restarting interface"
  curl -s \
    --form-string "token=$PUSHOVERTOKEN" \
    --form-string "user=$PUSHOVERUSER" \
    --form-string "message=Lost Connection to remote server - Restarting VPN" \
    https://api.pushover.net/1/messages.json > /dev/null 2>&1
  ifdown wg_remote
  sleep 5
  ifup wg_remote
fi

I created this script which periodically runs on my router via cron. If the router can ping the remote server via it's WG address (10.10.0.15), we assume that the VPN is functioning. If the ping fails, it sends me a quick FYI via Pushover, then brings the VPN interface (wg_remote) down, waits a few seconds, then brings it back up, triggering the domain resolution.

It's simple, but it seems to work.

no trackbacks

21 Feb 2025

Cloudflare for the Selfhoster

By Adam on 21 Feb 2025, 17:23

If you selfhost any applications and have wondered how you can best access those applications from outside your network, you’ve undoubtedly come across Cloudflare. Cloudflare offers two services in particular that might be attractive to homelabbers and selfhosters, reverse proxying and “tunnels”.

Both of these offer some degree of benefit; proxying potentially offers a degree of Denial of Service (DoS) ¹ protection and use of their Web Application Firewall (WAF), and the tunnel additionally adds the benefit of circumventing the host’s firewall and port-forwarding rules. To sweeten the deal, both of these services are offered “for free”, with the proxying actually being the default option on their free plans.

Lets examine these benefits briefly and see why they’re as popular as they are.

Advantages

DoS Protection

By virtue of being such a massive network, Cloudflare’s proxies is able to absorb a massive number of requests and block those requests from reaching their customer. You can think of it as a dam, when a massive flood of requests come in, Cloudflare is in front of the customer holding back the waters.

WAF

Cloudflare’s WAF allows customers to set firewall rules on the proxy itself, which can block or limit outside requests from ever reaching the customer, similar to the DoS protection.

Bypassing Firewalls (tunnel)

Many firewalls are configured to allow outgoing connections and block incoming connection. This is great for security, since it prevents others from accessing your stuff from the public internet, but this can be a problem if you want people to access your website or services. Cloudflare’s tunnel service takes advantage of the “allow outgoing” rule to establish a connection from the host server to Cloudflare, and then allowing incoming requests to tunnel through this connection and access the host server.

Bypassing Port-Fowarding rules (tunnel)

If you’re hosting a service from behind an IPv4 address, there’s a good chance that you’re using a technology called Network Address Translation (NAT) ². This allows you to have multiple systems running behind a single public IP address. This concept can be taken further by Carrier Grade (CG) NAT, where the ISP applies NAT rules to their customers. The problem with NAT is that it becomes difficult to make an incoming request to a machine behind NAT. To an outside system, all of the machines behind the NAT layer appears to have a single address. To partially overcome this, a rule needs to be configured that says “when a request comes in for port X, forward it to private address Y, port Z.” This allows, in a limited way, an outside connection to make use of the NAT address and a particular port to send traffic to a specific machine behind the NAT layer. In the case of CGNAT however, these rules would need to be created by the ISP, a service which most do not offer. Similar to taking advantage of the outgoing firewall rules, a tunnel also uses outgoing ports to avoid the need to create specific port forwarding rules and then directly tunnels a list of ports straight to the host machine.

On the surface, this all sounds like a great product; it allows smaller customers to selfhost even if in a sub-optimal environment.

Considerations

Cloudflare’s proxy and tunnel models both have several aspects, however, which might make some want to reconsider their use.

DoS Protection

DoS attacks can vary in size and scope and their impact will depend not only on the volume of the attack, but also on the capabilities of the targeted system. For example, for a small home server, a few hundred connections per second might be enough cause a denial of service. On the extreme opposite, Cloudflare’s servers routinely handle an average of 100,000,000 requests per second without issue. With those numbers in mind, a DoS on a small home server might be so small that it goes unnoticed by Cloudflare’s protections. I do have to say “might” here, though, because I could not find a definitive answer to how Cloudflare determines what a DoS attack looks like. I would expect, however, that the scale of a home server would be so small compared to the majority of Cloudflare’s customer base that an effective DoS attack would not look significantly different than “normal” traffic.
Lastly, and quite importantly, this protection only exists for attacks targeted at a domain name. Domains exist to make the internet easier for humans, for a computer script or bot, an IP address is far simpler. If an attacker targets the hosts IP address directly, that attack will completely bypass whatever protection Cloudflare was providing. If you’re behind CGNAT, this means the attack will target the ISP, but if you have a publicly addressable IP address, that attack is directly targeting your home router.

Security

Cloudflare offers these services for free, and as the saying goes: “If you are not paying for it, you’re not the customer; you’re the product.” Cloudflare advertises that their proxy and tunnel services can optionally provide “end-to-end” (e2e) encryption. In the traditional and commonly used definition, this means that traffic is encrypted by the client (your or your users) and decrypted by the host (your server). Under the normal definition, no intermediary device can decrypt and read the traffic.

Cloudflare, however, uses the term a little differently, as you can see in this graphic.

cloudflare’s “end-to-end” encryption

Instead of providing traditional e2e, Cloudflare acts as a “Man in the Middle” (MITM), receiving the traffic, decrypting it, analyzing it, and then re-encrypting it before sending it along. Cloudflare does this in order to provide their services; they collect and store the unencrypted data in order to apply WAF rules, analyze patterns, and so on.
Now, Cloudflare is a giant company with billions in contracts; contracts they could potentially lose if they were found to be misusing customer data. They wouldn’t benefit by leaking your nude photos from your NextCloud instance or exposing your password for Home Assistant, but you should understand that by giving them the keys to your data, you are placing your trust in them. This MITM position also means that, theoretically, Cloudflare could alter your content without the you (or your users) knowing about it. Normally this would cause a modern browser to display a very large “SOMEONE MIGHT BE TRYING TO STEAL YOUR DATA” warning, but because you are specifically allowing Cloudflare to intercept your data, the browser has no way of confirming whether the content actually came from your server or Cloudflare themselves.
Cloudflare does have a privacy policy, which explains exactly how Cloudflare intends to use your data, and a transparency report, which is intended to show exactly how many times each year Cloudflare has provided customer data to Government entities.

The warning you would normally get during a MITM attack:
SSL Warning

Content Limitations

Lastly, Cloudflare, by default, only proxies or tunnels certain ports. If you want to forward an unusual port through Cloudflare, like 70 or 125, you would need to use a paid account or install additional software. Cloudflare's Terms of Service also limit the type of data you can serve through their free proxies and tunnels, such as prohibiting the streaming of video content on their free plans.

Alternatives

Are there other ways to get similar benefits?

If only a limited number of users need access to your server or network and you have a public IP address, the best solution by far is to use a VPN service, such as Wireguard. Wireguard allows you to securely access your network without exposing any attack surface for bots or malicious actors.
If you don’t have a public IP, there’s a service called “Tailscale” which uses the same outgoing-connection trick to bypass firewalls and CGNAT, but instead of acting as a MITM privy to all of your traffic, Tailscale simply coordinates the initial connection and then allows the the client and host to establish their own secure, encrypted connection.
If you do want to expose a service to the world (like a public website) and you have a public IP address, you can simply forward the needed port/s from your router to the server (typically 443 and possibly 80).
If you want your service to be exposed to the world , but are stuck behind CGNAT, then a Virtual Private Server (VPS) is a potential option. These typically have a small cost, but they provide you with a remote server that can act as a public gateway to your main server. They can also provide a degree of DoS protection, since they’re using a much larger network and you can simply turn off the connection between the VPS and your server until things calm down.

Closing thoughts

Cloudflare offers some great benefits, but if you're particularly security-minded, you may want to look into alternatives. Even though Cloudflare is trusted by numerous customers around the world, you still have to decide if you want to trust them with your data. While there are other alternatives, though, Cloudflare's offerings are mostly free and comparatively easy to use.

1. DoS or a "Denial-of-Service (DoS) Attack" is a cyber act in which an attacker floods a server with internet traffic to prevent legitamte users from accessing the services. Cloudflare offers an example of one type of DoS attack here.

NAT translates private IP addresses in an internal network to a public IP address before packets are sent to an external network.

no trackbacks

31 Jan 2025

Short updates - new keyboard and SSDs

By Adam on 31 Jan 2025, 07:35

New Keyboard

I picked up a Feker Alice98 in an attempt to get something a little more "ergonomic" than my previous keyboard. It's an "Alice" layout, with the interesting distinction of having two "B" keys.
This keyboard is also compatible with VIA, making keymapping, backlighting, and macros easy to manage. This keyboard isn't listed on VIA's website, though, so it requires manually importing the keyboard definition file (attached to this post). Hopefully VIA will add it officially, eventually.

"New" SSDs for the cluster

I also picked up some gently-used Intel TLC SSDs for my main Proxmox nodes.

I had noticed before that the IO delay ^[1] on my nodes would creep in to the 80%+ range when running any reasonably write-intensive task, like a system update. I also received feedback that this blog seemed slow, which I suspected might be caused by the same issue. While the consumer QLC SSDs I was previously using seem fast for normal desktop use, their short-comings become quite noticeable when running multiple VMs. Here's a screen shot of the IO delay during fairly normal use before, compared to the the IO delay now while running updates:

(blue is the IO delay)

While the old ones would routinely creep in to the 50% or higher range while running fairly simple tasks, the new SSDs peaked around 5%, even when under load.

Note(s)

^{^}Amount of time that the CPU hangs idle while waiting for IO tasks (reading or writing to a drive) to complete.

one attachment

26 Jan 2025

Blog Mirroring to Gopher

By Adam on 26 Jan 2025, 09:28

gopher
rss

This blog is also available via gopher! And here's how:

Gophernicus

As I mentioned in my previous post, my gopher server of choice is Gophernicus. One of the benefits of Gophernicus is that it will automatically generate gophermaps based on the files in it finds in a directory. This means that adding entries is as simple as dropping a file into the directory. The next time that server is accessed, the new file will appear automatically.

Mirroring

All that remains is finding a way to easily add files to the gopher directory. Since I already have this blog, I decided the first thing I wanted to do was to mirror these posts into the gopherspace. This blog uses Dotclear, which provides a plugin to access an RSS feed for the blog. By fetching (and then parsing) this feed, I can export the contents into text files accessible to gopher. I wrote a perl script to accomplish this and created systemd units to execute that script on a recurring schedule to pull in new entries. The full source code is available on my github. The script uses LWP:Protocol:https to fetch the RSS feed and XML::Feed to extract the title, date, and body of each entry. The date and title are used as the file name, and the body is reduced down to plain text and then written to the file.

If you'd like to use the script, it should, probably, maybe work with other RSS and ATOM feeds, but some feeds can be a bit loosey-goosey about how they handle their XML, so no guarantees.

18 Jan 2025

Let's Gopher!

By Adam on 18 Jan 2025, 08:59

I've decided it's time to get my gopher server running again, and wanted to outline the basic steps for anyone else looking to dive into the gopherhole.

install

I personally like gophernicus, a "modern full-featured (and hopefully) secure gopher daemon." Assuming that you're running a Debian server, we can install a simple gopher server via apt:

apt install gophernicus

installing screenshot

This should also install a socket unit and create a service template for gophernicus.

config

Gophernicus defaults to serving from either /var/gopher/ or /srv/gopher. On my install, the default config file is stored at /etc/defaults/gophernicus. I feel like /etc/defaults/ isn't used very often these days, but the intention is to provide a simple way for a developer to provide a default configuration for their application. We can create a new config somewhere reasonable, like /etc/gophernicus or just modify the existing file in /etc/default/ like a lunatic. I'm choosing the latter, of course.

The important thing to add is a hostname, but we can also turn off any features we don't want.

My config looks like OPTIONS=-r /srv/gopher -h gopher.k3can.us -nu.

The -nu option just disables the "personal" gopherspace (serving of ~user directories). Unless you know what this does and intend to use it, I'd suggest disabling it.

testing

We should now be able to start the service and access our empty gopherhole via systemctl start gophernicus.socket, which will create an instance of the gophernicus@.service unit. We can run a quick test by creating a file to serve and viewing the gophermap. To create the test file: touch /srv/gopher/test.txt, and then we can fetch the gophermap via telnet [ip address]-70.

 Trying 192.168.1.5...
Connected to 192.168.1.5.
Escape character is '^]'.

i[/]    TITLE   null.host   1
i       null.host   1
0hello-world.txt                       2025-Jan-18 09:47     0.1 KB /test.txt   gopher.k3can.us 70
.
Connection closed by foreign host.

That little jumble of text is our gohpermap. Lines starting with i indicate "information", while the 0 indicates a link to a text file. Gophernicus creates this map automagically by examining the content of the directory (although it also provides the option of creating a map by hand). To add files and folders, we can simply copy them into the /srv/gopher and gophernicus will update the map to include the new files.

From here, if we want to expose this publicly, can simply route/port-forward through our router.

In my case, though, I'm going to need to configure a few more components before I open it up to the public... First, I use (apparmor)[https://apparmor.net/] to limit application access, and second, my webserver lives behind a reverse proxy.

apparmor

For apparmor, I created a profile for gophernicus:

include <tunables/global>
# AppArmor policy for gophernicus
# by k3can

/usr/sbin/gophernicus {
  include <abstractions/base>
  include <abstractions/hosts_access>
  network inet stream,

  /etc/ld.so.cache r,
  /srv/gopher/ r,
  /srv/gopher/** r,
  /usr/bin/dash mrix,

}

This profile limits which resources gopernicus has access to. While gophernicus should be fairly secure as is, this will prevent it from accessing anything it shouldn't on the off-chance that it somehow becomes compromised. Apparmor is linked above if you want to get into the details, but I'm essentially telling it that gophernicus is allowed to read a number of commonly-needed files, run dash, and access its tcp stream. Gophernicus will then be denied access to anything not explicitly allowed above.

nginx

Lastly, to forward gopher through my reverse proxy, I added this to my nginx configuration:

#Gopher Stream Proxy

 stream {
     upstream gopher {
         server 192.168.1.5:70;
     }

     server {
              listen 70;
              proxy_pass    gopher;
     }
 }

Since nginx is primarily designed to proxy http trafic, we have to use the stream module to forward the raw TCP stream to the upstream (host) server. It's worth noting that as a TCP stream, ngnix isn't performing any virtual host matching, it's simply passing anything that come into port 70 on to the uptream server. This means that while I've defined a specific subdomain for the host, any subdomain will actually work as long as it comes into port 70; gopher://blog.k3can.us, gopher://www.k3can.us, and even gopher://sdkfjhskjghsrkuhfsef.k3can.us should all drop you into the same gopherhole.

access

While telnet will show you the gophermap, the intended way to traverse gopher is through a proper client application. For Linux, Lynx is a command-line based web-browser/gopher client available in most repos, and for Android, DiggieDog is available through the PlayStore.

next steps

Now all that's for me to do is add content. I used to run a script that would download my Mastodon posts and save them to a "phlog" (a gopher blog), and I could potentially mirror this blog to there, as well. That helped me keep the content fresh without needing to manually add files. I haven't quite decided if I want the gopherhole to primarily be a mirror of my other content, or if I want to be more intentional with what I put there.

Besides figuring out content, I'm also curious about parsing my gophernicus logs through Crowdsec. Unsurprisingly, there's not currently a parser available on the Crowdsec Hub, so this might take a little tinkering...

12 Jan 2025

Wireguard - Securely connecting to your network

By Adam on 12 Jan 2025, 09:20

Background

After setting up my home server, I found that I wanted a way to securely access and manage my system when I was away from home. I could have opened the SSH port to the internet, but the idea of having such a well-known port exposed to the internet made me wary not only of attempts to gain unauthorized access, but even just simple DOS attacks. I decided the far better option was to use a VPN, specifically, Wireguard.

Wireguard is a simple, lightweight VPN, with clients for Linux, Android, Windows, and a handful of other operating systems. In addition to the security offered by its encrypted VPN tunnel, Wireguard is also a "quiet" service; that is, the application will entirely ignore all attempts to communicate with it unless the correct encryption key is used. This means that common intelligence gathering tactics, such as port scans, won't reveal the existence of the VPN. From the perspective of an attacker, the port being used by Wireguard appears to be closed. This can help prevent attacks, including DOS, because an attacker simply won't know that there's anything there to attack.

Now, when I set about trying to install and configure Wireguard, I found that many of the online guides were either overly complex, or so incredibly simple that they only provided a list of commands to run without ever explaining what those commands did. The overly complex explanation turned out to be too confusing for me, but on the other hand, I'm also not someone to run random commands on my system without understanding what they do. I did eventually figure it out, though, so I thought I'd try writing my own explanation to hopefully offer others a middle ground.

Introduction

Wireguard uses public key encryption to create a secure tunnel between two "peer" machines at the network layer. It's free, open source software, and is designed to be simple to use while still providing a minimum attack surface.

In essence, wireguard creates a new ipv4 network over an existing network. On a device running wireguard, the wireguard network simply appears to be an additional network interface with a new IP address. On a typical laptop, for example, one might have several network interfaces, like lo, eth0 and wlan0, with an additional interface for wireguard appearing as wg0.

Installation

Installing Wireguard is typically done through your package manager, i.e. apt, dnf, pacman, etc, or optionally installed via a container package, via podman, flatpack, snap, appimage, or docker.

For example, apt install wireguard-tools. Depending on your system setup, you may need to preface this and some of the other commands in this walkthrough with the word "sudo".

Configuration

Once installed, wireguard will need to be configured. Wireguard considers devices to be peers, rather than using a client/server relationship. That means that there is no "primary" device that others connect to, rather, they can all connect to each other, just like a typical LAN. That said, for the sake of explanation, it is sometimes easier understand if we use the server/client labels, despite the devices not technically functioning in that relationship. In other words, I'm going to say "server" and "client" because I find it less confusing than saying "peer A" and "peer B".

We'll start on the "server" (aka. Peer A, not technically a server, yada yada) by running wg genkey | tee private.key | wg pubkey > public.key.
wg genkey produces a random "Private key", which will be used by the peer to decrypt the packets it receives. tee is a basic linux command that sends the output of one command to two destinations (like a T shaped pipe). In this case, it is writing the private key to a file called "private.key" and also sending the same data to the next command wg pubkey. This command creates a "public key" based on the "private key" it was given. Lastly, > is writing the output of wg pubkey to the file public.key. After running that command, we now have two files: private.key and public.key.

We can view the actual key by running cat private.key or cat public.key.

We'll repeat those same steps on the "client".

Next, let's decide what IP network address we want to use for the wireguard network. Remember that this needs to be different from your normal local network. There are several different network ranges available, but the most common is 192.168.1.0/24. Those first three "octets", 192, 168, and 1 define the network in this IP address, while the last one defines the device's address within that network. So, what's important here is that if your primary network is 192.168.1.0/24, that we assign the wireguard network a different address, like 192.168.2.0/24. By changing that 1 to a 2, we're addressing an entirely different network.

Now, we'll create the configuration files themselves. On the server, we'll create a configuration file by using a text editor, such as nano. nano /etc/wireguard/wg0.conf will create a file called wg0.conf and open it for editing. The name of this configuration will be the name of the network interface. It doesn't need to be wg0, but that's a simple, easy to remember option.

Here is an example configuration file for the server:

[Interface]
PrivateKey = (Your server's private key here)
Address = 192.168.2.10/24  
ListenPort = 51820

[Peer]
PublicKey = (Client's public key here)
AllowedIPs = 192.168.2.0/24
Endpoint = (see below)

PrivateKey is one that we generated on this server.

Address is the IP address that we're assigning to this server on the wireguard interface. This is our wireguard network address.

ListenPort is the "port" that wireguard is going to "listen" to. If you're not familiar with ports, think of it this way: an IP address is used to address packets to a specific device, and a port number is how you address those packets to a specific application on that device. In this case, Wireguard will be "listening" for packets that come in addressed to port number 51820. That number is somewhat arbitrary, but for uniformity, certain applications tend to use certain port numbers and 51820 is typical for wireguard.

The [Peer] section defines the other system/s on the network, in this case, our "client" system.

PublicKey is the public key created on the client device.

AllowedIPs tells wireguard which IP address it should forward to this peer. Here, we're telling it that any packets addressed to the 192.168.2.0/24 network should be forwarded out to this peer (our client system).

Endpoint is how we reach this peer. Because wireguard creates a network on top of an existing network, we need to tell wireguard how to get to the peer on the existing network before it can use the wireguard network. If you're connecting two devices on your LAN, we would just enter the normal IP address of the peer, such as 192.168.1.123. If you're connecting to another device over the internet, you will need to have a way to address packets to that device, such as a public IP address or a domain name. This will be specific to your network situation. Following the IP or domain name is a colon and the port number. For this example we'll assume the other machine is on your LAN at address 192.168.1.123 so we can just enter the IP address and the port number, like so: 192.168.1.123:51820. Lastly, save the file.

We'll then reverse this process on the 'client', assigning it a different IP address on the same network, such as 192.168.2.10/24 and using the clients PrivateKey and the server's Public Key. Once we've completed the configuration files on both device, we can use the command wg-quick up wg0, telling wireguard to start (bring up) the wg0 interface we just configured.

Final Notes

It should be noted that the "Endpoint" setting is only required on one of the two peers. This can be useful if you want to use wireguard while you're away from your home. My "server" at home may keep the same public IP all the time, but my portable laptop will have a different public IP address every time I move to a new network. In this case, I would remove the "endpoint" setting entirely from the configuration file on the server, and only use that setting on the laptop. Remember, the one system needs to be able to connect to the other system over an existing network before wireguard can create the VPN connection between them. Most of the issues a user might encounter while trying to set up wireguard isn't actually related to wireguard itself, but rather the underlying network. Firewalls and Network Address Translation (NAT) are the most common causes of problems, but the steps to address these issues will vary significantly depending on your network situation.

9 Dec 2024

My Proxmox Disaster Recovery Test (of the unplanned variety)

By Adam on 9 Dec 2024, 17:53

Welp, I lost another drive.

Homelab disasters are inevitable; expected even. Recently, a failed disk resulted in losing an entire node in my Proxmox cluster. While the initial shock was significant, a solid backup/restore plan and a High Availability (HA) setup ensured a pretty swift recovery.

When I first started playing with my homelab, I was primarily using RPis and I lost a couple SD cards to corruption or failure. This helped to demonstrate to me the importance of regular backups, and I’ve made an effort to back up my homelab systems ever since. I now virtualize most of my servers via Proxmox and perform nightly backups to a local Proxmox Backup Server (PBS), which is then synchronized to an offsite PBS server. I’ve tested the restore function a few times and it seemed like a fairly straight-forward process.

For background, my current Proxmox cluster is comprised of three Lenovo Tiny SFF PCs. Two of those PCs are currently running only a single internal storage device which is used for both the boot and host partitions as well as storage for all of the guest OSes. This means that if a disk fails, it takes out the entire node and everything running on it.

...Which is exactly what happened a couple weeks ago. Booting into BIOS showed that the drive failed so hard that the system didn’t even acknowledge there was a drive installed at all. The drive, by the way, was a Critical branded NVMe that I had purchased only two months prior. That’s just enough time to be outside of Amazon’s return period, yet significantly short of any reasonable life expectancy… but I digress. With the failing of the drive, I lost that node’s Proxmox host OS and all of the VMs and containers running on that node. For the HA enabled guests, they were automatically migrated to one of the remaining nodes, exactly as intended (yay!). For the non-HA guests, I had to manually restore them from PBS backup. I was quite pleased with how quick and easy it was to restore a guest from PBS. Everything could be done through the web gui in just a couple clicks. It’s obviously never fun to lose a disk, but PBS made the recovery pretty painless overall.

With the guests back up and running again, I removed the failed node from the cluster and purged any its remaining config files from /etc/pve/nodes.

For the failed node itself, I had to replace the drive and then reinstall Proxmox from scratch. From there, I pointed apt to my aptcacher-ng server and then ran a quick Post Install Script, before configuring my network devices and finally adding the “new” node back into the cluster. The whole process took only a couple hours (including troubleshooting and physically installing the new drive), and most of the hosted systems (such as this blog) were only offline for a handful of minutes, thanks to the High Availability set-up.

Needless to say, I was quite happy with my PBS experience.
...And not so much my experience with Critical’s NVMe drives.

23 Nov 2024

Caching Apt with Apt-Cacher NG

By Adam on 23 Nov 2024, 19:44

It recently occurred to me that as I update each Linux container or VM, I'm downloading a lot of the same files over and over again. While the downloads aren't huge, it still seems wasteful to request the same files from the repo mirrors so many times... So why not just download the update once and then distribute it locally to each of my systems?

That's the purpose of a caching proxy.

I chose apt-cacher ng as it's very simple to setup and use, so I spun up a dedicated LXC and installed apt-cacher ng via apt. Once it was up and running, it was just a matter of following the included documentation to point all of my other systems to that cache.

After upgrading just a couple of systems, I can already see the cache doing it's job:

Those "hits" are requests that were able to be fulfilled locally from the cache instead of needing to download the files from the repo again. Since this is caching every request, it actually becomes more efficient the more that it's used, so hopefully the efficiency will increase even more over time.

So what exactly is happening?

First, this is not a full mirror of the Debian repos. Rather, apt-cacher ng acts as a proxy and cache. When a local client system wants to perform an update, it requests the updated packages from apt-cacher instead of the Debian repo directly. If the updated package is available in apt-cacher's local cache already, it simply provides the package to the requesting client. If the package is not in the local cache, then the proxy requests the package from the repo, provides that package to the client, and then saves a copy of the package to the cache. Now it has a local copy in case another system requests the same package again.

Some packages, like Crowdsec, are only installed on a single machine on my network, so the cache won't provide a benefit there. However, since most of my systems are all running Debian, even through they may be running some different services, they will still all request a lot of the same packages as each other every time they update, like openssh or Python. These will only have to be downloaded the very first time they're requested, and all of the subsequent requests can be filled from the proxy's local cache.

Do you use a cache in your homelab? Let me know below!

1 Nov 2024

New Router: BananaPi R3 - Part 3 - Configuration

By Adam on 1 Nov 2024, 07:43

After being subjected to numerous mean glares from the wife and accusations of "breaking the internet", I think I've got it all configured now...

Ironically, the more "exotic" configuration, like the multiple VPN and VLAN interfaces were pretty simple to set up and worked without much fuss. The part that had me pulling my hair and banging my head against the desk was just trying to get the wifi working on 2.4GHz and 5GHz at the same time... Something wireless routers have been doing flawlessly for over a decade. After enough troubleshooting, googling, and experimenting, though, I had a working router.

I installed AdGuardHome for dns and ad blocking, but kept dnsmasq for DHCP and local rDNS/PRT requests. dsnmasq's DHCP options directs all clients to AGH, and AGH fowards any PTR requests it recessives to dsnmasq.

Next, I installed a Crowdsec bouncer and linked it to my Local Crowdsec Engine. Now, when a scenario is triggered, instead of banning the offending address at the server, it will be blocked at the edge router level instead.

Lastly I installed and configured SQM (Smart Queue Management), which controls the flow of traffic through the router to the WAN interface. Without this management, the WAN interface buffer can get "bogged down" with heavy traffic loads and cause other devices to experience high latency or even lose their connection entirely. SQM performs automatic network scheduling, active queue management, traffic shaping, rate limiting, and QoS prioritization.

For a comparison, I used waveform to test latency under load.

Before SQM:

====== RESULTS SUMMARY ====== 	
Bufferbloat Grade	C
	
====== RESULTS SUMMARY ====== 	
Mean Unloaded Latency (ms)	51.34
Increase In Mean Latency During Download Test (ms)	76.01
Increase In Mean During Upload Test (ms)	8.69

After SQM:

====== RESULTS SUMMARY ====== 	
Bufferbloat Grade	A
	
====== RESULTS SUMMARY ====== 	
Mean Unloaded Latency (ms)	38.92
Increase In Mean Latency During Download Test (ms)	12.75
Increase In Mean During Upload Test (ms)	1.5

I have to say, I'm pretty happy with the results!
Going from a grade C with a 76 ms increase to a grade A with only a 12.75ms is a pretty substantial difference. This does increase the load on the CPU, but with the BPI R3s quad core processor, I expect that I'll still have plenty of overhead.

Overall, I think I'm happy with the configuration and the BPI R3 itself.

27 Oct 2024

Troubleshooting a minor network issue.

By Adam on 27 Oct 2024, 10:13

So, while surfing the web and reading up on how I wanted to configure my new BananaPi R3, I encountered a sudden issue with my network connection:

It seemed like I could reach the internet, but I suddenly lost access to my home network services. I tried to SSH into my webserver and was met with a "Permission denied" error. Since I had earlier attached an additional USB NIC to connect to the BPI, I thought that perhaps the laptop had gotten the interfaces confused and was no longer tagging packets with the correct VLAN ID for my home network. Most of my servers are configured to refuse any requests that don't come from the management VLAN, so this explanation made sense. After poking around in the network settings, all of the VLAN settings appeared correct, but I did notice that the link was negotiated at 100 mbs, instead of the usual 1000. I tired to reconfigure my network settings, manually setting the link to 1000 mbs, resetting interfaces, changing network priorities, etc. I then tired the classic "reboot and pray" technique, only to find that my wired network connect was down entirely. I wasn't receiving an IP from the DHCP server and the laptop kept reporting that the interface was UP, then DOWN, then UP again, then DOWN again.

Now I started to think that perhaps the issue was hardware related. My usual NIC is built into the laptop's dock, so I thought I would try power cycling the dock itself. This didn't seem to have any effect, besides screwing up my monitors placement and orientation. My next thought was that there might be an issue with the patch cable. "Fast Ethernet" (100mbs) can theoretically function on a damaged cable, so that might explain the lower link speed, and if the damage is causing an intermittent issue, that could also explain the up/down/up/down behavior.

Being the smart homelabber that I am, I disconnected both sides of the cable and connected my ethernet cable tester. All 8 wires showed proper continuity, though, suggesting that the cable was fine. When I plugged the cable back in, however, I noticed that I was suddenly getting the normal 1gbe again, but the link was still going up and down. This lead me to the conclusion that the cable likely was the issue, despite it passing the cable test. I tried replacing the cable entirely with a different one, and found that I now had a stable, 1gbe connection, an IP addresses, and I could now access my network like usual.

Looking back, I think replacing the cable should have been troubleshooting step 1 or 2.

Also, in retrospect, there were some clues that might have let me fix the issue before this point, if I had only put the pieces together. I had noticed another day that the link speed had dropped to 100mbs, but it seemed to correct itself, so I ignored it instead of investigating. While working, I found that Zoom and other applications had started to report my internet connection as "unstable" and that a "speedtest" showed that my internet bandwidth to be drastically lower than it used to be. I assumed this was due to my ISP just being unreliable, since reduced speeds and entire outages are not unusual where I live.

In hindsight, I think these were all indications that there was a level 1 issue in my network. In the future, I'll have to remember to not over-think things, and maybe just try the simplest solutions first.

New Router: BananaPi R3 - Part 2 - Flashing

By Adam on 27 Oct 2024, 07:38

Part 1 is here.

Now that the router is assembled, the next step is to decide where to flash the firmware. As I mentioned in the last post, this device offers a handful of options. Firmware can be flashed to the NOR, NAND, eMMC, or simply run from the SD card. From what I've read, it's not possible to boot from an m.2 card, though. That option is only for mass storage.

After a bit of reading, my decision was ultimately to install to all four! Sort of...

Image showing the leads connected the UART connector and the DIP switches
^{The DIP switches and the leads connecting to the UART}

My plan is to install a "clean" OpenWRT image to both the NOR and NAND. The NAND image will be fully configured into a working image, and then copied to the eMMC. The eMMC will then be the primary boot device going forward. If there's a problem with the primary image in the future, I would then have a cascading series of recovery images available. At the flip of a switch, I can revert to the known working image in the NAND, and if that fails, then I can fallback to the perfectly clean image in the NOR.

..And I do mean "at the flip of a switch". Due to the way that the bootable storage options are connected, only 2 of the 4 can be accessed at a time. Switching between NOR/NAND and SD/eMMC requires powering off the BPI and toggling a series of 4 dip switches, as seen on the official graphic below:

Switches A and B determine which device the BPI will attempt to boot from. Switch C sets whether the NOR or NAND will be connected and switch D does the same for the SD and eMMC. To copy a image from the SD card to the NOR, for example, switch D must be high (1) to access the SD card, and switch C must be low (0) to access the NOR. Since switches A and B set the boot device independent of which devices are actually connected, it would seem that you could set them to an impossible state to render the device unbootable, like 1110, or 1001.

To accomplish my desired install state, I had to first write the OpenWRT image to the SD card on a PC, and then insert it into the RPI. With the the switches to 1101, I could write the image from the SD card to the NOR, then flip the switches (with the BPI powered off) to 1111 to copy the image to the NAND. Lastly, I can remove the SD card and reboot with the switches in 1010 to boot from the NAND. Then I'll configure the BPI into my fully configured state. This is the step I'm currently working on. I have it about 80% configured, but will need to actually install it in my network before I can complete and test the remaining 20%. Once it is installed, tested, and fully configured, I'll copy the NAND to eMMC, before finally setting the switches to 0110 and booting from the eMMC for ongoing use.

Unfortunately, I haven't had a good opportunity to take my network offline to install the new router, so the last bit of configuration might need to wait a little while...

22 Oct 2024

New Router: BananaPi R3 - Part 1 - Hardware

By Adam on 22 Oct 2024, 19:04

I've been using a consumer router from from 2016 (with OpenWRT hacked onto it) all the way here in 2024, and felt that it might finally be time for an upgrade. I settled on a BananaPi R3 because it was a reasonable price and seemed like it would be a fun project.

Here's the bare board as received:

You can see most of the physical features in this photo, including a USB3 port, two SFP ports, 5 RJ45 ports, an m.2 slot for a cellular modem, and a 26-pin GPIO header. On the bottom, there's also a m.2 slot intended for nvme storage, as well as slots for a micro SD and a micro SIM. The CPU is a quad-core ARM chip paired with 2gb of RAM, and there's a handful of flash chips, providing NAND, NOR and eMMC. Quite a lot of options!

My plan is to install OpenWRT to the NAND storage. I suspect the nvme might be useful if I wanted to run a small file server or something, but that's not in the plan for now.

The first step I took in assembly was to apply some thermal pads to the chips and then attach a cooler and fan.

The thermal pads are "OwlTree" brand, but I don't have any specific preference to them, I just happen to already have them on-hand from a previous project. The CPU is a 0.5mm pad applied, and I applied 1.5mm pads to the remaining chips.

Thermal pad applied to CPU

After applying pads to all of the chips, I attached the cooler and plugged in the fan.

The next step was to install the board into the case. I went with the official BPI-R3 case. The quality is surprisingly nice and looks great once assembled. After installing the board I then installed the pigtails for the eight (yes, eight) antennas and applied some basic cable management.

Board installed into case and coax attached and routed to antenna ports.

Now, I can't finish putting the case together quite yet, since I'll need access to the UART pins to install Openwrt to the NAND flash. The UART header can be seen on the right side of this photo, but there is no way to access it once the case is assembled.

But, that's enough for today. I'll post an update once I make some progress towards getting OpenWRT flashed.

Homelab

Advantages

DoS Protection

WAF

Bypassing Firewalls (tunnel)

Bypassing Port-Fowarding rules (tunnel)

On the surface, this all sounds like a great product; it allows smaller customers to selfhost even if in a sub-optimal environment.

Considerations

DoS Protection

Security

Content Limitations

Alternatives

Closing thoughts

New Keyboard

"New" SSDs for the cluster

This blog is also available via gopher! And here's how:

Gophernicus

Mirroring

install

config

testing

apparmor

nginx

access

next steps

Background

Introduction

Installation

Configuration

Final Notes

Search

About

Categories

Tags

Latest entries

Subscribe (RSS)

Links