Your knock on the door analogy is exactly right–when I started my instance, I had to search every community that I wanted to see directly by URL. Then my server would send a message to that community’s server saying that I subscribed to that community. Now, every time a post is made at that community, it’s server sends my server an update. If I post a comment to a community on lemmy.ca (like I am now), from my kbin instance (remy.city), and you are reading it from kbin.social, that means my server first saved my comment locally, then sent it to lemmy.ca, and lemmy.ca sent it to your kbin.social because you subscribed to the community. So in that case, lemmy.ca is the ‘authority’, and is responsible for sending updates out to subscribed parties.
There is no such thing for instances–each new instance has to manually make a connection to another (i.e. a user on the new instance must subscribe to something from another instance). I think the tools like fediverse.observer are reading comments or other activity from popular instances, and are then compiling a list of the instances they find by doing that. But there is no central server/authority that makes communication between instances possible. Each instance has to talk to each other instance for it to happen. It’s a bit inefficient but is necessary for decentralized communication.
You can run your own instance and not allow anyone else to sign up, though I do agree the effort it requires if it’s just for a single person is a lot. Spread out amongst friends, or other folks who don’t mind chipping in, makes it seem a bit more sensible. But there is always the option to turn off registrations, and on Lemmy at least you can make registrations require approval.
The only other way your instance could incur more running costs than you’d like is if you have a community on your instance that gets very popular, and folks from all instances start posting to it (think stuff on Beehaw, Lemmy.world, etc.) Then your server needs to be the man in the middle, facilitating communication between users of other communities. But you always have the option of not allowing communities to be created, or stopping federation altogether if it gets to be too much. There really isn’t a way it would suddenly cost you more money than you thought, unless you aren’t monitoring it enough (which isn’t much more than setting up notification emails for storage use, system crashes, etc).
Running your own instance is the only way to really be sure that the costs are being covered on the up-and-up. Otherwise you’re just taking folks at their word. Your data, in the end, could always be sold to anyone. It is publically available through the ActivityPub protocol, after all. But that also means there’s really no need to pay for it, so no one would buy it.