[ $davids.sh ] — david shekunts blog

😢 See the fluffy one, prepare your butt 😢

# [ $davids.sh ] · message #214

😢 See the fluffy one, prepare your butt 😢

Rabbit, well screw you, what are you doing

#pain #rabbitmq #mq

  • @ [ $davids.sh ] · # 1121

    Are you fucking kidding me: imagine, a queue with auto-delete, just like 4000 others on the same RMQ, with 1 to 10 consumers.

    But when there are exactly 4 consumers on it, everything is fine, add another 5th one and it's instantly deleted...

    According to all logs and metrics of all services, hardware, and nodes, everything is absolutely fine.

    How did we fix it? We restarted the RMQ nodes and it started working...

    What the hell is this...

    On what principle can it delete a competing queue until you restart RMQ itself?

    This is not a code error, because (except that we interact with it in the most straightforward way) it would then be repeatable even after a restart.

    This is an error in "battle-tested" technology, which is supposed to guarantee its main duty - to deliver data from publisher to consumer via declared queues.

    Do you know what they write about such situations in the community? Switch to quorum queues.

    And do you know what quorum queues are? They are queues with clustered persistent storage (Kafka, without a log).

    Why will this work? Because then the responsibility for the queue is distributed across the cluster, and there's a chance that one node won't allow another to harm the queue.

    But, firstly, the whole point of choosing RMQ was precisely that we don't need persistence in these queues, we were interested in speed with guaranteed order, and secondly, why should clustering solve errors in the core mechanics of the system itself, preventing it from self-deleting your queue "just because"?

    In fact, they suggest creating a latent version of Kafka just to increase the chance that RMQ won't screw you over with another internal bug...

    P.S.

    When I wrote all this, I realized what frustrates me the most and hammers the final nail in RMQ's coffin:

    After deleting the queue, this bitch not only DID NOT SEND AN ERROR, but this scumbag CONTINUED TO HOLD OPEN CHANNELS to this queue.

    We were able to debug and saw that it wasn't the application holding the channels, but RMQ itself.

    And it didn't inform us of anything at all, no error, no information about the queue's death.

    There is no excuse for this.

    P.P.S.

    I'm still so shocked that I hope we messed up somewhere and there's an explanation for all of this, because how else can you use RMQ if it can do such things (and many people use it).

    If the situation repeats and we get more information, I'll write.

  • @ Vassiliy ITK Kuzenkov · # 1122

    I've successfully avoided the rabbit my whole life.

  • @ Arsen IT-K Arakelyan · # 1123

    And what did you use Kafka for? Or did you avoid microservices?

  • @ Vassiliy ITK Kuzenkov · # 1124

    Yeah, I never really got into the microservices craze myself.

    Queues are only for workers, background jobs, and all sorts of processing. Usually Redis + Bull, and I've tried Amazon's SQS on one project.

  • @ Arsen IT-K Arakelyan · # 1125

    Could you please explain BG jobs?

  • @ Arsen IT-K Arakelyan · # 1126

    Was it never interesting or not really necessary?

    I watched Kirill, who was talking about how microservices are primarily an organizational solution forced by a large number of teams and services that are easier to separate simply to monitor them in different repos than a single monolith.

    Although from an infrastructure perspective, it's much more complex.

  • @ Vassiliy ITK Kuzenkov · # 1127

    There was never a need. Since I organize processes within the company and on projects, I always strive to ensure that everything can be supported by a small number of people with the right focus.

    Infrastructure-wise and technically, it's definitely more complex. Definitely more expensive and definitely more labor-intensive.

    It's like having a bunch of services on other technologies – because it's more convenient/optimal that way. Here, yes, the services are separate. We even communicated via SQS. )

  • @ Arsen IT-K Arakelyan · # 1128

    And how have you tried serverless on projects?

  • @ Vassiliy ITK Kuzenkov · # 1129

    Legacy solutions came. It was so-so). In the serverless framework (which is the main one) - everything is mixed up: infrastructure configuration with CloudFormation, Lambdas, and plugins. Then you assemble these YAMLs. Which, of course, adds a strange flavor 💩 when you write on it. ) Although overall, everything there is not bad. But rather an acquired taste. And if you consider the final costs, it turns out quite cheap for startups that are testing a hypothesis and are more likely to fail. And somehow not very cheap if they don't. Again, there are specific cool use cases for Lambdas, for example, when you need to execute something within AWS and trigger it via a webhook (more complex than a bunch of CLI commands).

  • @ Arsen IT-K Arakelyan · # 1130

    Yes, I agree, or if you need a bunch of functions for generating documents/reports that would be very expensive if done on EC2 with round-the-clock payments, but a couple of Lambdas would be super cheap and fast enough, considering it's an internal corporate thing)

    Corporate users can wait 🤣

  • @ Vassiliy ITK Kuzenkov · # 1131

    We usually deploy ECS with Fargate. It's also very cheap. But Lambda has its niche too, I agree.

  • @ [ $davids.sh ] · # 1132

    Happy person…

    In the past, I used it most often for processing all messages, so quorum made sense, so everything worked, but suboptimally, so eventually we switched to Kafka.

    Now the essence was more about a pub/sub queue.

    And RMQ is exactly that, but damn…

    By the way, further debugging shows that RMQ simply can't handle proper fail-over when the network is interrupted.

  • @ CENTURIONO · # 1222

    There's another reason: vertical scaling is more expensive than horizontal scaling.

  • @ [ $davids.sh ] · # 1223

    Really? It seems to me that vertical scaling is often cheaper than getting a new car (because there's always a fixed cost per unit).

    It has clear limits and is much more expensive the moment you hit those limits, that's true.

  • @ Artur G · # 1224

    This is if the load is the same. But what if there are peaks and dips?

  • @ [ $davids.sh ] · # 1225

    Yes, I completely agree here.

  • @ CENTURIONO · # 1226

    Yes, that's right. The case is only about the ceiling, and it differs, but it's always present.