🧠 Bugnetka v0.1 🧠
I want to play "danetka" with you about the discovered bug
UPD: solved by @gennadiixd
#bugnetka #pain
# [ $davids.sh ] · message #215
🧠 Bugnetka v0.1 🧠
I want to play "danetka" with you about the discovered bug
UPD: solved by @gennadiixd
#bugnetka #pain
@ [ $davids.sh ] · # 1136
Rules like in the game "Yes/No": I give you the premise, and you have to guess what happened. During the process, you can ask me questions to which I can answer "yes", "no", "irrelevant", "golang".
Let's go:
There is a service in 3 replicas, serving the main HTTP API, it is part of a modular monolith (meaning it has other applications under a common name).
After another release, it returns a 404 code on the main endpoint (we have one because it's RPC), with a 90% probability on the most loaded geo-contour, in 70% of cases on a less loaded one, and in 30% of cases on development.
There are no errors, metrics are normal, and its healthcheck, as well as other services', are also fine.
What happened?
@ Vova hardvair smartvend 🛍️💻 · # 1137
Is the 404 being forwarded from another request?
@ [ $davids.sh ] · # 1138
I don't understand
@ [ $davids.sh ] · # 1139
You were in the thread where we discussed this, you have an advantage, so don't tell)))
@ Vova hardvair smartvend 🛍️💻 · # 1140
Forwarding errors from internal requests (API, Redis, etc.) to external requests
@ [ $davids.sh ] · # 1141
No
@ Vova hardvair smartvend 🛍️💻 · # 1142
Well, I haven't read it, so there's nothing. And besides, I dunno)
@ [ $davids.sh ] · # 1143
You were the one who first reported the bug)))
@ Vova hardvair smartvend 🛍️💻 · # 1144
So, does it work in the end? Or are we waiting again?
@ [ $davids.sh ] · # 1145
Works
@ Vova hardvair smartvend 🛍️💻 · # 1146
I filed a report and went to sleep
@ [ $davids.sh ] · # 1147
It's like: "cool guys don't look at explosions"
@ Vova hardvair smartvend 🛍️💻 · # 1148
I was poking around for 4 hours on and off, and the report was the last thing I did.
@ Gennadii IT-K Khotovytskyi · # 1149
Well, if I understand correctly, you have an API gateway, behind it a load balancer that distributes requests to replicas. In this case, first of all, I would check the load balancer. It's possible that it's mistakenly redirecting some requests to a replica of a neighboring service and consequently returning a 404. Then, the higher the load on the correct replicas, the more often the balancer will encounter an erroneous one when searching for the least loaded.
@ [ $davids.sh ] · # 1150
And that's the right answer!) I can take you with me for debugging))