Grpc server scaling (bidirectional infinite streaming)

11/13/2017

We are building a grpc service in python which has a bidirectional streaming endpoint and also a unary endpoint.

we want the stream to live forever so we have no timeouts and streams are working as expected, we are using kubernates and docker for deployment.

But we are facing issues with scaling the service, how to scale an infinite streaming grpc server, we can't scale based on a number of requests but there is only one request made and data is sent as frames.

how can we scale this service? right now in worker thread pool has max threads as 100.

one quick solution is to give max worker threads a higher number and scale based on CPU load and memory usage.

is there any better way to do it.

-- Samarendra
grpc
kubernetes
python
python-2.7
python-3.x

1 Answer

11/15/2017

Right now we don't have a good answer: the thread-per-RPC assumption was baked into gRPC Python fairly early and deeply, well before we were aware of "just keep an open connection in case either side has anything to say" long-lived RPCs being a use case.

We're working on better solutions but they'll likely be a while in coming.

Increasing the number of worker threads definitely sounds like the right answer for the time being. I'd be very curious to hear how it works out since your threads will be mostly idle most of the time (right?).

An option to maybe try that might work out well would be to design an object that implements the interface of futures.ThreadPoolExecutor but that actually does some sophisticated internal multiplexing to service a great many more RPCs. It's an idea that I've had on my mind for a while but haven't gotten around to testing out myself.

-- Nathaniel Manista At Google
Source: StackOverflow