C10B

After C10k (10 000 concurrency connections), C10M (10 000 000 concurrency connections), now C10B is near (10 Billions concurrent connections, 10Tbps, 10 billions packets per seconds, 1 billion connections/seconds). In my case: 1 Billion concurrent connections, 1Tbps, 1 billions packets per seconds, 1 millions connections/seconds)

I have test it with CatchChallenger, with 32 core Threadripper 2990WX , 256GB DDR4, Radeon RX Vega 64.

This experiment can be used for high speed network packet processing for ethernet 100G+ or infinityband device. I have do as R&D for my company router with 1Tbps (at 64B packet size) routing capability certified with benchmark.

I have do stateless filter on IPv6 input, each processing unit after do the state-full work, but with an special dispatching memory access to reduce the cache miss. Mean: IP/TCP processing GPU, CatchChallenger processing CPU.

Difficulty: Very limited into memory, I had used specialized swap technique with barer to delay some class of trafic (move on map: 95% of move, + not published protocol with another vector move to prevent memory access and only parse last pos with need to use previous data), bulk processing of this.

Result:
59ms average reply time
95% 342ms reply time
Burn 1.2Tbps of network bandwidth

Ref: https://link.springer.com/chapter/10.1007/978-3-642-24999-0_44

Ref2: https://shader.kaist.edu/packetshader/