MidWay Documentation
Prev Top Next

Scalability through funneling

MidWay provide the ability to have many more attached clients than there are servers, thus the number of users and sessions on the third tier (the database usually). We keep the use of resources (RAM most importantly) on the database server down to the absolute minimum.

RAM aside, there are limits to how many processes and sockets an OS can handle effectively. The number of processes are discussed in the next section. Here we discuss a bit the problem with TCP sessions and sockets.

Upper limits of number of sockets/connections.

Remember that when an operating system receive a TCP or UDP packet, it must find the proper queue to put the payload on. The proper queue is identified with the following key:

remote ip adr + remote port + local port + local ip adr

In IPv4 the key is 4 + 2 + 2 + 4 = 12 bytes , and will in IPv6 increase to 16 + 2 + 2 + 16 = 36 bytes. On a 32 bit machine it requires 10 load and 10 compare instruction for every socket. OS's do a sequential search through the socket table in the kernel, at least Linux do. If we have a 1 000 000 connections it is on average 500000 key tests which is 10 000 000 instructions. A top of the line CPU today can to 1 000 000 000 instructions a second, and we have burnt off all available CPU resource with 100 packets a second. Most servers today still have only a megabyte or two in L2 cache, and we need 36 MBytes just to hold the socket table. Needless to say, you get a storm of cache misses, and you would get a 10 fold degradation of performance.

The important thing to remember is that the work needed to process all packets coming into a machine per second is

W = p * n

where p is the number of packet and n is the number of sockets. If there is one user per socket then we have a relationship between p and n. p = n*r, where r is the rate of packets each user generate. This leave us with

W = n^2*r or O(n^2)

It is the square here that gives us a particular problem, and why the limit of max sockets don't increase all that much over time. Given a 1Gips CPU above, and we say that we can't use more than 1% of CPU in processing packets, we have 20 instruction per packet, and we on average have to test 1/2 of the sockets.

1000 000 000 / 100 = 20 * 1/2 * n*2

n = 1000

A hard limit of 1000 sockets, or with 1:1 with sockets and users, 1000 users. The Web (http) remedies this by connecting and disconnecting for every request. There is one problem that remain, latency on the net is often quite high. Try some ping around locally and on other continents. Round trip over my ISDN line to my ISP gives a round trip time of 30ms. To the US I get usually a round trip of 200ms. The thing is that due to the TCP three-way handshake on connect and the exchange of the disconnect packages, the socket must exist for two round trips. Considering that a well designed MidWay service should complete in 10ms, you have a problem. (NB: Suddenly the speed on the users access line to the Internet becomes your problem!) Fortunately, Linux at least only test for connections in ESTABLISHED state, effectively divides the problem by 3 (or at least 2), but /3 still doesn't solve the O(n^2) problem, nor the RAM requirement for holding the socket table.

One remedy to this problem is Transactional TCP, T/TCP RFC 1644. T/TCP bypasses the three-way handshake, and the TIME_WAIT state. T/TCP has some security issues, I picked up a draft RFC on the issue. T/TCP doesn't either solve the O(n^2) problem, but funneling does.

The RAM problem get quite a bit worse. A socket needs queues for storing data being send and received. On Unix these are two 16 kBytes buffers, or 32 kB for every socket. With 10000 sockets you need 320 MB of buffer space, and we haven't even done any work outside the kernel yet. Now you can start to consider kernel tables for open files, process tables, the amount to work is it to just schedule 10000 processes, etc.

In the literature, this is the main argument for using a SRB. A machine can handle 500 sockets, 1000 processes, and 1000 Mbytes of VM. It will go into thrashing with 10000 sockets, 20000 processes, and 20 GB of VM. When using an SRB, the optimal numbers are in the area of 50 sockets, 100 processes and 100 Mbytes of RAM. More importantly: you get to use the RAM for what it should be used for, DB cache!

If you look at Apache it has in its config min servers, and max servers. Apache will fork n copies of itself to handle all the incoming requests, with min < n < max. If Apache gets more requests than max, they are queued in the TCP/IP stack, until a server is ready to process. MidWay is just more flexible, and more general. Apache don't solve the forking problem.


Prev Top Next
© 2000 Terje Eggestad