[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Observation from tonight's Bar BOF on IPsec Failover



Tero Kivinen <kivinen@xxxxxx> writes:
> Narayanan, Vidya writes:
>> > 4)      Keeping the "internal" IP address so that open 
>> > tunneled TCP connections can stay open and applications that 
>> > cache the client's IP address don't need to be restarted.
>> As I understand it, all the above 4 goals are supported by the current
>> solution draft - it is just that keeping the same internal address is
>> not mandated.  But, there should be no reason the IP address cannot be
>> maintained if the new gateway can actually support that.  Do you see an
>> issue with it? 
> Supporting the "keeping the internal IP address" is not something that
> can really be done by the gateway. It requires support from the rest
> of the back-end network. I.e. if the client has received 10.1.44.25 from
> the SGW1 (I will assume the classic VPN case as it is one of the
> easiest ones). Inside the back-end network the routing system is set
> up so that 10.1.44.0/24 is routed to SGW1, so it can forward packets
> with those IP-addresses to the clients. Client then has secsh
> connection from 10.1.44.25:2323 <-> 10.0.0.1:22 which it is using to
> shell access to the development complication unix box (or actually any
> other long last TCP connection can be ued here). 
>
> When client moves to SGW2 having different network allocated for the
> clients, i.e. 10.1.45.0/24, and the SGW2 gives new IP-address
> 10.1.45.94 to the client, then the previous TCP connection is broken.
>
> There is few ways to fix it, but I do not think the current draft
> lists any of them.

Some sort of description of potential solutions might be nice, admittedly,
but these do indeed seem like out of scope of the actual needed
work.. because there really isn't any as such.

> 1) Use "routing magic" to get packets going to 10.1.44.0/24 come to
>    SGW2 instead SGW1. This could include routing protocols, tunnels to
>    router near the SGW1 etc. The problem is that the failure dropping
>    SGW1 out from the net might take out more machines than just it
>    (power failure in the data center, network problem affecting
>    back-end network too etc).

I think it's not really magic as such - your average IGP protocol,
correctly configured, could do this, if the SGW knows what to do with the
packets (and that it knows that it should be the one and only injecting the
route to the internal network). As a matter of fact, given correct
priorities, you should be able to announce both routes constantly and just
have the primary one dropping cause redirection of the traffic. ('exploding
datacenter case')

Unfortunately, your normal routing magic reaction time is (even in
optimistic case) roughly tens of seconds, but with BFD keepalive between
them the primary and the secondary the reaction could be almost instant. It
gets tricky when you have a large set of gateways/prefixes though.

Simple 'primary'/'secondary' per prefix (whether geographically or network
topologically same location or not) isn't a problem in my book, and it
should cover the majority of the use cases.

Cheers,

-Markus