[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Peer liveliness



Just for clarification, DPD makes no mention of sending IC. It's "out of scope" ;-).

-g

Gregory Lebovitz wrote:
Moving the discussion back to IKEv2, for the moment...

Several of us have spent a lot of time discussing this issue in the past few
weeks. A main problem we are trying to solve (though not the only one) is
rapid recovery from a rebooted peer.

If you look at the current DPD draft for IKEv1, it calls for sending
INITIAL-CONTACT whenever a peer thinks this is its first contact, i.e. has
no established SAs with the remote peer. This is done, even in the case
where DPD is running on both peers, to let the other peer -- the persisted
peer (as opposed to the rebooted peer) -- know to delete the old SAs asap.
Because the DPD timers might not catch it fast enough.

It is a very good idea to do this because sending the empty notify (DPD),
and the timer setting for how often, are totally optional. Therefore,
depending on settings, and *without* the INITIAL-CONTACT, it could be quite
some time before the persisted peer relinquishes its current SAs.


Charlie, I can't remember, is the sending of INITIAL-CONTACT a MUST in the
latest IKEv2 draft? Would it be a good idea to make INITIAL-CONTACT
notification a MUST, if it is not already? Doing so would help shorten the
tunnel black hole in most cases, regardless of dpd settings.

The next question is: what is the best behavior for a (rebooted) peer who
receives an invalid SPI? Today the mandate is to drop silently. But, if two
rules are checked first, it can be fine (i think) to respond. Those rules
are:
  - do I have an active SA with the sender of the invalid SPI? If yes, drop
silently. If no, go to next rule check...
  - do I have the source IP of the sender in my SPD? i.e. is the sender a
valid peer? If no, drop silently. If yes...
  - initiate IKE per SPD definition.

If these two rules are followed, the only threat I see to responding with
IKE initiation is that an attacker who knew all of my valid peers' IPs
could, at the moment of recovery from reboot (or power up), cause me to
establish IKE with all my peers listed in SPD, even though I might not have
otherwise made those establishmetns. Attacker would do so by sending me
invalid SPIs spoofed with source of each of my peers. I guess I see this as
a pretty tough attack to pull-off in the real world (given spoof checking
used on most ISP routers these days), and the pay-off of the attack likely
doesn't merit the difficulty of execution. Does the value merrit the risk?


Summary: IKEv2 aliveness checking doesn't ensure fast recovery. It provides
a mechanism that MAY be used for fast detection and recovery, but doesn't
guarantee it. However, combining the initiate-IKE response behavior +
INITIAL-CONTACT + liveness detection would ensure VERY fast
re-establishments for valid peers after one rebooted (and covers all other
cases too). If the liveness checking doesn't catch the failure fast enough,
the initiate IKE response w/ IC will.


Thoughts?

Gregory.


-----Original Message-----
From: Ravi [mailto:ravivsn@xxxxxxxxx]
Sent: Wednesday, May 14, 2003 9:59 PM
To: Charlie_Kaufman@xxxxxxxxxxxxxxxx
Cc: Gregory Lebovitz; 'ddukes@xxxxxxxxx'; ipsec@xxxxxxxxxxxxxxxxx;
Michael Choung Shieh; owner-ipsec@xxxxxxxxxxxxxxxxx
Subject: Re: Peer liveliness


Hi,
In IKEv2, the IKE SA are bound to the IPSEC SA and IPSEC SAs (Child
SAs) are deleted whenever IKE SA is dead. Due to this, I don't see any
problem with the approach mentioned in IKEv2 specifications. But, in
IKEv1, this binding is not mandated and IPSEC SA can exist without
corresponding IKE SA. This is where I see problem and current DPD
specification does not seem to be considering this. I was proposing
before, the need for Dead Tunnel detection on the remote SGs. I plan to
come out with draft in 1 to 2 weeks on this. It is only applicable
for IKEv1 implementations.


Regards
Ravi

Charlie_Kaufman@xxxxxxxxxxxxxxxx wrote:



I believe that the current IKEv2 spec addresses this issue

in a way that


puts minimal requirements on implementations, guarantees

interoperability


(though with less than ideal convergence time), and allows

implementations


to do better.

But it's quite possible that I don't understand all of the

things that


could go wrong, or have inadequately expressed what

implementations MUST


do, or just plain screwed up.

The implementation requirements for robust interoperability are:

(1) An IKE SA and all of its associated child SAs fail

together. You aren't


allowed a "partial crash" where some of the state is lost

but some is kept.


This will fall out naturally in most implementations, but

may require some


modular designs to have different modules poll one another

for liveness.


(2) A node may not send on a set of SAs associated with a

single IKE SA


indefinitely without hearing something back. If it hears

nothing for long


enough, it should send an IKE message requiring a reply,

and if no reply


comes it must declare all of the SAs dead.

(3) A node that has packets to send according to its SPD

and no SA to send


them on must periodically attempt to open an SA for them.

I believe these three requirements along guarantee that the

right thing


will happen eventually. But it doesn't prescribe what the

timers should be.


So it's possible it will take unacceptably long for things

to converge. (If


network delays are long enough and timeouts short enough,

the system could


fail to work at all, but I believe that problem is unavoidable).

The problem with more sophisticated strategies is that they may be
exploitable for denial of service attacks. Anyone can forge

an INVALID_SPI


notification message from an IP address of their choice

(since such a


message is not cryptographically protected). If such a message were
sufficient to cause its recipient to shut down and restart

the SA, it would


be a very effective attack. So the spec says that such a

message may be


used only as a hint to a problem - for example to trigger a
cryptographically protected liveness test. This will cause

the failure to


be detected more quickly, but will never cause one to be

detected falsely.


Similarly, the INITIAL_CONTACT notification can be used

when setting up an


SA to assure the other end that it should abandon any SAs

it has open to


the same identity. This is useful in - for example - the

firewall case


where an identity is tied to a single box and it would be

an error for that


box to bring up two connections at once. It would not be

useful in the case


of a user who is allowed to remotely log in from multiple

workstations at


the same time. Again, this makes convergence happen faster

while never


making the wrong thing happen.

Responding to the individual comments below...

Gregory Lebovitz <Gregory@xxxxxxxxxxxxx> wrote on 04/29/2003:


[WE] won't achieve interoperability unless it's mandated that
[IMPLEMENTORS] must


reply INVALID_SPI (in clear or initiate IKE back to the
sender) whenever it
receives bad spi packets.  Current IKEv2 draft doesn't
address this issue
(only states you MAY reply a clear notify message).

IKEv1 vendors has implemented many ways to solve it which

leave poor


interoperability.  We should just pick a method and clarify
it in IKEv2.
===============
Michael Shieh


I think we did, but if you don't think it works, explain why.



We have been having quite a debate in the ICSA IPsec

consortium mail list


recently trying to figure out how to handle this in IKEv1

(YES, STILL!!!)


Here is what we know for sure of this problem statement:
(a) detecting liveness/deadness of peer is a good thing,

but does not


solve


all the failure cases in and of itself

Which ones does it not solve?




(b) the behavior of a recently rebooted device when it receives an
encrypted packet for an SPI or IKE-SA not in its SADB MUST

be mandated,


or


else implementations will not interoperate (as is the case

in IKEv1, 5


years


later).

Can you give an example of how two implementations

following IKEv2 could


fail to interoperate?



(c) the behavior of a peer that receives a new IKE from a

peer that it


has


an existing IKE-SA with (i.e. the rebooted peer that is trying to

initiate a



new connection) MUST be mandated, or else implementations will not
interoperate (as is the case in IKEv1, 5 years later).

I believe it is mandated that the new IKE-SA must be

accepted, and the old


one either closed immediately or closed after a timeout,

though perhaps


that's just what I was thinking and not what I wrote. Is

there anything


specific you would recommend?



Darren Dukes wrote:


I believe INVALID_SPI does what you are looking for. If I

receive an


INVALID_SPI notify via an IKE SA I know to delete the SA and
traffic will
bring up a new one.

I don't believe this will work, since it assumes that an IKE SA is
established. In the scenario, the IKE-SA would have been

lost along with


the


SPI of the CHILD-SA by the rebooted peer.


Until a new IKE-SA is established, any INVALID_SPI message would be
cryptographically unprotected and therefore not to be taken as other
than a hint. If a new IKE-SA is established, the INVALID_SPI could
be taken as trustworthy and used to abandon the old SA. Without the
INVALID_SPI message, abandonment would still happen but it

would take


longer.



Recommendations to solve the solution:
- the empty notify as an aliveness check is a good idea. It

accomplishes


what the DPD draft did. Keep using this.


Generating them is not mandated, but the ability to respond

to them is.



- do what you can to use empty notify to detect dead peer ASAP. The

faster



the persisting peer can delete the old SPI and IKE-SA, the

better. The


best


case is for Persisting Peer to detect death and initiate new IKE to

rebooted



peer before rebooted peer gets packets with old SPI, IKE-SA.


If the rebooted peer knows that the SA is needed, it can do

that. If it


sets them up based on traffic, it has to wait until a

packet comes in from


one side or the other.



- On the Rebooted peer side: If an implementation receives

a protected


packet from an unkown SPI,
- simply relying on sending back an unprotected

INVALID_SPI is not a


good


idea. It is too easy to DoS the persisting peer by simply

spoofing the


rebooted peer's address.
- initiate IKE to the persisting peer.

This is allowed, although sending what looks like protected

messages from


randomly chosen IP addresses to cause the node to attempt

lots of IKE


connections is also a plausible DOS attack. Sending the

INVALID_SPI message


will tell the other end to probe this end for liveness and

initiate its own


new IKE connection if that liveness test fails. That's the

path guaranteed


to work. Others will speed things up if implementations

choose to do them.



- On the Persisting Peer:
- If you get a new IKE request from a peer already in your

SADB, respond


with the under-attack, 6 message method. This will mitigate the DoS

attack.



If you get all the way through SA and TS negotiation

successfully, you


are


assured (unless I'm missing something) that this really is

your peer, and


that he re-initiated because he lost the original IKE-SA.

Start using the


new IKE-SA and the new CHILD-SA and delete the previous

ones after some


wait


period.


Only if there is an INITIAL_CONTACT notification message.

Otherwise it's


possible that the peer is opening multiple IKE SAs, perhaps

because he is


replicated. In some configurations this might be

acceptable. In firewall to


firewall tunnels, it would not and an implementation might

reasonably treat


any IKE-SA as an INITIAL_CONTACT.



Would this proposal explicitly solve things?

Gregory.


--Charlie



--


The views presented in this mail are completely mine. The company is not
responsible for whatsoever.
--------------------------------------------------------------
----------
Ravi Kumar CH
Rendezvous On Chip (i) Pvt Ltd
Hyderabad, India
Ph: +91-40-2335 1214 / 1175 / 1184


ROC home page <http://www.roc.co.in>