[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Peer liveliness

Just for clarification, DPD makes no mention of sending IC. It's "out of scope" ;-).


Gregory Lebovitz wrote:
Moving the discussion back to IKEv2, for the moment...

Several of us have spent a lot of time discussing this issue in the past few
weeks. A main problem we are trying to solve (though not the only one) is
rapid recovery from a rebooted peer.

If you look at the current DPD draft for IKEv1, it calls for sending
INITIAL-CONTACT whenever a peer thinks this is its first contact, i.e. has
no established SAs with the remote peer. This is done, even in the case
where DPD is running on both peers, to let the other peer -- the persisted
peer (as opposed to the rebooted peer) -- know to delete the old SAs asap.
Because the DPD timers might not catch it fast enough.

It is a very good idea to do this because sending the empty notify (DPD),
and the timer setting for how often, are totally optional. Therefore,
depending on settings, and *without* the INITIAL-CONTACT, it could be quite
some time before the persisted peer relinquishes its current SAs.

Charlie, I can't remember, is the sending of INITIAL-CONTACT a MUST in the
latest IKEv2 draft? Would it be a good idea to make INITIAL-CONTACT
notification a MUST, if it is not already? Doing so would help shorten the
tunnel black hole in most cases, regardless of dpd settings.

The next question is: what is the best behavior for a (rebooted) peer who
receives an invalid SPI? Today the mandate is to drop silently. But, if two
rules are checked first, it can be fine (i think) to respond. Those rules
  - do I have an active SA with the sender of the invalid SPI? If yes, drop
silently. If no, go to next rule check...
  - do I have the source IP of the sender in my SPD? i.e. is the sender a
valid peer? If no, drop silently. If yes...
  - initiate IKE per SPD definition.

If these two rules are followed, the only threat I see to responding with
IKE initiation is that an attacker who knew all of my valid peers' IPs
could, at the moment of recovery from reboot (or power up), cause me to
establish IKE with all my peers listed in SPD, even though I might not have
otherwise made those establishmetns. Attacker would do so by sending me
invalid SPIs spoofed with source of each of my peers. I guess I see this as
a pretty tough attack to pull-off in the real world (given spoof checking
used on most ISP routers these days), and the pay-off of the attack likely
doesn't merit the difficulty of execution. Does the value merrit the risk?

Summary: IKEv2 aliveness checking doesn't ensure fast recovery. It provides
a mechanism that MAY be used for fast detection and recovery, but doesn't
guarantee it. However, combining the initiate-IKE response behavior +
INITIAL-CONTACT + liveness detection would ensure VERY fast
re-establishments for valid peers after one rebooted (and covers all other
cases too). If the liveness checking doesn't catch the failure fast enough,
the initiate IKE response w/ IC will.



-----Original Message-----
From: Ravi [mailto:ravivsn@xxxxxxxxx]
Sent: Wednesday, May 14, 2003 9:59 PM
To: Charlie_Kaufman@xxxxxxxxxxxxxxxx
Cc: Gregory Lebovitz; 'ddukes@xxxxxxxxx'; ipsec@xxxxxxxxxxxxxxxxx;
Michael Choung Shieh; owner-ipsec@xxxxxxxxxxxxxxxxx
Subject: Re: Peer liveliness

In IKEv2, the IKE SA are bound to the IPSEC SA and IPSEC SAs (Child
SAs) are deleted whenever IKE SA is dead. Due to this, I don't see any
problem with the approach mentioned in IKEv2 specifications. But, in
IKEv1, this binding is not mandated and IPSEC SA can exist without
corresponding IKE SA. This is where I see problem and current DPD
specification does not seem to be considering this. I was proposing
before, the need for Dead Tunnel detection on the remote SGs. I plan to
come out with draft in 1 to 2 weeks on this. It is only applicable
for IKEv1 implementations.


Charlie_Kaufman@xxxxxxxxxxxxxxxx wrote:

I believe that the current IKEv2 spec addresses this issue

in a way that

puts minimal requirements on implementations, guarantees


(though with less than ideal convergence time), and allows


to do better.

But it's quite possible that I don't understand all of the

things that

could go wrong, or have inadequately expressed what

implementations MUST

do, or just plain screwed up.

The implementation requirements for robust interoperability are:

(1) An IKE SA and all of its associated child SAs fail

together. You aren't

allowed a "partial crash" where some of the state is lost

but some is kept.

This will fall out naturally in most implementations, but

may require some

modular designs to have different modules poll one another

for liveness.

(2) A node may not send on a set of SAs associated with a

single IKE SA

indefinitely without hearing something back. If it hears

nothing for long

enough, it should send an IKE message requiring a reply,

and if no reply

comes it must declare all of the SAs dead.

(3) A node that has packets to send according to its SPD

and no SA to send

them on must periodically attempt to open an SA for them.

I believe these three requirements along guarantee that the

right thing

will happen eventually. But it doesn't prescribe what the

timers should be.

So it's possible it will take unacceptably long for things

to converge. (If

network delays are long enough and timeouts short enough,

the system could

fail to work at all, but I believe that problem is unavoidable).

The problem with more sophisticated strategies is that they may be
exploitable for denial of service attacks. Anyone can forge


notification message from an IP address of their choice

(since such a

message is not cryptographically protected). If such a message were
sufficient to cause its recipient to shut down and restart

the SA, it would

be a very effective attack. So the spec says that such a

message may be

used only as a hint to a problem - for example to trigger a
cryptographically protected liveness test. This will cause

the failure to

be detected more quickly, but will never cause one to be

detected falsely.

Similarly, the INITIAL_CONTACT notification can be used

when setting up an

SA to assure the other end that it should abandon any SAs

it has open to

the same identity. This is useful in - for example - the

firewall case

where an identity is tied to a single box and it would be

an error for that

box to bring up two connections at once. It would not be

useful in the case

of a user who is allowed to remotely log in from multiple

workstations at

the same time. Again, this makes convergence happen faster

while never

making the wrong thing happen.

Responding to the individual comments below...

Gregory Lebovitz <Gregory@xxxxxxxxxxxxx> wrote on 04/29/2003:

[WE] won't achieve interoperability unless it's mandated that

reply INVALID_SPI (in clear or initiate IKE back to the
sender) whenever it
receives bad spi packets.  Current IKEv2 draft doesn't
address this issue
(only states you MAY reply a clear notify message).

IKEv1 vendors has implemented many ways to solve it which

leave poor

interoperability.  We should just pick a method and clarify
it in IKEv2.
Michael Shieh

I think we did, but if you don't think it works, explain why.

We have been having quite a debate in the ICSA IPsec

consortium mail list

recently trying to figure out how to handle this in IKEv1


Here is what we know for sure of this problem statement:
(a) detecting liveness/deadness of peer is a good thing,

but does not


all the failure cases in and of itself

Which ones does it not solve?

(b) the behavior of a recently rebooted device when it receives an
encrypted packet for an SPI or IKE-SA not in its SADB MUST

be mandated,


else implementations will not interoperate (as is the case

in IKEv1, 5



Can you give an example of how two implementations

following IKEv2 could

fail to interoperate?

(c) the behavior of a peer that receives a new IKE from a

peer that it


an existing IKE-SA with (i.e. the rebooted peer that is trying to

initiate a

new connection) MUST be mandated, or else implementations will not
interoperate (as is the case in IKEv1, 5 years later).

I believe it is mandated that the new IKE-SA must be

accepted, and the old

one either closed immediately or closed after a timeout,

though perhaps

that's just what I was thinking and not what I wrote. Is

there anything

specific you would recommend?

Darren Dukes wrote:

I believe INVALID_SPI does what you are looking for. If I

receive an

INVALID_SPI notify via an IKE SA I know to delete the SA and
traffic will
bring up a new one.

I don't believe this will work, since it assumes that an IKE SA is
established. In the scenario, the IKE-SA would have been

lost along with


SPI of the CHILD-SA by the rebooted peer.

Until a new IKE-SA is established, any INVALID_SPI message would be
cryptographically unprotected and therefore not to be taken as other
than a hint. If a new IKE-SA is established, the INVALID_SPI could
be taken as trustworthy and used to abandon the old SA. Without the
INVALID_SPI message, abandonment would still happen but it

would take


Recommendations to solve the solution:
- the empty notify as an aliveness check is a good idea. It


what the DPD draft did. Keep using this.

Generating them is not mandated, but the ability to respond

to them is.

- do what you can to use empty notify to detect dead peer ASAP. The


the persisting peer can delete the old SPI and IKE-SA, the

better. The


case is for Persisting Peer to detect death and initiate new IKE to


peer before rebooted peer gets packets with old SPI, IKE-SA.

If the rebooted peer knows that the SA is needed, it can do

that. If it

sets them up based on traffic, it has to wait until a

packet comes in from

one side or the other.

- On the Rebooted peer side: If an implementation receives

a protected

packet from an unkown SPI,
- simply relying on sending back an unprotected

INVALID_SPI is not a


idea. It is too easy to DoS the persisting peer by simply

spoofing the

rebooted peer's address.
- initiate IKE to the persisting peer.

This is allowed, although sending what looks like protected

messages from

randomly chosen IP addresses to cause the node to attempt

lots of IKE

connections is also a plausible DOS attack. Sending the


will tell the other end to probe this end for liveness and

initiate its own

new IKE connection if that liveness test fails. That's the

path guaranteed

to work. Others will speed things up if implementations

choose to do them.

- On the Persisting Peer:
- If you get a new IKE request from a peer already in your

SADB, respond

with the under-attack, 6 message method. This will mitigate the DoS


If you get all the way through SA and TS negotiation

successfully, you


assured (unless I'm missing something) that this really is

your peer, and

that he re-initiated because he lost the original IKE-SA.

Start using the

new IKE-SA and the new CHILD-SA and delete the previous

ones after some



Only if there is an INITIAL_CONTACT notification message.

Otherwise it's

possible that the peer is opening multiple IKE SAs, perhaps

because he is

replicated. In some configurations this might be

acceptable. In firewall to

firewall tunnels, it would not and an implementation might

reasonably treat


Would this proposal explicitly solve things?




The views presented in this mail are completely mine. The company is not
responsible for whatsoever.
Ravi Kumar CH
Rendezvous On Chip (i) Pvt Ltd
Hyderabad, India
Ph: +91-40-2335 1214 / 1175 / 1184

ROC home page <http://www.roc.co.in>