[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Peer liveliness
Just for clarification, DPD makes no mention of sending IC. It's "out of scope"
Gregory Lebovitz wrote:
Moving the discussion back to IKEv2, for the moment...
Several of us have spent a lot of time discussing this issue in the past few
weeks. A main problem we are trying to solve (though not the only one) is
rapid recovery from a rebooted peer.
If you look at the current DPD draft for IKEv1, it calls for sending
INITIAL-CONTACT whenever a peer thinks this is its first contact, i.e. has
no established SAs with the remote peer. This is done, even in the case
where DPD is running on both peers, to let the other peer -- the persisted
peer (as opposed to the rebooted peer) -- know to delete the old SAs asap.
Because the DPD timers might not catch it fast enough.
It is a very good idea to do this because sending the empty notify (DPD),
and the timer setting for how often, are totally optional. Therefore,
depending on settings, and *without* the INITIAL-CONTACT, it could be quite
some time before the persisted peer relinquishes its current SAs.
Charlie, I can't remember, is the sending of INITIAL-CONTACT a MUST in the
latest IKEv2 draft? Would it be a good idea to make INITIAL-CONTACT
notification a MUST, if it is not already? Doing so would help shorten the
tunnel black hole in most cases, regardless of dpd settings.
The next question is: what is the best behavior for a (rebooted) peer who
receives an invalid SPI? Today the mandate is to drop silently. But, if two
rules are checked first, it can be fine (i think) to respond. Those rules
- do I have an active SA with the sender of the invalid SPI? If yes, drop
silently. If no, go to next rule check...
- do I have the source IP of the sender in my SPD? i.e. is the sender a
valid peer? If no, drop silently. If yes...
- initiate IKE per SPD definition.
If these two rules are followed, the only threat I see to responding with
IKE initiation is that an attacker who knew all of my valid peers' IPs
could, at the moment of recovery from reboot (or power up), cause me to
establish IKE with all my peers listed in SPD, even though I might not have
otherwise made those establishmetns. Attacker would do so by sending me
invalid SPIs spoofed with source of each of my peers. I guess I see this as
a pretty tough attack to pull-off in the real world (given spoof checking
used on most ISP routers these days), and the pay-off of the attack likely
doesn't merit the difficulty of execution. Does the value merrit the risk?
Summary: IKEv2 aliveness checking doesn't ensure fast recovery. It provides
a mechanism that MAY be used for fast detection and recovery, but doesn't
guarantee it. However, combining the initiate-IKE response behavior +
INITIAL-CONTACT + liveness detection would ensure VERY fast
re-establishments for valid peers after one rebooted (and covers all other
cases too). If the liveness checking doesn't catch the failure fast enough,
the initiate IKE response w/ IC will.
From: Ravi [mailto:ravivsn@xxxxxxxxx]
Sent: Wednesday, May 14, 2003 9:59 PM
Cc: Gregory Lebovitz; 'ddukes@xxxxxxxxx'; ipsec@xxxxxxxxxxxxxxxxx;
Michael Choung Shieh; owner-ipsec@xxxxxxxxxxxxxxxxx
Subject: Re: Peer liveliness
In IKEv2, the IKE SA are bound to the IPSEC SA and IPSEC SAs (Child
SAs) are deleted whenever IKE SA is dead. Due to this, I
don't see any
problem with the approach mentioned in IKEv2 specifications. But, in
IKEv1, this binding is not mandated and IPSEC SA can exist without
corresponding IKE SA. This is where I see problem and current DPD
specification does not seem to be considering this. I was proposing
before, the need for Dead Tunnel detection on the remote
SGs. I plan to
come out with draft in 1 to 2 weeks on this. It is only applicable
for IKEv1 implementations.
I believe that the current IKEv2 spec addresses this issue
in a way that
puts minimal requirements on implementations, guarantees
(though with less than ideal convergence time), and allows
to do better.
But it's quite possible that I don't understand all of the
could go wrong, or have inadequately expressed what
do, or just plain screwed up.
The implementation requirements for robust interoperability are:
(1) An IKE SA and all of its associated child SAs fail
together. You aren't
allowed a "partial crash" where some of the state is lost
but some is kept.
This will fall out naturally in most implementations, but
may require some
modular designs to have different modules poll one another
(2) A node may not send on a set of SAs associated with a
single IKE SA
indefinitely without hearing something back. If it hears
nothing for long
enough, it should send an IKE message requiring a reply,
and if no reply
comes it must declare all of the SAs dead.
(3) A node that has packets to send according to its SPD
and no SA to send
them on must periodically attempt to open an SA for them.
I believe these three requirements along guarantee that the
will happen eventually. But it doesn't prescribe what the
timers should be.
So it's possible it will take unacceptably long for things
to converge. (If
network delays are long enough and timeouts short enough,
the system could
fail to work at all, but I believe that problem is unavoidable).
The problem with more sophisticated strategies is that they may be
exploitable for denial of service attacks. Anyone can forge
notification message from an IP address of their choice
(since such a
message is not cryptographically protected). If such a message were
sufficient to cause its recipient to shut down and restart
the SA, it would
be a very effective attack. So the spec says that such a
message may be
used only as a hint to a problem - for example to trigger a
cryptographically protected liveness test. This will cause
the failure to
be detected more quickly, but will never cause one to be
Similarly, the INITIAL_CONTACT notification can be used
when setting up an
SA to assure the other end that it should abandon any SAs
it has open to
the same identity. This is useful in - for example - the
where an identity is tied to a single box and it would be
an error for that
box to bring up two connections at once. It would not be
useful in the case
of a user who is allowed to remotely log in from multiple
the same time. Again, this makes convergence happen faster
making the wrong thing happen.
Responding to the individual comments below...
Gregory Lebovitz <Gregory@xxxxxxxxxxxxx> wrote on 04/29/2003:
[WE] won't achieve interoperability unless it's mandated that
reply INVALID_SPI (in clear or initiate IKE back to the
sender) whenever it
receives bad spi packets. Current IKEv2 draft doesn't
address this issue
(only states you MAY reply a clear notify message).
IKEv1 vendors has implemented many ways to solve it which
I think we did, but if you don't think it works, explain why.
interoperability. We should just pick a method and clarify
it in IKEv2.
We have been having quite a debate in the ICSA IPsec
consortium mail list
recently trying to figure out how to handle this in IKEv1
Here is what we know for sure of this problem statement:
(a) detecting liveness/deadness of peer is a good thing,
but does not
all the failure cases in and of itself
Which ones does it not solve?
(b) the behavior of a recently rebooted device when it receives an
encrypted packet for an SPI or IKE-SA not in its SADB MUST
else implementations will not interoperate (as is the case
in IKEv1, 5
Can you give an example of how two implementations
following IKEv2 could
fail to interoperate?
(c) the behavior of a peer that receives a new IKE from a
peer that it
an existing IKE-SA with (i.e. the rebooted peer that is trying to
new connection) MUST be mandated, or else implementations will not
interoperate (as is the case in IKEv1, 5 years later).
I believe it is mandated that the new IKE-SA must be
accepted, and the old
one either closed immediately or closed after a timeout,
that's just what I was thinking and not what I wrote. Is
specific you would recommend?
Darren Dukes wrote:
I believe INVALID_SPI does what you are looking for. If I
INVALID_SPI notify via an IKE SA I know to delete the SA and
bring up a new one.
I don't believe this will work, since it assumes that an IKE SA is
established. In the scenario, the IKE-SA would have been
lost along with
SPI of the CHILD-SA by the rebooted peer.
Until a new IKE-SA is established, any INVALID_SPI message would be
cryptographically unprotected and therefore not to be taken as other
than a hint. If a new IKE-SA is established, the INVALID_SPI could
be taken as trustworthy and used to abandon the old SA. Without the
INVALID_SPI message, abandonment would still happen but it
Recommendations to solve the solution:
- the empty notify as an aliveness check is a good idea. It
what the DPD draft did. Keep using this.
Generating them is not mandated, but the ability to respond
to them is.
- do what you can to use empty notify to detect dead peer ASAP. The
the persisting peer can delete the old SPI and IKE-SA, the
case is for Persisting Peer to detect death and initiate new IKE to
peer before rebooted peer gets packets with old SPI, IKE-SA.
If the rebooted peer knows that the SA is needed, it can do
that. If it
sets them up based on traffic, it has to wait until a
packet comes in from
one side or the other.
- On the Rebooted peer side: If an implementation receives
packet from an unkown SPI,
- simply relying on sending back an unprotected
INVALID_SPI is not a
idea. It is too easy to DoS the persisting peer by simply
rebooted peer's address.
- initiate IKE to the persisting peer.
This is allowed, although sending what looks like protected
randomly chosen IP addresses to cause the node to attempt
lots of IKE
connections is also a plausible DOS attack. Sending the
will tell the other end to probe this end for liveness and
initiate its own
new IKE connection if that liveness test fails. That's the
to work. Others will speed things up if implementations
choose to do them.
- On the Persisting Peer:
- If you get a new IKE request from a peer already in your
with the under-attack, 6 message method. This will mitigate the DoS
If you get all the way through SA and TS negotiation
assured (unless I'm missing something) that this really is
your peer, and
that he re-initiated because he lost the original IKE-SA.
Start using the
new IKE-SA and the new CHILD-SA and delete the previous
ones after some
Only if there is an INITIAL_CONTACT notification message.
possible that the peer is opening multiple IKE SAs, perhaps
because he is
replicated. In some configurations this might be
acceptable. In firewall to
firewall tunnels, it would not and an implementation might
any IKE-SA as an INITIAL_CONTACT.
Would this proposal explicitly solve things?
The views presented in this mail are completely mine. The
company is not
responsible for whatsoever.
Ravi Kumar CH
Rendezvous On Chip (i) Pvt Ltd
Ph: +91-40-2335 1214 / 1175 / 1184
ROC home page <http://www.roc.co.in>