I think we need to be careful here. The answer to the question in the OP (and the title of this thread) is a very definite "NO". You cannot know the transient (e.g. impulse) response given only the frequency response. As Earl mentions, you also need to know the phase response, since this tells you when each frequency component will appear in the time domain.

That is why this sentiment is not strictly correct:

I think a way to look at that mathematical truth is that if the upper frequencies are present like with good tweeters, then the whole system reacts real quick... which is like saying the transient response is swell.

Invoking the term "mathematical truth" is just going to mislead the uninformed and, frankly, is anything but "truth".

What I think Ben is trying say is that it is the upper extent of the frequency response sets a lower bound regarding how fast a system can respond to a stimulus. That is true to the extent that a band-limited system can only respond as fast as its high frequency components will allow given the best possible phase response for the system.

But the converse is not always correct, e.g. that a system with flat and extended high frequency response MUST have good transient response. And the missing piece of relevant information is the phase response.

If you do some hand waving and say that the loudspeaker must be well-behaved and well-designed, yadda yadda then Ben's sentiment starts to ring true and we can make some educated guesses about what the phase response will look like.

So, to reiterate, if you only know the FR, then you cannot know for certain the time domain behavior. As an example, it's like the difference between two systems with the exact same FR, one of which can reproduce a square wave perfectly and the other is a jumbled mess in the time domain. It's very likely that these will sound exactly the same thanks to your ears+brain not being all that sensitive to "phase distortion" but they absolutely do not have the same time domain response.