It's called your mental sketchpad. You've got a limited audio and limited visual processing capability. It's the reason why powerpoints fail to communicate ideas when both text and identical verbal instructions are given, and why verbal communications with associated pictures actually communicate the idea better. We're actually working on a long term project with researchers at Berkley right now concerning human processing.
Music is for moments. Words interfere with those. When dialogue is being spoken music should usually take a backseat, with some exceptions.
This is actually very pertinent to the current field of online education.
"Hah! It's like we don't even have feelings. Now pardon me while I recline in my huge executive chair and guffaw, cigar in-hand. "
"ill just go with what Winslow always when something that funny about a location in monkey island is said"