I think the problem is one of expectations. I think 20 sec is entirely reasonable since the size of your media library is over 1TB. Anyway, it only has to load once if you keep iTunes running (like I do).
How do you "know" that the iPhone can do what the iPhone 5 does and that there isn't a good technical reason for limiting voice to iPhone 5? You're making an assumption that this would run great on iPhone 4 and that assumption may not be accurate.