[Speech Communication'2026] Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling