Distilled semantics for comprehensive scene understanding from videos