Rule-based structural analysis of web pages