Screen-scraping HTML is evil. APIs are better, and can return structured XML, JSON or even SOAP (everyone laughs at the latter!)
If page is marked up semantically using microformats, an XSL stylesheet can convert it into RDF. Put the stylesheet on a profile page and link to it from the HTML (
<head profile="...">), and triplr can generate triples from it in various formats. This can then be parsed using SPARQL.