UfXtract is an API that extracts microformats from web pages, HTML fragments or HTML files. It can output the results in JSON, XML or text. There is JSON-P support for use with JavaScript. You can also download the .Net code from GitHub.
Try out API...
Example API call
https://ufxtract.com/api/?url=http://microformats.org/&format=hcard&output=json
API parameters
- url
- The address of the web page containing the microformats
- htmlfragment
- A piece of HTML containing the microformats. This can be a fragment of HTML.
- file
- A HTML file or zipped HTML file containing microformats. In zipped files name of the HTML document should be index.html. You can must use a Form POST to send a file.
- originurl
- The URL of any HTML fragment. Is used to resolve relative path information.
- format
- The type of microformat you want to parse. This can be a single microformat name or a comma delimited list of names. The currently supported names are: hcard, xfn, hreview, hcalendar, hatom, hresume, geo, adr, tag, nofollow, license, directory, home, enclosure, votelinks, test-suite and test-fixure.
- output
- The type of output ie xml, json or text
- callback
- A JSON-P function name to wrap the data in. Only works when the output is set to JSON
- report
- Returns a summary of parsing information
Example JSON output
{
"microformats": {
"vcard": [{
"fn": "Tantek",
"nickname": ["Tantek"],
"photo": ["http:\/\/www.gravatar.com\/avatar\/02cd45622e90350cc061aaaa02229195?s=16&d=http:\/\/www.gravatar.com\/avatar\/ad516503a11cd5ca435acc9bb6523536?s=16&r=PG"],
"url": ["http:\/\/tantek.com\/"]
}]
}
}
Errors
{
"microformats": {
"errors": [{
"msg": "Invalid URI: The hostname could not be parsed.",
"url": "http:\/\/"
}]
}
}
Other tools