UfXtract

Powerful, easy to use .Net microformats parser

UfXtract is an API that extracts microformats from web pages, HTML fragments or HTML files. It can output the results in JSON, XML or text. There is JSON-P support for use with JavaScript. You can also download the .Net code from GitHub.

Try out API...

Extract microformats from a URL

Extract microformats from HTML fragment

Extract microformats from HTML file

Example API call

https://ufxtract.com/api/?url=http://microformats.org/&format=hcard&output=json

API parameters

url
The address of the web page containing the microformats
htmlfragment
A piece of HTML containing the microformats. This can be a fragment of HTML.
file
A HTML file or zipped HTML file containing microformats. In zipped files name of the HTML document should be index.html. You can must use a Form POST to send a file.
originurl
The URL of any HTML fragment. Is used to resolve relative path information.
format
The type of microformat you want to parse. This can be a single microformat name or a comma delimited list of names. The currently supported names are: hcard, xfn, hreview, hcalendar, hatom, hresume, geo, adr, tag, nofollow, license, directory, home, enclosure, votelinks, test-suite and test-fixure.
output
The type of output ie xml, json or text
callback
A JSON-P function name to wrap the data in. Only works when the output is set to JSON
report
Returns a summary of parsing information

Example JSON output

A single hCard on a page would return output as in the example below. The format is based on ufJson documented on the microformats wiki. The API compresses the JSON output removing all spaces and returns. If you would like a more readable layout try using the Javascript beautifier.

{
    "microformats": {
        "vcard": [{
            "fn": "Tantek",
            "nickname": ["Tantek"],
            "photo": ["http:\/\/www.gravatar.com\/avatar\/02cd45622e90350cc061aaaa02229195?s=16&d=http:\/\/www.gravatar.com\/avatar\/ad516503a11cd5ca435acc9bb6523536?s=16&r=PG"],
            "url": ["http:\/\/tantek.com\/"]
        }],
    }
}

Errors

UfXtract has very simple in built error reporting. Below is an example of calling the API with an empty URL.

{
    "microformats": {
        "errors": [{
            "msg": "Invalid URI: The hostname could not be parsed.",
            "url": "http:\/\/"
        }]
    }
}

Other tools

Download Code MIT Open Source License About the UfXtract .Net library