Getting data from the web with Import.io

By Chris Alexander

Import•io is a platform which facilitates the conversion of semi-structured information in web pages into structured data. This structured data can be used for anything from driving business decisions to integration with apps and other platforms.

For developers, we offer real-time data retrieval through our JSON REST-based and streaming APIs, integration with several common programming languages and data manipulation tools, as well as a federation platform which allows up to 100 data sources to be queried simultaneously.

If you've ever found yourself writing a scraper, trying to get data out of the HTML of a web page, or combining the data from multiple sources together then import•io may be for you.

Scraping problems

Using current scraping tools, especially in code, is a brittle and time-consuming process.

import•io replaces the bulk of this process with a UI tool, which allows anyone to train the system to recognise data in web pages. Once the platform has been trained, you can use our RESTful APIs to download the data you got, or to even query the site for data in realtime with your own inputs.

Example data

Here, I am going to show you how to get data from a sample data source I built in a few minutes. If you want to find out more about building your own source, we have plenty of tutorials and help documentation to get you going.

The example source I am going to use is what we call a "connector" to Amazon. We call it a connector because it connects your app to Amazon's product search engine over an API. You can see the data source on import.io here, or check out some example data from it here.

On that page you can also go and edit the input search, and press "Refresh". The import•io platform will then go and get you the latest search results for that term, in realtime. From that page you can also download the data as CSV, for Excel, as JSON and in other formats.

However, we are more interested in getting this data into some code, so that we can manipulate it ourselves.

import•io Python client

We provide a Python client library to allow you to do the realtime queries just illustrated on our site in code.

In order to use these examples, you will need to replace "YOUR_USER_GUID" and "YOUR_API_KEY" with credentials for a valid import•io account. Once you have signed in you can get both of these pieces of information from your Account page.

The first step is to download our client library from this link. You can save this anywhere on your machine.

Next, if you save it in the same directory as where you downloaded the client library file, this example.py script will query the data source we mentioned earlier and show you the results:

Don't forget to replace in your User GUID and API key on line 51 of the script!

On line 98 of the example, we create a new query to import•io. It sets the product name we are searching for, in this case "ellie goulding", and the GUID of the data source that we had earlier on. Once the query has been dispatched to the server, the callback function defined at line 78 is called for each page of Amazon search results for that query (up to a maximum of 10 pages).

Once we have all the data, at line 112, we go through all of the results we received and process the data to output the names of the products found, and how much they cost.

You should get some output that looks like this:

This is just one example of what you can do with import.io. There are many ways you can use the platform, such as crawling an entire website and then refreshing specific parts of the data, and getting data from many different parts of the same webpage.

If you like this source, you can go to our integration guide which will show you how to do a similar integration with JavaScript, Node JS, Java, C#, Ruby and other languages.

About the Author

Chris is a Developer Experience Engineer at London-based big data startup import.io. He is responsible for contributing to the import•io platform and web applications in addition to making sure that every developer gets the most out of the service.

About Import.io

Turn websites into structured APIs in seconds. import·io is the ultimate tool for anyone who wants to take advantage of big data the easy way. Using our free app you can extract data from any website into a table or an API without writing any code. Get live data direct from the source then combine multiple sources together and call them all over a single API call.


See More Posts About:

Web Development

Posted by Chris Alexander

LinkedIn Website