Part 1 – Power Query Basics – Indeed!

While I am eagerly awaiting the preview for Power BI; I need to have something to test out in this online offering once it’s available. So in my first installment on Power Query, Power Pivot, Power View and Power Map I wanted to create an example that would truly be BI for everyone. Job search and employment opportunities came to mind. I started poking around for accessible API’s only to discover Indeed
offered one that could be hooked into with Power Query; now we are off to the races. In this post I just want to cover the basics of Power Query; importing data, splitting columns and getting data from an API. It’s a good idea to have an overall plan though; what does this running example need to be eventually be?

Update 09/01/2013:

I realize after completing the fourth installation of this series that I never explained how to enable Power Query…foolish human.  So here it goes…

You need to use Power Query to walk through this series. In Excel 2013 Power Query is built in to the product but is not a visible tab without enabling it. In order to make this visible go to File and then Options and you should see the screen below.

090113_1738_Part4Power5.png

Go to the “Add-Ins” section and select the drop down and choose “COM Add-Ins”; then click “Go”.

090113_1738_Part4Power6.png

Enable “GeoFlow for Microsoft Excel” (now called Power Map), “Power View”,”Microsoft Power Query for Excel” and “Microsoft Office PowerPivot for Excel 2013” then click “Ok”.

 

Well the indeed API only lets me get 25 results maximum and defaults to 10; so it will need to keep adding rows to the data with every refresh; not just keep the current results. It might also be nice if a user can change the city; the job search criteria and the salary depending on the job they are looking for. So a parameterized accumulating list of records from Indeed
is the eventual goal for this Excel workbook. There are some interesting Power Query blogs and examples out there two of which sprang to mind. Chris Webb’s BI Blog has exactly what we need for accumulating records with Power Query; sweet! Ok now on to parameterizing a query to change values like the city; there are numerous examples out there in the blogosphere including one I liked from Faisal Mohamood
over at Microsoft who taps into the Yelp API. So with these two blogs as a starting point and a clear idea of where we need to go it’s time to dig in and explore the API we are trying to use.

Exploring the Indeed API

The first step is to head over to the Indeed Developer
section and get an account. Click on create an account and you’ll be presented with this generic little form; when you’re done click on the “I have read and agreed to the terms of Service. Create my account”

Once you’ve clicked the magic button you are in the developer section; which has a wealth of information for us! Click on the “XML Feed” tab at the top!

Scroll down to the Sample Request section and they provide you with a search query and the options for searching via their API.

Copy the example search out of the site and crack open excel; click on the power query tab and paste in your example query from indeed. Click Ok when you’re done!

So you’re now in the power query filter and shape window; but it’s not displaying much…what did you do wrong? Absolutely nothing; you need to expand the results…twice. The results are embedded two levels deep in the data you just got back from Indeed. So have at it!

I can now see 10 results! Exciting stuff…

As I’m scrolling across what was returned some data points are interesting and others; not so much. Do I really need to know that the query was Java? Probably not; but the company information would be nice as would the date. If you right click on any column; you can remove it.

Items I’m going to remove in this way are:

  • Query
  • Location
  • DupeFilter
  • Highlights
  • Total Results
  • Start
  • End
  • Radius
  • PageNumber
  • City
  • Country
  • Expired
  • OnMouseDown
  • Sponsered
  • State
  • AttributeVersion

So now we have a cleaner data set of only items which I’m interested in. Some of the items I removed are City and State; these are relevant but I wanted the opportunity to introduce you to a great little feature called split column. A lot of the data you will be dragging in from the web is text based and needs to be formatted to be useful in a table and Power Pivot. Let’ split the formattedLocation in to City and State using the split columns feature. Right click on formattedLocation and go to split columns; then delimiter.

Comma is already entered in and that’s what we want because the city and state are separated by a comma; we know we also want only two columns (one for city and one for state). Click ok and you should have a formattedLocation1 and formattedLocation2.

We could have used the formattedLocation2 which occasionally uses zip code to do the same thing; with a higher degree of accuracy if the Zip Code was provided. In this case we would have had to first split by the comma; to create two columns city and state-zip. Then we would use the split function again on the state-zip column and split the column by “Number of Characters”. In this case the number of characters to split by is three and we would have to limit the number of columns to two in the advanced options section of split.

I’ve chosen not to do this for this particular model but it’s up to you if you find it gives your data more meaning I say go for it. I’m going to remove the formattedLocationFull from my query. At this point I don’t see a whole lot else to split up; but the column names could use some work; if you double click on a column name you can change it. It highlights the cell in green and when you’re done just press enter to change the column name.

**This occasionally fails in the preview version I’m working with; while annoying it’s not the end of the world. Just click back on the column name and try again.

Column Name Changes:

  • Company
  • Date
  • City
  • State
  • RelativeTime
  • JobKey
  • JobTitle
  • Latitude
  • Longitude
  • Snippet
  • Source
  • URL

Now that we have some clean column names and clean data; press the giant “Done” checkmark in the bottom right of the screen.


So far you have tapped into an API, parsed some strings to get at City and State; removed columns and adjusted the column names. I know what you’re thinking though; “I’m not from Austin, TX and I’m not a Software engineer. Even if I was; ten results wouldn’t really help me all that much”. In Part two of the series we will correct these deficiencies and start getting into the more exciting features of Power Query and the M language that is behind what we just did.

The file we have been working with can be downloaded here.

Continue to Part 2….

One thought on “Part 1 – Power Query Basics – Indeed!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s