Part 3 – Power Query API – Table as Parameter for a Query Indeed!

This the third part in the Indeed! Series and we will cover how to parameterize a query with Power Query from a table in Excel. In Part 2 we setup our Indeed API calls to gather some basic parameterized data with Power Query. It’s nice that we can enter anything; but do we really want to manually enter information each time to change the query? Probably not. The file from Part 2 can be downloaded here. Hopefully you have had a chance to go over to Faisal Mohamood’s blog and read up on parameterizing a Yelp query or for a crazy parameterization in Power Query you can check out Devin Knight’s blog post.

Important:

Eliminate all steps after you invoked the function; the only step in this Power Query should be “GetJobs”.

Your “Query Results” tab should look like this:

Parameters from a Sheet in Excel

We don’t want to have to open up Power Query and keep entering these manually do we? I doubt our end users will either. So the next thing we are going to do is to make a shiny new table with the zip codes and other data we want to store as parameters. In the existing workbook I added a new sheet and named it “Parameters” and renamed “Sheet 2” to “Query”. The renaming is not needed; but it does help keep things organized.

The next step is to right click on the entire worksheet and format it as “Text” instead of “General”.

I then added two data rows with a column header as “ZipCodes”, “Salary”,”SearchCriteria” and “PublisherKey”. I then went to the insert tab and inserted “Table” using the three rows I just created including the header.

** I left no spaces in my column names; if I had I would have to add double quotes and a # symbol when referencing it in Power Query.

End Result:

Next we go over to the power query tab; highlight the entire table and click “From Table”.

We just created a new Power Query using the zip code table in the “Parameters” worksheet. Now we need to add a custom column to this query which references the “Parameters” table and calls our previous query “GetJobs”. This new Power Query acts as the glue; bringing together the “Parameters” sheet and our existing query. Right click on a column and go to “Insert Column” then click on “Custom”.

Click the continue button and set the workbook to public; then click save. These security features are there so you don’t unintentionally send company information to places like Indeed.

*** In my case the security pop up did not appear when using the columns; so I had to add regular text to the query and then set the security. In case you run into it; add some regular text data using “” to mark the text (be sure to include your PublisherKey, which I omitted here). You can also take the entire “Query” code block and just move it into another sheet which works just as well; but you have to setup your parameters again.

From here you can either delete the existing query with hard coded values or recode it to reflect the table you just made. I’m going to delete it in case you didn’t run into my specific security issues.

**

Right click and create a custom column. First add “GetJobs(” to the query, then select each column in order and click insert. You should end up with something like this by the end.

Now if you click “Ok”; you will get back to the screen below; in the “Custom” column header click on the table expander and click “Ok”. This will expand the query results from our call to “GetJobs”.

End Results…

If you added two rows of data to your job search; you will receive 20 results; 3 rows and you’ll get back 30. This is calling the function, which calls the API, for every row in your new table! At this point I looked back at the developers section of Indeed and realized; we are allowed up to 50 results (after plugging in 50 below; you only get back 25 but hey it’s better than 10!). So click “Done” in Power Query, rename your new sheet to “Parameterized Query Results” and then go back to your “Query” sheet.

Open up Power Query and go to the advanced editor. Change the sort by to “date”, radius to “25” and limit to “50” (….&sort=date&radius=25&st=&jt=&start=&limit=50…). Save your query and “Refresh All” from the Data tab in Excel. Now you should have 50 results for two rows of data and no update was needed to the “Parameterized Query Results” we just created. This is a great way to encapsulate logic in Power Query and change data on the fly without affecting anything we are doing downstream.

The end user of this little Excel may be looking for a particular job in a particular region; but salary is usually negotiable. Let’s update our table to look for only DBA jobs ranging from 70,000 to 150,000 in a particular zip code (if they are paying 150,000 for a DBA gig I want to know about it).

If we click refresh all at this point I would expect 125 results in my “Parameterized Query Results” tab; sadly this was not the case in my first run through.

I have only the rows from my original sheet….what the what? Something is obviously wrong so let’s go into Power Query again.

At this point I tried to recreate the “GetJobs” query in our Parameterized Query worksheet. This didn’t work out so well and resulted in the same error. I didn’t take my own advice:

Important:

Eliminate all steps after you invoked the function; the only step in this Power Query should be “GetJobs”.

I went through and re-created everything to end up at the same place with the error; until I realized that while adjusting the sorting and limit sizes in the API call I had tested it and not deleted that step in query….foolish human. Ok with that cleaned up; I am now able to add rows to my “Parameters” worksheet and get results.

Query Parameters

Results in 175 rows!

Ok I’m noticing a trend; I have a lot of duplicated data. Indeed lets you set the lower bound of the salary you are looking for; so when we search on 100,000 and 130,000 the same job can appear in both queries. Let’s clean that up by “Removing Duplicates” on the Job Key in the “Parameterized Query”.

The order of our “Parameters” table matters; in this context the API calls go from the top down so our first row of data is issued first and on down the list. We are setting the minimum salary to query during each call; so I’m calling jobs for 150,000 dollars first; then 130,000 next etc. Since the calls are being done in that order; deduplication is also being done in that order. It’s an important item to note; if we reversed the query and went from 65,000 at the top to 150,000 the jobs would be marked lowest to highest. Meaning a job that is a 100,000 salary gets queried and marked as 65,000 because 100,000 is greater than 65,000. This will matter later on the series but it is also an important concept to grasp when parameterizing queries.

After deduplication of the data we end up with only 3 job’s whose value is at 65,000 dollars; because all the other job searches listed those jobs at a salary above that range already. At the end of the day I ended up with 89 results in my query de-duplicated vs. 175 actually returned.

To recap; we have gone from just calling the Indeed API in Excel to being able to issue multiple parameterized queries, de-duplicate them and return the data. That’s awesome …

The state I really want to get to is any time we query the Indeed API; we accumulate the results instead of overwriting them. So stay tuned for Part 4 of the Indeed! Series where we will do just that!

The file for Part 3 in the series can be downloaded here.

Continue on to Part 4….

One thought on “Part 3 – Power Query API – Table as Parameter for a Query Indeed!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s