My Python Mechanical Turk API — mturkcore.py

mturkcore.py is now on Github: https://github.com/ctrlcctrlv/mturk-python

The current state of Mechanical Turk API’s for programming languages is somewhere between “horrible” and “confusing”. It is completely astounding to me how something as simple as a REST/SOAP webservice is confused so much in every single SDK I tried.

The Perl SDK is totally broken and has been for quite some time. I first realized this in one of my first experiences as a Mechanical Turk requester when I had to go through a totally convoluted process to install it to use a script that a friend of mine wrote (think, failing every test and compiling an old version of Perl then forcing the install for it to “work”). This version seems to work, but is not easily discoverable.

The PHP SDK is totally confusing. It contains gems like building the SOAP responses by itself using string formatting, and this:

/* URL validation - SORT LATER */
function mtValidHTTP($url)
{
   return TRUE;
}

All the while being totally confusing internally, coming with no example programs, and renaming a ton of functions.

Even when the SDK’s do work, you need to refer to two sets of documentation: to the SDK’s documentation and the official mTurk API documentation. Why? Because many of them come up with their own cutely named methods (get_account_balanceGetAccountBalance).

Python is my favorite language, so after much Google searching I tried to use the boto mTurk connection. It’s a mess, it’s missing a ton of calls (doesn’t seem to have been updated in a long time) and is just overkill for most scripts (all of them).

Download latest mturk.py from Github: https://github.com/ctrlcctrlv/mturk-python

Download mturkcore.zip (2.4KB) (v 1.0)

Download old SOAP version (requires python-suds, not recommended for future projects) (2.8KB) (v 0.2)

Update 01/26: Version 0.2 released. This fixes a critical bug where the mTurk WDSL was NOT cached! Updating is strongly recommended.

Update 03/19: Version 1.0 released. This is the best version yet! mturkcore no longer uses Suds and now uses the mTurk REST API, the only major SDK I know of to do so! This release however breaks backwards compatibility.

Update 02/27/2014: I’m using Github for this instead now. Please refer to the documentation there.

Thanks to all of this, I wrote my own SDK, mturkcore.py. At just 103 lines of code, including documentation, I hope this is a real departure from the past. mturkcore.py is a complete Python Mechanical Turk SDK. It only requires Python 2.7, xmltodict and requests and uses standard Python types for parameters, like str, dict, and list. Some example programs and usage info follows.

Your configuration file, passed as a dict to MechanicalTurk or saved in mturkconfig.json

{
"use_sandbox" : false,
"stdout_log" : false,
"aws_key" : "ACCESSID",
"aws_secret_key" : "PASSWORD",
}

Getting your balance

import mturkcore
m = mturkcore.MechanicalTurk()
m.create_request("GetAccountBalance")
if m.is_valid():
        print m.get_response_element("AvailableBalance")

Assigning a qualification

import mturkcore
m = mturkcore.MechanicalTurk()
workers = ["A1ZZZ","A1QQQ"] # Replace these, of course!
for worker in workers:
    m.create_request("AssignQualification",{"QualificationTypeId":"2MYQUALIFICATION","WorkerId":worker,"IntegerValue":100})

Known bugs

None anymore :)

13 thoughts on “My Python Mechanical Turk API — mturkcore.py

  1. Jake Harris

    I will give this a poke when I get home.

    I have been using the CLI as I have little to no experience with Java or C#, so I’m pretty glad I stumbled here from stackoverflow.

    Thanks for sharing!

    Reply
  2. Chris A

    Hey there,

    Thanks for making this! I’ve been using it with a good amount of success especially when compared to boto.

    I have had one sticking point: sending a create_hit request to sandbox fails when a HITLayoutID is included, but it’s fine when sent to the production site. I get a TypeNotFound error from suds. Any idea why this is? My code is below:

    login_dict = {‘use_sandbox’:True,
    ‘stdout_log’:False,
    ‘AWS_ACCESS_KEY_ID’:'my_key’,
    ‘AWS_SECRET_ACCESS_KEY’:'my_secret_key’}

    question = {“Title”:”Test Layout”,
    “Description”:”Test Description”,
    “HITLayoutId”:’the_layout_id’,
    “HITLayoutParameter”:[{"Name":"MATCHED_LE_NAME", "Value":LE_NAME},
    {"Name":"SOURCE_URL","Value":SOURCE_URL}],
    “Reward”:{“Amount”:0.08,”CurrencyCode”:”USD”},
    “LifetimeInSeconds”:3600,
    “AssignmentDurationInSeconds”:172800
    }
    mtc.create_request(“CreateHIT”, question)

    Also I made sure to use the hit layout ID from the sandbox GUI, so that shouldn’t be an issue.

    Any ideas?

    Reply
    1. Fredrick Post author

      @Chris A: That’s very interesting! I have not experimented with this myself, but from what I can see a HIT Layout is what you make on the requester site.

      http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_HITLayoutArticle.html

      I had no idea that you could use these with the API. That’s quite a cool feature.

      I tracked down the issue (sorry for the delay, I also offer paid services if you’re interested ;) ), and it was that for the sandbox I was using an outdated version of the Mechanical Turk WSDL while for production I had the correct version. (2011-12-01 vs 2012-03-29).

      To fix this, change the URL in line 58 to http://mechanicalturk.sandbox.amazonaws.com/AWSMechanicalTurk/2012-03-25/AWSMechanicalTurkRequester.wsdl

      I’ll update this in mturkcore soon.

      Reply
  3. Andy

    Thanks for putting this python SDK together. Really helpful. I am having some difficulty extracting info from API responses though. I can run the examples from the blog post no problem. Here’s where I run into difficulty:

    >>a=m.create_request("SearchHITs",{"SortProperty":None,"SortDirection":None})
    >>a
    ...returns list of HITs...
    >>m.get_response_element("HITId")
    ...returns nothing...
    >>m.get_all_elements("HITId")
    {}
    >>type(a)
    <type 'instance'>

    instance of what? some suds object? I don’t recognize the outputs as a python dict or list. Is it some kind of extracted, formatted xml? I am confused by the format – not sure what it “is”, so I’m not sure how to interact with it.

    Thanks again!

    Reply
    1. Fredrick Post author

      Hi,

      You’re not doing it right. Take a look at the documentation:

      http://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/ApiReference_SearchHITsOperation.html

      The HIT ID’s are stored in HIT elements. You need to get the HIT element first, then go down to the HITId element.

      Actually I recommend using the new mturkcore (I just released it, I’ve been testing it for a few months on my own projects and it seems to be very stable) which outputs OrderedDicts as responses.

      Reply
  4. Anthony

    This is great — thanks a lot for making it available. One question — what is the purpose of the “inner” argument to the _flatten method? I see it is set to True when _flatten is called recursively, but that seems to lead to incorrect output. For example, to specify a reward, Amazon wants a URL like this:

    &Reward.1.Amount=0.50&Reward.1.CurrencyCode=USD

    However, if you do:

    >>> m = mturkcore.MechanicalTurk()
    >>> params = {'Reward': [{'Amount': '0.50', 'CurrencyCode': 'USD'}]}
    >>> print m._flatten(params)
    {'Reward.1.CurrencyCode': 'USD', 'Reward.1': '', 'Reward.1.Amount': '0.50'}

    Notice the extraneous ‘Reward.1′: ” returned in the flattened dictionary. If I alter the _flatten method and set inner=False in the recursive call, I then get the following:

    >>> print m._flatten(params)
    {'Reward.1.CurrencyCode': 'USD', 'Reward.1.Amount': '0.50'}

    which is exactly correct.

    So, am I doing something wrong, or is that a bug in the _flatten method?

    Thanks.

    Reply
    1. Fredrick Post author

      You just ran into one of my little hacks :)

      Take a look at this thread: https://forums.aws.amazon.com/message.jspa?messageID=169417

      Even though the docs don’t say it, mTurk sometimes requires something like this:

      'QualificationRequirement.1.LocaleValue':"", # Why?
      'QualificationRequirement.1.LocaleValue.Country':"USA",
      

      Not always, though. However, having an extra argument where one isn’t needed doesn’t make a difference to mTurk (it is silently ignored).

      So, while you could leave inner as False for that specific case, if you tried using an endpoint that had this odd problem on Amazon’s side you would find it doesn’t work.

      Reply
  5. J Ruiz

    Thanks for posting this! Works great on Python 2.7 but in Python 3 they changed the hmac to only accept bytes and for the life of me I can’t get the signature to generate correctly. Any Ideas how to get _generate_signature to work with Python3?

    Reply
    1. Daniel

      The following _generate_signature implementation seems to work for Python 3:

      def _generate_signature(self, service, operation, timestamp, secret_access_key):
      my_sha_hmac = hmac.new(secret_access_key.encode('ascii'),
      (service + operation + timestamp).encode('ascii'),
      hashlib.sha1)
      my_b64_hmac_digest = binascii.b2a_base64(my_sha_hmac.digest())[:-1]
      return my_b64_hmac_digest.decode('ascii')

      I did not yet test if everything else works without changes. I’ll report back ;)

      Reply
  6. Brian Schiller

    Hi, I’m here from your answer to my stackoverflow question.

    Thanks again for your help. I’m surprised at how much easier this is than using boto.

    I noticed you don’t have an example on your site of creating a HIT. I hope it’s alright if I add this one.

    I recommend making a parameterized template of your assignment at https://requestersandbox.mturk.com/create/projects. Then you can take the HITLayoutID and use it in your code:

    QUESTION_PROTOTYPE = {
    'Title': 'your title',
    'Description':'your description',
    'Keywords':'your keywords',
    'HITLayoutId': 'get this from requester.mturk.com/create/projects',
    'HITLayoutParameter': [{'Name':'image_url', 'Value': 'http://example.com/img1.jpg' }],
    'AssignmentDurationInSeconds': 60*60, #one hour
    'LifetimeInSeconds': HIT_LIFETIME_IN_SECONDS,
    'AutoApprovalDelayInSeconds':AUTO_APPROVAL_DELAY_SECONDS,
    'Reward': { 'Amount': '0.50', 'CurrencyCode':'USD' }
    }
    resp = mtc.create_request('CreateHIT', QUESTION_PROTOTYPE)
    print resp['CreateHITResponse']['HIT']['HITId']

    Reply
  7. Matt Hayward

    Thanks for this library – gets the job done quickly and easily!

    I had one problem that drove me up a wall, and a suggested change to address it. I was composing my own HTMLQuestions, and they were quite large. At a certain point my CreateHIT they would fail mysteriously with an unhelpful error message on this line:
    self.response = xmltodict.parse(self.xml_response.encode(‘utf-8′))

    It turns out that the URL for the GET was too long, and the response was coming back empty with no request.text, which in turn leads to failures on the above line trying to operate on an empty string.

    The fix was simple for me, just change the requst submission to a POST with data instead of params:

    request = requests.post(self.service_url, data=self.flattened_parameters)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>