Skip to content

Instantly share code, notes, and snippets.

@mdg
Last active February 15, 2018 17:31
Show Gist options
  • Save mdg/aa4c9070ff3dbeaa5d4613cba05c2faf to your computer and use it in GitHub Desktop.
Save mdg/aa4c9070ff3dbeaa5d4613cba05c2faf to your computer and use it in GitHub Desktop.
color
flavor
humor
labor
center
fiber
liter
theater
analyze
defense
license
offense
pretense
analog
catalog
dialog
colour
flavour
humour
labour
centre
fibre
litre
organise
recognise
analyse
defence
licence
offence
pretence
2723439
135502
723439
2939135
536841
1224072

Types of English

TpT has many users from US, Canada, Australia and New Zealand. Users from these countries may be more likely to buy resources localized (localised?) to their own version of English. In order to provide them this, we need to be able to identify the version of English used in resources based on their title and description.

We have identified lists of words that identify British english or American english. And we have a list of product IDs that we want to identify as British or American.

Task

In the language of your choice, write a command line program that:

  1. Downloads the latest versions of the British and American word lists from a gist
  2. Read the list of product IDs to test from a file on disk
  3. Fetches title and description of each product
  4. Checks each title and description for British or American words
  5. Flags each product as:
    • American English
    • British English
    • Unknown
    • Mixed British and American English
  6. Outputs the list of product IDs mapped to: british, american, unknown or mixed

Resources

  • a british word list
  • an american word list
  • a product id list

To fetch data from the GraphQL API:

  • submit a POST request to the following URL: https://www.teacherspayteachers.com/graph/graphql
  • submit the parameters as a JSON object with two fields: query and variables

Use the following as the GraphQL query text to fetch product title and description:

query productText($productIds: [ID]!) {
    products(ids: $productIds) {
        id
        name
        description
    }
}

This query expects a single variable as input (in the variables field), which should be structured as:

    {"productIds": [1, 2, 3, 4, ...]}

What matters to us?

  • Does it work?
  • Is the code readable?
  • Is the program well tested?

What doesn't matter so much?

  • This is meant to be about the mechanics of the system. You don't need to deep in the NLP parts of the problem. Simple text processing is fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment