Search#

Overview#

This page documents how to interact with our Search endpoint, we recommend that you read the full documentation for this endpoint in order to craft queries that provide quality results.

Making Requests#

HTTP Request#

  POST http://api.algo.com/v1/search

JSON Body Parameters#

Note:
- A type enclosed in square brackets [] means that the API expects an array of that type. eg: [String] refers to an Array of String's.
- Two types separated by a pipe | means that either type is allowed.

Parameter Description
search_terms [String] (required)
The list of terms you would like to search for.
Max number of terms: 10
Max number of words in a single term: 5
filter_terms [String]
Articles that contain these terms will be rejected from the result set.
Max number of terms: 10
Max number of words in a single term: 5
size Integer
The number of articles to return.
Min: 1, Max: 20, Default: 20
page Integer
The page of results to return.
Min: 1, Default: 1
min_matches Integer | String
The amount of search_terms that has to be found in an article for it be considered valid.
Expressed as either a number (1) or percentage String "100%".
Min: 1, Default: 1
min_score Float
The minimum relevancy score required for an article to be considered valid.
Min: 0.0, Max: 1.0, Default: 0.0
fields [String]
The article fields you would like returned
Default: All.
Allowed Fields: ["title", "pub_date", "source", "content", "keywords", "link", "image_url", "public_id"]

Understanding the API#

Most of the above parameters are fairly straight forward to work with, however there are a few that allow for a lot of flexibility but can cause problems if they aren't used properly.

We highly recommend reading this next section in order to best understand how to use these search parameters.

search_terms#

These are the terms that you would like to search for, each string is treated as a phrase.

For example, if you were to search for ["steve jobs"] it would look for articles where it sees the word steve and then immediately after it jobs. However if you were to search for ["steve", "jobs"] it would look for articles which have either steve OR jobs.

This behaviour can be controlled with the min_matches parameter.

We allow you to include 10 search terms in each request, each term can consist of up to 5 words.

filter_terms#

Filter terms are a powerful way of filtering your search to only include exactly what you're looking for. If any of the filter_terms are found in an article, it will be excluded from your result set.

For example if you were looking for news about cats, but didn't want any articles that include mentions of dogs. You could add the "filter_terms": ["dogs"] to your request to exclude these articles.

We allow you to include 10 filter terms in each request, each term can consist of up to 5 words.

min_matches#

The min_matches field allows you to have greater control over what gets returned in your result set. The default min_matches value is 1, which means that only 1 search term needs to be found in an article for it to be considered relevant.

You can use a percentage "80%" or an integer 3 to indicate how many of your search terms need to be found in an article for it to be relevant. Keep in mind that the more search terms that are required, the less articles you will likely find.

min_score and understanding relevancy#

Along with each article in a result set, Algo provides a relevancy score. This score indicates how relevant an article is to the query that you have provided.

The score ranges between 0.0 and 1.0 where 1.0 is the highest and signifies that the article is completely relevant to what you are searching for. A good tip to keep in mind, is that anything below 0.3 generally has only 1-2 mentions of your terms and isn't too relevant. Whereas anything over 0.7 is very relevant.

The min_score field allows you to set a minimum score for articles for them to be considered valid and included in your result set. However keep in mind that the more terms you include, the lower the relevancy scores will be as theres a lower chance of finding articles that include all the terms you have provided.

fields#

The fields parameter allows you to specify what fields of an article you would like returned in your result set.

This can be useful if you only need a subset of article information, as fields like content can be rather large and it can bloat the JSON response size considerably.

The available fields of an article are:

Field Description
title String
The title of the article.
pub_date String - ISO8601
The datetime the article was published.
source String
Where the article was crawled from.
content String
The main body content of the article. This content includes HTML.
link String
The link to the original article from the writes website.
image_url String
The url to the main image of the article.
keywords String
A comma separated list of entities and keywords extracted from the article.
public_id String
The unique identifier we use for the article.

Note:
- All articles will include their relevancy score under the field score.

Examples#

Example API request#

The following shows an example of an API request where we want articles about "virtual reality" but not "augmented reality", we then require these articles to be reasonably relevant.

We've included some of the parameters with their default values just to be clear, however only the search_terms parameter is required in a request.


curl \
    -X POST \
    -H "Content-type: application/json" \
    -d@- \
    "http://api.algo.com/v1/search" <<EOF
    {
        "search_terms": ["virtual reality", "vr"],
        "filter_terms": ["augmented reality"],
        "min_score":    0.3,
        "page":         1,
        "size":         1,
        "min_matches":  1
    }
EOF

Example API response#

This is the response from the above query.

{
  "meta": {
    "results": 1,
    "page": 1,
    "api_version": 1
  },
  "articles": [
    {
      "title": "HTC Vive vs Oculus Rift: which is better?",
      "source": "Techradar ",
      "score": 0.33,
      "public_id": "514fff",
      "pub_date": "2015-08-07T23:22:37.000Z",
      "link": "http://www.techradar.com/us/news/wearables/htc-vive-vs-oculus-rift-1301375?src=rss&attr=all",
      "keywords": "oculus, rift, vive, touch, controllers",
      "image_url": "http://www.techradar.com/us/news/wearables/htc-vive-vs-oculus-rift-1301375?src=rss&attr=all",
      "content": "<div id=\"content\"><h3> Hardware, design and controllers </h3><p>Virtual reality is...(truncated)"
    }
  ]
}