gsocneurostarsbiostarrestwebserviceapi

NeuroStars RESTful Webservice Sketch

Building a RESTful Webservice will probably be my first activity for NeuroStars during GSoC2014: it is a good opportunity for me to get an overview of the codebase.

Some work have already been done in BioStar v1.0 and that could be the starting point.

API In Biostar 1

A simple API was built for BioStar v1.0. The available methods are:

1. Traffic

Url

GET api/traffic

Response

{
    "date": "Tue May 13 08:55:32 2014", 
    "timestamp": 1399992932.0, 
    "traffic": 512
}

Description

Number of post views over the last 60 min filtered by unique IPs.

Possible Improvements

  • Clearly state the units of measurement.
    We could f.i. change the field "traffic" to "post_views_last_60_min".
  • Add minutes parameter: GET api/traffic?min=5. This returns the number of post views over the last 5 min filtered by unique IPs.
  • Improve the counting algorithm.
    Users part of the same network infrastructure (like a office or a university) might share the same public IP, so the number provided might be underestimated.
    We could improve this algorithm by counting post visits of unique logged users plus post visits of anonymous users filtered by unique IPs.

2. User Details

Url

GET api/user/<id>

Response

{
    "date_joined": "2013-12-13", 
    "id": 2, 
    "joined_days_ago": 151, 
    "last_visited": "2013-12-17", 
    "name": "John Doe", 
    "vote_count": 156
}

Description

General info about a user.

3. Post Details

Url

GET api/post/<id>

Response

{
    "answer_count": 0, 
    "author": "John Doe", 
    "author_id": 1, 
    "creation_date": "2013-12-16", 
    "id": 2, 
    "lastedit_date": "2013-12-16", 
    "parent_id": 1, 
    "rank": 5626.6076326, 
    "root_id": 1, 
    "score": 0, 
    "title": "C: How to cook Pizza?", 
    "type": "Comment", 
    "type_id": 3, 
    "xhtml": "<p>Wood fired oven is best!</p>\n"
}

Description

General info about a post.

4. Statistics

Url

GET api/stats/<days>

Response

{
    "answers": 70, 
    "comments": 53, 
    "date": "2014-05-03", 
    "days_ago": 10, 
    "questions": 30, 
    "timestamp": 1399179600.0, 
    "toplevel": 43, 
    "users": 103, 
    "votes": 215, 
    "x_new_posts": [2], 
    "x_new_users": [3], 
    "x_new_votes": [4]
}

Description

Website statistics from day-0 until <days> days ago.

Possible Improvements

  • Clearly state the date interval.
    In the response we could use the fields date_from (day-0) and date_until.
  • The code is currently broken.
    The cause is a name clash between the variable json and the module json.
  • <days> might be not user friendly.
    Istvan pointed out that counting backwards from the current day seems to be simple to use, as it answers questions like: “What was posted ten days ago?”. But this makes life harder for people who want to mine our system periodically. He suggests to count forward from day-0.
    I guess we could accept 3 mutually exclusive parameters:

    • until to provide stats from day-0 until until;
    • days_ago to provide stats from day-0 until days_ago days ago;
    • days_from_zero to provide stats from day-0 until day-0 + days_from_zero.
  • Cached files could grow indefinitely.
    I like the idea of caching statistics in json files, but we have to consider that those files could grow indefinitely. We must monitor this, f.i. we could set a threshold of N files and when reached, delete the file with the oldest modification timestamp. We might simply ignore this point as it is a matter of one small text file per day. As Istvan suggests we could also generate these files every day (it’s a matter of 1 file per day) and serve them as static files.

API In BioStar 2

BioStar v2 has no API so far. Porting Biostar v1 API to Biostar v2 should be not too hard but my impression is that the code requires some love before being ported. A solid design is also required if we plan to create a exhaustive set of API methods.

Ideas

Apart from the possible improvements I’ve already mentioned above, we could consider some of the followings.

a. Questions Endpoint

  • Get the most recent questions (filtering by id, date, tags, author, popularity, unanswered status, votes, bookmarks);
  • Create/edit a question;
  • Delete a question I have authored;
  • Vote/bookmark a question.

b. Answers Endpoint

  • Get the most recent answers (filtering by id, question, date, author, votes, bookmarks);
  • Create/edit a answer;
  • Delete a answer I have authored;
  • Vote/bookmark a answer;
  • Accept/refuse answers for the author of the question.

c. Comments Endpoint

  • Get the most recent comments (filtering by id, question, answers, date, author, votes);
  • Create/edit a comment;
  • Vote a comment.

d. Search Endpoint

  • Search for posts (questions, answers, comments) meeting certain criteria;
  • Search for similar questions.

e. Users Endpoint

  • Get users list;
  • Get user profile;
  • Get posts (questions, answers, comments) authored by a user ordered by date or popularity (votes, views, answers).

f. Authentication

Some of those methods require authentication, by meaning f.i. that only authenticated users could post a question.

g. Versioning

I suggest we first introduce a small set of methods and then gradually extend it. This requires us to use version numbers like: api.biostars.org/1.0/answers

h. Semantic backend

Satrajit has a plan to build a semantic backend for Biostar. This would let BioStar interact with external websites: f.i. we could provide any website with a button “ask this in Biostar” to automatically post a new question. This semantic backend would work together with the API, we should keep this in mind!

Notes

Building such API is a major benefit for our project since it gives developers the ability to create new user interfaces like mobile applications.

We must take inspiration from the very well designed StackOverflow API.

Tools

Django REST Framework

Bibliography

REST in Practice