Hey hexbears, I want to try and run some data analysis on what emojis are used the most and least often for fun. Does anyone know the API endpoint and how I can use it to try and start extracting raw text from posts and comments? Thanks!!!!

  • bottech [he/him]
    ·
    edit-2
    2 years ago

    Due to the number of comments just requesting all comments by using /comment/list doesnt work so you have two options:

    • Using /comment/list but only requesting comments from one specific community and doing it for every community (note: some communitites are not visible without logging in (for example !chicago@hexbear.net) so requests for these communities will not return anything unless you provide authorization)
    • Using /user to request comments of a user and doing it for every user - due to the number of users and the website limiting the number of requests to 30 per minute this will take several hours

    documentation for /comment/list (there is no documentation for this exact api request but it works exactly like get posts)
    documentation for /user

    If you have any questions, you can ask me, i have some experience with using api

    • LesbianLiberty [she/her]
      hexagon
      ·
      2 years ago

      Thank you bottech! I actually have set up a script which I'll be trying to push later on my main server. It's going to have a decent number of delays built into it so that it'll be more like a trickle over time, but it essentially relies on the fact that every single post is in sequential order and all I have to do is have something that trawls through every post since say post 200000, processes the body, processes the comments, then pastes the resulting emotes into an ever expanding text file which I'll just process later when my adhd brings me back around to this. This might be a really dumb solution, it just seemed good enough to my novice brain.

      • bottech [he/him]
        ·
        2 years ago

        If you want to do it from post 200000 then your method should work well, though if you ever wanted to go through much larger number of posts it would take a long time