• utopiah@lemmy.ml
    ·
    edit-2
    1 month ago

    FWIW if you are interested in such tooling consider also soffice and pandoc which have (as far as I can tell) similar features but have been existing for years now and are not related to Microsoft.

    Edit: not related to Microsoft AND Google, seems the transcription aspect (which IMHO is still weird in that context but OK) is done via Google servers, cf https://lemmy.ml/post/23629310/15586865

  • loathsome dongeater@lemmygrad.ml
    ·
    1 month ago

    This could be useful to me. A while ago I was trying to make something that take all unread posts from my feed reader, make an epub out of them and then put it behind an OPDS server.

    I found converting HTML from RSS to first markdown and then compiling them to an epub the most reliable way to take out the unnecessary markup from the source HTML. I used pandoc for this.

    • utopiah@lemmy.ml
      ·
      1 month ago

      I used pandoc for this.

      Please come back and share if it's done better or worst and if so along which dimensions. Quite curious to better understand the differences.

    • Max-P@lemmy.max-p.me
      ·
      edit-2
      1 month ago

      ~Not really. All the features of that tool are basic functions we've had before LibreOffice was still OpenOffice.~

      ~Since this converts to Markdown, it's inherently a very lossy conversion. What's hard to pull off is preserve the full formatting when converting to an odt or something.~

      Someone pointed out it doesn't just convert word documents to Markdown, it can also transcribe and OCR, so I guess it does have some usefulness!

      • django@discuss.tchncs.de
        ·
        1 month ago

        I like libreoffice, but converting audio files to markdown must be a pretty recent feature, for I never heard of it before being part of libreoffice.

        • utopiah@lemmy.ml
          ·
          1 month ago

          converting audio files to markdown must be a pretty recent feature

          Quite curious... does it actually do that and if so how? Because STT to get a plaintext file or subtitle (so with timing) has been available via e.g. Whisper quite efficiently for a while now. If this though does do more, e.g. structure (differentiating a title, list, etc) I'd like to learn how.

          • django@discuss.tchncs.de
            ·
            1 month ago

            There is nothing special going on. This whole project is just a bunch of python libraries coupled together to a cli tool. It uses the package SpeechRecognition to connect to the google speech recognition api: https://github.com/microsoft/markitdown/blob/main/src/markitdown/_markitdown.py#L691

            Pretty uninteresting and a bit disappointing. Pandoc is a lot more interesting.

            • utopiah@lemmy.ml
              ·
              1 month ago

              Thanks for the clarification. I checked the code you linked and noticed recognize_google and seems it's relying on https://github.com/Uberi/speech_recognition which then seems to rely on https://github.com/Uberi/speech_recognition/blob/master/speech_recognition/recognizers/google.py so basically are they using an API, sending all the audio data to Google servers?

              • django@discuss.tchncs.de
                ·
                1 month ago

                Yes, this is how I read it as well. The library would support to use a local model, but they decided to just send the audio data to Google.

                • utopiah@lemmy.ml
                  ·
                  1 month ago

                  Might open up a GDPR related issue there. I don't think people using such a library assume they need connectivity nor that their data would be send to a 3rd party.