pointless

  • 0 Posts
  • 27 Comments
Joined 1 year ago
cake
Cake day: June 23rd, 2023

help-circle




  • 'Mastering Emacs' is a very highly regarded resource; & it might be the only one that fits your requirements -- it's laid out as a book that you can read from cover to cover: https://www.masteringemacs.org/

    Needles to say, though, that for the concepts discussed in the book to sink in, reading alone wouldn't suffice.




  • PyMuPDF is excellent for extracting 'structured' text from a pdf page — though I believe 'pulling out relevant information' will still be a manual task, UNLESS the text you're working with allows parsing into meaningful units.

    That's because 'textual' content in a pdf is nothing other than a bunch of instructions to draw glyphs inside a rect that represents a page; utilities that come with mupdf or poppler arrange those glyphs (not always perfectly) into 'blocks', 'lines', and 'words' based solely on whitespace separation; the programmer who uses those utilities in an end-user facing application then has to figure out how to create the illusion (so to speak) that the user is selecting/copying/searching for paragraphs, sentences, and so on, in proper reading order.

    PyMuPDF comes with a rich collection of convenience functions to make all that less painful; like dehyphenation, eliminating superfluous whitespace, etc. but still, need some further processing to pick out humanly relevant info.

    Built-in regex capabilities of Python can suffice for that parsing; but if not, you might want to look into NLTK tools, which apply sophisticated methods to tokenize words & sentences.

    EDIT: I really should've mentioned some proper full text search tools. Once you have a good plaintext representation of a pdf page, you might want to feed that representation into tools like the following to index them properly for relevant info:

    https://lunr.readthedocs.io/en/latest/ -- this is easy to use, & set up, esp. in a python project.

    ... it's based on principles that are put to use in this full-scale, 'industrial strength' full text search engine: https://solr.apache.org/ -- it's a bit of a pain to set up; but python can interface with it through any http client. Once you set up some kind of mapping between search tokens/keywords/tags, the plaintext page, & the actual pdf, you can get from a phrase search, for example, to a bunch of vector graphics (i.e. the pdf) relatively painlessly.


  • wvstolzing@lemmy.mltoLinux@lemmy.mlD-Bus overview
    ·
    edit-2
    6 months ago

    Skimmed over the whole article -- I wish this had been available back when I was trying to piece together the basics from the documentation. There really needs to be a 2nd part, though, with some discussion of the GVariant signatures, which the author says were 'beyond the scope of' this article -- which is true; nevertheless, understanding that syntax (and how to use it e.g. with gdbus) is an absolute requirement for using dbus properly; and as a silly amateur, I lost so much time over them.





  • Yeah I keep running into similar issues when trying to build pretty much anything on windows; for stuff that can't be 'nicely' configured & dependency-managed through an IDE, windows is pure pain.

    It really sounds like PySide would fit your use case better. Check out this website for a great starting point: https://www.pythonguis.com/pyqt6/ -- the author also has an entire book on packaging PySide programs for cross-platform distribution.

    As for installing Python itself; I think I'd stick with the plain installer from python.org, and afterwards, pip. In case of dependencies that are hard to get through PyPi, I think anaconda might be worth looking at as well: https://www.anaconda.com/download

    msys2 provides a package manager, & several development toolchains; it's an easy way to get native (mingw) gcc & bash on windows; cross-platform programs rely on it heavily, because it saves them from all the 'visual studio' BS: https://www.msys2.org/docs/what-is-msys2/ -- I believe any implementation of GTK on windows requires a mingw toolchain.


  • Am I missing something?

    It's impossible to tell without knowing what specific aspect had failed.

    Before we even get to GTK; there are some issues with python wheels under msys2; check out: https://www.msys2.org/docs/python/ -- some wheels just can't be built under msys2 due to various incompatibilities. Not being able to replace such packages with 'pure' python equivalents could end up being a (very annoying) roadblock.

    The roadblock that I recently ran into with my simple GTK4 app was unpredictable ids on d-bus interface exports. D-bus does work under msys2; though you have to start the user session manually; d-feet and gdbus also work; though, as always, there's a catch. On Linux I can automaticaly export 'action groups' that belong to GtkApplicationWindow widgets; & their 'object path's show up predictably under the application's path + / + the window's id. This makes it really convenient when you want to add basic 'remote controls' to your widgets. Under msys2, though, I can't figure out how to find those paths; which throws a monkey wrench, so to speak, in my 'remote control' implementation. Granted, d-bus is a linux-native technology; and expecting it to work w/o issues on windows is probably a bit too much.

    -- apart from those, I haven't run into any issues with GTK4 under msys2. The GTK3 packages available in their repos also work just fine.

    I do agree with the others who recommend PySide, though. Their cross platform support appears to be more robust. Their documentation has been improving as well.





  • wvstolzing@lemmy.mltoProgrammer Humor@lemmy.mltrick OR treat
    ·
    8 months ago

    NAND and XOR aren't equivalent, though

    | X | Y | X NAND Y |
    | 0 | 0 | 1        |
    | 1 | 0 | 1        |
    | 0 | 1 | 1        |
    | 1 | 1 | 0        |
    
    | X | Y | X XOR Y |
    | 0 | 0 | 0       |
    | 1 | 0 | 1       |
    | 0 | 1 | 1       |
    | 1 | 1 | 0       |
    

    & XOR can be reduced to NAND; not sure if NAND can be reduced to XOR