• Kausta@lemm.ee
    ·
    2 months ago

    You havent seen anything until you need to put a 4.2gb gzipped csv into a pandas dataframe, which works without any issues I should note.

      • Kausta@lemm.ee
        ·
        2 months ago

        Yeah, it was just a simple example. Although using just pandas (without something like dask) for loading terabytes of data at once into a single dataframe may not be the best idea, even with enough memory.

  • QuizzaciousOtter@lemm.ee
    ·
    2 months ago

    Is 600 MB a lot for pandas? Of course, CSV isn't really optimal but I would've sworn pandas happily works with gigabytes of data.

  • Barx [none/use name]
    ·
    2 months ago

    And there are like 8 software projects dedicated to making pandas wrappers that work with large datasets because this is somehow better than engineers and statisticians learning SQL or some kind of distributed calculations strategy.