What’s your take on parquet?

I’m still reading into it. Why is it closely related to apache? Does inly apache push it? Meaning, if apache drops it, there’d be no interest from others to push it further?

It’s published under apache hadoop license. It is a permissive license. Is there a drawback to the license?

Do you use it? When?

I assume for sharing small data, csv is sufficient. Also, I assume csv is more accessible than parquet.

  • @[email protected]
    link
    fedilink
    83 months ago

    Yeah depends on what you’re using it for. CSV is terrible in many many ways but it is widely supported and much less complex.

    I would guess if you’re considering Parquet then your use case is probably one where you should use it.

    JSON is another option, but I would only use it if you can guarantee that you’ll never have more than like 100MB of data. Large JSON files are extremely painful.

    • Eager Eagle
      link
      fedilink
      English
      13 months ago

      since the data is tabular, JSONL works better than JSON for large files