Editing
Comparison of common data file formats
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== SQLite or Parquet Files === SQLite or Parquet files: when the data volume exceeds Excel's row limit, these two formats are worth considering. The theoretical row limit per table in SQLite is 2⁶⁴ rows (approximately 1.8 × 10¹⁹), but this number is practically unreachable because the database file size limit of 281 TB will be hit first. At maximum database capacity, the number of storable rows is approximately 2 × 10¹³ (assuming no indexes and minimal row size).<ref>SQLite official documentation — Implementation Limits For SQLite: https://sqlite.org/limits.html</ref> Parquet has no hard row limit. The format stores data in units called Row Groups — the horizontal partitioning unit within a Parquet file. Each Row Group contains a contiguous segment of rows stored column by column. For example, a dataset with 1,000,000 rows and 10 columns would be split into 10 separate column chunks within a single Row Group, rather than stored row by row. This design allows queries to read only the required columns without scanning entire rows, greatly improving read performance. Each Row Group defaults to a limit of 1,000,000 rows, but a single file can contain any number of Row Groups, so there is no theoretical row limit for the file as a whole.<ref>Apache arrow-rs GitHub issue #5797 — "Row groups are limited to 1M rows by default": https://github.com/apache/arrow-rs/issues/5797</ref> In practice, people have successfully written 500 million to 1 billion rows,<ref>Andy Cutler — 10 Billion Rows: Parquet File Size and Distribution When using CETAS: https://www.serverlesssql.com/row-size-and-parquet-file-distribution/</ref> with the bottleneck typically being disk space rather than the format itself.
Summary:
Please note that all contributions to LemonWiki共筆 are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
LemonWiki:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Current events
Recent changes
Random page
Help
Categories
Tools
What links here
Related changes
Special pages
Page information