Many organizations make their data available in CSV format (for example, the US Census Bureau, NOAA, and the US Department of Education). Xsv is a tool for quickly performing analysis and filtering on CSV files without first needing to import the data into a database system. It can generate basic statistics like the mean, standard deviation, median, and range for each column in the input file. CSV files can be combined using inner, outer, and cross join operations. They can also be a subset using regular expression searches, select operations on columns, and random sampling. Users can optionally create index files to improve performance. The xsv readme includes a "whirlwind tour" of many of these features using sample files from the Data Science Toolkit. Xsv is free software, dual-licensed under the MIT License and the Unlicense. Source code is available on GitHub. Executables can be downloaded for Windows, macOS, and Linux.
Comments