
上QQ阅读APP看书,第一时间看更新
Tips for dealing with large files
Some input files can get quite large and inefficient to read. Here are some tips to speed up the process:
- Use external Unix tools for splitting files so that they can be read in chunks. There is usually a field that you can use to split out separate files. Date fields are good ones.
- Consider using external tools to replace large character strings with numerical or shorter character strings. This will save valuable memory.
- Use parameters on input to control how much data you want to read. You may want to process your input file by starting to read your input at row 1,000,000. You don't always have to read a file from the beginning.
- Do not feel obliged to always read all of the columns. Once you have determined which columns are truly needed, read only those columns; this will speed up the processing. For example, if you are using read.table, you can specify NULL in the colClasses option to indicate that a column is to be skipped.
- Use the scan, fread, and readlines functions. They will give you a greater degree of control over the input, and can make input faster.