Q:

How to read a large CSV file with pandas?

belongs to collection: Python Pandas Programs

0

CSV files or Comma Separated Values files are plain text files but the format of CSV files is tabular. As the name suggests, in a CSV file, each specific value inside the CSV file is generally separated with a comma. The first line identifies the name of a data column. The further subsequent lines identify the values in rows.

col_1_value, col_2_value ,  col_3_value
row1_value1 , row_1_value2 , row_1_value3
row1_value1 , row_1_value2 , row_1_value3

Here, the separator character (,) is called the delimiter. There are some more popular delimiters. E.g.: tab(\t), colon (:), semi-colon (;) etc.

Sometimes, large CSV files can cause issues while loading in the primary memory of the device. The system may go down because of large CSV files which can be so huge that the memory is unable to fit them.

To overcome this problem, instead of reading the full CSV file, we read chunks of the file into memory.

We just need to pass chunksize='' inside the read_csv() method, with the help of this, the CSV file is read into chunks. The chunk size refers to the number of lines it read from the CSV file at once.

All Answers

need an explanation for this answer? contact us directly to get an explanation for this answer

Let us understand with the help of an example,

# Importing pandas package
import pandas as pd

# Importing dataset
data = pd.read_csv('D:/test.csv', chunksize=500)
df = pd.concat(data)

# Print the dataset
print(df)

Output:

Example: Read a large CSV file

need an explanation for this answer? contact us directly to get an explanation for this answer

total answers (1)

Python Pandas Programs

This question belongs to these collections

Similar questions


need a help?


find thousands of online teachers now
Label encoding across multiple columns in scikit-l... >>
<< How to get rid of \'Unnamed: 0\' column ...