Posts

Showing posts from April, 2020

How Can One Use Python and/or R for Summary Statistics and Machine Learning on Data Sets Too Big to Fit into Memory?

Image
While dealing with large data sets, usually of the order of 400 million rows or gigabytes worth of data files, it’s natural to run out of memory capacity. For in-house data analysis, affording machines with enormous capacities can become a challenge for students and researchers. This doesn’t mean that average school or home computers can’t carry out large data calculations. Data Science is an integral part of the Python Programming Training modules and everyone should be able to work on large data sets easily. Therefore, we have devised several ways by which this can be made possible.  Optimizing Calculations for Large Data Sets Using GRIB Data Format   GRIB (GRIdded Binary) is a different data format than ASCII usually used in CSV and Excel files. GRIB takes up less space in the memory as it converts the data into a binary form.  This means that you can fit more data into the memory and have your machine calculations with a larger scope. GRIB is originally u