Everyone always talks about “scalability” when distinguishing the difference between python/pandas and pyspark. This might be a stupid question, but at which point would you need to switch from python to pyspark for ML application? How many rows of data or how much memory would be the cut off point where pandas and scikit-learn in python are no longer usable?

LinkedIn:https://www.linkedin.com/in/andrew-hershy-a7779199/