How do you manage large size datasets of text for preprocessing? Submitted by [deleted] t3_10j6jpl on January 23, 2023 at 7:12 AM in deeplearning [deleted] 8 comments 1
timelyparadox t1_j5iu8hi wrote on January 23, 2023 at 8:31 AM Are you using Spark? Permalink 1 [deleted] OP t1_j5ivup4 wrote on January 23, 2023 at 8:53 AM No, I’m not. Can you share any reference material Permalink Parent 1 timelyparadox t1_j5iw0j4 wrote on January 23, 2023 at 8:55 AM Medium article: https://medium.com/analytics-vidhya/how-to-pre-process-large-datasets-for-machine-learning-using-spark-19500155b521 Not perfect example but might be helpfull to lead into right direction. Permalink Parent 1
[deleted] OP t1_j5ivup4 wrote on January 23, 2023 at 8:53 AM No, I’m not. Can you share any reference material Permalink Parent 1 timelyparadox t1_j5iw0j4 wrote on January 23, 2023 at 8:55 AM Medium article: https://medium.com/analytics-vidhya/how-to-pre-process-large-datasets-for-machine-learning-using-spark-19500155b521 Not perfect example but might be helpfull to lead into right direction. Permalink Parent 1
timelyparadox t1_j5iw0j4 wrote on January 23, 2023 at 8:55 AM Medium article: https://medium.com/analytics-vidhya/how-to-pre-process-large-datasets-for-machine-learning-using-spark-19500155b521 Not perfect example but might be helpfull to lead into right direction. Permalink Parent 1
developers_hutt t1_j5kwuq5 wrote on January 23, 2023 at 7:06 PM Use your PC You can use libraries like RAY or PySpark. Permalink 1 [deleted] OP t1_j5kwzqk wrote on January 23, 2023 at 7:07 PM I think I just need to start working with spark Permalink Parent 1 developers_hutt t1_j5kxaun wrote on January 23, 2023 at 7:09 PM Are you using Python? Then Pyspark will be easy for you. Permalink Parent 1
[deleted] OP t1_j5kwzqk wrote on January 23, 2023 at 7:07 PM I think I just need to start working with spark Permalink Parent 1 developers_hutt t1_j5kxaun wrote on January 23, 2023 at 7:09 PM Are you using Python? Then Pyspark will be easy for you. Permalink Parent 1
developers_hutt t1_j5kxaun wrote on January 23, 2023 at 7:09 PM Are you using Python? Then Pyspark will be easy for you. Permalink Parent 1
[deleted] OP t1_j5kx8iy wrote on January 23, 2023 at 7:08 PM [deleted] Permalink 1 [deleted] OP t1_j5kxa5u wrote on January 23, 2023 at 7:09 PM Yes, python Permalink Parent 2
timelyparadox t1_j5iu8hi wrote
Are you using Spark?