Last day:
yayyy I readied the code to do that logic of:
at one run time: there are like 200 spark tasks at once (in a 50 task node cluster)
and the thread awaits the application to finish first to do create a new spark job. i mean there is always at most 200 jobs issued to cluster run time at once.
hmm i anticipate this to run at least 2 days to finish this ~1000 billion rows capable of 50 million partitions to 5 million partitions now. Before there were partition for every 20k rows now it would be like 100k rows. Currently each partition does not have 20k rows because the row id consecutive assigner of row id does not work always with same increase so its more sparse. but does not matter, 5 million partitions is initially better than 50 million partitions with sparse data. hmm so this would i guess run for 2 days and slightly more than that. then yayy first db table would be ready.
after this task completes/started, then I would continue this time with neural network configuration to run the dependency graph logic to check /do its run time evaluations along with other nlp approaches.
then would resume topology studying revising to later (maybe 2 weeks later or 1 week later ) start soon topological definitions of words from word net synonymity clusters.
hmm so today's main study focus would be revising topolgy and topology rdf definitions improvisions to how to define such rdf.
---------------
yayyy I created 50 nodes cluster and instead of 200 tasks I set it at 800 spark jobs at one instant type run time config. I dont know if 800 tasks would work, I would see now, if it works then it would at least work more distributed than 200 spark jobs at single time instant. yupp.
Yorumlar
Yorum Gönder