Kayıtlar

Nisan, 2023 tarihine ait yayınlar gösteriliyor
 yayyy so project goals for setting up a particle physics lab in ocean is in good progress. to invent new particle physics methods if possible to invent a better tomography device. hmm so by november, i believe ai would be in version of being able to code generate quantum, algebra and a quantum mechanics simulator to build a quantum comptuer design. yayyy :) I think if i sutyd like this for rest of days until november, i think ai by then would have generative science capabilities to fastly build science projects.  Hmm this first version tasks are mixture of various studies but 0.2 would be more lesser tasks but more topology rdf grammar focused.  hmm finally would finish those phd books in studying to 0.2 version of ai.  yayy :) first goal is building a particle physics lab in mid of ocean. to investigate particle physics theories to be crafted by ai. or so similarly creating a quantum computer and creating new quantum algebra designs with ai. with also writing exist...
 I been all day in nonresting times studying to understand transformer code usage and also pytorch and also numpy.  but there is something weirdo:  e.g. bert input considers paragraph or sentence scope be of 512 length (tokens)  and there is feature interim result that encoder generates.  but then the dropout functionalities are written considering the actual paragraph/sentence lenght which is not 512. I initially modified to have them be extended stupidly but thats illogical for at least the feature vector. I had searched for a library which does constituency parsing and dependency graph generation and currently working to understand that. to understand transformers more and learn its libraries (pytorch and transformers) but there is such inconsistency.  I would eventually construct a neural network model after i learn from such examples of pytorch/transformers that does constituency parsing. but getting used to this advanced indexing schemes also takes ti...
 yayy now its time to fix the neural network's some issue! yayyy ai project starts today exactly now. hmm first task:  - solving the library's issue which were expected to run pretrained Bert system to predict constituency and parse graph information.  after this task then topology task resumes. to convert PTB tags graph to a topological language to be designed. that study would be started with first revisiting some other topics other than topology since I forgot some topics to define the basical topology grammar's open set concept etc etc. so today would revisit some theoretical studies after neural network starts to work correctly/consistently. (I had not coded it, right now intiially I use a preexisting library. but it has some issue in some configuration that  i would have to refactor. I think from this code library package I would get more introduced to transformers library/methods of those encoder/decoders)   
( yeppp something I saw made me thought think whether if  I have a coder friend whom interacted. hello :) lots of cheers :) ) (or it could be coincidence and there is no coder friend. if its not a coincidence& if i have a coder friend, cheers to coder friend )
 yayyy another awesome day starts to continue to build ai project yayy:)   hmm monday is holiday:  so if i resolve the dependency graph problem by today, hmm then would need to continue with the RDF defintiion of topology grammar very fast. to start to create an intiial topology rdf version to start building POS Tag's PTB's  language counterparts. and plus it would also mix semantics RDFs to that without definite separation of syntax/semantics. i mean the translated RDF knowledge graph that would be defined basically in topology RDF.  hmm so as soon as graph dependency generation is observed. i dont directly switch to problem of solution of ontology concept prediction tasks. but rather move to some theoretics task of:  the problem of initial topology language design /concept of topology grammar. i think the task of ontology signature concept prediction is also very challenging and needs to asap be solved. but that could wait later. but that would also requi...
 yayyy today's task would be;  resolving the library's problems for pretrained Bert model. wishfully I solve this not with alot effort.  then checking the constituency/dependency graph information ai generates. Yepp 
 today's accomplishment were nice to finish the entire course study in consecutive study for 5 hours.  but thats not enough to the schedule requirements of this ai project. then i continued with this time checking the libraries for ml to integrate. that be tomorrow continued . now its time to revise the topology study which actually would be fun! even if most possibly grammar i would design would be silly. nevertheless its fun :) (to design a topology grammar).  I already think of design principles for grammar whilst studying to linguistics utterance types token types that would be represented by such grammar topologically.  its kind of fun to going to create a topological universe definition. but startng with basic topological constructs of sets based constructs.  hmm and then mappings. hmm some in some cases verbs are some isomorphic to mapping concept in topology. but not always. and there is more to lambda calculus in functions study of functions in topolgy ...
 yayyy its 01:26am and lets study the revisiting of topology study and noting grammar designs.  (since silly i ate some food at this hour and its unreasonable to rest now with this much calories taken, it would directly turn to more fat i guess if i rest now. )  (lets spend some calories (with some topology study/grammar design) before resting today since i took some calories  20 minutes ago)
 yayyy fun of topology language grammar design starts whilst during the topology revisiting study.  I know my initial topology grammar design might be stupidly designed since i am not a maths expert. so i anticipate the grammar i design would not be very well designed.  but it would work to be defined inference sql engine later whether designed well or not.   yayyy after some such lack of courage reflecting sentences for status quo in topology grammar design in above paragraph, lets attend this topology grammar design task whilst in tandem revising topology knowledge from this bsc topology topics book. (its not even phd phd topology topics I had left that before maybe a year ago and had not continued yet).  there many times at poitns I lack courage or reflect lack of self confidence in my capability of accomplishing ai project's  tasks. like in very first paragraphs which i reflected my inconfidence in topology grammar task.  but i would design bu...
 seems as source code I tried has had some errors. regarding matrix multiplications. I might need to edit to make it run and also would learn pytorch/transformers libraries awhile.  hmm seems as would need to refactor this code to make it possible to run with pretrained bert models. hmm but not today. I think i feel tired now.  already studied alot today. (finished a courseware pdfs of linguistics and also studied to PTB like topics and started running this neural network configuration etc etc) hmm would make this neural network come to predicting correctly state tomorrow.  I think its enough coding related tasks today.  now either would watch some tv show or read topology.  I think i would read topology since I need to revise it soonest. (i mean the bsc topology introduction revising). also i am slightly bored of everytime watching tv shows in rest time. this time would study to topology task in today's rest time. its already like 22:45 pm. I guess I would...
 yayy would run a pretrained neural network to do  both constituency and dependency parsing from an nn that uses pretrained encoder/decoder network now. but also needing to revise the machine learning other methods e.g. visualization libraries. and i also forgot logistic regression/ logit classification maths how it were hmm.  hmm lots of topics i forgot of neural networks since i had not been doing machine learning engineering type tasks for a lot time.  I need to also study remember the transformers internal query key value based attention mechanism but not now. initially would just check the dependency graph parsing results.  but this language modeling integrated layer model could be used to do even also ontological tasks hmm. nice. i mean it could be just not this specific task of dependency graph prediction but also ontological predictions could be also accomplished maybe. e.g. take a verb:  and take an ontological signature.  if we encoded ontolo...
 yayyy rested and now back to fun ai project now!
 some rest time now (1 hour or 2 hour alike )
yayyy been introduced to computational linguistics in last 5 hours or so.  now time to learn more about the POS tags and then some parser analyzing time. to try to setup some constituency /dependency linkages studies/algorithms.   hmm with such first definite linkage method, might start creating knowledge graph based representations with topological RDFs.  -----------------------------------------------------------------------------------------
 hmm so now would analyze the POS tag sets referenced in the lecture to understand each POS tag.  hmm nice to have learnt about PropBank also that might be very useful in topological semantics translation. 
 some rest time now yepp
 yayy course is finished/studied to.  
 yayy more than half of the lecture ended so I would give some rest time. 12 lectures remaining. but this were an introduction to computational linguistics. to revise basic algorithms like naive bayes /SVM etc etc   So more important is the PTB standards knowledge being fully learnt. and algorithms regarding parse trees/constituency information.   Then i need to asap revise topology to craft very initial very basic topological RDF to translate these to and then start defining various POS topology representations then move to verb topology representations study.  initially very basical topological representations versions. Not would be very studied RDFs initially but a baseline to build more studied to versions afterwards. then would need to craft constituency relations between such RDF s of paragrapsh/linkages between. that could be also accomplished with pretrained models with some unsupervised modelings/metrics. even not being very accurate.  I think...
yayy some rest time before linguistics course studying.
continuing linguistics learning from now: LING 1330/2330 Computational Linguistics (pitt.edu) yayy goal now is to fastly study to all courseware in this course to have much better initial understanding of the linguistics methodologies/algorithms.  then following task would be continueing the study of TB domain knowledge to get to know that PTB2 syntax. then try analyze that PTB 2 extractor neural network.  hmm.  then I need to devise the paragraphs dependency graphs in terms of linkages between paragraphs.  and a covering RDF definition set to translate such interpreted language to.  hmm lots of tasks but 3 days holidays. i guess i wish i iterate alot and reach a parse tree /treebank parser to translate it to such topological ontological language /knowledge graph (interpretation).  hmm today would most possibly pass with linguistics domain knowledge study.  then later topology studies revisiting/remembering topology basic information (not phd but bsc t...
 todays study to study Treebank grammars starts with reading this article of: TreebankGrammars.pdf (cmu.edu) hmm after reading: this has been a very nice introduction to linguistics methodologies :) ------------------------------------------------------------- now more tree bank tutorials/lectures study to get expert in TBs and commonly known linguistics algorithms/methods. hmm I randomly search articles to study to from google.  hmm as now finished reading partially the second article now, i did not liked this second article.  so I removed from blog paragraph after reading it as i did not liked this article. hmm would find another article now of linguistic methodologies(PTB)  from again google.  but I get some basic knowledge regarding priobabilistic space of CFGs and TSGs and TAGs like modelings from very first article. but I would read more to understand more this corpus related probabilistic estimator buildings. hmm one thing, I got introduced myself to NLP ...
 hmm yesterday started to get my self introduce to NLP/Linguistic topics. e.g. before Part of Speech tags I had slightly known but now learning more details.  And started to setup machine learning library dependencies to start running the constituency parsers and dependency extractors and parse tree conceptual constructs/concepts.  Hmm specifically the PTB dependency constructs are interesting to check and to later transform to the RDF with topographic definitions.  hmm so today's task set would be continueing NLP/Linguistics study most and initiation of design of translated RDF design. of topological RDF definition of language constructs. and also to tie the wikidata ontological definitions to topological RDF definitions. hmm then this would be like stored in mainly in neo4j i think. all of these. then there might i also create query language format to neo4j other than its usual sql alike query interface but specific to inference capabilities from this multitude of ...
 hmm yesterday did not switched to ml task of dependency extraction nor topology cause were very tired  but would resume today. hmm most data is now in subject predicate object table yayy :)  hmm so today: initially would study with some NLPtools to check/analyze dependency extractors starting with some encoder/decoder method github library  and also others.  then would continue working with this table to analyze its ontological hierarchies/data. and to reduce table and create some other partitioned tables. then later would create neo4j local databases for clustered ontologies.  then also need to start writibng the wordnet ontology topological definitions with an RDF tool on sunday or monday wishfully. to start translating language to its topological meaning/definition sets i mean for verbs.  hmm then after that i would need to create the code that binds paragraphs of pages. and plus a probabilitistic wise interpretation RDF design for read sentences o...
 yayyy AI project start time now! :) monday is holiday  so 3 days for ai project yayyy :)
 just a minute its not 23 billion records (parsed via jena records), its 23 trillion records. wov. i in 7.5 hours generated 23 trillion records. with the notebook 50 nodes cluster setup in totally 7.4 hours :)    nice :)  (this were a challenge to come up to peak efficiency condition in setting up cluster all utilized to process this much data :) but challenge solved recent week :)  ) I would share later zeppelin and aws emr configuration settings to set all nodes be always used in FAIR Scheduler to process this much data in this not so much time (23 trillion records in 7.5 hours :) ) so that anyone needs such config, it can be helpful to share to.  but of course i wont share my notebook or keeping up at most 400 spark jobs at same time code :)  i wont share my silly notebook which isn coeded with silly fors whiles and some silly c like code and some also scala futures logic there to keep cluster very utilized all times (e.g. 400 spark jobs always)...
hey last day's records entries to tblae has been like 23.104.000.000.000 resultant records in 7:30 hours (ina  50 node cluster with doing parsing with Jena parser of ttl) .  hmm but there are duplicates alot. due to tumbling window approach. would reduce the row count later. wov 23 billion records results of jena parser run. nice. :) I mean even its not very neat the code notebook to run this parsing ontology db is working nice imho even at 7:30 hour somehow its says cluster terminated. i am happy of managing to setup zeppelin notebook codes to handle this huge data silo tasks like this.  hmm last days has been like 200 euro extra bill due to this task but its ok its 23 billion row data that is parsed with jena.  I really like aws very much :)  I mean being able to spawn 50 cluster nodes at once :)  and procesisng data silos at once :) nice :)    I mean its kind of affordable cluster solutions. i saw other clouds big data spark solutions and ...
 my lack of recent experience in recent notebook tasks resulted in total 450 euros like bill in this month in cloud costs due to my constantly wasting cpu time alot of spawning alot clusters (50 node or so and hours debugging there or trying to figure out the expected settings to run the tasks) and lots of data processing. but its still cheap. i mean to be able to spawn an 50 node cluster and do ml or feature engineering instantly.  I anticipate that at least 500 or 1000 euros more such bill would happen recent month. since i would continue to work with lots of data silos during this phase of project of ontology study tasks.  its that i been spawning 50 node cluster alot and its anticipated to be like this since i do unnecessary much debugging hours with 50 node cluster.   does not matter. ayy today would with this new headphones added nostalgia of listening epic rock to my ai study time :) I before liked alot metal music :D so today some nostalgia time with roc...
(i bought myself a gift.  hmm this sound cancelling headphone really cool :))  (not that i buy such expensive headphones ever before but bought a bose headphones to gift myself during ai project yayy :)) (a gift to myself) 
 hmm I last day finally reached the stable spark app condition but have to switch to method of listing s3 files to get partition folders paged by paged. Again at one instant atmost 400 spark jobs are submitted to 50 nodes cluster. it seems to be working ok, has only passed 4hours and seems as there would also be necessary 20 hours also. then later some of the row count would be reduced. hmm then currently seems as data would be like finally initially 300k partitions of 2.5 megabyte or so ithink. then but those would be also reducedto 1/10th of. then the neo4j db along with ontology hierarchy parsings would be crafted. ayy last day passed all day with reconfiguring the spark job and finding whilst it stucks and turned out the initial partitionId were very sparsed and just iterating over it is not possible since is very sparse due to original monotously increasing row id mechanism's nonstandard increase. so had moved to listing parttion directories and adding to spark jobs for once a...
Last day:  yayyy I readied the code to do that logic of: at one run time: there are like 200 spark tasks at once (in a 50 task node cluster) and the thread awaits the application to finish first to do create a new spark job. i mean there is always at most 200 jobs issued to cluster run time at once. hmm i anticipate this to run at least 2 days to finish this ~1000 billion rows capable of  50 million  partitions  to 5 million partitions now. Before there were partition for every 20k rows now it would be like 100k rows. Currently each partition does not have 20k rows because the row id consecutive assigner of row id does not work always with same increase so its more sparse. but does not matter, 5 million partitions is initially better than 50 million partitions with sparse data. hmm so this would i guess run for 2 days and slightly more than that. then yayy first db table would be ready.  after this task completes/started, then I would continue this time with neu...
 hmm i checked the dl templates for running jupyter notebook. i decided i would initially run a usual ec2 instance for that to setup jupyter environment and run predictions of dependency graphs with neural network.  since i mean as a development environment then when if one day training of nn gets necessary could spawn this time gpu environment alike environments. or habana environments of 100 cores or so.  hmm now i would continue with topology study.  so tomorrow's tasks would be: continuation of jupyter notebook setup firstly there hmm i could do test on my pc that either, hmm i also have jupyter installed.  and creation of distributed such 200 spark apps logic with 50 node cluster alike task. hmm so would continue topology revising now. 
 hmm figured out that it were due to 20000 lines of ttl code being hardly parsed by jena somehow.  it were when e.g. used 4000 lines ttl data it can parse much more fastly.  hmm so experience is that: don't use 20k alike lines when using jena to parse ttl. it does not work any linearly with respect to row count there is some nonlinear condition and thereby it just would take as if task never completes.  hmm now i had setup a 1 task node having cluster and now it seems as processed alot partitions already very fastly. so I had to do some optimization alike: to reduce row sorting optimization of: e.g. this were used to have sorted text from each partition in storage: val grouped_df = ( df .groupBy( "partitionId" ) .agg( sort_array(collect_list(struct( "rowId" , "value" ))) .alias( "collected_list" ) ) .withColumn( "data" , concat_ws( " \n " , col( "collected_list.value" ))) .drop...
yayyy studyign starts now again.  hmm.  yesterday i had nto had left energy to start studying dependency graph creation part since most of time went to trying to figure out the computation efficiency problems of building the initial ttl from wiki data in a table. then later in neo4j format. so today i feel its now time to start checking the dependency graph creation enabler neural network. hmm i wish to add some also topology studying today hmm yesterday were like being lost in the original raw table of not processed ttl data. of testing various partition schemes. hmm I had partitioned it like 20K rows at most type partitions. its it seems as apache jena is having problems parsing that much lines i mean it parses but slow. i might do further do init jena with like 1000 likes of data from there but with also either spark wise windowing (tumbling window type ) of every such 20000 records entries. or something like that etc. I would also check how much time cpu time it takes for ...
 wov there are ~  996461580000 rows actually 996 billion rows near to 1 trillion rows! So I am trying to figure out economic processing of this much rows. I mean like that when constructing the ontology db main table. hmm. today's tasks be as: - hmm starting another test with this many rows to see how many cpu hours some row count is processed to have estimates. The only just writing them to cloud storage took nearly 4 hours. Reading partitioned wise and taking the ontology info and storing back to another cloud table might take at least two times of hours. hmm I dont spawn a huge cluster currently e.g. 3 task nodes alike atmost instead of initial 30 nodes 50 nodes cluster but go more iteratively with 3 nodes cluster since I already have time for this table task due to other tasks that needs to be accomplished. hmm so first i would check how many hours this some partition processing happens (data is partitioned now already) to decide on the cluster size to scale out. since tr...