Extent DRUGan in order to be able to provide a personalized Data Science
at least what we've done
ex1={"someblog.com":
{
"source":7,
"title":"New tyoe of robotics",
"time_to_read":"approximately 40 minutes",
"short_summary":"Some lines",
"our_tags": ["ML", "NLP"],
"given_by_blog_tags":["Disussion","Project","Reserch"],
"date": "19 Sep 2018",
"author_name": "name",
"number_of_comments":34,
"preview_picture":"Here will be a png" ,
"github_link":"https://github.com/",
"arxiv_link":"https://arxiv.org/abs/1807.02033v2",
"reddit_link":"https://www.reddit.com/r/MachineLearning/comments/",
"amount_of_facebooks":450,
"amount_of_twits":450
}}
ex2={"arxiv" : {
"id" : "http://arxiv.org/abs/1806.01660v4",
"date" : "19 Sep 2018",
"category" : "cs.[CV|CL|LG|AI|NE]/stat.ML",
"category1" : ["nlp", "CV"],
"title" : "Mask R-CNN",
"authors" : [{"name" : "Andrew Ng", "id" : 123}],
"trackbacks" : "www.kaggle.com",
"arxiv_summary" : "We present an auxiliary task to Mask R-CNN, an instance segmentation network, which......",
"our_summary":"We present an auxiliary task to Mask R-CNN, an instance segmentation network, which......",
"pages_count" : 50,
"Bibliographic":{"data":{
"references" : "another arxiv article",
"citations" : "direct quote",
"similar_abstract" : [],
"also_read" : []}
},
"num_of_submission" : 4,
"Comments" : {
"availability" : "yes",
"text" : "arXiv admin note: text overlap with" ,
"link" : "https://arxiv.org/abs/1802.05155"
},
"figures" : {
"caption_boundary": {
"x1": 152.66566806369357,
"x2": 693.7513987223307,
"y1": 273.42425452338324,
"y2": 284.6669514973958
},
"caption_text": "Table I. Objects carrying charges m\u2032\u00b5 and n\u00b5 in each theory related by dualities.",
"dpi": 100,
"figure_boundary": {
"x1": 262.0,
"x2": 584.0,
"y1": 285.0,
"y2": 433.0
},
"figure_type": "Table",
"name": "name",
"page": 2
},
"video_summary" : {
"videos" :"https://www.youtube.com/user/keeroyz/videos",
"text" : "http://www.shortscience.org/?s=cs"
},
"pdf" : {
"id" : "https://arxiv.org/pdf/1806.01660v4",
"results_сonclusion" : "",
"bold_item" : "",
"pages" : 13
}
}}
At the beginning it wasn't a list but a 2350 lines dictionary with an hierarchical structure
Fragment of that dict:
dict_ex={
"structures used in natural language processing":{
"anaphora":{},
"context-free language":{},
"controlled natural language ":{},
"corpus":{
"text corpus":{},
"speech corpus":{}},
"grammar":{
"context-free grammar (cfg)":{},
"constraint grammar (cg)":{},
"definite clause grammar (dcg)":{},
"functional unification grammar (fug)":{},
"generalized phrase structure grammar (gpsg)":{},
"head-driven phrase structure grammar (hpsg)":{},
"lexical functional grammar (lfg)":{},
"probabilistic context-free grammar (pcfg)":{},
"stochastic context-free grammar (scfg)":{},
"systemic functional grammar (sfg)":{},
"tree-adjoining grammar (tag)":{}},
"natural language":{},
"n-gram ":{
"bigram":{},
"trigram":{}},
"ontology":{
"taxonomy":{
"hyponymy and hypernymy":{},
"taxonomy for search engines":{}}},
"textual entailment":{},
"triphone":{}}
}
We've used a library called sumy which has a whole bunch of summarizers and challenge was to choose the right one.
We've ended up choosing Luhn.
{"source": "FastAI", "title": "Andrew Ng says Deep learning is the \"New Electricity\"; what this means to your organization fast.ai", "timeToRead": 5, "Readability":"easy to read article","summary": "Deep learning models provide deeper insight and greater accuracy, make existing products better, improve operations (e.g. Google used deep learning to reduce data center cooling requirements by 40%!)\nDeep learning is particularly effective at handling noise in data, and in handling unstructured data - so if your data infrastructure is not in a good state, it is even more important that you invest in deep learning.\nLooking externally for deep learning experts, rather than developing deep learning expertise within your existing staff, means that you will be creating a gap between your domain experts and your new data experts.\nThe best approach, of course, is to do both: hire existing deep learning experts if you can, whilst developing skills of your own team at the same time.\n", "tags": ["ai-in-society", "meta learning", "deep learning", "neural network"], "date": "2016-10-11T00:00:00", "picture": null, "githubLink": null, "arxivLink": null}