Bio-medical NLP
Background
- An online journal or publication site holds millions of bio medical research papers and articles. There can be hundreds of papers on a single topic. Organizing all this data in a well-structured manner can get fiddly. “Skimming” through that much data online, looking for a particular information is probably not the best option.
- Tagging the content in the papers on the basis of the relevant entities it holds can save the trouble of going through the plethora of information on the subject matter. For instance, there could be around 2 million papers pertaining to medicine. If you put tags on them based on the entity extracted, you quickly find the relevant information in these articles.
- Unstructured textual content is rich with information, but finding what’s relevant is always a challenging task. With the extensive amount of data that comes from academic articles, it becomes increasingly hard and necessarily important to extract, categorize, and learn from that information. There can be other NLP techniques for process discovery, but when you want your categorized data well-structured, Named Entity Recognition (NER) is your best choice.
Solution
In order to tag relevant information in any article, the data is first preprocessed to ensure it is clean and only relevant data is considered.
Next an NLP model is created which is trained, using the key tags (in this case disease, genes, drugs, compounds, etc.), to identify relevant information in each article.
- Application is access restricted.
- Keyword search option to find relevant articles from the database.
- Articles are rendered with tags to identify key terms in the article.
- Graphical representation also available of the identified terms/keywords.
Pre-processing Data
Application