Themen für jede Woche

  • Allgemeines

    The exam is on 27th of January at 10:00 in K 106/107, you will be allowed to use the computer and internet.

    Literature:

    • http://labs.rd.ciencias.ulisboa.pt/book/ => This book is used in this class
    • Linux command-line: https://nostarch.com/tlcl2
    • https://www.w3schools.com
      • tutorials about web-related technologies, e.g. XML, XPath

  • 1. Oktober - 7. Oktober

    This week we have introductions into the THD ecosystem and did not start with the actual course topics. Everyone introduced themselves and their background.

  • 8. Oktober - 14. Oktober

    This week we have presentations from Marius, Navdeed, and Varun and discussions. The topics were Open data in life sciences, Homomorphic Encryption, and Predictive Analytics in Life Sciences.

  • 15. Oktober - 21. Oktober

    This week we start working with the computer shell. We will use the following book and slides:

    http://labs.rd.ciencias.ulisboa.pt/book/

    During the class we will use a Linux shell environment to learn how to use the shell. You can either bring your own computer with Linux or use the Linux server on THD. The details how to connect to the THD server will be provided during the class.

    Interesting Links:

  • 22. Oktober - 28. Oktober

    Agenda for 22th October

    • Feedback
    •  last week recap
      • structured vs unstructured data
        • e.g., table data vs flowtext
      • command line tools vs Python
        • e.g., available since many decades vs since 90's,
        • suitable for easy vs more complex tasks like algorithms
      • open data
        • e.g., https://wikidata.org
      • open source
        • what are the advantages & disadvantages?
      • open data formats
        • e.g., csv, xml, jpg, mp3
    • why are we using Linux/Unix instead of Windows?
    • what is a command-line interface?
      • nowadays we mostly interact with the operating system using a graphical interface using mouse and keyboard
      • alternative way for interacting with the computer is using commands
    • exercises with command-line tools
      • ls
        • list
      • cd
        • change directory
      • mkdir
        • make directory
      • rm
        • remove files or directories
      • rmdir
        • remove empty directories
      • nano
        • simple text editor for using in terminal
      • cat
        • concatenate
        • for reading the files and outputting their content directly to the command-line
          • e.g., cat horse-names.txt
        • also used for concatenation of two files
          • cat horse-names.txt cat-names.txt
      • curl
        • client for URL
        • for downloading webpages, files
      • grep
        • global / regular expression / print
        • global search for a regular expression and print out matching lines
      • >
        • direction operator
        • for storing the output of a command in a file
        • e.g.,
          • curl aydos.de > source-of-aydos.de.html
      • awk
        • tool for manipulating structured data like csv, tsv
        • the tool name is made by the first letters of the authors, Aho, Weinberger, Kernighan
      • sed
        • stream editor
        • for manipulating strings (text)
      • xargs
        • for using the output of the last command as an argument for the subsequent command in a pipe
      • |
        • piping symbol
        • e.g., curl aydos.de | grep podcast
      • xmllint
        • for reading and modifying XML files
      • tr
      • sort
      • head
    • Creating an Amazon Search script
      • firefox amazon.com/s?k=$1
    • Analyzing the number of male horses on Wikipedia
      • https://query.wikidata.org
      • Examples -> horses
      • downloading the CSV
      • cat horses.csv | grep -w male | wc -l

    For further practice with the Unix/Linux shell please use the Section 3.2 of the book

  • 29. Oktober - 4. November

    Preparation

    • Chapters 2 and 3 of the book.
    • to learn about Unix (or Linux) shell read the Section 3.2
      • we covered most of the stuff last week
    • If you do not have Linux on your computer, you can use the Linux server I setup. If you do not have an account for this server, please send me an email.
      • you can also use this server from home
        • connect to the university network using VPN
        • install x2go client
        • generate a new session using
          • your username
          • host: linux.aydos.de
          • select LXDE as the window manager instead of KDE
          • connect

    Agenda for 29th October

    • Feedback
    • election of students' spokesperson for LSI students
      • central information dissemination point
      • the spokesperson collects all the concerns of the LSI students and is contact person for the LSI-course-assistant Johanna Aschenbrenner
      • Naveed was elected as the spokesperson
    • Recap of last week
  • 5. November - 11. November

    Preparation

    • To get messages from me regarding our class, you should enroll (einschreiben) in this class.
    • Command Line interface (1st priority)
      • section 3.2 of the second edition draft of the book (http://labs.rd.ciencias.ulisboa.pt/book/ => second edition draft)
      • you should practice commands on a command line shell
      • ways to play with a command line on your computer:
        • https://webminal.org
        • use the Linux server that I provided
        • you can install Linux on your Windows using Windows subsystem for Linux
          • Before installing Ubuntu on Windows or before the first run please open the Control Panel, visit Programs and Features' submenu Turn Windows features on or off and select Windows Subsystem for Linux. Click OK, reboot, and then your system is ready to run this app.
          • search for ubuntu in Windows Store and install it
        • install Virtualbox and Ubuntu
        • the best way is to have Ubuntu locally on your computer
      • if the book is too hard, then search for a Linux command line tutorial
    • Chapters 2 and 3 of the book, at least section 3.2.

    Agenda for 5th November

    • Feedback
    • please enroll to this class
    • recap last week
    • Data Resources
      • biomedical text
      • Semantics
    • Data Retrieval
      • Web Identifiers
      • data retrieval
      • data extraction
      • task repetition

    Exercise for next week

    please do the exercise before our class next week. We will just discuss the results during the class

  • 12. November - 18. November

    Agenda for 12th November

    • Feedback
      • work with NCBI
        • we already worked with Uniprot, ChEBI, and Wikidata. Working with NCBI has a similar approach (using an Web API)
    • recap last week
    • exercise sheet
    • Data Retrieval
      • data retrieval
      • data extraction
      • task repetition

    TODO for me

    also introduce $()

    prepare solutions for the exercise sheet

  • 19. November - 25. November

    Agenda for 12th November

    • locating anaconda
      • you can write the whole path /opt/anaconda3/bin/....
    • howto setup PATH
    • sudo apt install curl
    • Feedback
    • recap last week
      • files vs strings
        • pipelining only works when a command prints something in the standard-output channel (there is also a standard-error)
      • xargs
        • data coming from the pipeline vs argument of a command
        • cat proteins | xargs ./lookup (xargs transforms to)=> ./lookup a
      • wildcards
        • ls * (not always the same as) ls | xargs echo
        • ls index*1
      • & => ampersand on command-line
        • anaconda-navigator & => does not block the command-line
      • https://www.ncbi.nlm.nih.gov/pubmed/?term=2298749&report=abstract&format=text
      • grep -o => print only the matching output
    • Data Retrieval - XML Processing
  • 26. November - 2. Dezember

    Agenda for 26th November
    • feedback
    • recap last week
      • files vs strings
      • xargs
      • wildcards
    • Data Retrieval - Text Retrieval
    • Text Processing - pattern matching
    • Text Processing - regular expressions

    Minutes

    • using xmllint instead of awk and grep (spoon vs fork analogy)
      • remember the XML with the two brained creature
        • awk and grep did not help
    • case sensitivity
      • when searching for acronyms => case sensitive (e.g., MH)
      • when words or phrases => case insensitive (e.g., malignant hyperthermia)
    • regular expressions
      • extended vs basic
        • in basic regexp mode we have to escape regular expressions
      • (pattern1|pattern2) => pattern1 or pattern2
        • (ac|a) => ac or a
      • . => one character
      • [a-zA-z] => for defining a group (i.e., set) of characters
        • [ac] => a or c
        • [a-c] => a to c => a or b or c
      • quantifiers are used on single characters
        • * => character not used at all or any number of times
        • + => at least single time used
        • ? => not used at all or single time (remember British vs American words)
        • {n,m} => e.g., {2,5} used two to five times
      • [ac]? => nothing or a or c
      • [ac]+ => a or c
    • diff file1 file2
      • what changed from file1 to file2?
        • > bla
          • this text was added (is not present in file1)
        • < glu
          • this text was removed (is not present in file2)
    • using search engines like a pro
      • search engines are case insensitive
      • "malignant hyperthermia" vs malignant hyperthermia
      • filetype:pdf => search only for pdf files
      • site:th-deg.de => search inside
    • the exam
      • will contain essay questions
      • will contain script writing assignments
      • you will be allowed to use internet
  • 3. Dezember - 9. Dezember

    Preparation:

    • from Data-Retrieval-XML-Processing to Text-Processing-Quantifiers
    • the workbook

    Agenda for 3th December:

    • Feedback
    • questions about preparation materials
    • Precision vs recall
      • F1 score
    • Text-Processing-Regular-Expressions
      • Quantifiers
    • Text-Processing-Position to Text-Processing-Pattern-File

    Minutes:

    • sort -u
    • Quantifier
      • think about quantity
        • sets how many times a character can be used
      • e.g., * => zero up to as many times you want
  • 10. Dezember - 16. Dezember

    Preparation:

    • from Text-Processing-Quantifiers to
      • Text-Processing-Pattern-File

    Agenda for 3th December:

    • Feedback
    • questions about preparation materialsText-Processing-Pattern-File
    • Text-Processing-Relation-Extraction to
      • Semantic-Processing-URI-and-Labels

    Minutes

    • Searching for relations between concepts using grep:
      • cat chebi_27732_sentences.txt | grep -i caffeine | grep -i -e 'malignant hyperthermia' -e 'MH' | grep -E -e '(malignant hyperthermia|MH).*caffeine' -e 'caffeine.*(malignant hyperthermia|MH)'
      • if only searching for a phrase between MH and caffeine, use -o
        • cat chebi_27732_sentences.txt | grep -i caffeine | grep -i -e 'malignant hyperthermia' -e 'MH' | grep -E -e '(malignant hyperthermia|MH).*caffeine' -e 'caffeine.*(malignant hyperthermia|MH)' -o | grep diagnose
    • get context around a line found
      • grep -10 ...
      • grep -7 ...
    • ontology
      • cross references between languages
        • dates
          • xref arabic tamr
          • xref .. kajur
          • xref turkish hurma
    • semantics
    • RDF uses XML to describe relations
    • owl:onProperty is defined by Web Ontology Language
      • but the property itself can be defined by another ontology, e.g.,  relation ontology (RO) has a concept with the name has role (http://purl.obolibrary.org/obo/RO_0000087)
      • you could also define your own ontology
    • computer uses URIs, humans use labels, words, phrases
    • if you have a German keyboard but in your x2go session English letters are written instead of German ones, change the keyboard layout to German in x2go client settings
    • extract the URI from label:
      • xmllint --xpath '//*[local-name()="label" and text()="malignant hyperthermia"]/..' doid.owl | head -1 | awk -F\" '{print $2}'
      • like a pro:
        • xmllint --xpath 'string(//*[local-name()="label" and text()="malignant hyperthermia"]/../@*[local-name()="about"])' doid.owl
  • 17. Dezember - 23. Dezember

    For recap of last week and questions:

    • Text-Processing-Relation-Extraction to
      • Semantic-Processing-URI-and-Labels

    Agenda for 17th December:

    • Feedback
    • recap last week and questions
    • Semantic-Processing-Synonyms to
      • Semantic-Processing-My lexicon

    Minutes:

    • URL vs URI
    • xmllint using shell mode supports namespaces
      • echo -e "setrootns\ncat //rdfs:label[text()='malignant hyperthermia']/../@rdf:about" | xmllint --shell doid.owl | grep -E '[^/> -]' | awk -F'"' '{print $2}'
      • echo -e "setrootns\ncat //rdfs:label[text()='malignant hyperthermia']/../oboInOwl:hasExactSynonym/text()" | xmllint --shell doid.owl | grep -E '[^/> -]'
    • URL=$1
      echo -e "setrootns\ncat //owl:Class[@rdf:about='$URL']/rdfs:label/text()" | xmllint --shell doid.owl | grep -E '[^/> -]'
  • 24. Dezember - 30. Dezember

    🎄

    • 31. Dezember - 6. Januar

      🎄

      • 7. Januar - 13. Januar

        Please recap the last class from the book and come with questions:

        • Semantic-Processing-URI-and-Labels
        • working with xpath expressions with namespaces

        Agenda:

        • Feedback
        • lecture recordings
        • Evaluation
          • http://tinyurl.com/tjoexgg
        • Recap last class
        • alternative xpath tool
        • How to use an API
        • PubMed, PubMed Central, Europe PMC
          • full-text vs open-access
        • Using stdin in xmllint
        • Semantic-Processing-Parent Classes to
          • Semantic-Processing-Performance

        Minutes:

        • exam contents
        • xidel
        • standard input vs arguments for program input
        • why do we use pipelines
        • forming XPath expressions
      • 14. Januar - 20. Januar

        Please recap the last class from the book and come with questions:

        • using web APIs
          • EuropePMC
        • synonyms
          • try to write the XPath expressions by yourself instead of reading and understanding them

        Agenda:

        • Feedback
        • lecture recordings
        • Evaluation
          • http://tinyurl.com/tjoexgg
        • Semantic-Processing-Parent Classes to
          • Semantic-Processing-Performance

        Minutes:

        • Wishes for informatics II
          • object-oriented programming
            • classes contain functions and objects are derived
          • how to write algorithm
        • paywall
        • stdin vs arguments
          • stdin is used while running
          • arguments are used prior to running a program
        • attribute name (e.g., about (from rdf:about)) vs attribute value (http:/...)
          • xmllint --xpath "//*[@*[local-name()='about']='http://purl.obolibrary.org/obo/DOID_8545']" doid.owl
        • for and while loops
        • stderr vs stdout

        Homework:

        variable=$(echo 'malignant hyperthermia' | ./geturi.sh doid.owl | ./getparent.sh doid.owl)

        while [[ $variable ]]
        do
          variable=$(echo "$variable" | ./getparent.sh doid.owl)
          echo "$variable"
        done

        • this script begins at malignant hyperthermia and ends at the root of the DO
        • you want to record every Class (node, disease) that you encounter, and also the synonyms of these classes in a file, e.g., dictionary.txt
        • do not forget the evaluation for intro to informatics
      • 21. Januar - 27. Januar

        Agenda:

        • Feedback
        • Evaluation results & discussion
          • also for ITI
        • Please bring your own computer and tools to the exam
        • questions for exam
        • for & while
        • selecting nodes which have an attribute equals to a value (node[attribute='value'])

        Minutes

        variable=$(echo 'malignant hyperthermia' | ./geturi.sh doid.owl | ./getparent.sh doid.owl)

        echo 'malignant hyperthermia' | ./geturi.sh doid.owl | ./getlabels.sh doid.owl

        while [[ $variable ]]
        do
          variable=$(echo "$variable" | ./getparent.sh doid.owl)
          echo "$variable" | ./getlabels.sh doid.owl
        done

        • boolean conditions
          • false and false = false
          • false and true = false
          • true and true = true
          • false or false = false
          • true or ... = true
          • false and ..... = false
        • for loop
          • iteration over objects
          • if you have to use a command on many files
        • calculating fibonacci numbers
        • xidel -e "//*[@rdf:about='{}']" $OWLFILE
          • selecting nodes which have an attribute equals to a value (node[attribute='value'])