Projects & Participants

Andrew West (CC BY-NC-SA 2.0)

Participants join this program with a project that they either are already working on or want to develop during this program.

For this round of the OLS program, we are happy to have 66 participants with 37 projects.

Projects

Best practices for online collaboration/peer-production in citizen science

By: Katharina Kloppenborg

Mentored by: Fotis Psomopoulos

Status: graduated

Keywords: citizen science, peer-production, participatory design

Citizen science revolves around the idea of integrating the public in scientific research. However, there are different interpretations of this idea. An important part of citizen science projects allows laymen only to participate in a limited scope of microtasks and keeps thus reinforcing the power gap between academic scientists and the public. Literature has called for more autonomy of citizen scientists by allowing them to participate in more phases of the research cycle. Commons-based peer-production, an alternative mode of production in which people self-organize to develop complex knowledge-commons like Wikipedia or open software, seems to be a promising approach to facilitate this. However, a design-centred approach implementing this for a specific use case is yet to be done. In my PhD project I am trying to fill this gap by redesigning the online ecosystem of Open Humans - an existing community of practice around citizen science - collaborating closely with this community in a user-centered design approach. As one of the first steps, I am working on a best practices guide, summarizing the experiences of existing similar projects.

A virtual conference management system with seamless open science integration

By: Simon Duerr

Mentored by: Emily Lescak

Status: graduated

Keywords: conferences, virtual, poster session

VCMS (vcms.simonduerr.eu) is convenient tool to setup a website for a virtual conference including an abstract submission portal, timezone adapted scheduling of talks and an interactive virtual poster session with video chat and spotlights for posters with some features still in development. The tool is currently in beta and will be released as FOSS under MPL once the software is battle tested (in mid february).

Memory Collecting: Croatian Homeland War

By: Annalee Sekulic

Mentored by: Kate Simpson

Status: graduated

Keywords: database, video, generational memory, record, document, historical, Croatia, Diaspora

The “Memory Collecting: Croatian Homeland War” project aims to create a platform where survivors can submit video recordings of their own memories and reflections of the 1992 Homeland War. The repository will also store them in a publicly accessible database. By having the software be open to citizen scientists, the database will be one of the most inclusive and easily accessible memory banks. This initiative seeks to preserve the memory of the role of Croatian-Americans in the creation of free, modern Croatia during the Homeland War in the 1990s.

Open Phototroph

By: Steven Burgess

Mentored by: Stephen Klusza

Keywords: synthetic biology, community building, citizen science

I want to help build a culture of open science and good practice (as well as fun) within the plant synthetic biology community, with an initial emphasis on the US.

I hope to do this by (1) establishing an open toolset for genetic manipulation of algae and photosynthesis enzymes (2) developing an open repository of protocols for genetic manipulation (3) producing educational resources to aid experimentation, both in academia and for citizen scientists (4) building a community of interested individuals to expand and contribute to the project.

Opensource Transpiler of Synthetic Biology Lab Protocols for Wetlab Robotics

By: William Jackson

Mentored by:

Keywords: Robotics, Synthetic Biology, Open Source, Community Enhancement

An open-source software tool and associated protocol repository that translates wet-lab protocols into instruction sets for commonly available robotic liquid handlers. Protocols will be hosted on a publically accessible website, and community members can edit, annotate, and report on different protocols. Think Github for biological protocols with an issue tracker.

Field and laboratory based research project researching, surveying, and discovering the palaeoecology and palaeogeography of West Cork

By: Robin Lewando

Mentored by: Bruno Soares

Keywords: palaeoecology, palaeoenvironment, palynology, interdependence, interconnectedness, landscape, public, geomorphology, geology, geography, microscopy, microfossils

This project is a field and laboratory based research project researching, surveying, and discovering the palaeoecology and palaeogeography of West Cork. The project will make use of:- paper research methods; sampling and scientific analysis of sediments; digital mapping; field and site visits and landscape analysis; scientific processing, analysis and identification of microfossils from sediments; site visits and surveys; ecological surveys; and local enquiry. Results and findings will be published on a website in the form of:- stories, accounts, photographs, digital interactive maps, and graphics, with a prime emphasis on accessibility, understandability, relevance. Principal attention will be paid to environmental areas that are productive of microfossils (bog and lake sediments); that have distinctive landscape features and sediment types (relict glacial and past and present fluvial landscape features); different natural habitat types and plant and animal communities; geological distinctiveness; and archaeological sites. Emphasis will be placed on the interconnectedness of these aspects of the current and past environments. The final step will be to show how, in each area, however local, these many and varied aspects have contributed to the present landscape and environment and thus to give an understanding how the future development may progress.

Open data schema for actigraphy data in chronobiology and sleep research

By: Manuel Spitschan, Grégory Hammad

Mentored by: Mallory Freeberg

Status: graduated

Keywords: open data, open science, data schemas, metadata, actigraphy, actimetry, rest-activity cycles, circadian rhythms, chronobiology, sleep research

Actigraphy provides a measure of the 24h rest-activity cycles based on movement counts, typically of the wrist. It is obtained using wearable devices and is a widely used, non-invasive way to determine sleep and circadian properties. Importantly, metrics derived from actigraphy are being increasingly used in clinical contexts, where groups of psychiatric and neurological patients in specific conditions have found to be exhibit abnormal rest-activity rhythms and sleep. Sleep and circadian parameters from actigraphy are derived measures. These are obtained by converting the movements counts (usually obtained at a resolution of 1 minutes) into sleep parameters and circadian metrics using algorithms raging from threshold-based computations to machine learning techniques. Unfortunately, at present, there are no standards or schemas for specifying and sharing actigraphy data and corresponding algorithms.

The goal of this project is to develop a common schema for the use, analysis, reporting and open and interoperable sharing of actigraphy data across different actigraphy devices produced by different commercial manufacturers and for use by researchers and research users. This project builds upon core research and technical expertise amongst the team members, and provides a framework to structure the work of the newly funded Chronobiology Data Standards Interest Group (CDSIG).

BioFerm: A web application used for kinetic modeling, parameter estimation and simulation of bioprocesses

By: Olayile Ejekwu

Mentored by: Renato Alves

Status: graduated

Keywords: Kinetic modelling, optimization, Bioprocesses, Microbial growth, parameter estimation

BioFerm is a web application platform which can be used for kinetic study, simulation and optimization of bioprocesses. The user is able to calculate the best initial conditions as well as overall operating conditions which will result in the highest product yield (or any user specified output). Kinetic modelling can also be done to further analyse the process and to calculate and estimate yield and kinetic parameters respectively. This allows the prediction of substrate, product and biomass concentrations over the bioprocess period. The BioFerm web application will be able to take in a variety of bioreactor configurations (batch, fed batch, continuous) and fit the results to a variety of models(inhibition and non-inhibition) to return the above mentioned parameters. The software is currently being written in Python using an open-source app framework(streamlit) to run the app but will later be written using Django, also a popular web framework.

Intellectual Property, Indigenous Knowledges, and the Rise of Open Data in Australian Environmental Archaeology

By: Carly Monks

Mentored by: Esther Plomp

Status: graduated

Keywords: Australian archaeology, Open data, Indigenous Knowledge

This project will investigate existing literature on the benefits, risks, and limitations of open data practices in Australian environmental archaeology, seeking to characterise the ethical and practical issues associated with the dissemination of data owned or stewarded (either wholly or in part) by Indigenous communities. Environmental archaeology, and its partner field of palaeoecology, is inherently interdisciplinary, drawing on diverse lines of evidence including faunal and botanical remains, geomorphological records, and Indigenous knowledges in order to understand past and present human-environmental relationships. The project will consider the tensions between Western scientific and Indigenous epistemologies, including the ways in which ‘data’ are understood and connected (or disconnected) to people and places, and where the boundaries of ‘archaeological’ and ‘non-archaeological’ environmental records lie. This project will provide the groundwork for the development of a larger, collaborative project engaging Indigenous and non-Indigenous researchers to advance a Code of Conduct for Australian archaeologists and palaeoecologists seeking to work openly while supporting the rights of Indigenous communities to manage access as they consider appropriate.

MiSET Publication Standards: A tool for AI-assisted peer-review of experimental information

By: Fabienne Lucas

Mentored by: Sonika Tyagi

Status: graduated

Keywords: rigor, reproducibility, peer-review, publishing, research quality defects, experimental methods, flow cytometry, tool, AI-assisted peer-review

The MiSET initiative aims to develop a minimum set of quality standards in the form of a quality assessment tool that evaluates the technical aspects of cytometry publications, and to fully integrate these flow cytometry standards into grant submission and publication requirements across scientific fields (Lucas et al., Cytometry A 2019).

Global Distribution of APOL1 Genetic variants

By: John Ogunsola

Mentored by: Sam Haynes, Yo Yehudi

Keywords: bioinformatics, data visualization, open educational resource

Genetic variants of APOL1 commonly found in people of recent African ancestry can predispose to chronic kidney disease. It is however unknown if and to what extent the variants are present outside of Africa. This project aims to create a visual representation of the global distribution of the frequencies of these genetic variants, by mining genomic information from publicly available datasets.

LA-CoNGA physics (Latin American alliance for Capacity buildiNG in Advanced physics)

By: Reina Camacho Toro, Alexander Martinez Mendez

Mentored by: Laura Ación

Status: graduated

Keywords: Open educational content, Data science training, Open science training

LA-CoNGA physics is an Erasmus+ project, an European-Latinamerican network of 11 universities, 9 research institutions and 3 industrial partners (2 of them being in the data science field) in advanced physics. We aim to create a set of postgraduate courses in Advanced Physics (high energy physics and complex systems) that will be common and inter-institutional, supported by the installation of interconnected instrumentation laboratories. This program will be inserted as a specialization in the Physics masters of the 8 Latinamerican partners in Colombia, Ecuador, Peru and Venezuela. It will comply with the Bologna protocols and is based on three pillars: courses in physics theory/phenomenology, data science and instrumentation.

We are guided by the principles of open science and education: *Content should be engaging and pedagogical *The content will be created and made available following the FAIR principles *Reproducibility is the base of the data science pillar. We want to teach the students how to use the correct tools to work with large amounts of data but also create an environment where the reproducibility of their work, tasks and projects is inculcated and applied from the first day

Our website: https://laconga.redclara.net Our github repository: https://github.com/LA-CoNGA

Junto Labs - Advancing Virtual Environments for Life Science Research and Active Learning

By: Lomax Boyd

Mentored by: Melissa Burke

Keywords: online research, mentorship, virtual environments, Jupyter notebooks

Inspired by the social clubs founded by Benjamin Franklin, the Junto Labs initiative seeks to provide life science researchers with an online space for pursuing collaborative research and supporting active learning. Life science laboratories can be open and highly collaborative spaces for in person research, learning and discovery. While online tools, such as Git and Jupyter notebooks, help facilitate openness and reproducibility among peers, they can also provide a highly creative and flexible medium for designing interactive educational experiences. The Junto Labs initiative aims to create a catalog of Jupyter notebooks that exemplify how to design virtual environments optimized for conducting research, facilitating mentorship, and encouraging active learning. Researchers would be able to more easily collaborate on active projects, but also expand active learning opportunities for students who may not otherwise have the chance to participate in research. Importantly, life science laboratories could use the resource to design and provide research and mentorship opportunities to students from under-resourced communities or universities where opportunities to participate in life science research are limited or nonexistent.

metaNanoPype: a reproducible Nanopore python pipeline for metabarcoding

By: António Sousa

Mentored by: Hans-Rudolf Hotz

Status: graduated

Keywords: metabarcoding, python pipeline, reproducible

The emergence of short-read NGS technologies have brought a profound knowledge to the field of microbial ecology/evolution through the taxonomic identification of microbial communities - metabarcoding. Although its main limitation resides on their short read-length that has been suppressed by long-read/real-time sequencing technologies such as Oxford Nanopore MinION. Currently, there are many standalone tools/algorithms to process this data inclusive bioinformatic pipelines but they lack a better integration. My proposal is the development of a modular python pipeline for nanopore metabarcoding data - metaNanoPype - with the following modules: (I) demultiplexing; (II) quality-assessment; (III) quality-filtering and trimming; (IV) taxonomic classification; (V) diversity analyses. Each module could include several options to allow flexibility. Each step could generate a log file used later to build a report in html/pdf format describing the versions, commands and references of software used. The report built would ensure reproducibility, transparency, acknowledgement and could be used as supplementary material of papers. metaNanoPype could be publicly available on github (open source) with further documentation published with github pages.

MBiO: Designing an open-collaborative website in the field of molecular biology

By: Nihan Sultan Milat

Mentored by: Michael Landi, Renato Alves, Toby Hodges

Status: graduated

Keywords: open educational resource, open-collaborative, molecular biology

The field of molecular biology is a concept to discover, identify and explain mechanisms of everything about DNA, RNA and protein level in a cell. Despite it is a relatively young discipline, its prominence in the life sciences is becoming more and more popular. Within the scope of my project, I aim to evaluate the paper on molecular biology studies and make them available for everyone. Choosing a weekly topic and summarizing it that everyone can understand is the main idea. As a workflow, I aim to write a brief introduction to introduce the paper and its authors, the purpose of the study, and present the results. Briefly, I would like to design a website which is publicly accessible. I aim to make this website as a resource for the academic community, students and all other folks who want to read and learn. At the same time, I plan to prepare a section where questions can be asked in order to share with other readers to make a discussing community about the related article. I want to provide a connection between students or researchers in this field of science to improve knowledge, share and even find new ideas.

Open Life Science (OLS) Program, a driver of open science skills among early stage researchers and young leaders: mentee perspective

By: Muhammet Celik

Mentored by: Bérénice Batut, Yo Yehudi, Malvika Sharan

Keywords: value of OLS, participant perspective, internalize openess, pendown

OLS is a great platform to open life science with the objective of train young researchers towards the practices in open science skills. I am a graduand of OLS-2 that recently concluded and coming out of that program I felt that there is tremendous value in the program. However, this might not be reaching out to as many as possible. I think, one way of the extending the outreach beyond what else has been done is perhaps to pen down the experience of the participants of the previous program. As a participant myself, I can see how there many ways, one could promote this and share the journey with the readers, especially with the young generation and highlight the essence of this program. Thus, I took it motivation to myself to contribute in this direction by re-joining the OLS-3 program and having this it self as a project with the goal of coming up as a tangible document in the form of a publication to be shared with the community at large.

Documentation enhancement with open science practices in sktime

By: Afzal Ansari, Abdulelah Al Mesfer

Mentored by: Toby Hodges

Status: graduated

Keywords: sktime, documentation, algorithm maintainer, codeowner

sktime is a new Python toolbox for machine learning with time series (https://github.com/alan-turing-institute/sktime). It provides state-of-the-art time series algorithms and scikit-learn compatible tools for building, tuning and evaluating complex models. The goal of this project is to improve sktime’s online documentation with a specific focus on documenting algorithm contributors. Algorithms form a major part of sktime. They require special expertise in their development and maintenance. We plan to enhance the existing documentation by making algorithm contributors more visible. The aim is (i) to make it easier for users and other developers to directly get in touch with the algorithm experts to ask questions or suggest code improvements and (ii) to recognize their contributions more visibly and formally to encourage long-term maintenance of their contributions. sktime has already defined a new community role as part of their governance guidelines to ensure that algorithm contributors have extra rights and responsibilities with regard to their algorithm. However, up-to-date documentation listing the current contributors and links to their algorithms is currently missing. Optionally, we can add other information like literature references. We plan to automate the generation of this documentation by making use of the existing documentation and other components such as CODEOWNERS file and author strings in Python files.

An Open Source Service Area for Turing research projects

By: Sarah Gibson

Mentored by: Meag Doherty

Status: graduated

Keywords: open-source, research, strategy

This project is to develop a Turing Service Area in Open Source that will provide formal support in open working and embedding best practices of open software development into Turing projects. This service area will create an Open Developer Advocate position whose role will be to work with and guide projects into working openly and either build a community around their open project, or make a contribution to an existing open project. This guidance would take the form of regular meetings, co-working and/or drop-in sessions and would address roadmapping of the project in terms of its open goals, and developing project policies for engaging openly. The area would work with the Turing Way project to draw on existing material and contribute new processes there.

Towards FAIRer phytolith data

By: Javier Ruiz Pérez, Juan José García-Granero, Carla Lancelotti, Marco Madella

Mentored by: Emma Karoune

Status: graduated

Keywords: FAIR, data sharing, palaeobotany, phytolith research, archaeology, palaeoecology

Phytoliths are microfossils of plants used world-wide to address a variety of questions in fields like archaeology, palaeoecology and palaeontology. Diverse laboratory procedures, analyses and identification criteria are used resulting from different research traditions. Some steps, such as the normalisation of nomenclature through the International Phytolith Society, have been promoted to standardise the phytolith analysis and the subsequent publication of data. However, the standardisation of phytolith research and data publication is still far from being achieved. Moreover, a recent assessment of the data sharing practices within the phytolith community found only half of the publications share some form of data and the majority do not provide reusable data. This project has grown from initial efforts by Emma Karoune during OLS2 to raise awareness of issues with poor data sharing practice. It is part of a broader initiative supported by the International Phytolith Society on data sharing and represents the first steps towards the FAIRification of phytolith data: an evaluation of sharing practices in phytolith research; the creation of a GitHub repository for collaborative use by this working group and in the forthcoming FAIRification project; and the development of a webpage to provide the community with information as the project proceeds.

By: Arvinpreet Kaur, Ashutosh Tiwari, Robandeep Kaur, Mehak Chopra, Harpreet Singh, Prash Suravajhala

Mentored by: Prash Suravajhala, Harpreet Singh, Bérénice Batut

Status: graduated

Keywords: Obesity, Diabetes, Gut microbiome, Linkage disequilibrium, pleiotropy

Obesity causes approximately 4.7 million premature deaths annually, which accounts for a loss of ca. 8% globally. Obesity is an outcome of complex, heritable, and multi-factorial interaction of multiple genes, environmental factors, and behavioral traits that makes management and prevention challenging in the human population (Rao et al., 2014). Experimental research has demonstrated that altered metabolites in multiple metabolic pathways are associated with obesity (Zhao et al.,2016). Alteration in the proportion of bacteroidetes and firmicutes in the gut microbiome can trigger obesity. The gut microbiome’s influence on obesity is much more complicated than the imbalance of these bacteria species. Modulation of the gut microbiome through diet, prebiotics, surgery, and antibiotics significantly affects the obesity epidemic (John & Mullin, 2016). It is one of the enormous global health problems associated with increased morbidity and mortality mediated by its association with several other metabolic disorders (Saini et al., 2018). We aim to target obesity and diabetes-associated metabolic disorders and annotate the genes common to these complex diseases using a systems genomic integrated approach, thereby using Galaxy as a platform.

Boosting research visibility using Preprints

By: Didik Utomo, Hilyatuz Zahroh, Zenita Milla Luthfiya

Mentored by: Iratxe Puebla

Status: graduated

Keywords: preprints, open resource, open access, publishing

AKADEMISI PREPRINTS is a free distribution service of preprints from multidisciplinary fields. The server plans to include connection hub to journals and open peer review community. By doing so, we hope to promote the transparency and quick visibility of research results to the public.

Open Science Community in Saudi Arabia

By: Batool Almarzouq

Mentored by: Anelda van der Walt

Status: graduated

Keywords: Open science, Saudi Arabia, Community

Although there is an increasing number of initiatives in Saudi Arabia to raise awareness in Data Science (DS) and connect researchers in artificial intelligence (AI), there is no single community dedicated to stimulating responsible research practices and Open Science policies. I wish (with the help of a mentor) to establish an open science community in Saudi Arabia. Our target groups are researchers and students who are open and curious about open science but have little to no experience with open science practices.

COMPUTATIONAL DRUG DISCOVERY (CORONAVIRUS)

By: Anshika Sah

Mentored by: Yo Yehudi

Status: graduated

Keywords: SARS coronavirus 3C-like proteinase, IC50, pIC50, Bioactivity, Lipinski’s rule, Scatter plot, Frequency plot, Box plot, Mann-Whitney test

Biological activity data was retrieved from the ChEMBL database and pre-processed by selecting the target which was SARS coronavirus 3C-like proteinase in the project and the data frame of the target protein was filtered by removing the molecules which do not have the standard type as IC50 and those having missing value for standard value. The data was distributed as active, inactive, and intermediate by the IC50 values. The SMILES notation (representing the unique chemical structure of compounds) from the dataset was used to compute the molecular descriptors. Lipinski’s descriptors are used in the project which considers molecular weight, LogP, number of hydrogen bond donors, and number of hydrogen bond acceptors. These descriptors are related to the pharmacokinetic properties of molecules. The exploratory data analysis was performed via Lipinski’s descriptors. Simple box plots and scatter plots were plotted to discern differences between the active and inactive sets of compounds. Mann-Whitney U test was performed for each descriptor to determine the statistically significant difference between active and inactive molecules.

Skills for Open Agrobiodiversity Data

By: Irene Ramos

Mentored by: Piraveen Gopalasingam

Status: graduated

Keywords: agrobiodiversity, open data, oer

I aim to develop training materials to support the use of open data by researchers working on agrobiodiversity conservation. At CONABIO (Mexico), a governmental agency that coordinates biodiversity data collection, I collaborate in the development of an Agrobiodiversity Information System (SIAgroBD); my role involves technical and community management responsibilities. Currently, twelve teams of students and researchers from different institutions contribute to field data collection for SIAgroBD. While we are committed to open practices at CONABIO and all collected data are open, some external contributors lack the skills to use these data, even if they have helped collect them, and are not familiar with open practices. Thus my project consists in developing training materials (OER) for an introductory workshop on open data with a focus on FAIR principles, biodiversity standards, effective management strategies, among other skills that encourage contributors to become active users of data apart from collectors. The integration of social and biological information and the use of Indigenous data are distinctive features of agrobiodiversity research in Mexico that will also be addressed. I expect this serves as a prototype for advanced training modules that could be used by future contributors or other researchers working on agrobiodiversity topics.

By: Jennifer Miller

Mentored by: Beth Duckles

Status: graduated

Keywords: postdoc, empirical legal research, systematic review, public policy, open notebook science

The project is an open notebook living systematic review of legal documents related to postdoctoral scholars and appointments (postdocs). The project aims to use the methods of empirical legal scholarship to describe and categorize the ways postdoctoral scholars and their appointments have been involved in the legal system. Briefly, empirical legal scholarship is a form of qualitative or mixed-methods research, often involving content analysis, that uses legal documents or decisions as its data source. We are not aware of any other research applying this method or data source to the study of postdocs. In fact, there has been little research of any kind on the legal aspects of postdoc appointments.

Building on a “file drawer” paper by Jennifer Miller (with Kristina Van Buskirk), we frame our project around the question of whether postdocs are employees or students. Based on economic theory, we expect the types of cases to reflect whether postdocs are employees producing in a labor market or students consuming in a services market.

More information about the project is available on GitHub https://github.com/JMMaok/postdoc/projects and Zotero https://www.zotero.org/groups/

The UKCRC Tissue Directory and Coordination Centre

By: Emma Lawrence, Jessica Sims

Mentored by: Sarah Gibson

Status: graduated

Keywords: Biobanking, research, samples, biospecimens, COVID19

The mission of the UKCRC Tissue Directory and Coordination Centre (UKCRC TDCC) is to maximise the use, value and impact of the UK’s human sample resources in the UK, and beyond. The UKCRC TDCC is creating a world-leading, research-enabling, and networked biobanking infrastructure to facilitate the discovery and use of the UK’s human samples and data. The UKCRC TDCC works to help researchers discover samples and data, help sample resources improve their data systems for sharing, and harmonise policy relating to the discovery and use of samples and data. The work of the UKCRC TDCC is guided by the belief that the biomedical research ecosystem should be based on open standards, open-science, and pre-competitive collaboration.

Development of language resources for Hausa Natural Language Processing

By: Shamsuddeen Muhammad, Ibrahim Said Ahmad, Ruqayya Nasir Iro

Mentored by: Laura Carter

Status: graduated

Keywords: Natural Language Processing, Low-resources, Machine Learning, Corpus, Language resources

This work aims to create a Nigerian sentiment corpus, sentiment, and hate speech lexicon through manual annotation for three different languages (Hausa, Igbo, Yoruba). Our method for the creation of these language resources is as follows:

Nigerian Sentiment Corpus: To create the sentiment corpus, tweets from major Nigerian news headlines for each of the three languages will be crawled from Twitter using an existing Python crawler we developed. Ten thousand tweets will be extracted per language via the Twitter API. Thereafter, the tweets will be annotated by native annotators for each of the languages. These annotators will be hired and trained to perform the annotation. The annotation tasks consist of labeling each tweet as either positive, negative or neutral.To mitigate errors and bias, each dataset will be annotated by three different annotators. After which the project team will compute the kappa agreement between the annotators

Nigerian Sentiment Lexicon: In the same way, manual annotation of the tweets will be used to create the sentiment lexicon for each of the three languages. The sentiment lexicon annotation task involves Identifying sentiment bearing words from each tweet and assigning a sentiment score between +1 to +5 (with 1 being the most negative sentiment and +5 the most positive sentiment).

Nigerian Hate Speech Lexicon: Extreme negative sentiment from the sentiment lexicon will be used to develop the hate speech lexicon.

Annotation tool: We plan to use a web-based annotation tool, brat (Stenetorp et al., 2012) which has been proved to be efficient for this type of task by many researchers. The annotators must be native speakers of the language and follow the annotation guidelines provided by the project teams.

HausaNLP aims to create more language resources that can be used to train models in machine learning.

ProCancer-I - An AI Platform integrating imaging data and models, supporting precision care through prostate cancer’s continuum

By: Haridimos Kondylakis, Stelios Sfakianakis

Mentored by: Harpreet Singh

Status: graduated

Keywords: Prostate Cancer, Open Data, Pca

In Europe, prostate cancer (PCa) is the second most frequent type of cancer in men and the third most lethal. Current clinical practices, often leading to overdiagnosis and overtreatment of indolent tumors, suffer from a lack of precision calling for advanced AI models to go beyond SoA. The ProCAncer-I project brings together 20 partners, including PCa centers of reference, world leaders in AI, and innovative SMEs, with the objective to design, develop, and sustain a cloud-based, secure European Image Infrastructure with tools and services for data handling. The platform hosts the largest collection of PCa multi-parametric (mp)MRI, anonymized image data worldwide (>17,000 cases), based on data donorship, in line with EU legislation (GDPR). Robust AI models are developed, based on novel ensemble learning methodologies, leading to vendor-specific and -neutral AI models for addressing 8 PCa clinical scenarios. To accelerate the clinical translation of PCa AI models, we focus on improving the trust of the solutions with respect to fairness, safety, explainability, and reproducibility. A roadmap for AI models certification is defined, interacting with regulatory authorities, thus contributing to a European regulatory roadmap for validating the effectiveness of AI-based models for clinical decision making.

Seeding

By: Dario Pescini, Marzia Di Filippo, Chiara Damiani, Paolo Pedaletti

Mentored by: Bérénice Batut

Keywords: community building, Systems Biology, Metabolism, quantitative Life Science, technology, infrastructure

The long term project objective is to establish in my university a core-team/lab able to aid the community to design and implement open science projects. In order to start this long term project I believe that working on a use case would help various aspects. It will help to coalesce and uniform the team domain knowledge, to start to get involved also the technical and administrative part and, to gain visibility and credibility. The use case is a computational framework to aid the metabolism modelling, that we are currently developing in my lab and it is on the way to be published. The publication that will accompany this framework is near to be ready to be submitted and the framework itself is wholly developed with open software. This use case, in particular, is suitable to follow various aspects of the Open Science approach, from the journal paper management to the software publication. I think that this application can be great opportunity to learn the open science approach in an organic way and to discover how to do it.

Creating a network of Open Science ambassadors in Spanish Health Research Institutes

By: Marta Marin, Santi Rello Varona, Iris San Pedro

Mentored by: Joyce Kao

Status: graduated

Keywords: Health Research Institutes, Network of Open Science Ambassadors, Best practices’ Toolkits, Open Science implementation

This project will create a network of Open Science (OS) ambassadors in Spanish Health Research Institutes (HRI). Thus, aiming to implement OS in HRI by raising consciousness about its principles that apply to this particular field (i.e., reproducibility, transparency, dissemination and data sharing). To induce an easy and comprehensive transition to OS, researchers will be provided with access to the best practices’ Toolkits for OS implementation. As part of their activity, OS ambassadors will be encouraged to engage with the general public, patients and the future generations of scientists.

To accomplish that, professionals in the HRI willing to be trained to become OS ambassadors will be identified and recruited. This network will be in charge of promoting OS in their institutions. The ambassadors will identify potential OS activities that can be kickstarted, give solutions to questions raised and advice on best practices on their institutions.

At the end of this project: a network will be created, together with a framework to maintain this group active, and ambassadors will disseminate the knowledge gathered about application of OS principles in health research in their institutes. That would allow the progressive implementation of OS in HRI.

FAIR MAFIL: FAIRification of imaging/neurophysiological data of MAFIL CEITEC MUNI laboratory for EOSC

By: Michal Růžička, Michal Javornik, Zdenka Dudova

Mentored by: Louise Bezuidenhout, Bérénice Batut

Status: graduated

Multimodal and Functional Imaging Laboratory (MAFIL, https://mafil.ceitec.cz/en/) is one of core facilities at CEITEC MUNI and part of national large research infrastructure Czech-BioImaging and European research infrastructure Euro-BioImaging. The main role is to provide access to medical imaging technologies – mainly magnetic resonance imaging accompanied with various electrophysiological methods. Within this project we aim at preparing our data and metadata of neuroimaging datasets processed in MAFIL to follow FAIR principles and be ready for publication and cloud-based processing in EOSC. As MAFIL is “open access” laboratory, i.e. provides researchers outside of CEITEC access to the laboratories, technologies, and experts of CEITEC to conduct their analysis and support their research needs, the procedures will be document and training provided to MAFIL users (“customers”) to be aware of FAIRification procedures and able to apply them on their data making them “EOSC ready”. The outputs and experiences will also be shared with other labs/nodes within Czech-BioImaging/Euro-BioImaging infrastructures. Thus, we would very appreciate and welcome any training, help, advice, or good practise on FAIRification and anonymisation of neuroimaging datasets.

The Turing Way - Developing a community health report and assessing its impact on the wider data science community

By: Ali Humayun

Mentored by: Malvika Sharan

In The Turing Way, we want to systematically understand community practices including the community engagement pathways, contributors’ roles and nature of their participation that have been successful at supporting its community of diverse contributors. Simultaneously, we want to identify factors that may currently prohibit short or long term commitments of our contributors and how they can be further supported.

With my participation in OLS-3, I will develop a community health report of the project, capturing community development aspects from growth to retention. I will build upon the Open Source community health metric (https://wiki.mozilla.org/Contribute/Community_Health), which involves evaluating contributors’ group that is actively involved in a project, number of new contributors that join the project, and members who leave. For online projects, it can also involve tracking the number of community ambassadors, the number of return attendees to events and the rate of churned attendees. Developing an ideal metric in this project will require further deliberation and consultation from The Turing Way team and core contributors. Hence, this project will be collaboratively designed with other community members by actively inviting their contributions and thoughts.

Developing and embedding open science practices within the Research Application Management team at Turing

By: Aida Mehonic

Mentored by: Malvika Sharan

Keywords: open science workflow, open source code, stakeholder engagement, research application, ASG

I have just started a new role as a Research Application Manager at the Turing Institute. My responsibility is to define my own workplan as well as guide the development of the workplans of 2 other Research Application Managers once they are recruited. This is a new role and we do not yet have a blueprint for what a good research application manager at Turing looks like.

My goal is to embed open science practices into the philosophy and the workflow of the RAM team, as much as possible within the constraints of a given project and Theme.

Since I am personally new to the open science community, I would benefit from OLS training and mentorship. My ambition is to ensure that we create a good basis for open science practices within ASG and hopefully in other parts of Turing that RAMs interface with.

GyaNamuna: Virtual School Connecting Rural Students To The World

By: Prakriti Karki, Mohan Gupta, Ujwal Shrestha

Mentored by: Teresa Laguna

Status: graduated

Keywords: online, village, education, DIY, Makerspace

Project aims to connect rural school students to the cities of Nepal and other countries. Pandemic has helped few Nepalese rural schools get internet facilities. Our project will take help of internet, online conferencing tool and team of young people from different disciplines to connect our little heroes to the outer world to help them learn language, better understand external culture, meeting new friends, learning science, doing DIY innovations and initiating makerspace movement through virtual collaborative environment.

By: Martina Vilas

Mentored by: Anna Krystalli

Status: graduated

Keywords: Reproducibility, AI trends, Data Science, Reproducible research practices, Computational research methods, Research software

In The Turing Way, we define reproducible research as work that can be independently recreated using the same data and code from the original study. Reproducible research is necessary to ensure that scientific output can be trusted and built upon. Despite this importance, many studies are difficult to reproduce, including those involving the application of a computational model.

To overcome this “Reproducibility Crisis” we need to identify and standardized reproducible practices that researchers can apply in their projects from the start. But these may vary across fields and methods. In this context, this project will quantitatively assess and derive those research practices that can ensure the reproducibility of studies involving the development and application of AI models for understanding cognitive-systems, with the overarching goal of increasing their transparency and trustworthiness.

As a cognitive neuroscientist, I will develop a prototype of this assessment by identifying and openly documenting reproducible practices of computational modelling projects in my field. With my participation in OLS-3, I will review the reproducible practices of gold-standard studies and assess the level of transparency maintained in their research. I will also curate relevant guidelines and expert-recommendations. The findings will be collaboratively reported as chapters in The Turing Way.

Towards an infrastructure for open-source (online) training in data science and AI

By: Mishka Nemes

Mentored by: Jez Cope

Keywords: education, training, data science, AI, open infrastructure, community

This project aims to devise, develop and implement an online tool that allows interested users to suggest or contribute to training courses in an open source fashion. The tool could involve a GitHub repository where users can suggest training ideas, review and comment on existing courses, or share their resources for the larger community. As the national institute for data science and AI promoting open, expert and ethical leadership, I believe the Institute and my team would be well placed to support such an engagement stream with the wider community of trainers and researchers.

Implementing a series of pedagogical games to teach pupils and citizens (metagenomic) data analysis

By: Teresa Müller, Alireza Khanteymoori, Masako Kaufmann, Florian Heyl

Mentored by: Yvan Le Bras

Status: graduated

Keywords: citizens science, DNA sequencing, metagenomics, Galaxy

As part of the Street Science Community, we successfully developed the BeerDeCoded project: a hands-on workshop for pupils and citizens with the general aim of scientific outreach. During these workshops, we help participants to extract and identify different yeasts contained in a beer sample. The identification is performed by sequencing the extracted yeast DNA, using our self-developed protocols, and analyzing the generated reads via an easy and straightforward Galaxy workflow. Because of the pandemic situation, we cannot run face-to-face workshops. For a more scalable outreach to the public and the long term sustainability of the project, we want to implement the data analysis as a series of fun and easy-to-understand online games. We will use already existing games to get participants interested and give them the biological background necessary for our project. Primarily we will develop a game, which teaches the data analysis of the BeerDeCoded project. Here, participants will get familiar with Galaxy, run and play with their first data analysis pipeline. They are going to compare their results with others and use different available datasets. For this game, we will work with the Galaxy community on the technical part and with teachers on the pedagogical and gamification side.

Participants

The GitHub avatar of

William Jackson

Pronouns: He/Him
@0x174

Boston University

Expertise:
Robotics, Software, Automation, Computer Vision, Machine Learning

More about William

The GitHub avatar of

Abdulelah Al Mesfer

Pronouns: he/him
@asmesfer

More about Abdulelah

The GitHub avatar of

Afzal Ansari

Pronouns: he/him

Expertise:
Time series analysis with machine learning, Machine learning

More about Afzal

The GitHub avatar of

Aida Mehonic

Pronouns: she/her
@amehonic

The Alan Turing Institute

Expertise:
Creating user-friendly outputs from the research process, Data science, Biophysics

More about Aida

The GitHub avatar of

Ali Humayun

Pronouns: He/Him

Expertise:
Editing, Legal, Diversity issues

More about Ali

The GitHub avatar of

Annalee Sekulic

Pronouns: She/Her/Hers

Ohio State University

Expertise:
Arabian Botanicals, Public Databases, Social Memory

More about Annalee

The GitHub avatar of

Anshika Sah

Pronouns: SHE
@anshika24092962

Expertise:
Bioinformatics, Biology

More about Anshika

The GitHub avatar of

António Sousa

Pronouns: he/him
@antonioggsousa

Instituto Gulbenkian De Ciência

Expertise:
Bulk RNA-seq data analysis, Single-cell RNA-seq data analysis, Metabarcoding data analysis, R, Python, Bash

More about António

The GitHub avatar of

Arvinpreet Kaur

Pronouns: she/her/hers

Expertise:
Computer aided drug desig, Bioinformatics
The GitHub avatar of

Ashutosh Tiwari

Pronouns: He/him/his

Guru Ghasidas Vishwavidyalaya

Expertise:
Molecular Diagnostics, Microbial Technology

More about Ashutosh

The GitHub avatar of

Batool Almarzouq

Pronouns: She/Her
@batool664

Open Science Saudi Arabia

Expertise:
Reproducibility, Computational biology

More about Batool

The GitHub avatar of

Reina Camacho Toro

Pronouns: She/her
@rcamachotoro

Cnrs/Cern. Co-Coordinator Of La-Conga Physics

Expertise:
Open education, Science and education capacity building, Virtual research and learning communities, Scientific connections between developed and developing countries, Particle physics

More about Reina

The GitHub avatar of

Carly Monks

Pronouns: She/Her
@archaeo_ecology

University Of Western Australia

Expertise:
Ethical data sharing, Fair principles, Care principles

More about Carly

The GitHub avatar of

Chiara Damiani

The GitHub avatar of

Carla Lancelotti

Pronouns: she-her
@cl379

Universitat Pompeu Fabra And Icrea

Expertise:
Archaeobotany

More about Carla

The GitHub avatar of

John Ogunsola

Pronouns: he/him
@JohnOgunsola

Institute Of Biodiversity, Animal Health & Comparative Medicine, University Of Glasgow

Expertise:
Genomics, Renal pathology, Trypanosome biology, Biological sciences, Trypanosomiasis, Veterinary pathology

More about John

The GitHub avatar of

Didik Utomo


Akademisi

Expertise:
Bioinformatics

More about Didik

The GitHub avatar of

Zdenka Dudova


Masaryk University

Expertise:
Biobanking software, Data management, IT infrastructure basics

More about Zdenka

The GitHub avatar of

Simon Duerr

Pronouns: he/him
@simonduerr

Epfl

Expertise:
Computational Chemistry, Biochemistry, Protein Design, Deep Learning

More about Simon

The GitHub avatar of

Emma Lawrence

Pronouns: She/hers
@emmaj22

Ucl

Expertise:
Biobanking

More about Emma

The GitHub avatar of

Fabienne Lucas

Pronouns: she/hers
@DrFabLucas

Brigham And Women'S Hospital

More about Fabienne

The GitHub avatar of

Grégory Hammad


University Of Liège

Expertise:
Physics, Actigraphy, Programming, Open-source software

More about Grégory

The GitHub avatar of

Harpreet Singh


Hans Raj Mahila Maha Vidyalaya Jalandhar

Expertise:
Bioinformatics, Molecular Modeling, Machine learning, R Programming

More about Harpreet

The GitHub avatar of

Florian Heyl


University Of Freiburg

The GitHub avatar of

Hilyatuz Zahroh

Pronouns: She/her
@hilyatuz_zahroh

Genetics Research Centre, Universitas Yarsi

Expertise:
Structural Bioinformatics, Human disease genetics, GWAS analysis, Pharmacogenomics, Open Science

More about Hilyatuz

The GitHub avatar of

Irene Ramos

Pronouns: she / her

National Commission For The Knowledge And Use Of Biodiversity (Conabio)

Role in OLS: NASA Cohort Coordinator (contract)

Expertise:
Fair, Open data, Data management, Agrobiodiversity, Sustainability, Transdisciplinary research

More about Irene

The GitHub avatar of

Iris San Pedro

Pronouns: She/her
@irisbotas

Expertise:
Open Science, Communication, Film
The GitHub avatar of

Ibrahim Said Ahmad

@Isabone

Bayero University Kano

Expertise:
Natural Language Processing

More about Ibrahim Said

The GitHub avatar of

Javier Ruiz Pérez

@J_Ruiz_Perez

Cases Research Group, Department Of Humanities, Universitat Pompeu Fabra

Expertise:
Phytolith analysis, Archaeobotany, Palaeoecology, South American archaeology, Amazonian archaeology

More about Javier

The GitHub avatar of

Jessica Sims

Pronouns: She/her
@jmaisi

University College London

Expertise:
Biobanking, Human samples for research, Policy, Engagement

More about Jessica

Expertise:
Public policy; science & technology policy; public management; program evaluation; civic tech

More about Jennifer

The GitHub avatar of

Juan José García-Granero

Pronouns: He/him/his

Spanish National Research Council

Expertise:
Archaeology

More about Juan José

The GitHub avatar of

Katharina Kloppenborg

Pronouns: she/her
@k_kloppenborg

Center For Research & Interdisciplinarity (Cri)

Expertise:
User experience, User-centered/participatory design, Citizen science, Peer production, Illustration

More about Katharina

The GitHub avatar of

Alireza Khanteymoori


University Of Freiburg

Expertise:
Machine Learning, Computational Intelligence, Bioinformatics
The GitHub avatar of

Haridimos Kondylakis

@kondylak

Collaborating Researcher, Forth-Ics

Expertise:
Data Management, Semantics

More about Haridimos

The GitHub avatar of

Lomax Boyd

Pronouns: He/him
@lomaxboyd

The Rockefeller University

Expertise:
Neurogenetics, Evolution of the human brain

More about Lomax

The GitHub avatar of

Marco Madella

Pronouns: he/his
@m4bcn

Cases Research Group - Icrea - Universitat Pompeu Fabra

Expertise:
Archaeology, Palaeoenvironment
The GitHub avatar of

Marta Marin

Pronouns: She/her

Eatris

Expertise:
Project management, Health research, Genetics, European research
The GitHub avatar of

Martina Vilas

Pronouns: She/her
@martinagvilas

Max-Planck-Institute Ae

Expertise:
Open source, Open source documentation, Open infrastructure, Open science communities, Version control, Computational Modeling, Machine learning, Neuroimaging, Neuroscience

More about Martina

The GitHub avatar of

Marzia Di Filippo


University Of Milano-Bicocca

Expertise:
Systems Biology, Metabolic Modelling, Constraint-based modelling

More about Marzia

The GitHub avatar of

Masako Kaufmann

Pronouns: She, her

Uniklinikum Freiburg

Expertise:
Genome editing
The GitHub avatar of

Mehak Chopra

Pronouns: she/her
@chopramehak18

Centre For Bioinformatics, Pondicherry University

More about Mehak

The GitHub avatar of

Michal Javornik


Masaryk University

The GitHub avatar of

Michal Růžička


Masaryk University, Institute Of Computer Science

Expertise:
Open science, FAIR data, Cybersecurity, Data management

More about Michal

The GitHub avatar of

Mishka Nemes

Pronouns: she/her
@mishkanemes

The Alan Turing Institute

Expertise:
Genomics, Neuroscience, Computational neuroscience, Computational modelling, Education

More about Mishka

The GitHub avatar of

Mohan Gupta

Pronouns: He/Him
@Mohan Gupta

Media Lab Nepal, Purbanchal University (Pu)

Expertise:
Biotech research, Community science, Entrepreneurship

More about Mohan

The GitHub avatar of

Muhammet Celik

Pronouns: he/him

The GitHub avatar of

Alexander Martinez Mendez

@mxrtinez

Universidad Industrial De Santander / La-Conga Physics

Expertise:
Science reproducibility; open software; linux

More about Alexander

The GitHub avatar of

Nihan Sultan Milat

@MilatNihan

Bezmialem Vakif University, Beykoz Institute Of Life Science And Biotechnology

Expertise:
Life Sciences, Molecular biology, Developmental Genetics

More about Nihan Sultan

The GitHub avatar of

Olayile Ejekwu

Pronouns: She/Her

University Of Pretoria

Expertise:
Bioprocess Engineering

More about Olayile

The GitHub avatar of

Paolo Pedaletti


Università Milano Bicocca

Expertise:
Free / Open Source Software, Open data, Software licenses

More about Paolo

The GitHub avatar of

Dario Pescini

Pronouns: he/him
@darioPescini

University Of Milano-Bicocca

Expertise:
Systems biology, Computational biology, Systems simulation;
The GitHub avatar of

Prakriti Karki

Pronouns: She/her

Tribhuvan University, Media Lab Nepal

Expertise:
Microbiology research, Community science, Women in science initiatives, DIY educational tools, Teaching

More about Prakriti

The GitHub avatar of

Prash Suravajhala

Pronouns: He/Him
@prashbio

Bioclues.Org

Expertise:
Systems genomics, bioinformatics

More about Prash

The GitHub avatar of

Robandeep Kaur

Pronouns: She/her

Expertise:
Bioinformatics
The GitHub avatar of

Robin Lewando


Independent

Expertise:
Geology, Palaeoecology, Palynology, QGIS, Website construction

More about Robin

The GitHub avatar of

Ruqayya Nasir Iro

Pronouns: She

The GitHub avatar of

Santi Rello Varona

Pronouns: he/him
@KropTor

Hospital La Paz Institute For Health Research

Expertise:
Cell Biology, Science Management

More about Santi

The GitHub avatar of

Sarah Gibson

Pronouns: she/her
@drsarahlgibson

The Alan Turing Institute

Expertise:
Reproducibility, Cloud infrastructure, Open source, Community building, Continuous integration

More about Sarah

The GitHub avatar of

Stelios Sfakianakis


Forth-Ics

Expertise:
Bioinformatics, Software Design, Data Integration, Health Informatics
The GitHub avatar of

Shamsuddeen Muhammad

Pronouns: He/Him
@shmuhammadd

Bayero University, Kano - Nigeria

Expertise:
Natural language processing, Machine learning, R

More about Shamsuddeen

The GitHub avatar of

Steven Burgess

Pronouns: He/His/Him
@SJB_SynBio

University Of Illinois At Urbana-Champaign

Expertise:
Synthetic Biology

More about Steven

The GitHub avatar of

Manuel Spitschan

Pronouns: he/him/his
@mspitschan

University Of Oxford

Expertise:
Circadian neuroscience, Chronobiology, Visual neuroscience

More about Manuel

The GitHub avatar of

Teresa Müller

Pronouns: She/Her
@tesamueller

University Of Freiburg, Bioinformatics Group

Expertise:
RNA sequence-structure alignment, RNAseq, SELEX, Ribo-Seq

More about Teresa

The GitHub avatar of

Ujwal Shrestha

Pronouns: He/Him

Purbanchal University, Media Lab Nepal

Expertise:
Biotechnologist, Antibiotics, probiotics, Social entrepreneurship

More about Ujwal

The GitHub avatar of

Zenita Milla Luthfiya

Pronouns: She
@zenitamilla

Akademisi

More about Zenita Milla