Andrew West (CC BY-NC-SA 2.0)
Participants join this program with a project that they either are already working on or want to develop during this program.
For this round of the OLS program, we are happy to have 66 participants with 37 projects.
Mentored by: Fotis Psomopoulos
Status: graduated
Keywords: citizen science, peer-production, participatory design
Citizen science revolves around the idea of integrating the public in scientific research. However, there are different interpretations of this idea. An important part of citizen science projects allows laymen only to participate in a limited scope of microtasks and keeps thus reinforcing the power gap between academic scientists and the public. Literature has called for more autonomy of citizen scientists by allowing them to participate in more phases of the research cycle. Commons-based peer-production, an alternative mode of production in which people self-organize to develop complex knowledge-commons like Wikipedia or open software, seems to be a promising approach to facilitate this. However, a design-centred approach implementing this for a specific use case is yet to be done. In my PhD project I am trying to fill this gap by redesigning the online ecosystem of Open Humans - an existing community of practice around citizen science - collaborating closely with this community in a user-centered design approach. As one of the first steps, I am working on a best practices guide, summarizing the experiences of existing similar projects.
By: Simon Duerr
Mentored by: Emily Lescak
Status: graduated
Keywords: conferences, virtual, poster session
VCMS (vcms.simonduerr.eu) is convenient tool to setup a website for a virtual conference including an abstract submission portal, timezone adapted scheduling of talks and an interactive virtual poster session with video chat and spotlights for posters with some features still in development. The tool is currently in beta and will be released as FOSS under MPL once the software is battle tested (in mid february).
By: Annalee Sekulic
Mentored by: Kate Simpson
Status: graduated
Keywords: database, video, generational memory, record, document, historical, Croatia, Diaspora
The “Memory Collecting: Croatian Homeland War” project aims to create a platform where survivors can submit video recordings of their own memories and reflections of the 1992 Homeland War. The repository will also store them in a publicly accessible database. By having the software be open to citizen scientists, the database will be one of the most inclusive and easily accessible memory banks. This initiative seeks to preserve the memory of the role of Croatian-Americans in the creation of free, modern Croatia during the Homeland War in the 1990s.
By: Steven Burgess
Mentored by: Stephen Klusza
Keywords: synthetic biology, community building, citizen science
I want to help build a culture of open science and good practice (as well as fun) within the plant synthetic biology community, with an initial emphasis on the US.
I hope to do this by (1) establishing an open toolset for genetic manipulation of algae and photosynthesis enzymes (2) developing an open repository of protocols for genetic manipulation (3) producing educational resources to aid experimentation, both in academia and for citizen scientists (4) building a community of interested individuals to expand and contribute to the project.
By: William Jackson
Mentored by:
Keywords: Robotics, Synthetic Biology, Open Source, Community Enhancement
An open-source software tool and associated protocol repository that translates wet-lab protocols into instruction sets for commonly available robotic liquid handlers. Protocols will be hosted on a publically accessible website, and community members can edit, annotate, and report on different protocols. Think Github for biological protocols with an issue tracker.
By: Robin Lewando
Mentored by: Bruno Soares
Keywords: palaeoecology, palaeoenvironment, palynology, interdependence, interconnectedness, landscape, public, geomorphology, geology, geography, microscopy, microfossils
This project is a field and laboratory based research project researching, surveying, and discovering the palaeoecology and palaeogeography of West Cork. The project will make use of:- paper research methods; sampling and scientific analysis of sediments; digital mapping; field and site visits and landscape analysis; scientific processing, analysis and identification of microfossils from sediments; site visits and surveys; ecological surveys; and local enquiry. Results and findings will be published on a website in the form of:- stories, accounts, photographs, digital interactive maps, and graphics, with a prime emphasis on accessibility, understandability, relevance. Principal attention will be paid to environmental areas that are productive of microfossils (bog and lake sediments); that have distinctive landscape features and sediment types (relict glacial and past and present fluvial landscape features); different natural habitat types and plant and animal communities; geological distinctiveness; and archaeological sites. Emphasis will be placed on the interconnectedness of these aspects of the current and past environments. The final step will be to show how, in each area, however local, these many and varied aspects have contributed to the present landscape and environment and thus to give an understanding how the future development may progress.
By: Manuel Spitschan, Grégory Hammad
Mentored by: Mallory Freeberg
Status: graduated
Keywords: open data, open science, data schemas, metadata, actigraphy, actimetry, rest-activity cycles, circadian rhythms, chronobiology, sleep research
Actigraphy provides a measure of the 24h rest-activity cycles based on movement counts, typically of the wrist. It is obtained using wearable devices and is a widely used, non-invasive way to determine sleep and circadian properties. Importantly, metrics derived from actigraphy are being increasingly used in clinical contexts, where groups of psychiatric and neurological patients in specific conditions have found to be exhibit abnormal rest-activity rhythms and sleep. Sleep and circadian parameters from actigraphy are derived measures. These are obtained by converting the movements counts (usually obtained at a resolution of 1 minutes) into sleep parameters and circadian metrics using algorithms raging from threshold-based computations to machine learning techniques. Unfortunately, at present, there are no standards or schemas for specifying and sharing actigraphy data and corresponding algorithms.
The goal of this project is to develop a common schema for the use, analysis, reporting and open and interoperable sharing of actigraphy data across different actigraphy devices produced by different commercial manufacturers and for use by researchers and research users. This project builds upon core research and technical expertise amongst the team members, and provides a framework to structure the work of the newly funded Chronobiology Data Standards Interest Group (CDSIG).
By: Olayile Ejekwu
Mentored by: Renato Alves
Status: graduated
Keywords: Kinetic modelling, optimization, Bioprocesses, Microbial growth, parameter estimation
BioFerm is a web application platform which can be used for kinetic study, simulation and optimization of bioprocesses. The user is able to calculate the best initial conditions as well as overall operating conditions which will result in the highest product yield (or any user specified output). Kinetic modelling can also be done to further analyse the process and to calculate and estimate yield and kinetic parameters respectively. This allows the prediction of substrate, product and biomass concentrations over the bioprocess period. The BioFerm web application will be able to take in a variety of bioreactor configurations (batch, fed batch, continuous) and fit the results to a variety of models(inhibition and non-inhibition) to return the above mentioned parameters. The software is currently being written in Python using an open-source app framework(streamlit) to run the app but will later be written using Django, also a popular web framework.
By: Carly Monks
Mentored by: Esther Plomp
Status: graduated
Keywords: Australian archaeology, Open data, Indigenous Knowledge
This project will investigate existing literature on the benefits, risks, and limitations of open data practices in Australian environmental archaeology, seeking to characterise the ethical and practical issues associated with the dissemination of data owned or stewarded (either wholly or in part) by Indigenous communities. Environmental archaeology, and its partner field of palaeoecology, is inherently interdisciplinary, drawing on diverse lines of evidence including faunal and botanical remains, geomorphological records, and Indigenous knowledges in order to understand past and present human-environmental relationships. The project will consider the tensions between Western scientific and Indigenous epistemologies, including the ways in which ‘data’ are understood and connected (or disconnected) to people and places, and where the boundaries of ‘archaeological’ and ‘non-archaeological’ environmental records lie. This project will provide the groundwork for the development of a larger, collaborative project engaging Indigenous and non-Indigenous researchers to advance a Code of Conduct for Australian archaeologists and palaeoecologists seeking to work openly while supporting the rights of Indigenous communities to manage access as they consider appropriate.
By: Fabienne Lucas
Mentored by: Sonika Tyagi
Status: graduated
Keywords: rigor, reproducibility, peer-review, publishing, research quality defects, experimental methods, flow cytometry, tool, AI-assisted peer-review
The MiSET initiative aims to develop a minimum set of quality standards in the form of a quality assessment tool that evaluates the technical aspects of cytometry publications, and to fully integrate these flow cytometry standards into grant submission and publication requirements across scientific fields (Lucas et al., Cytometry A 2019).
By: John Ogunsola
Mentored by: Sam Haynes, Yo Yehudi
Keywords: bioinformatics, data visualization, open educational resource
Genetic variants of APOL1 commonly found in people of recent African ancestry can predispose to chronic kidney disease. It is however unknown if and to what extent the variants are present outside of Africa. This project aims to create a visual representation of the global distribution of the frequencies of these genetic variants, by mining genomic information from publicly available datasets.
By: Reina Camacho Toro, Alexander Martinez Mendez
Mentored by: Laura Ación
Status: graduated
Keywords: Open educational content, Data science training, Open science training
LA-CoNGA physics is an Erasmus+ project, an European-Latinamerican network of 11 universities, 9 research institutions and 3 industrial partners (2 of them being in the data science field) in advanced physics. We aim to create a set of postgraduate courses in Advanced Physics (high energy physics and complex systems) that will be common and inter-institutional, supported by the installation of interconnected instrumentation laboratories. This program will be inserted as a specialization in the Physics masters of the 8 Latinamerican partners in Colombia, Ecuador, Peru and Venezuela. It will comply with the Bologna protocols and is based on three pillars: courses in physics theory/phenomenology, data science and instrumentation.
We are guided by the principles of open science and education: *Content should be engaging and pedagogical *The content will be created and made available following the FAIR principles *Reproducibility is the base of the data science pillar. We want to teach the students how to use the correct tools to work with large amounts of data but also create an environment where the reproducibility of their work, tasks and projects is inculcated and applied from the first day
Our website: https://laconga.redclara.net Our github repository: https://github.com/LA-CoNGA
By: Lomax Boyd
Mentored by: Melissa Burke
Keywords: online research, mentorship, virtual environments, Jupyter notebooks
Inspired by the social clubs founded by Benjamin Franklin, the Junto Labs initiative seeks to provide life science researchers with an online space for pursuing collaborative research and supporting active learning. Life science laboratories can be open and highly collaborative spaces for in person research, learning and discovery. While online tools, such as Git and Jupyter notebooks, help facilitate openness and reproducibility among peers, they can also provide a highly creative and flexible medium for designing interactive educational experiences. The Junto Labs initiative aims to create a catalog of Jupyter notebooks that exemplify how to design virtual environments optimized for conducting research, facilitating mentorship, and encouraging active learning. Researchers would be able to more easily collaborate on active projects, but also expand active learning opportunities for students who may not otherwise have the chance to participate in research. Importantly, life science laboratories could use the resource to design and provide research and mentorship opportunities to students from under-resourced communities or universities where opportunities to participate in life science research are limited or nonexistent.
By: António Sousa
Mentored by: Hans-Rudolf Hotz
Status: graduated
Keywords: metabarcoding, python pipeline, reproducible
The emergence of short-read NGS technologies have brought a profound knowledge to the field of microbial ecology/evolution through the taxonomic identification of microbial communities - metabarcoding. Although its main limitation resides on their short read-length that has been suppressed by long-read/real-time sequencing technologies such as Oxford Nanopore MinION. Currently, there are many standalone tools/algorithms to process this data inclusive bioinformatic pipelines but they lack a better integration. My proposal is the development of a modular python pipeline for nanopore metabarcoding data - metaNanoPype - with the following modules: (I) demultiplexing; (II) quality-assessment; (III) quality-filtering and trimming; (IV) taxonomic classification; (V) diversity analyses. Each module could include several options to allow flexibility. Each step could generate a log file used later to build a report in html/pdf format describing the versions, commands and references of software used. The report built would ensure reproducibility, transparency, acknowledgement and could be used as supplementary material of papers. metaNanoPype could be publicly available on github (open source) with further documentation published with github pages.
Mentored by: Michael Landi, Renato Alves, Toby Hodges
Status: graduated
Keywords: open educational resource, open-collaborative, molecular biology
The field of molecular biology is a concept to discover, identify and explain mechanisms of everything about DNA, RNA and protein level in a cell. Despite it is a relatively young discipline, its prominence in the life sciences is becoming more and more popular. Within the scope of my project, I aim to evaluate the paper on molecular biology studies and make them available for everyone. Choosing a weekly topic and summarizing it that everyone can understand is the main idea. As a workflow, I aim to write a brief introduction to introduce the paper and its authors, the purpose of the study, and present the results. Briefly, I would like to design a website which is publicly accessible. I aim to make this website as a resource for the academic community, students and all other folks who want to read and learn. At the same time, I plan to prepare a section where questions can be asked in order to share with other readers to make a discussing community about the related article. I want to provide a connection between students or researchers in this field of science to improve knowledge, share and even find new ideas.
By: Muhammet Celik
Mentored by: Bérénice Batut, Yo Yehudi, Malvika Sharan
Keywords: value of OLS, participant perspective, internalize openess, pendown
OLS is a great platform to open life science with the objective of train young researchers towards the practices in open science skills. I am a graduand of OLS-2 that recently concluded and coming out of that program I felt that there is tremendous value in the program. However, this might not be reaching out to as many as possible. I think, one way of the extending the outreach beyond what else has been done is perhaps to pen down the experience of the participants of the previous program. As a participant myself, I can see how there many ways, one could promote this and share the journey with the readers, especially with the young generation and highlight the essence of this program. Thus, I took it motivation to myself to contribute in this direction by re-joining the OLS-3 program and having this it self as a project with the goal of coming up as a tangible document in the form of a publication to be shared with the community at large.
By: Afzal Ansari, Abdulelah Al Mesfer
Mentored by: Toby Hodges
Status: graduated
Keywords: sktime, documentation, algorithm maintainer, codeowner
sktime is a new Python toolbox for machine learning with time series (https://github.com/alan-turing-institute/sktime). It provides state-of-the-art time series algorithms and scikit-learn compatible tools for building, tuning and evaluating complex models. The goal of this project is to improve sktime’s online documentation with a specific focus on documenting algorithm contributors. Algorithms form a major part of sktime. They require special expertise in their development and maintenance. We plan to enhance the existing documentation by making algorithm contributors more visible. The aim is (i) to make it easier for users and other developers to directly get in touch with the algorithm experts to ask questions or suggest code improvements and (ii) to recognize their contributions more visibly and formally to encourage long-term maintenance of their contributions. sktime has already defined a new community role as part of their governance guidelines to ensure that algorithm contributors have extra rights and responsibilities with regard to their algorithm. However, up-to-date documentation listing the current contributors and links to their algorithms is currently missing. Optionally, we can add other information like literature references. We plan to automate the generation of this documentation by making use of the existing documentation and other components such as CODEOWNERS file and author strings in Python files.
By: Sarah Gibson
Mentored by: Meag Doherty
Status: graduated
Keywords: open-source, research, strategy
This project is to develop a Turing Service Area in Open Source that will provide formal support in open working and embedding best practices of open software development into Turing projects. This service area will create an Open Developer Advocate position whose role will be to work with and guide projects into working openly and either build a community around their open project, or make a contribution to an existing open project. This guidance would take the form of regular meetings, co-working and/or drop-in sessions and would address roadmapping of the project in terms of its open goals, and developing project policies for engaging openly. The area would work with the Turing Way project to draw on existing material and contribute new processes there.
By: Javier Ruiz Pérez, Juan José García-Granero, Carla Lancelotti, Marco Madella
Mentored by: Emma Karoune
Status: graduated
Keywords: FAIR, data sharing, palaeobotany, phytolith research, archaeology, palaeoecology
Phytoliths are microfossils of plants used world-wide to address a variety of questions in fields like archaeology, palaeoecology and palaeontology. Diverse laboratory procedures, analyses and identification criteria are used resulting from different research traditions. Some steps, such as the normalisation of nomenclature through the International Phytolith Society, have been promoted to standardise the phytolith analysis and the subsequent publication of data. However, the standardisation of phytolith research and data publication is still far from being achieved. Moreover, a recent assessment of the data sharing practices within the phytolith community found only half of the publications share some form of data and the majority do not provide reusable data. This project has grown from initial efforts by Emma Karoune during OLS2 to raise awareness of issues with poor data sharing practice. It is part of a broader initiative supported by the International Phytolith Society on data sharing and represents the first steps towards the FAIRification of phytolith data: an evaluation of sharing practices in phytolith research; the creation of a GitHub repository for collaborative use by this working group and in the forthcoming FAIRification project; and the development of a webpage to provide the community with information as the project proceeds.
By: Arvinpreet Kaur, Ashutosh Tiwari, Robandeep Kaur, Mehak Chopra, Harpreet Singh, Prash Suravajhala
Mentored by: Prash Suravajhala, Harpreet Singh, Bérénice Batut
Status: graduated
Keywords: Obesity, Diabetes, Gut microbiome, Linkage disequilibrium, pleiotropy
Obesity causes approximately 4.7 million premature deaths annually, which accounts for a loss of ca. 8% globally. Obesity is an outcome of complex, heritable, and multi-factorial interaction of multiple genes, environmental factors, and behavioral traits that makes management and prevention challenging in the human population (Rao et al., 2014). Experimental research has demonstrated that altered metabolites in multiple metabolic pathways are associated with obesity (Zhao et al.,2016). Alteration in the proportion of bacteroidetes and firmicutes in the gut microbiome can trigger obesity. The gut microbiome’s influence on obesity is much more complicated than the imbalance of these bacteria species. Modulation of the gut microbiome through diet, prebiotics, surgery, and antibiotics significantly affects the obesity epidemic (John & Mullin, 2016). It is one of the enormous global health problems associated with increased morbidity and mortality mediated by its association with several other metabolic disorders (Saini et al., 2018). We aim to target obesity and diabetes-associated metabolic disorders and annotate the genes common to these complex diseases using a systems genomic integrated approach, thereby using Galaxy as a platform.
By: Didik Utomo, Hilyatuz Zahroh, Zenita Milla Luthfiya
Mentored by: Iratxe Puebla
Status: graduated
Keywords: preprints, open resource, open access, publishing
AKADEMISI PREPRINTS is a free distribution service of preprints from multidisciplinary fields. The server plans to include connection hub to journals and open peer review community. By doing so, we hope to promote the transparency and quick visibility of research results to the public.
By: Batool Almarzouq
Mentored by: Anelda van der Walt
Status: graduated
Keywords: Open science, Saudi Arabia, Community
Although there is an increasing number of initiatives in Saudi Arabia to raise awareness in Data Science (DS) and connect researchers in artificial intelligence (AI), there is no single community dedicated to stimulating responsible research practices and Open Science policies. I wish (with the help of a mentor) to establish an open science community in Saudi Arabia. Our target groups are researchers and students who are open and curious about open science but have little to no experience with open science practices.
By: Anshika Sah
Mentored by: Yo Yehudi
Status: graduated
Keywords: SARS coronavirus 3C-like proteinase, IC50, pIC50, Bioactivity, Lipinski’s rule, Scatter plot, Frequency plot, Box plot, Mann-Whitney test
Biological activity data was retrieved from the ChEMBL database and pre-processed by selecting the target which was SARS coronavirus 3C-like proteinase in the project and the data frame of the target protein was filtered by removing the molecules which do not have the standard type as IC50 and those having missing value for standard value. The data was distributed as active, inactive, and intermediate by the IC50 values. The SMILES notation (representing the unique chemical structure of compounds) from the dataset was used to compute the molecular descriptors. Lipinski’s descriptors are used in the project which considers molecular weight, LogP, number of hydrogen bond donors, and number of hydrogen bond acceptors. These descriptors are related to the pharmacokinetic properties of molecules. The exploratory data analysis was performed via Lipinski’s descriptors. Simple box plots and scatter plots were plotted to discern differences between the active and inactive sets of compounds. Mann-Whitney U test was performed for each descriptor to determine the statistically significant difference between active and inactive molecules.
By: Irene Ramos
Mentored by: Piraveen Gopalasingam
Status: graduated
Keywords: agrobiodiversity, open data, oer
I aim to develop training materials to support the use of open data by researchers working on agrobiodiversity conservation. At CONABIO (Mexico), a governmental agency that coordinates biodiversity data collection, I collaborate in the development of an Agrobiodiversity Information System (SIAgroBD); my role involves technical and community management responsibilities. Currently, twelve teams of students and researchers from different institutions contribute to field data collection for SIAgroBD. While we are committed to open practices at CONABIO and all collected data are open, some external contributors lack the skills to use these data, even if they have helped collect them, and are not familiar with open practices. Thus my project consists in developing training materials (OER) for an introductory workshop on open data with a focus on FAIR principles, biodiversity standards, effective management strategies, among other skills that encourage contributors to become active users of data apart from collectors. The integration of social and biological information and the use of Indigenous data are distinctive features of agrobiodiversity research in Mexico that will also be addressed. I expect this serves as a prototype for advanced training modules that could be used by future contributors or other researchers working on agrobiodiversity topics.
By: Jennifer Miller
Mentored by: Beth Duckles
Status: graduated
Keywords: postdoc, empirical legal research, systematic review, public policy, open notebook science
The project is an open notebook living systematic review of legal documents related to postdoctoral scholars and appointments (postdocs). The project aims to use the methods of empirical legal scholarship to describe and categorize the ways postdoctoral scholars and their appointments have been involved in the legal system. Briefly, empirical legal scholarship is a form of qualitative or mixed-methods research, often involving content analysis, that uses legal documents or decisions as its data source. We are not aware of any other research applying this method or data source to the study of postdocs. In fact, there has been little research of any kind on the legal aspects of postdoc appointments.
Building on a “file drawer” paper by Jennifer Miller (with Kristina Van Buskirk), we frame our project around the question of whether postdocs are employees or students. Based on economic theory, we expect the types of cases to reflect whether postdocs are employees producing in a labor market or students consuming in a services market.
More information about the project is available on GitHub https://github.com/JMMaok/postdoc/projects and Zotero https://www.zotero.org/groups/
By: Emma Lawrence, Jessica Sims
Mentored by: Sarah Gibson
Status: graduated
Keywords: Biobanking, research, samples, biospecimens, COVID19
The mission of the UKCRC Tissue Directory and Coordination Centre (UKCRC TDCC) is to maximise the use, value and impact of the UK’s human sample resources in the UK, and beyond. The UKCRC TDCC is creating a world-leading, research-enabling, and networked biobanking infrastructure to facilitate the discovery and use of the UK’s human samples and data. The UKCRC TDCC works to help researchers discover samples and data, help sample resources improve their data systems for sharing, and harmonise policy relating to the discovery and use of samples and data. The work of the UKCRC TDCC is guided by the belief that the biomedical research ecosystem should be based on open standards, open-science, and pre-competitive collaboration.
By: Shamsuddeen Muhammad, Ibrahim Said Ahmad, Ruqayya Nasir Iro
Mentored by: Laura Carter
Status: graduated
Keywords: Natural Language Processing, Low-resources, Machine Learning, Corpus, Language resources
This work aims to create a Nigerian sentiment corpus, sentiment, and hate speech lexicon through manual annotation for three different languages (Hausa, Igbo, Yoruba). Our method for the creation of these language resources is as follows:
Nigerian Sentiment Corpus: To create the sentiment corpus, tweets from major Nigerian news headlines for each of the three languages will be crawled from Twitter using an existing Python crawler we developed. Ten thousand tweets will be extracted per language via the Twitter API. Thereafter, the tweets will be annotated by native annotators for each of the languages. These annotators will be hired and trained to perform the annotation. The annotation tasks consist of labeling each tweet as either positive, negative or neutral.To mitigate errors and bias, each dataset will be annotated by three different annotators. After which the project team will compute the kappa agreement between the annotators
Nigerian Sentiment Lexicon: In the same way, manual annotation of the tweets will be used to create the sentiment lexicon for each of the three languages. The sentiment lexicon annotation task involves Identifying sentiment bearing words from each tweet and assigning a sentiment score between +1 to +5 (with 1 being the most negative sentiment and +5 the most positive sentiment).
Nigerian Hate Speech Lexicon: Extreme negative sentiment from the sentiment lexicon will be used to develop the hate speech lexicon.
Annotation tool: We plan to use a web-based annotation tool, brat (Stenetorp et al., 2012) which has been proved to be efficient for this type of task by many researchers. The annotators must be native speakers of the language and follow the annotation guidelines provided by the project teams.
HausaNLP aims to create more language resources that can be used to train models in machine learning.
By: Haridimos Kondylakis, Stelios Sfakianakis
Mentored by: Harpreet Singh
Status: graduated
Keywords: Prostate Cancer, Open Data, Pca
In Europe, prostate cancer (PCa) is the second most frequent type of cancer in men and the third most lethal. Current clinical practices, often leading to overdiagnosis and overtreatment of indolent tumors, suffer from a lack of precision calling for advanced AI models to go beyond SoA. The ProCAncer-I project brings together 20 partners, including PCa centers of reference, world leaders in AI, and innovative SMEs, with the objective to design, develop, and sustain a cloud-based, secure European Image Infrastructure with tools and services for data handling. The platform hosts the largest collection of PCa multi-parametric (mp)MRI, anonymized image data worldwide (>17,000 cases), based on data donorship, in line with EU legislation (GDPR). Robust AI models are developed, based on novel ensemble learning methodologies, leading to vendor-specific and -neutral AI models for addressing 8 PCa clinical scenarios. To accelerate the clinical translation of PCa AI models, we focus on improving the trust of the solutions with respect to fairness, safety, explainability, and reproducibility. A roadmap for AI models certification is defined, interacting with regulatory authorities, thus contributing to a European regulatory roadmap for validating the effectiveness of AI-based models for clinical decision making.
By: Dario Pescini, Marzia Di Filippo, Chiara Damiani, Paolo Pedaletti
Mentored by: Bérénice Batut
Keywords: community building, Systems Biology, Metabolism, quantitative Life Science, technology, infrastructure
The long term project objective is to establish in my university a core-team/lab able to aid the community to design and implement open science projects. In order to start this long term project I believe that working on a use case would help various aspects. It will help to coalesce and uniform the team domain knowledge, to start to get involved also the technical and administrative part and, to gain visibility and credibility. The use case is a computational framework to aid the metabolism modelling, that we are currently developing in my lab and it is on the way to be published. The publication that will accompany this framework is near to be ready to be submitted and the framework itself is wholly developed with open software. This use case, in particular, is suitable to follow various aspects of the Open Science approach, from the journal paper management to the software publication. I think that this application can be great opportunity to learn the open science approach in an organic way and to discover how to do it.
By: Marta Marin, Santi Rello Varona, Iris San Pedro
Mentored by: Joyce Kao
Status: graduated
Keywords: Health Research Institutes, Network of Open Science Ambassadors, Best practices’ Toolkits, Open Science implementation
This project will create a network of Open Science (OS) ambassadors in Spanish Health Research Institutes (HRI). Thus, aiming to implement OS in HRI by raising consciousness about its principles that apply to this particular field (i.e., reproducibility, transparency, dissemination and data sharing). To induce an easy and comprehensive transition to OS, researchers will be provided with access to the best practices’ Toolkits for OS implementation. As part of their activity, OS ambassadors will be encouraged to engage with the general public, patients and the future generations of scientists.
To accomplish that, professionals in the HRI willing to be trained to become OS ambassadors will be identified and recruited. This network will be in charge of promoting OS in their institutions. The ambassadors will identify potential OS activities that can be kickstarted, give solutions to questions raised and advice on best practices on their institutions.
At the end of this project: a network will be created, together with a framework to maintain this group active, and ambassadors will disseminate the knowledge gathered about application of OS principles in health research in their institutes. That would allow the progressive implementation of OS in HRI.
By: Michal Růžička, Michal Javornik, Zdenka Dudova
Mentored by: Louise Bezuidenhout, Bérénice Batut
Status: graduated
Multimodal and Functional Imaging Laboratory (MAFIL, https://mafil.ceitec.cz/en/) is one of core facilities at CEITEC MUNI and part of national large research infrastructure Czech-BioImaging and European research infrastructure Euro-BioImaging. The main role is to provide access to medical imaging technologies – mainly magnetic resonance imaging accompanied with various electrophysiological methods. Within this project we aim at preparing our data and metadata of neuroimaging datasets processed in MAFIL to follow FAIR principles and be ready for publication and cloud-based processing in EOSC. As MAFIL is “open access” laboratory, i.e. provides researchers outside of CEITEC access to the laboratories, technologies, and experts of CEITEC to conduct their analysis and support their research needs, the procedures will be document and training provided to MAFIL users (“customers”) to be aware of FAIRification procedures and able to apply them on their data making them “EOSC ready”. The outputs and experiences will also be shared with other labs/nodes within Czech-BioImaging/Euro-BioImaging infrastructures. Thus, we would very appreciate and welcome any training, help, advice, or good practise on FAIRification and anonymisation of neuroimaging datasets.
By: Ali Humayun
Mentored by: Malvika Sharan
In The Turing Way, we want to systematically understand community practices including the community engagement pathways, contributors’ roles and nature of their participation that have been successful at supporting its community of diverse contributors. Simultaneously, we want to identify factors that may currently prohibit short or long term commitments of our contributors and how they can be further supported.
With my participation in OLS-3, I will develop a community health report of the project, capturing community development aspects from growth to retention. I will build upon the Open Source community health metric (https://wiki.mozilla.org/Contribute/Community_Health), which involves evaluating contributors’ group that is actively involved in a project, number of new contributors that join the project, and members who leave. For online projects, it can also involve tracking the number of community ambassadors, the number of return attendees to events and the rate of churned attendees. Developing an ideal metric in this project will require further deliberation and consultation from The Turing Way team and core contributors. Hence, this project will be collaboratively designed with other community members by actively inviting their contributions and thoughts.
By: Aida Mehonic
Mentored by: Malvika Sharan
Keywords: open science workflow, open source code, stakeholder engagement, research application, ASG
I have just started a new role as a Research Application Manager at the Turing Institute. My responsibility is to define my own workplan as well as guide the development of the workplans of 2 other Research Application Managers once they are recruited. This is a new role and we do not yet have a blueprint for what a good research application manager at Turing looks like.
My goal is to embed open science practices into the philosophy and the workflow of the RAM team, as much as possible within the constraints of a given project and Theme.
Since I am personally new to the open science community, I would benefit from OLS training and mentorship. My ambition is to ensure that we create a good basis for open science practices within ASG and hopefully in other parts of Turing that RAMs interface with.
By: Prakriti Karki, Mohan Gupta, Ujwal Shrestha
Mentored by: Teresa Laguna
Status: graduated
Keywords: online, village, education, DIY, Makerspace
Project aims to connect rural school students to the cities of Nepal and other countries. Pandemic has helped few Nepalese rural schools get internet facilities. Our project will take help of internet, online conferencing tool and team of young people from different disciplines to connect our little heroes to the outer world to help them learn language, better understand external culture, meeting new friends, learning science, doing DIY innovations and initiating makerspace movement through virtual collaborative environment.
By: Martina Vilas
Mentored by: Anna Krystalli
Status: graduated
Keywords: Reproducibility, AI trends, Data Science, Reproducible research practices, Computational research methods, Research software
In The Turing Way, we define reproducible research as work that can be independently recreated using the same data and code from the original study. Reproducible research is necessary to ensure that scientific output can be trusted and built upon. Despite this importance, many studies are difficult to reproduce, including those involving the application of a computational model.
To overcome this “Reproducibility Crisis” we need to identify and standardized reproducible practices that researchers can apply in their projects from the start. But these may vary across fields and methods. In this context, this project will quantitatively assess and derive those research practices that can ensure the reproducibility of studies involving the development and application of AI models for understanding cognitive-systems, with the overarching goal of increasing their transparency and trustworthiness.
As a cognitive neuroscientist, I will develop a prototype of this assessment by identifying and openly documenting reproducible practices of computational modelling projects in my field. With my participation in OLS-3, I will review the reproducible practices of gold-standard studies and assess the level of transparency maintained in their research. I will also curate relevant guidelines and expert-recommendations. The findings will be collaboratively reported as chapters in The Turing Way.
By: Mishka Nemes
Mentored by: Jez Cope
Keywords: education, training, data science, AI, open infrastructure, community
This project aims to devise, develop and implement an online tool that allows interested users to suggest or contribute to training courses in an open source fashion. The tool could involve a GitHub repository where users can suggest training ideas, review and comment on existing courses, or share their resources for the larger community. As the national institute for data science and AI promoting open, expert and ethical leadership, I believe the Institute and my team would be well placed to support such an engagement stream with the wider community of trainers and researchers.
By: Teresa Müller, Alireza Khanteymoori, Masako Kaufmann, Florian Heyl
Mentored by: Yvan Le Bras
Status: graduated
Keywords: citizens science, DNA sequencing, metagenomics, Galaxy
As part of the Street Science Community, we successfully developed the BeerDeCoded project: a hands-on workshop for pupils and citizens with the general aim of scientific outreach. During these workshops, we help participants to extract and identify different yeasts contained in a beer sample. The identification is performed by sequencing the extracted yeast DNA, using our self-developed protocols, and analyzing the generated reads via an easy and straightforward Galaxy workflow. Because of the pandemic situation, we cannot run face-to-face workshops. For a more scalable outreach to the public and the long term sustainability of the project, we want to implement the data analysis as a series of fun and easy-to-understand online games. We will use already existing games to get participants interested and give them the biological background necessary for our project. Primarily we will develop a game, which teaches the data analysis of the BeerDeCoded project. Here, participants will get familiar with Galaxy, run and play with their first data analysis pipeline. They are going to compare their results with others and use different available datasets. For this game, we will work with the Galaxy community on the technical part and with teachers on the pedagogical and gamification side.
I’m the primary Software Engineer at the DAMP Lab at Boston University, focusing mostly on integration of robotic systems and biological design tools. I enjoy dogs, beer, and fantasy novels.
Abdul founded two chapters of PyData in Saudi Arabia with a mission to support and grow the community of open source developers in the middle east. He is passionate about enhancing computer science education around the world and especially in the Arabic speaking communities.
Self-motivated, Dedicated, Focused
I am a Research Application Manager at The Alan Turing Institute. It’s a new role and one we hope to use to demonstrate how by creating deeper connections between research teams and external stakeholders, the impact of the overall Programme can be significantly improved.
I am 23 years old and currently working at a sports media company as an editor. I graduated in October 2020 from UCL in MSc Digital Humanities, with a year of experience in journalism. I also have a previous background in history, and I am about to embark on a postgraduate diploma in law, with hopes of pursuing a career at the intersection of law and technology. Also, I’m very keen to help address educational inequity as well as learning more about community building and the open science community!
I am driven to engage with communities to foster an appreciation and connection for both intangible and tangible culture. Growing up in a small town, I learned what a gift it is to connect with people from another way of life; OLS makes that possible.
Currently a biochemistry student and very much interested in the field of data science and bioinformatics. Very keen to meet amazing people of same interest.
I am a bioinformatician, with background in molecular/cell biology, working in the Bioinformatics Unit at Instituto Gulbenkian de Ciência (IGC), Oeiras, Portugal.
A student and innovator from rural India.
Batool is a computational biologist affiliated with both KAIMRC in Saudi Arabia and the University of Liverpool in the UK. As an advocate for Open Science and its role in improving scientific and economic outputs in the Middle east, Batool established an Open Science Community in Saudi Arabia (OSCSA). OSCSA aims to create significant value towards Saudi Arabia’s Vision 2030, which focus on enhancing knowledge and improving equal access to education in the Kingdom
I divide my time between data analysis to understand the smallest components of matter, instrumentation R&D, and science and education capacity building programs to build the next generation of scientists in Latin America. I am an advocate for virtual research and learning communities as a way to strengthen scientific connections between Europe and Latin America.
Carly is a Senior Technician at the University of Western Australia, responsible for managing the archaeology laboratories, field safety, and equipment. She specialises in faunal analysis (zooarchaeology), and is passionate about connecting people with place and environment.
Archaeobotanist and ethnoarchaeologist interested in dryland agriculture
I am a budding scientist, interested in host-parasite interactions. I am open to continuous learning, and passionate about improving the health of man and animals.
Work as IT analyst at university, mainly do an interface between scientists and IT experts. Lead small IT team of BBMRI.cz focused on data gathering and harmonization.
Grew up on a farm in Germany. I like open science, sustainability and exploring the outdoors. I pursue a PhD working on improving the design of stable metalloproteins using deep learning and molecular simulation.
I am a former immunologist who now works at the UK’s Biobanking Centre. My role is to engage with researchers and Biobankers to improve efficiency in the sector.
I am a German-born, German/UK/US-trained physician scientist and current Clinical Pathology Resident/ Pathology Fellow at Brigham and Women’s Hospital and Harvard Medical School. I am passionate about blood, translating research findings into patient diagnostics and care, and removing obstacles that prevent every patient getting the treatment that is right for them. As a clinical pathologist, I believe this starts with establishing a correct diagnosis - backed up by science and fair and transparent peer-review that holds everyone to the same standards.
High-energy physicist recruited by a neuroscience lab!
A Bioinformatician, who strongly believe in constant learning, collaboration, and team work.
Bioinformatician, working in pharmacogenomics and human genetic disease fields. Outside research, working with APBioNet as APBioNet ExCo and APBioNetTalks program coordinator.
Role in OLS:
NASA Cohort Coordinator (contract)
I work as data manager at CONABIO where I develop FAIR workflows for biodiversity and agricultural data. I also study a PhD at UNAM, and my research is focused on the challenges for integrating social and ecological data. I love working in interdisciplinary projects that combine my interests in sustainability, data and open research
Ibrahim Said Ahmad is lecturer in the Department of Information Technology, Bayero University Kano. He completed his PhD from Universiti Kebangsaan Malaysia, in 2020 focusing on data science. His main areas of interest lie in Data Analytics, Natural Language Processing and Artificial Intelligence specifically in business intelligence and computational intelligence. He has worked and published articles on sentiment analysis, natural language processing, and data mining.
I am an archaeologist with field experience in Bolivia, Brazil, India and Spain, specialising in phytolith analysis for archaeological and palaeoecological studies. Currently I am a last-year PhD candidate at the Universitat Pompeu Fabra, waiting for viva. My main interests are prehistoric cultivation systems, landscape anthropization and the development of new techniques for phytolith analysis.
I work at the UKCRC Tissue Directory and Coordination Centre (TDCC). It is a project in collaboration between UCL (my host institution) and University of Nottingham to build and maintain a online directory of UK-based tissue samples. There I work to develop collaborations with external stakeholders to join-up the UK’s biomedical research landscape in relation to biobanking - the collection, storage and use of (human) biological samples for research. My background is in policy development with specific focus on social justice, and patient and public engagement in health research and UK clinical guidelines. I have a special interest in using creative activities and techniques, such as the use of performance and games, to engage both professionals and the public in research work.
Independent scholar advancing open knowledge through a portfolio of projects in open education, open science, and open data. PhD in Public Policy with research interests at the intersection of science & technology policy and the future of work. Expertise in issues facing early career researchers.
I am an archaeobotanist interested in how late prehistoric societies interacted with their environment in terms of plant food acquisition and transformation practices, particularly during the Neolithic and Bronze Age.
Katharina is a PhD student at the Peer-Produced Research Lab at the Center for Research & Interdisciplinarity in Paris, France. She is working on the participatory design of tools to support bottom-up communities in citizen science in peer-producing knowledge. She has a background in cognitive and media science and user experience consulting and is passionate about art and illustration.
Postdoctoral researcher, pationate on data management and semantics.
Geneticist, educator, and filmmaker with experience spanning neuroscience, evolutionary biology, and trekking through the muck in the Yukon Territories. My research focuses on the molecular and genetic mechanisms regulating human brain development but when I’m not in the laboratory, you can normally find me somewhere north of the arctic circle.
Martina is currently working at the Max-Planck-Institute AE, doing cognitive neuroscience research using computational modeling techniques. She is an open-science advocate who enjoys programming and contributing to open-source projects and communities. She provides infrastructure support for The Turing Way project as a core contributor.
I’m currently post-doc at the Department of Statistics and Quantitative Methods, University of Milano-Bicocca working on the development of computational pipeline for the reconstruction of genome-scale metabolic networks of little-known organisms.
I am an enthusiastic student, pursuing Master’s in Bioinformatics from India. I am passionate about science and scientific techniques. My interests include Genomics, Proteomics, Molecular Biology and Cheminformatics. I wish to learn and pursue research in the future to work for the betterment of human health.
Michal Růžička obtained a master’s degree in the field of Information Technology Security and is a graduate of doctoral studies in the field of informatics with a focus on advanced search methods in specialized digital repositories. He has worked on many international and national projects – in the field of digital repositories, e.g. on the projects of the European Digital Mathematical Library (EuDML), the Czech Digital Mathematical Library (DML-CZ), various digital libraries of MUNI. In cooperation with industrial partners (TAČR project) he worked on the ScaleText project (advanced search in heterogeneous types of [text] data using machine learning methods). He is currently mainly involved in projects in the field of data management, protection and access (Open Science): the development of a system for long-term preservation of digital data (LTP) in the ARCLib project, responsibility for Open/FAIR data activities in HR4MUNI II project on acceleration and advancement of Open Science at MUNI; is the leader of the Czech National Open Access Desk within the European project OpenAIRE with accelerating national-wide activities in research data management in the Czech Republic. He is (co)author of dozens of publications in the field of data management and digital libraries.
Interested in brains on all levels of analysis: from molecular to cognitive and computational. Keen to bridge disciplines (neuroscience x artificial intelligence) and ask pertinent research questions. Passionate about open science, community building and knowledge sharing.
Social entrepreneur, research, motivational speaker, DIY biologist.
I am a Systems Engineer passionate about applying Computer Science to improve the way we live. Enthusiast and promoter of open science in the region.
I am trying to be a good molecular biologist. I think that learning new things and sharing them in this field is great.
I am a PhD student at the University of Pretoria, South Africa. Currently working of optimizing chemical production from waste .
Physics degree, Computer science technician at Milano Bicocca University
Microbiology Researcher from Nepal struggling to create next generation scientists from rural Nepal.
A Systems Biologist with wide interests in the areas on functional genomics, protein informatics and interactions. *Principal Investigator for four or more projects coalescing keywords #HypotheticalProteins #VitaminK #LncRNAs #ProstateCancer *Founder of Bioclues.org, India’s largest bioinformatics society working for mentor-mentee relationships since 2005. *Advocate #OpenAcess and #OpenSource
I am a semi retired independent researcher looking into the palaeoecology and palaeogeography of West Cork in Ireland. I have degrees in Geology, Geography and Archaeology. I am committed to the ideals of Open Science.
After a PhD and several years in Cancer Research I am devoted now to promote and manage international research in La Paz University Hospital.
Sarah Gibson is a Research Software Engineer at the Alan Turing Institute where she helps solve real-world problems with cutting-edge techniques across academia, industry and the public sector. She is also a passionate open source contributor, primarily working with Project Binder to serve reproducible computational environments in the cloud around the world. On top of all that, she also promotes software best practices and reproducible workflows through her Fellowship with the Software Sustainability Institute.
I am PhD candidate in computer science at University of Porto, Portugal. I am also faculty staff at Bayero University, Kano- Nigeria.
Synthetic Biology enthusiast, brit, cat dad, motivator and baker with a sweet tooth.
I’m a visual and circadian neuroscientist interested in how light affects our physiology and behaviour. I’m also passionate about improving science.
I am Teresa, a PhD student in the Backofen lab at the University of Freiburg. My PhD is in RNA bioinformatics where I do data analysis, tool improvement and benchmarking. Apart form this, I am part of the Street Science Community, a scientific outreach group in Freiburg.
A researcher, motivational speaker
I am eager to learn