The need to assistance in use of e-infrastructure within the Social sciences and Humanities

at Faculty of Social Sciences at the University of Oslo.

Author

Dr. Athanasia M. Mowinckel

Background

Capturing and processing large data sets is becoming increasingly important in social sciences and humanities, but project leaders can struggle with implementing the necessary technical solutions to deal with increasing demands on open and reproducible science.  For instance, working open source software like Python and R, using version control like git and GitHub, using Open Science Framework, using parallell computing on Sigma2/TSD, using TSD efficiently, structuring data according to FAIR principles etc.

These increasing demands from funders and instutitions to implement new ways of working that requires technical solutions are taxing for researcers that want to focus on the science and not dealing with new tools.

Researchers often need help in setting up systems to get their projects off on the right track, and the earlier this can be done, the more likely a project is to meet its deliverables. But it is not realistic to expect every researcher to have the skillset needed to meet the increasing demands on technical solutions. Having the ability to utilise specialised staff, with experience and expertise in utilising a variety of technical solutions for open and reproducible science, will enable researchers to focus on the research. 

Furthermore, most projects only have this need in the beginning  (and maybe at the end) of a project, making it difficult to justify setting aside budgets to hire staff to do the work. So far, only large and established labs and centers with stable external funding are able to employ such staff on fulltime basis.

We, a group of data scientists currently associated with LCBC and Promenta, believe a new model of organising this type of workforce is a necessity at UiO in today's research landscape. We are thus considering establishing a core facility dedicated to assist UiO researchers in the capture, structuring and processing of complex life science data. For this reorganisation to occur, it would be beneficial for us to know how many, to which extent, and for what types of tasks researchers need this type of help. We would greatly appreciate any feedback and thoughts on this from our colleagues.

Summary

On February 01, 2023 the survey “Survey on need for data/technical assistance” was launched among researchers at the University of Oslo. As of February 10, 2023 the survey has collected 31 responses from 11 institutions, and among these responses, 14 said they would budget for the use of the services proposed if they were available. Another 15 would consider it given their needs and the skills advertised. The need for assistance in the use of e-infrastructure is clearly signalled by the responses provided, and show as lack of support to properly utilise e-infrastructure by the researchers, either by lack of such support, or by under availability.

Survey results

Of the 31 responses to the survey, the majority of responders (70%) were from sv, and are mainly within PSI (52%). Full presentation of main work place can be found in Figure 1. The survey was disseminated by ways of the psi sympa mailing list to all scientific staff at PSI, in addition to targeted e-mails to colleagues outside of PSI that we knew might be interested in the services CAPRO aims at providing.

Show the code
dt |> 
  mutate(
    institute = if_else(is.na(institute), "Undisclosed", institute),
    institute = fct_infreq(institute)
    ) |> 
  ggbar(institute)

Figure 1: Summary of respondents places of work, and their general field of research

Past needs

A substantial part of the responders indicated they in the past needed help in utilising new technology to get their projects running (90%). These report mainly asking other researchers for help, but some report asking internal center staff (RITMO, LCBC) and IT (local or central). As seen in Figure 2, most reporting getting “some” help, which was the lowest category or possible support to receive (next to not receiving any).

Show the code
p1 <- dt |> 
  drop_na(past_help) |> 
  mutate(past_help = if_else(past_help == "no", "none", past_help)) |> 
  ggbar(past_help) +
  labs(title = "Amount of received help in the past")

p2 <- dt |> 
  ggplot(aes(past_satisfaction, group = past_satisfaction)) + 
  geom_bar(aes(fill = past_satisfaction), show.legend = FALSE) +
  labs( y = "Count",
        title = "Satisfaction with received help",
        x = "Rating") +
  scale_x_continuous(limits = c(0, 10.5)) +
  coord_flip()

p1 + p2

Figure 2: Help received and needed


Future needs

The survey further probed what type of needs researchers have for e-infrastructure in their projects. These are summarised in Figure 3, where all categories of services CAPRO can provide were selected by a minimum of 11 responders.

Show the code
help_dictorionary <- lapply(meta$elements$details[[14]], function(x){
    tibble(text=x$answer_option, future_help=x$answer_codebook)
}) |> bind_rows()

dt |> 
  separate_rows(future_help, sep = ";") |> 
  group_by(future_help) |> 
  tally() |> 
  left_join(help_dictorionary) |>
  mutate(
    text = sprintf("%s: %s", future_help, text),
    text = str_wrap(text),
    pc = n/nrow(dt),
    future_help = fct_infreq(future_help)
  ) |>
  ggplot(aes(future_help, pc)) +
    geom_point(size = 0,
               aes(colour = text)) +
  geom_bar(stat = "identity", aes(fill = pc), 
           alpha = .7, 
           color = "grey82", 
           show.legend = FALSE) +
  geom_label(aes(label = scales::percent(pc)), 
             alpha = .5,
             nudge_y = .04) +
  labs(title = "", y = "") +
  scale_y_continuous(labels = scales::percent,
                     limits = c(0, 1)) +
  theme(axis.title.x = element_blank(),
        legend.position = "bottom",
        legend.title = element_blank(),
        legend.text = element_text(size = 12)) +
  guides(color = guide_legend(ncol = 1))  

Figure 3: Areas researchers need help in were probed in 8 categories (in addition to an extra freetext option). Each bar is its own grouping, showing the percent of responders that indicated they needed help with this specific category.

Conclusion

The need and interest for assistance in setting up and utilising e-infrastrucure exists among a lot of researchers at PSI in particular, but likely also in other parts of UiO and external institutions. CAPRO could provide valuable assistance to more efficiently and more transparently work with data through custom and standardised applications, as well as through workshops.