Notes on: How to use (Sci)BERT and PyTorch from within R

First, follow the steps described here, to set up an Rstudio Project with and virtual environment (virtualenv) for python. Alternatively, execute the following commands in your Terminal, within the project folder. pip install virtualenv virtualenv python source python/bin/activate And create an .RProfile file in your project folder with the following content Sys.setenv(RETICULATE_PYTHON = "py_env/python/bin/python") Next, you want to install RUST, since it is needed to build the tokenizer used in the transformers library.

Read More…

Starting new with hugo and github.io

Using wordpress for my website lead to me mostly abondoning it. Also a static website should be enough for my purposes. I like Hugo, as it allows me to do all the authoring in Markdown files in a local folder and automating the deployment to github pages is easy as well.

Next Steps

  • Changing my github pages to my own domain
  • create content!!!

Kommst Du aus Neuenkirchen oder Neuenkirchen? - Doppelte Städtenamen in Deutschland

Bei der Darstellung von Information zu deutschen Städten und Gemeinden kann es bei der Zusammenführung von unterschiedlichen Datenquellen Probleme geben, da einige Ortsnamen auf dem Gebiet der BRD mehrfach vergeben sind. Auf der Grundlage der GADM Daten für Deutschland wird hier gezeigt, dass beim Zusammenführen von Daten nicht ausreicht, neben den Ortsnamen auch die Zugehörigkeit zu Bundesland, Landkreis, Verwaltungsgemeinschaft zu berücksichtigen. Erst wenn auch die Information, ob es sich bei dem Ortsnamen um eine Stadt, eine Gemeinde oder ein Gemeinderfreies Gebiet handelt mit herangezogen wird, kann eine akkurate Zuordnung erreicht werden.

Read More…

R: Plotting Map of Germany with ggplot2

This is a collection of links to blogposts and tutorials that helped me putting together a choropleth map of germany at the level of states and postal code areas. I started out with a set of postal codes (german: Postleizahlen) and wanted to convert these to frequencies per city and per state. The resulting map was a choropleth map indicating counts per state and an extra layer with circles indicating the counts per city and district (german Stadt and Landkreis). This visualisation is useful if your postal code data is sparsely distributed accross the national level. If your data contains information for all or almost all cities or discrits, a choropleth map for the according level should do the trick. The following steps will be needed:

  1. Download German Map Shapefiles
  2. Download German Postal Code Data
  3. Combine Map Data and PLZ Data
  4. Plot Choropleth Map
  5. Plot Map with Circles as Cities or districts
  6. Add labels

Read More…

Diagram Plotter

I wrote a little tool to create diagrams. I am mainly using it myself, but I also gave it to some people and they seem to use it from time to time. Just maybe, it might be also useful for you.

This tool plots a diagram with vertical data lines. It is especially useful for the display of ordinal survey data. It creates a PDF file that contains the diagram. With Inkscape installed you can also create a PNG File. The PDF file can be vectorized with the appropriate software (e.g. Inkscape) and than be manipulated and saved in different formats.

Read More…

Grouped Median Function for R – gmedian (), mQ()

The formula for the median of grouped frequency distributions has a nice feature for analysing ordinal data – even if the data isn’t really grouped. This forumla computes the median and adds a decimal fraction value to it. The decimal fraction value is an estimate of where the median would be, if we were using a more granular scale. Especially social science data is often not available in a metric format. So using this formula is really helpful for making i.e. more differentiated comparisons between subsets of social data.

Read More…