The tf-idf-Statistic For Keyword Extraction
The tf-idf-statistic (“term frequency – inverse document frequency”) is a common tool for the purpose of extracting keywords from a document by not just considering a single document but all documents...
View ArticleTitanic challenge on Kaggle with decision trees (party) and SVMs (kernlab)
The Titanic challenge on Kaggle is about inferring from a number of personal details whether a passenger survived the disaster or did not. I gave two algorithms a try, which are decision trees using R...
View ArticleGermans used to have more Sex in Summer!
Wow – what a headline … okay, I admit it’s phrased quite sensational given that it anticipates just one possible interpretation of increasingly more births around summer / autumn compared to in spring...
View ArticleHumor is a powerful, alternative Method for processing Data and reporting...
“Je n’ai pas peur des représailles. Je n’ai pas de gosses, pas de femme, pas de voiture, pas de crédit. Ça fait sûrement un peu pompeux, mais je préfère mourir debout que vivre à genoux.” (“I am not...
View ArticleAs a Data Scientist it is my Obligation to support #nobagida, #nopegida and...
Political Opinion on a Scale from 0 to 2π par(mfrow=c(2,1),mar=c(2,4,4,1)) barplot(rep(1,10), main="") mtext("#[a-z]{2}gida = ...", side=3, adj=.1, line=1, cex=2, col=rgb(.4,.4,.4))...
View Article