<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">EQ</journal-id>
<journal-id journal-id-type="nlm-ta">EQ</journal-id>
<journal-id journal-id-type="publisher-id">EQ</journal-id>
<journal-title-group>
<journal-title>Educational Quest</journal-title>
</journal-title-group>
<issn pub-type="ppub">0976-7258</issn>
<issn pub-type="epub">2230-7311</issn>
<publisher>
<publisher-name>New Delhi Publishers</publisher-name>
<publisher-loc>India</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="other">EQ-11-03-169</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Paper</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Termination of Cyber-Sexual Harassment and Abuse with Teenagers using Artificial Intelligence</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Gandhi</surname><given-names>Ratnark</given-names></name></contrib></contrib-group>
<aff>Data Scientist, BTM Financial, 153-54, Spaze I Tech Park, Sector 49, Gurgaon-122018, India</aff>
<author-notes>
<corresp id="cor001">Corresponding author: <email>ratanxcd@gmail.com</email></corresp>
</author-notes>
<pub-date pub-type="ppub">
<month>12</month>
<year iso-8601-date="2020">2020</year>
</pub-date>
<volume>11</volume>
<issue>3</issue>
<fpage>169</fpage>
<lpage>174</lpage>
<history>
<date date-type="received" iso-8601-date="2020-09-11">
<day>11</day>
<month>09</month>
<year>2020</year>
</date>
<date date-type="revised" iso-8601-date="2020-11-17">
<day>17</day>
<month>11</month>
<year>2020</year>
</date>
<date date-type="accepted" iso-8601-date="2020-12-07">
<day>07</day>
<month>12</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9;2020 New Delhi Publishers. All rights reserved</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>New Delhi Publishers</copyright-holder>
</permissions>
<self-uri content-type="pdf" xlink:href="EQ-11-03-169.pdf"></self-uri>
<abstract>
<p>With the rapid pace of lives transforming by the means of innovative technologies being introduced into our lives, the rate of people willing to abuse technologies is increasing as well. The motive of method being introduced by me in this paper is to monitor textual conversation of underage adolescents on social media platforms and creating alerts (if found a case of sexual harassment/abuse) through an application. In the proposed method I will be concentrating more on the algorithm used in an application for the classification of textual messages (to provide new insights towards chances of tackling cybercrime) rather than the development of application itself.</p>
</abstract>
<kwd-group>
<kwd>Adolescents</kwd>
<kwd>sexual harassment</kwd>
<kwd>cybercrime</kwd>
<kwd>abuse</kwd>
</kwd-group>
<counts>
<fig-count count="2"/>
<table-count count="2"/>
<ref-count count="5"/>
<page-count count="6"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1">
<title/>
<sec id="S1_1">
<title>Artificial Intelligence</title>
<p>Artificial Intelligence is a large branch of computer science that deals with the creation of smart machines capable of performing tasks that usually require human intelligence. AI is an interdisciplinary science with different approaches, but in nearly every field of the tech industry, developments in machine learning and deep learning are causing a paradigm change.</p>
</sec>
<sec id="S1_2">
<title>Why Teenagers?</title>
<p>As a human, teenagers are still evolving, but they don&#x2019;t know what their full identity is. It is very easy for the world to let them know what they should be and how they should behave as an individual, without researching it and forming their own opinion on it, adolescents are likely to adopt ideas. It is easy for a person to put ideas in the head of a child, since they may lack assertiveness that helps them to understand what they should say no to and yes to, making it very difficult to interpret mind games. That&#x2019;s the main reason behind teenagers being this paper&#x2019;s target.</p>
</sec>
<sec id="S1_3">
<title>What is sexual harassment?</title>
<p>According to EEOC, Sexual Harassment can be defined as unwelcome sexual advances, requests for sexual favors, and other verbal or physical conduct of a sexual nature when:
<list list-type="bullet">
<list-item><p>Submission to such conduct is made either explicitly or implicitly a term or condition of an individual&#x203A;s employment.</p></list-item>
<list-item><p>Submission to or rejection of such conduct by an individual is used as a basis for employment decisions affecting such individual.</p></list-item>
<list-item><p>Such conduct has the purpose or effect of unreasonably interfering with an individual&#x203A;s work performance or creating an intimidating, hostile, or offensive working environment.</p></list-item></list></p>
<p>Unwelcome Behaviour is the critical word. Unwelcome does not mean &#x201C;involuntary.&#x201D; A victim may consent or agree to certain conduct and actively participate in it even though it is offensive and objectionable. Therefore, sexual conduct is unwelcome whenever the person subjected to it considers it unwelcome. Whether the person, in fact, welcomed a request for a date, sex-oriented comment, or joke depends on all the circumstances.</p>
<p><bold>How to cite this article:</bold> Gandhi, R. (2020). Termination of Cyber-SexualHarassment and Abuse with Teenagers using Artificial Intelligence. <italic>Educational Quest: An Int. J. Edu. Appl. Soc. Sci.,</italic> <bold>11</bold>(4): 169-174.</p>
<p><bold>Source of Support:</bold> None; <bold>Conflict of Interest:</bold> None</p>
</sec>
<sec id="S1_4">
<title>Cyber Sexual Harassment</title>
<p>Sexual Harassment can be categorized into various forms, one of which and the main focus of this paper is Cyber Sexual Harassment. When the victim of Sexual Harassment is targeted through the means of any Online Social Platform such as Facebook, Instagram, Telegram, E-Mails, etc case falls under the umbrella of Cyber Sexual Harassment. Means of harassment through the internet could be:
<list list-type="bullet">
<list-item><p>Abusive Texts</p></list-item>
<list-item><p>Inappropriate Images</p></list-item>
<list-item><p>Sexual Threats</p></list-item>
<list-item><p>Offensive voice messagesare based on an individual&#x2019;s traits.</p></list-item>
<list-item><p>Erotic stickers and gif(s)</p></list-item>
<list-item><p>Links comprising of sexual content</p></list-item>
</list></p>
</sec>
<sec id="S1_5">
<title>Effects of Cyber Sexual Harassment</title>
<p>Being harassed sexually can impact an individual in several ways, affecting physical well-being as well as causing mental traumas and breakdown.</p>
<p>Some of the consequential reactions are listed below [2]:</p>
<p><bold>Physiological Reactions</bold></p>
<list list-type="order">
<list-item><p>Headaches</p></list-item>
<list-item><p>Lethargy</p></list-item>
<list-item><p>Phobias</p></list-item>
<list-item><p>Panic Reactions</p></list-item>
<list-item><p>Sleep Disturbance</p></list-item>
<list-item><p>Nightmares</p></list-item>
<list-item><p>Weight Loss</p></list-item>
</list>
<p><bold>Psychological Reactions</bold></p>
<list list-type="order">
<list-item><p>Depression, anxiety, shock, denial</p></list-item>
<list-item><p>Anger, fear, frustration, irritability</p></list-item>
<list-item><p>Insecurity, embarrassment, feelings of betrayal</p></list-item>
<list-item><p>Confusion, feelings of being powerless</p></list-item>
<list-item><p>Shame, self-consciousness, low self-esteem</p></list-item>
<list-item><p>Guilt, self-blame, isolation</p></list-item>
</list>
</sec>
<sec id="S1_6">
<title>Statistics Involved</title>
<list list-type="bullet">
<list-item><p>There are more than 42 million survivors of sexual abuse in America. <italic>(National Association of Adult Survivors of Child Abuse)</italic></p></list-item>
<list-item><p>1 in 3 girls is sexually abused before the age of 18. <italic>(The Advocacy Centre)</italic></p></list-item>
<list-item><p>1 in 5 boys is sexually abused before the age of 18. <italic>(The Advocacy Centre)</italic></p></list-item>
<list-item><p>1 in 5 children is solicited sexually while on the Internet before the age of 18. <italic>(National Children&#x2019;s Alliance: Nationwide Child Abuse Statistics)</italic></p></list-item>
<list-item><p>30% of sexual abuse is never reported. <italic>(Child Sex Abuse Prevention and Protection Centre)</italic></p></list-item>
<list-item><p>Nearly 70% of all reported sexual assaults (including assaults on adults) occur to children age 17 and under. <italic>(Children&#x2019;s Advocacy Centre)</italic></p></list-item>
<list-item><p>90% of child sexual abuse victims know the perpetrator in some way. <italic>(U.S. Department of Justice)</italic></p></list-item>
<list-item><p>Approximately 20% of the victims of sexual abuse are under age eight. <italic>(Broward County)</italic></p></list-item>
<list-item><p>95% of sexual abuse is preventable through education. <italic>(Child Molestation Research and Prevention Institute)</italic></p></list-item>
<list-item><p>38% of the sexual abusers of boys are female. <italic>(Broward County)</italic></p></list-item>
<list-item><p>There is worse lasting emotional damage when a child&#x2019;s sexual abuse started before the age of six and lasted for several years. Among child and teen victims of sexual abuse, there is a 42 percent increased chance of suicidal thoughts during adolescence. <italic>(American Counselling Association)</italic></p></list-item>
<list-item><p>&#x201C;More than 90% of individuals with a developmental delay or disability will be sexually assaulted at least once in their lifetime.&#x201D; <italic>(Valenti-Heim, D.m Schwartz L.)</italic></p></list-item>
<list-item><p>&#x201C;There are nearly half a million registered sex offenders in the U.S. &#x2013; 80,000 to 100,000 of them are missing.&#x201D; <italic>(The National Centre for Missing and Exploited Children)</italic></p></list-item>
<list-item><p>&#x201C;A typical pedophile will commit 117 sexual crimes in a lifetime.&#x201D;</p></list-item>
</list>
</sec>
<sec id="S1_7">
<title>Around the Internet</title>
<p>In the 2016 Annual Report from the Internet Watch Foundation (IWF),18 whose remit is to remove Child Sexual Abuse Material hosted anywhere in the world, including non-photographic content hosted in the United Kingdom, of the 105,420 reports processed in 2016, 57,162 were received from public sources withthe remainder identified through analysts actively searching the open Internet using a combination of analyst searching and bespoke web crawlers. Of these, 57,335 URLs contained Child Sexual Abuse Content, twenty- eight percent of these reports were confirmed as containing serious cases regarding the same.</p>
<p>INHOPE (a global network of hotlines from 49 members whose remit is to deal with illegal content online and remove Child Sexual Abuse Material from the Internet),20 reported that they received 9,357,240 reports in 2016, with 8,474,713 confirmed as containing Child Sexual Abuse Material. Cybertip. ca, the hotline hosted by the Canadian Centre for Child Protection processed 40,251 reports in 2016-17, 49% of which were forwarded to law enforcement, child welfare, and (or) INHOPE, or notice was sent to an Electronic Service Provider to report Child Sexual Abuse Material hosted by a service provider in Canada or the United States.</p>
<p>During 2015, in addition to receiving reports of online Child Sexual Abuse, 85% of hotlines in 2015 also accepted other types of reports, which included for example racism/hate speech (69%), adult pornography (64%), bullying (62%) and selfharm/suicide (44%).</p>
<p>However, in this global study of hotlines, approximately one-third of those surveyed indicated that Child Sexual Abusereports made up the majority of their workload. The National Centre for Missing and Exploited Children (NCMEC), serves as the United States of America&#x2019;s clearinghouse for Child Sexual Abuse Material through the Cyber Tip-line. This provides an online mechanism for members of the public and electronic service providers (ESPs) to report incidents of suspected child sexual exploitation. This includes Child Sexual Abuse Material,sexual exploitation of children in travel and tourism, online enticement, trafficking of children for sexual purposes, child sexual molestation, misleading domain names or words, and unsolicited obscene material sent to a child. In the report to the United States House of Representatives Subcommittee in March 2017, it was noted that over recent years, the volume of Cyber Tip-line reports received by NCMEC had increased from over 1.1 million reports in 2014 to more than 4.4 million reports in 2015, to more than 8.2 million reports in 2016.</p>
</sec>
<sec id="S1_8">
<title>Detection of Cyber Sexual Harassment</title>
<p>The above reports show that 20% of all teenagers experience Cyber Sexual Harassment through social media before reaching 18, and this number might be much bigger in absence of proper resources to detect such crimes. We live in a world where everyone is free to share their views through social media, but some lawbreakers take advantage of the liberty provided by these platforms to reach out to anyone, they start targeting innocent children. My proposed architecture focuses on tackling and monitoring of textual messages often received by children by unknown/known people who tend to harm children because of their sexual urges.</p>
</sec>
<sec id="S1_9">
<title>Proposed Architecture</title>
<list list-type="order">
<list-item><p>Real-time monitoring of texts being exchanged across social media platforms, options to include, or exclude specific platforms are always under the control of user&#x2019;s guardian.</p></list-item>
<list-item><p>A list of applications to be monitored is created internally according to options chosen by the user along with the guardian. Whenever the chosen application starts monitoring starts automatically.</p></list-item>
<list-item><p>Whenever a text message is tagged as Sexually Inappropriate by algorithm, it starts searching for sexually offensive words (collection of 453 such words are with me already) in text messages sent by the opposite person. As soon as any word matches from a list of abusive words with me, a warning trigger is generated and sent across different platforms to the local guardian.</p></list-item>
</list>
<fig id="F1">
<label>Fig. 1</label>
<caption>
<p>Process architecture</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EQ-11-03-169-f001.jpg"/>
</fig>
</sec>
<sec id="S1_10">
<title>User Data Protection</title>
<p>Since the user data in criminal cases like sexual harassment is very sensitive, hence as a developer of this application I&#x2019;m bound to keep the total privacy of user&#x2019;s data. I know that if I failed to do so it will have disastrous consequences for my application from both legal and PR standpoint. I will take few necessary steps to safeguard my application against security breaches:</p>
<p><bold>No Access to Sensitive Data:</bold> No one from the team of developers/executives (if any) should&#x2019;ve access to the client&#x2019;s data without being reported for any technical issue by the user and only if the user agrees to share data with the helping executives. There should be a centralized system only through which executives would be able to help the user.</p>
<p><bold>Dedicated Server:</bold> Using shared servers to cutdown the costs is normal practice among running businesses, however, this comes with a security risk, as these servers share risk with other websites. While dedicated servers are expensive, they also provide an additional layer of protection for both you and your client.</p>
<p><bold>Proper Firewalls and Antivirus Protection:</bold> Basic security measures to safeguard data include the installation of firewalls and antivirus protection. While firewalls would help to prevent unauthorized access, antivirus software helps prevent, detect, and remove harmful programs from your computer and hence reducing the vulnerabilities of the whole system.</p>
<p><bold>Periodic Security Updates:</bold> Security hacks do exploit known security holes, so in order to prevent exploitation of security holes, developers should provide regular updates covering those holes and pushing them to users.</p>
</sec>
</sec>
<sec id="S2">
<title>METHODOLOGY</title>
<sec id="S2_1">
<title>Sample Set</title>
<p>In this case, finding the accurate dataset for this specific task was kind of impossible, so I prepared my dataset by mixing:-Sexual Harassment Conversation between victims and their criminals from around 8320 cases, extracted directly from stories told by multiple victims which were provided by Safecity India. Along with normal text conversations containing 9360 texts provided by Unicamp with a list of 453 sexually offensive words taken from GitHub.</p>
<p><bold>Training and Testing Set:</bold> After mixing up both Abusive as well as Normal text messages, I got 17680text messages with me. Splitting it as 80:20 gave me with 14144 tagged text messages for training the algorithm and 3536 for testing.</p>
</sec>
<sec id="S2_2">
<title>Research Methodology</title>
<p>Once an application starts which belongs to the list of apps to be monitored, real-time extraction of text starts after the following steps used for detection:</p>
<p><bold>Reading and Conversion of raw and semi-structured text into structured format:</bold> When the data is extracted directly from text messages, it&#x2019;s always in a semi-structured format. To operate and extract details from text messages I converted semi-structured texts into a structured format.</p>
<p><bold>Cleaning Data:</bold> I have structured text with me now,but structured texts still contain unwanted or irrelevant content for detection, removing them increases efficiency in reading and extracting meanings from texts as well as decreasing execution time for algorithms.</p>
<p>Steps involved in cleaning are:
<list list-type="bullet">
<list-item><p><bold>Removing Punctuation:</bold> I used the inbuilt PUNCTUATION method which returns all sets of punctuation to me, imported from the STRING library.</p></list-item>
<list-item><p><bold>Tokenization</bold><bold>:</bold> It is a way of separating a piece of text into smaller units called tokens. Tokens produced are then used to prepare a vocabulary, vocabulary refers to the set of unique tokens in the corpus.</p></list-item>
<list-item><p><bold>Removing Stop Words:</bold> Stopwords are the most common words in any natural language adding extra and irrelevant words for processing. Words like &#x201C;is&#x201D;, &#x201C;in&#x201D;, &#x201C;for&#x201D;,&#x201D; where&#x201D;, etc are considered as stopwords. After the removal of stopwords, I am left with only meaningful and relevant words for text processing.</p></list-item>
<list-item><p><bold>Lemmatising/Stemming:</bold> It is the process of reducing inflected (or sometimes derived) words to their word stem or root. Lemmatising is the process of grouping together the inflected forms of a word so they can be analyzed as a single term, identified by the word&#x2019;s lemma. For example, words like &#x201C;typing&#x201D;, &#x201C;typed&#x201D; will reduce to the word &#x201C;type&#x201D;. Leaving me with only the necessary words to understand the meaning of its usage.</p></list-item>
</list></p>
<p><bold>Vectorizing Data:</bold> Process of encoding text as integers to create feature vectors, where feature vector is an n-dimensional vector of numerical features that represents some object. Hence, in my context,I took individual text messages and converted them into vector numerals. Normal texts were converted into numerals so that, python and machine learning algorithms can understand the data.</p>
</sec>
<sec id="S2_3">
<title>Learning Techniques Used</title>
<p><bold><italic>Naive Bayes:</italic></bold> Naive Bayes constitutes multiple algorithms that I can use while doing text classification. The member of Na&#x00EF;ve Bayes which I used is Multinomial Naive Bayes (MNB). The main reason behind using this algorithm is I got really good results when data available with me was not much and computational resources were scarce.</p>
<p><bold><italic>Support Vector Machines:</italic></bold> SVM is one of the algorithms being used majorly for text classification. It is similar to Naive Bayes in the manner that it does not need much training data to start providing accurate results. In short, SVM takes care of drawing a line of a hyperplane that divides a space into two subspaces,one subspace that contains the vectors that belong to a group and another subspace that contains vectors that do not belong to that group. These vectors represent the training texts and a group is a tag given to text messages of similar type.</p>
</sec>
<sec id="S2_4">
<title>Output</title>
<p>After classifiers are trained on the training dataset, I tested those classifiers on the test set giving an output of text tags as Sexually Inappropriate or Normal.</p>
<table-wrap id="T1">
<label>Table 1</label>
<caption>
<p>Confusion Matrix for MBT</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EQ-11-03-169-t001.jpg"/>
</table-wrap>
<table-wrap id="T2">
<label>Table 2</label>
<caption>
<p>Confusion Matrix for SVM</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EQ-11-03-169-t002.jpg"/>
</table-wrap>
<fig id="F2">
<label>Fig. 2</label>
<caption>
<p>Average and Individual Algorithm&#x2019;s Accuracy Plot</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="EQ-11-03-169-f002.jpg"/>
</fig>
</sec>
<sec id="S2_5">
<title>Alert Generation</title>
<p>Once a text message is tagged as sexually abusive, the presence of abusive words (present in list taken from GitHub) in text messages is tested and the presence of such words leads to the generation of warning trigger sent to local guardians.</p>
</sec>
<sec id="S2_6">
<title>Tools Used</title>
<p>The programming and preprocessing part on the collected data set was done on SPYDER IDE, accessed from ANACONDA with the help of few open-source and pre-installed packages available on spyder including PANDAS, SCIKIT-LEARN, etc.</p>
</sec>
<sec id="S2_7">
<title>Similar Cyber Crimes and Statistics Involved</title>
<p>Around 9.2% of 630 adolescents surveyed in the Delhi-National Capital Region had experienced cyber-bullying and half of them had not reported it to teachers, guardians, or the social media companies concerned, a recent study by Child Rights and You, a non-governmental organization, found.</p>
<p>Vulnerability rose with internet use: 22.4% of respondents, aged 13-18 years, who used the internet for longer than three hours a day were vulnerable to online bullying, while up to 28% of respondents, who used the internet for more than four hours a day, faced cyber-bullying.</p>
</sec>
</sec>
<sec id="S3">
<title>CONCLUSION</title>
<p>The proposed NLP based Emotion Recognition using Natural Language Processing approach works to recognize sexual harassment and abuse efficiently and effectively. Algorithms used in learning techniques gave an aggregate accuracy of 90.25% even with a limited amount of data available. Also,I believe that with the expansion of the training dataset provided for learning techniques, accuracy will improve substantially.</p>
</sec>
</body>
<back>
<ref-list>
<ref id="R1"><mixed-citation publication-type="web"><source>ECPAT website</source> (<ext-link ext-link-type="uri" xlink:href="https://www.ecpat.org/news/online-child-sexual-abuse-material-the-facts/">https://www.ecpat.org/news/online-child-sexual-abuse-material-the-facts/</ext-link>) provides unknown facts on online child sexual abuse materials present worldwide.</mixed-citation></ref>
<ref id="R2"><mixed-citation publication-type="web"><source>Race Against Abuse of Children Everywhere (RAACE) website</source> (<ext-link ext-link-type="uri" xlink:href="https://www.raace.org/join-the-raace">https://www.raace.org/join-the-raace</ext-link>) has built their mission to prevent hidden epidemic of child sexual abuse.</mixed-citation></ref>
<ref id="R3"><mixed-citation publication-type="web"><source>Rhea Maheshwari, India Spend, 13 Mar 2020</source>, <date-in-citation>accessed 3 September 2020</date-in-citation>, Link: <ext-link ext-link-type="uri" xlink:href="https://scroll.in/article/956085/in-one-year-alone-cyberbullying-of-indian-women-and-teenagers-rose-by-36">https://scroll.in/article/956085/in-one-year-alone-cyberbullying-of-indian-women-and-teenagers-rose-by-36</ext-link></mixed-citation></ref>
<ref id="R4"><mixed-citation publication-type="web"><source>The U.S. Equal Employment Opportunity Commission (EEOC) website</source> (<ext-link ext-link-type="uri" xlink:href="https://www.eeoc.gov/overview">https://www.eeoc.gov/overview</ext-link>) mission is to prevent and remedy unlawful employment discrimination and advance equal opportunity for all in the work place.</mixed-citation></ref>
<ref id="R5"><mixed-citation publication-type="web"><source>University of South Florida (USF) website</source> (<ext-link ext-link-type="uri" xlink:href="https://www.usf.edu/student-affairs/victim-advocacy/types-of-crimes/sexual-harassment.aspx">https://www.usf.edu/student-affairs/victim-advocacy/types-of-crimes/sexual-harassment.aspx</ext-link>)has a separate department for crime info and support, which provides adequate definition and examples of sexual harassment.</mixed-citation></ref>
</ref-list>
</back>
</article>