Feed aggregator

Create a PDF in Memory

Tom Kyte - 3 hours 23 min ago
Is there a way to create a pdf in memory and save it in a blob? We don?t want to create a database directory and create the actual pdf on the server. So, I am looking for a way to create it in memory and insert it into a blob field using pl/sql or a database procedure written in Java.
Categories: DBA Blogs

Copy data with BLOB using @dblink to my Staging database

Tom Kyte - 3 hours 23 min ago
Hi :)! I have a database @dblink and I need to copy files with BLOB from @dblink to my staging database. What can I do? I can't CREATE tables because of restrictions, so we only should use TRUNCATE, UPDATE and INSERT. Thank you so much in advance!
Categories: DBA Blogs

I want to automate this using Ansible

Tom Kyte - 3 hours 23 min ago
Fix Text Remove any demonstration and sample databases, database applications, objects, and files from the DBMS. To remove an account and all objects owned by that account (using BI as an example): DROP USER BI CASCADE; To remove objects without removing their owner, use the appropriate DROP statement (DROP TABLE, DROP VIEW, etc.). This is check: make sure all example schemas are removed select distinct(username) from dba_users where username in ('BI','HR','OE','PM','IX','SH','SCOTT'); Expected result: Expected results: no rows returned. If rows are returned, then drop the users. sqlplus / as sysdba drop user bi cascade; drop user hr cascade; drop user oe cascade; drop user pm cascade; drop user is cascade; drop user sh cascade; drop user scott cascade;
Categories: DBA Blogs

Flashback Query using Flashback Data Archive and minvalue does not work as expected

Tom Kyte - 3 hours 23 min ago
Hello, I created a Flashback Data Archive and gave it a retention of one year. Now I have enabled flashback archive for a test table. My problem is that minvalue does not work as I expected. My expectation is that the specified query will return ALL values that are present in the flashback archive. In fact, it only returns the data from the last 15 min. Is this the correct behavior? If yes, how can I query all previous versions of a row with a query? The given example shows the DDL of the test objects and an example query with timestamps when they were executed. Sorry for not using LiveSQL, but i wasn't able to recreate the Problem there. <code> create table test ( tst_id number generated by default as identity, tst varchar2(255) ) create flashback archive flashback_test tablespace users_flashback retention 1 year; alter table test flashback archive flashback_test; 15.09.21 11:41:28,534508000 +02:00 insert into test(tst) values('Test1'); 15.09.21 11:43:15,736558000 +02:00 update test set tst = 'Test2' where tst_id = 1; 15.09.21 11:45:47,551388000 +02:00 update test set tst = 'Test3' where tst_id = 1; select tst, versions_starttime, versions_endtime from test versions between scn minvalue and maxvalue where tst_id = 1; tst versions_starttime versions_endtime 15.09.21 11:48:09,833296000 +02:00 Test3 15.09.21 11:45:47 Test2 15.09.21 11:43:22 15.09.21 11:45:47 Test1 15.09.21 11:41:22 15.09.21 11:43:22 15.09.21 11:58:20,512213000 +02:00 Test3 15.09.21 11:45:47 Test2 15.09.21 11:43:22 15.09.21 11:45:47 Test1 15.09.21 11:43:22 15.09.21 11:59:52,966693000 +02:00 Test3 15.09.21 11:45:47 Test2 15.09.21 11:45:47 15.09.21 12:04:07,629023000 +02:00 Test3 </code>
Categories: DBA Blogs

How to find the chr(10) or chr(13) is in between a CLOB string

Tom Kyte - 3 hours 23 min ago
I would need to replace the chr(10) or chr(13) from the string. However i would need to exclude those chr(10) or chr(13) in middle of the string. eg: (2S,4R)-1-{1-[(2-acetamido-2-deoxy-??-D- [(3-{5-[(2-acetamido-2-deoxy-??-Dgalactopyranosyl)oxy]pentanamido} 5,11,18-trioxo-14-oxa-6,10,17-triazanonacosan-29-oyl}-4-hydroxypyrrolidin-2- yl]methyl all-P-ambo-2'-O-methyl-P-thioguanylyl-(3'?5')-2'-O-methyl-Pthiouridylyl-(3'?5')-2'-O-methylcytidylyl-(3'?5')-2'-O-methyladenylyl-(3' In the above example, i need not replace the chr(10) or chr(13), which is in between the string. Need to replace only at the end of the string. Please help to provide plsql script for this. Thank you
Categories: DBA Blogs

Designing Good Audit Trails for an Oracle Database

Pete Finnigan - 5 hours 23 min ago
I have been asked to speak at the UKOUG Autumn Tech event. This is an online conference event and the agenda grid is live and I will speak at 15:00 to 15:45 BUT the link to the details of my....[Read More]

Posted by Pete On 23/09/21 At 09:58 AM

Categories: Security Blogs

 Working With Jupyter Notebook

Online Apps DBA - 16 hours 58 min ago

* What Is Anaconda? Anaconda is a free and open-source distribution of the Python languages for data science and machine learning-related applications. It can be installed on Windows, Linux, and macOS systems. Conda is an open-source, cross-platform, package management system. Anaconda comes with so very nice tools like JupyterLab, Jupyter Notebook, Spyder, Glueviz, Visual Studio […]

The post  Working With Jupyter Notebook appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

Azure RBAC vs Azure Policy vs Azure Blueprints

Online Apps DBA - Wed, 2021-09-22 23:47

➽ The blog post – https://k21academy.com/az30419 will cover an overview of azure governance and its services and their differences. ➽ What comes to your mind when you see the word Governance? Is it a rule, or is it policy? Whatever it may be, don’t you think the company needs Governance to run effectively and efficiently. […]

The post Azure RBAC vs Azure Policy vs Azure Blueprints appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

B2B data sharing

Tom Kyte - Wed, 2021-09-22 19:26
Hi guys, I have a little question on how does B2B data sharing work ? Say I am the department of home affairs and I want to share the data I hold (data that lives into an Oracle db) to others departments of the country. Especially the data about HR. So how to do with that : - give a SQLplus access to each department ? - create a web service that gives the desired data ? What's commonly done in this kind of situations ? Thanks in advance. Amine
Categories: DBA Blogs

Python for Data Science

Online Apps DBA - Wed, 2021-09-22 06:47

Data Science ⯮Data science is a field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.Python⯮ Python is a general-purpose, object-oriented, high-level programming language. Its design philosophy emphasizes code readability. Python can handle every job, from data cleaning to data visualization to website development to executing embedded systems. Why […]

The post Python for Data Science appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

[New Update] Oracle Cloud (OCI) Support Center

Online Apps DBA - Wed, 2021-09-22 06:40

If you are working with Oracle Cloud, use their services and need support from oracle. Earlier customers used My Oracle support from a non-cloud platform only. But now you can raise & manage service requests from the OCI console. Oracle launched Oracle Cloud Infrastructure (OCI) Support Center on July 8th, 2021, where Cloud support has […]

The post [New Update] Oracle Cloud (OCI) Support Center appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

Text mining in R

Rittman Mead Consulting - Wed, 2021-09-22 05:04

As data becomes increasingly available in the world today the need to organise and understand it also increases. Since 80% of data out there is in unstructured format, text mining becomes an extremely valuable practice for organisations to generate helpful insights and improve decision-making. So, I decided to experiment with some data in the programming language R with its text mining package “tm” – one of the most popular choices for text analysis in R, to see how helpful the insights drawn from the social media platform Twitter were in understanding people’s sentiment towards the US elections in 2020.

What is Text Mining?

Unstructured data needs to be interpreted by machines in order to understand human languages and extract meaning from this data, also known as natural language processing (NLP) – a genre of machine learning. Text mining uses NLP techniques to transform unstructured data into a structured format for identifying meaningful patterns and new insights.

A fitting example would be social media data analysis; since social media is becoming an increasingly valuable source of market and customer intelligence, it provides us raw data to analyse and predict customer needs. Text mining can also help us extract sentiment behind tweets and understand people’s emotions towards what is being sold.

Setting the scene

Which brings us to my analysis here on a dataset of tweets made regarding the US elections that took place in 2020. There were over a million tweets made about Donald Trump and Joe Biden which I put through R’s text mining tools to draw some interesting analytics and see how they measure up against the actual outcome – Joe Biden’s victory. My main aim was to perform sentiment analysis on these tweets to gain a consensus on what US citizens were feeling in the run up to the elections, and whether there was any correlation between these sentiments and the election outcome.

I found the Twitter data on Kaggle, containing two datasets: one of tweets made on Donald Trump and the other, Joe Biden. These tweets were collected using the Twitter API where the tweets were split according to the hashtags ‘#Biden’ and ‘#Trump’ and updated right until four days after the election – when the winner was announced after delays in vote counting. There was a total of 1.72 million tweets, meaning plenty of words to extract emotions from.

The process

I will outline the process of transforming the unstructured tweets into a more intelligible collection of words, from which sentiments could be extracted. But before I begin, there are some things I had to think about for processing this type of data in R:

1. Memory space – Your laptop may not provide you the memory space you need for mining a large dataset in RStudio Desktop. I used RStudio Server on my Mac to access a larger CPU for the size of data at hand.

2. Parallel processing – I first used the ‘parallel’ package as a quick fix for memory problems encountered creating the corpus. But I continued to use it for improved efficiency even after moving to RStudio Server, as it still proved to be useful.

3. Every dataset is different – I followed a clear guide on sentiment analysis posted by Sanil Mhatre. But I soon realised that although I understood the fundamentals, I would need to follow a different set of steps tailored to the dataset I was dealing with.

First, all the necessary libraries were downloaded to run the various transformation functions. tm, wordcloud, syuzhet are for text mining processes. stringr, for stripping symbols from tweets. parallel, for parallel processing of memory consuming functions. ggplot2, for plotting visualisations.

I worked on the Biden dataset first and planned to implement the same steps on the Trump dataset given everything went well the first time round. The first dataset was loaded in and stripped of all columns except that of tweets as I aim to use just tweet content for sentiment analysis.

The next steps require parallelising computations. First, clusters were set up based on (the number of processor cores – 1) available in the server; in my case, 8-1 = 7 clusters. Then, the appropriate libraries were loaded into each cluster with ‘clusterEvalQ’ before using a parallelised version of ‘lapply’ to apply the corresponding function to each tweet across the clusters. This is computationally efficient regardless of the memory space available.

So, the tweets were first cleaned by filtering out the retweet, mention, hashtag and URL symbols that cloud the underlying information. I created a larger function with all relevant subset functions, each replacing different symbols with a space character. This function was parallelised as some of the ‘gsub’ functions are inherently time-consuming.

A corpus of the tweets was then created, again with parallelisation. A corpus is a collection of text documents (in this case, tweets) that are organised in a structured format. ‘VectorSource’ interprets each element of the character vector of tweets as a document before ‘Corpus’ organises these documents, preparing them to be cleaned further using some functions provided by tm. Steps to further reduce complexity of the corpus text being analysed included: converting all text to lowercase, removing any residual punctuation, stripping the whitespace (especially that introduced in the customised cleaning step earlier), and removing English stopwords that do not add value to the text.

The corpus list had to be split into a matrix, known as Term Document Matrix, describing the frequency of terms occurring in each document. The rows represent terms, and columns documents. This matrix was yet too large to process further without removing any sparse terms, so a sparsity level of 0.99 was set and the resulting matrix only contained terms appearing in at least 1% of the tweets. It then made sense to cumulate sums of each term across the tweets and create a data frame of the terms against their calculated cumulative frequencies. I went on to only experiment with wordclouds initially to get a sense of the output words. Upon observation, I realised common election terminology and US state names were also clouding the tweets, so I filtered out a character vector of them i.e. ‘trump’, ‘biden’, ‘vote’, ‘Pennsylvania’ etc. alongside more common Spanish stopwords without adding an extra translation step. My criterion was to remove words that would not logically fit under any NRC sentiment category (see below). This removal method can be confirmed to work better than the one tm provides, which essentially rendered useless and filtered none of the specified words. It was useful to watch the wordcloud distribution change as I removed corresponding words; I started to understand whether the outputted words made sense regarding the elections and the process they were put through.

The entire process was executed several times, involving adjusting parameters (in this case: the sparsity value and the vector of forbidden words), and plotting graphical results to ensure its reliability before proceeding to do the same on the Trump dataset. The process worked smoothly and the results were ready for comparison.

The results

First on the visualisation list was wordclouds – a compact display of the 100 most common words across the tweets, as shown below.

Joe Biden's analysis wordcloudJoe Biden's analysis barplotDonal Trump's analysis wordcloudDonald Trump's analysis barplot

The bigger the word, the greater its frequency in tweets. Briefly, it appears the word distribution for both parties are moderately similar, with the biggest words being common across both clouds. This can be seen on the bar charts on the right, with the only differing words being ‘time’ and ‘news’. There remain a few European stopwords tm left in both corpora, the English ones being more popular. However, some of the English ones can be useful sentiment indicators e.g., ‘can’ could indicate trust. Some smaller words are less valuable as they cause ambiguity in categorisation without a clear context e.g., ‘just’, ‘now’, and ‘new’ may be coming from ‘new york’ or pointing to anticipation for the ‘new president’. Nonetheless, there are some reasonable connections between the words and each candidate; some words in Biden’s cloud do not appear in Trump’s, such as ‘victory’, ‘love’, ‘hope’. ‘Win’ is bigger in Biden’s cloud, whilst ‘white’ is bigger in Trump’s cloud as well as occurrences of ‘fraud’. Although many of the terms lack context for us to base full judgement upon, we already get a consensus of the kind of words being used in connotation to each candidate.

Analysing further, emotion classification was performed to identify the distribution of emotions present in the run up to the elections. The syuzhet library adopts the NRC Emotion Lexicon – a large, crowd-sourced dictionary of words tallied against eight basic emotions and two sentiments: anger, anticipation, disgust, fear, joy, sadness, surprise, trust, negative, positive respectively. The terms from the matrix were tallied against the lexicon and the cumulative frequency was calculated for each sentiment. Using ggplot2, a comprehensive bar chart was plotted for both datasets, as shown below.

Side-by-side comparison of Biden and Trump's sentiment distribution.

Some revealing insights can be drawn here. Straight away, there is an absence of anger and disgust in Biden’s plot whilst anger is very much present in that of Trump’s. There is 1.6 times more positivity and 2.5 times more joy pertaining Biden, as well as twice the amount of trust and 1.5 times more anticipation about his potential. This is strong data supporting him. Feelings of fear and negativity, however, are equal in both; perhaps the audience were fearing the other party would win, or even what America’s future holds regarding either outcome. There was also twice the sadness and surprise pertaining Biden, which also makes me wonder if citizens are expressing potential emotions they would feel if Trump won (since the datasets were only split based on hashtags), alongside being genuinely sad or surprised that Biden is one of their options.

In the proportional bar charts, there is a wider gap between positivity and negativity regarding Biden than of Trump, meaning a lower proportion of people felt negatively about Biden. On the other hand, there is still around 13% trust in Trump, and a higher proportion of anticipation about him. Only around 4% of the words express sadness and surprise for him which is around 2% lower than for Biden – intriguing. We also must remember to factor in the period after the polls opened when the results were being updated and broadcasted, which may have also affected people’s feelings – surprise and sadness may have risen for both Biden and Trump supporters whenever Biden took the lead. Also, there was a higher proportion fearing Trump’s position, and the anger may have also creeped in as Trump’s support coloured the bigger states.

Proportional distribution of Biden-related sentimentsProportional distribution of Trump-related sentimentsSide-by-side comparison of proportional sentiment distribution

Conclusions
Being on the other side of the outcome, it is more captivating to observe the distribution of sentiments across Twitter data collected through the election period. Most patterns we observed from the data allude to predicting Joe Biden as the next POTUS, with a few exceptions when a couple of negative emotions were also felt regarding the current president; naturally, not everyone will be fully confident in every aspect of his pitch. Overall, however, we saw clear anger only towards Trump along with less joy, trust and anticipation. These visualisations, plotted using R’s tm package in a few lines of code, helped us draw compelling insights that supported the actual election outcome. It is indeed impressive how text mining can be performed at ease in R (once the you have the technical aspects figured out) to create inferential results instantly.

Nevertheless, there were some limitations. We must consider that since the tweets were split according to the hashtags ‘#Biden’ and ‘#Trump’, there is a possibility these tweets appear in both datasets. This may mean an overspill of emotions towards Trump in the Biden dataset and vice versa. Also, the analysis would’ve been clearer if we contextualised the terms’ usage; maybe considering phrases instead would build a better picture of what people were feeling. Whilst plotting the wordclouds, as I filtered out a few foreign stopwords more crept into the cloud each time, which calls for a more solid translation step before removing stopwords, meaning all terms would then be in English. I also noted that despite trying to remove the “ ’s” character, which was in the top 10, it still filtered through to the end, serving as an anomaly in this experiment as every other word in my custom vector was removed.

This experiment can be considered a success for an initial dip into the world of text mining in R, seeing that there is relatively strong correlation between the prediction and the outcome. There are several ways to improve this data analysis which can be aided with further study into various areas of text mining, and then exploring if and how R’s capabilities can expand to help us achieve more in-depth analysis.

My code for this experiment can be found here.

Categories: BI & Warehousing

Adding transform=oid:n to import using SQL Developer Import Wizard

Tom Kyte - Wed, 2021-09-22 01:06
Hi, I know how to use transform=oid:n in a CMD box using impdp. Is there a way to add that parameter to the import when using the Datapump Import Wizard in SQL Developer? Thanks.
Categories: DBA Blogs

Certified Kubernetes Administrator | Day 1 & Day 2 Training Concepts

Online Apps DBA - Tue, 2021-09-21 02:45

K8s Architecture, Components, Installation, and Networking Kubernetes is an open-source conainer tool for orchestration. It can automate the processes such as deploying, and managing containerized applications. Kubernetes follows a very straightforward architecture with flexibility. It consists of master nodes and worker nodes. Master communicates with the worker node with the help of API servers. Kubernetes […]

The post Certified Kubernetes Administrator | Day 1 & Day 2 Training Concepts appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

Schedule And Automate SQL Server DB Replication

Online Apps DBA - Tue, 2021-09-21 01:57

In this blog, I have covered steps to Schedule And Automate SQL Server DB Replication between different servers [SQL Server 2017 or Windows Server 2016]. We cover this in detail in our Azure Data Database Administrator [DP-300] Training program. Introduction SQL Server DB replication is a technology that is used to copy or distribute, data […]

The post Schedule And Automate SQL Server DB Replication appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

Top 50 Data Science Interview Questions

Online Apps DBA - Tue, 2021-09-21 00:52

Are you looking for some Data Science Sample Questions to practice for your Interview Here is the list of  Data Science Interview Questions that are best Data Scientist ➪ A Data Scientist aspirant should get deep knowledge and experience with core data science concepts. Top Interview Questions will help them garnish their skill before any […]

The post Top 50 Data Science Interview Questions appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

AWS Solutions Architect | Day 3 Review and Q/A: Security Management on AWS [SAA-C02]

Online Apps DBA - Tue, 2021-09-21 00:12

AWS is a cloud computing platform that offers various compute services to build, text, store data, and deploy applications. AWS Identity and Access Management (IAM) ➪ helps you manage access to AWS services and resources securely. Using IAM, you can create and manage AWS users and groups and allow specific permissions. AWS Shield ➪ is […]

The post AWS Solutions Architect | Day 3 Review and Q/A: Security Management on AWS [SAA-C02] appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

CSV scripts worked well with 12c but in 19c, results are differents - spaces are added before and after value !

Tom Kyte - Mon, 2021-09-20 12:26
Hi all, It's first time post. We are facing a problem in 19c when generating a csv file. In 12c, results were as expected. Spaces are added after or before the content of the column. Please find attached the information about the environment. -- tables structure Column Name Type TABLE ----------------- -------- ----------------------------- CODE_PORTEFEUILLE CHAR(6) EXPIT.inv_pool_prod_tmp HISTO_DATE_VALORISATION NUMBER(6) GP2PROD.gestion_valorisation -- conversion date function <code>CREATE FUNCTION "GP2PROD"."CONVERTION_DATE" (cd_date number) return date is cd_date2 date; BEGIN cd_date2 := TO_DATE('14/09/1752','DD/MM/YYYY') + abs(cd_date); return (cd_date2); END;</code> -- script <code>set heading off pagesize 0 linesize 0 set TRIMSPOOL ON set RECSEP off set verify off CREATE TABLE inv_pool_prod_tmp as select distinct dp.code_portefeuille from descriptif_comp_portefeuille dcp, descriptif_portefeuille dp, contenu_ensemble_port cep where dcp.flag_pool <> ' ' and dcp.code_portefeuille = dp.code_portefeuille and dp.objectif_portefeuille <> ' ' and cep.code_portefeuille =dp.code_portefeuille and cep.code_ensemble_port <> 'OPCLIQ' group by dp.code_portefeuille order by dp.code_portefeuille; spool $GPFETAT/ctrl_nbr_pool_prod.csv select B.CODE_PORTEFEUILLE, ';', convertion_date(max(B.HISTO_DATE_VALORISATION)) as premiere_valo from inv_pool_prod_tmp a, gestion_valorisation b where a.code_portefeuille = b.code_portefeuille group by b.code_portefeuille order by b.code_portefeuille; spool off; quit;</code> -- as you can see, saces are added after value into first column and before value into second column. <code>more ctrl_nbr_pool_prod.csv</code> 100801 ; 31-DEC-2014 100804 ; 31-DEC-2014 100805 ; 31-DEC-2014 100806 ; 31-DEC-2014 100809 ; 31-DEC-2014 100810 ; 31-DEC-2014 100811 ; 14-JUN-2016 100812 ; 14-JUN-2016 100813 ; 14-JUN-2016 100814 ; 14-JUN-2016 100815 ; 30-JUN-2016 100816 ; 30-JUN-2016 100817 ; 01-JUN-2017 126401 ; 10-NOV-2017 Thanks in advance for you reply.
Categories: DBA Blogs

Everything you need to know about Power BI Service

Online Apps DBA - Mon, 2021-09-20 08:00

This blog walks you through the Power BI Service and everything you need to know about Power BI Service Microsoft Power BI Service The Power BI Service is a cloud-based☁️ service where users view and interact with the reports. The Microsoft Power BI service, also referred to as Power BI online, is the SaaS (Software […]

The post Everything you need to know about Power BI Service appeared first on Oracle Trainings for Apps & Fusion DBA.

Categories: APPS Blogs

RabbitMQ on Kubernetes in Skipper

Andrejus Baranovski - Mon, 2021-09-20 07:51
RabbitMQ works great for event-based microservices implementation. Katana ML Skipper is our open source product, it helps to run workflows and connect multiple services, we are specifically specialized for ML workflows. In this video, I explain how we integrated RabbitMQ and how we run it on Kubernetes cluster. I believe this can be helpful if you are researching how to run RabbitMQ on Kubernetes cluster for your own use cases.

 

Pages

Subscribe to Oracle FAQ aggregator